Active Learning and Predictive Modeling Using Uncertainty Quantification
Typ
Examensarbete för masterexamen
Program
Publicerad
2020
Författare
Blomgren, Carl
Gummesson Svensson, Hampus
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
A deficit with current state-of-the-art machine learning algorithms in drug discovery
is that they solely provide a point-estimate. However, in drug discovery, where data
is associated with costly and time consuming experiments, there is a need for the
models to indicate the uncertainty of their outputs. Otherwise, the models might be
used erroneously. In order to obtain uncertainty from the models, this thesis utilizes
Bayesian statistical models. In particular, the objective of this thesis is twofold: (1)
Investigate the use of uncertainty in active learning (AL) for predicting the observed
yields of chemical reactions with different reaction conditions and reactants. Uncertainty
methods for AL and methods based on design of experiments were compared.
The predictions were done by using the Bayesian probabilistic matrix factorization
model Macau. (2) Investigate how the induced uncertainty affects the performance
of Bayesian neural networks used to predict reaction conditions. The uncertainty
was used to evaluate how reliable the obtained predictions are. The network was
based on variational Bayesian methods and we compare Bayes by Backprop and MC
dropout on a severely imbalanced data set. We found that the use of uncertainty
in active learning shows better performance with respect to absolute error and variance
when a sufficient number of data points have been added to the training set.
Also, using uncertainty seems to yield a significant different training set compared
to randomly selected points. Bayes by Backprop illustrates comparable accuracy
to MC dropout, however, it struggles to predict the minority classes. This further
affects the uncertainty estimates on the minority classes which could indicate that
MC dropout is more certain than Bayes by Backprop. To conclude, the introduction
of uncertainty quantification seems to provide some valuable information to synthesis
prediction models. However, future research on the quality of the uncertainty is
needed to use the induced uncertainty to its full extent.
Beskrivning
Ämne/nyckelord
machine learning , uncertainty quantification , Bayesian probabilistic matrix factorization , Bayesian neural networks , Bayesian statistics , variational inference , active learning , drug discovery , synthesis prediction