Active Learning and Predictive Modeling Using Uncertainty Quantification

Blomgren, Carl; Gummesson Svensson, Hampus

Active Learning and Predictive Modeling Using Uncertainty Quantification

Ladda ner

CSE 20-55 Blomgren Gummesson Svensson.pdf (15.62 MB)

Publicerad

2020

Författare

Blomgren, Carl

Gummesson Svensson, Hampus

Typ

Examensarbete för masterexamen

Sammanfattning

A deficit with current state-of-the-art machine learning algorithms in drug discovery is that they solely provide a point-estimate. However, in drug discovery, where data is associated with costly and time consuming experiments, there is a need for the models to indicate the uncertainty of their outputs. Otherwise, the models might be used erroneously. In order to obtain uncertainty from the models, this thesis utilizes Bayesian statistical models. In particular, the objective of this thesis is twofold: (1) Investigate the use of uncertainty in active learning (AL) for predicting the observed yields of chemical reactions with different reaction conditions and reactants. Uncertainty methods for AL and methods based on design of experiments were compared. The predictions were done by using the Bayesian probabilistic matrix factorization model Macau. (2) Investigate how the induced uncertainty affects the performance of Bayesian neural networks used to predict reaction conditions. The uncertainty was used to evaluate how reliable the obtained predictions are. The network was based on variational Bayesian methods and we compare Bayes by Backprop and MC dropout on a severely imbalanced data set. We found that the use of uncertainty in active learning shows better performance with respect to absolute error and variance when a sufficient number of data points have been added to the training set. Also, using uncertainty seems to yield a significant different training set compared to randomly selected points. Bayes by Backprop illustrates comparable accuracy to MC dropout, however, it struggles to predict the minority classes. This further affects the uncertainty estimates on the minority classes which could indicate that MC dropout is more certain than Bayes by Backprop. To conclude, the introduction of uncertainty quantification seems to provide some valuable information to synthesis prediction models. However, future research on the quality of the uncertainty is needed to use the induced uncertainty to its full extent.

Ämne/nyckelord

machine learning, uncertainty quantification, Bayesian probabilistic matrix factorization, Bayesian neural networks, Bayesian statistics, variational inference, active learning, drug discovery, synthesis prediction

URI

https://hdl.handle.net/20.500.12380/301740

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Active Learning and Predictive Modeling Using Uncertainty Quantification

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced