Active Learning and Predictive Modeling Using Uncertainty Quantification

Blomgren, Carl; Gummesson Svensson, Hampus

Active Learning and Predictive Modeling Using Uncertainty Quantification

dc.contributor.author	Blomgren, Carl
dc.contributor.author	Gummesson Svensson, Hampus
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.examiner	Kemp, Graham
dc.contributor.supervisor	Yu, Yinan
dc.contributor.supervisor	Johansson, Simon
dc.date.accessioned	2020-09-18T14:08:32Z
dc.date.available	2020-09-18T14:08:32Z
dc.date.issued	2020	sv
dc.date.submitted	2020
dc.description.abstract	A deficit with current state-of-the-art machine learning algorithms in drug discovery is that they solely provide a point-estimate. However, in drug discovery, where data is associated with costly and time consuming experiments, there is a need for the models to indicate the uncertainty of their outputs. Otherwise, the models might be used erroneously. In order to obtain uncertainty from the models, this thesis utilizes Bayesian statistical models. In particular, the objective of this thesis is twofold: (1) Investigate the use of uncertainty in active learning (AL) for predicting the observed yields of chemical reactions with different reaction conditions and reactants. Uncertainty methods for AL and methods based on design of experiments were compared. The predictions were done by using the Bayesian probabilistic matrix factorization model Macau. (2) Investigate how the induced uncertainty affects the performance of Bayesian neural networks used to predict reaction conditions. The uncertainty was used to evaluate how reliable the obtained predictions are. The network was based on variational Bayesian methods and we compare Bayes by Backprop and MC dropout on a severely imbalanced data set. We found that the use of uncertainty in active learning shows better performance with respect to absolute error and variance when a sufficient number of data points have been added to the training set. Also, using uncertainty seems to yield a significant different training set compared to randomly selected points. Bayes by Backprop illustrates comparable accuracy to MC dropout, however, it struggles to predict the minority classes. This further affects the uncertainty estimates on the minority classes which could indicate that MC dropout is more certain than Bayes by Backprop. To conclude, the introduction of uncertainty quantification seems to provide some valuable information to synthesis prediction models. However, future research on the quality of the uncertainty is needed to use the induced uncertainty to its full extent.	sv
dc.identifier.coursecode	DATX05	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/301740
dc.language.iso	eng	sv
dc.setspec.uppsok	Technology
dc.subject	machine learning	sv
dc.subject	uncertainty quantification	sv
dc.subject	Bayesian probabilistic matrix factorization	sv
dc.subject	Bayesian neural networks	sv
dc.subject	Bayesian statistics	sv
dc.subject	variational inference	sv
dc.subject	active learning	sv
dc.subject	drug discovery	sv
dc.subject	synthesis prediction	sv
dc.title	Active Learning and Predictive Modeling Using Uncertainty Quantification	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 20-55 Blomgren Gummesson Svensson.pdf
Size:: 15.62 MB
Format:: Adobe Portable Document Format
Description:: Active Learning and Predictive Modeling Using Uncertainty Quantification

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 1.14 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen