Predicting multiple chemical contexts using multi-label classification and predictors
Typ
Examensarbete för masterexamen
Program
Publicerad
2021
Författare
Lahti, Gustav
Mårdh, Agnes
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Drug discovery is a time and resource intensive process. Machine learning is one way
of speeding up the process. One important task is to choose suitable conditions –
solvents, catalysts etc. – for a reaction to optimize the amount of product from the
reaction. The purpose of this thesis was to investigate ways to improve condition
prediction. In this thesis the condition prediction is limited to chemical contexts, or
sets of conditions, and the reaction class Buchwald-Hartwig that is common in drug
discovery.
First, we evaluate two models using two approaches for multi-label classification
to predict several possible chemical contexts for a reaction. We evaluate both a
neural network and a binary relevance model. Second, we present a model for
condition prediction of a chemical library used for parallel synthesis. Last, Venn ABERS predictors were added on top of these models to evaluate the impact of
model calibration on these tasks. However, calibrating the scores with Venn-ABERS
predictors did not improve our results.
All models show potential in improving condition prediction. We consider both
models for the multi-label classification task to be well-performing. Also, both
models performed better than the naive models. The novel model for condition
prediction for chemical libraries also showed good results which out-performed naive
classifiers.
Beskrivning
Ämne/nyckelord
reaction prediction , condition prediction , cheminformatics , machine learning , drug development , multi-label classification , predictor , model calibration