Predicting multiple chemical contexts using multi-label classification and predictors
Examensarbete för masterexamen
Drug discovery is a time and resource intensive process. Machine learning is one way of speeding up the process. One important task is to choose suitable conditions – solvents, catalysts etc. – for a reaction to optimize the amount of product from the reaction. The purpose of this thesis was to investigate ways to improve condition prediction. In this thesis the condition prediction is limited to chemical contexts, or sets of conditions, and the reaction class Buchwald-Hartwig that is common in drug discovery. First, we evaluate two models using two approaches for multi-label classification to predict several possible chemical contexts for a reaction. We evaluate both a neural network and a binary relevance model. Second, we present a model for condition prediction of a chemical library used for parallel synthesis. Last, Venn ABERS predictors were added on top of these models to evaluate the impact of model calibration on these tasks. However, calibrating the scores with Venn-ABERS predictors did not improve our results. All models show potential in improving condition prediction. We consider both models for the multi-label classification task to be well-performing. Also, both models performed better than the naive models. The novel model for condition prediction for chemical libraries also showed good results which out-performed naive classifiers.
reaction prediction , condition prediction , cheminformatics , machine learning , drug development , multi-label classification , predictor , model calibration