On nonlinear machine learning methodology for dose-response data in drug discovery
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This thesis investigates novel approaches to use nonlinear methodology for doseresponse
data in drug discovery. Such methodology could potentially create insights
and value within the field, saving resources such as time and usage of animals in
experiments. Methods for dimensionality reduction and visualization, as well as
methods for classification of compounds into clinical classes based on therapeutic
usage, are investigated. The thesis builds partly upon previous research where linear
methods, based on partial least squares and principal component analysis, have been
used for dimensionality reduction in drug discovery. By using results from linear
methods as a benchmark, this thesis investigates the nonlinear methods kernel partial
least squares and t-distributed stochastic neighbor embedding for dimensionality
reduction. Moreover, methods for classification of compounds are investigated using
the linear method multinomial logistic regression as well as the nonlinear methods
random forest and multi-layer perceptron networks.
Results from nonlinear methods for dimensionality reduction do not detect any
distinctly new patterns or clusters, compared to linear methodology. However, some
results are promising to build upon in further methodology development.
The best performing classification method shows results corresponding to wellknown
effects for 70.6% of the compounds evaluated. Moreover, classifications of
11.8% of the compounds indicate potentially unknown effects, which are considered
interesting and could be a springboard for further analysis and innovation. Therefore,
this classification methodology can create insight and potentially high value.
Beskrivning
Ämne/nyckelord
drug discovery, machine learning, classification, multi-layer perceptron, random forest, dimensionality reduction, partial least squares, kernel partial least squares, t-sne