Machine Learning for Prediction of Antibiotic Resistance

Carlsson, Emil

Machine Learning for Prediction of Antibiotic Resistance

dc.contributor.author	Carlsson, Emil
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.examiner	Lundh, Torbjörn
dc.contributor.supervisor	Dyrkell, Erik
dc.date.accessioned	2022-10-06T19:07:38Z
dc.date.available	2022-10-06T19:07:38Z
dc.date.issued	2019	sv
dc.date.submitted	2020
dc.description.abstract	This thesis aims to investigate whether machine learning can be used to diagnose whether a bacteria is resistance towards a certain antibiotic or not. This will be done by building a prediction model for prediction of minimum inhibitory concentration. Minimum inhibitory concentration is defined as the minimum dosage of a drug needed in order to inhibit a infection or disease. To do this, a labeled dataset consisting of 4964 genomes from Salmonella bacteria with corresponding minimum inhibitory concentrations for up to 15 antibiotics where used alongside a unlabeled dataset of Salmonella genomes taken from ncbi GenBank. Further, due to the small size of the dataset compared to the length of a Salmonella genome, more than 4 000 000 nucleotides, we divided each genome into k-mers and viewed each k-mer as a word. The genome can then be viewed as a document and the problem at hand becomes to classify this document w.r.t antibiotic resistance. To classify this document we took a bag-of-word approach, counting the occurrence of each k-mer and then producing a vector based on the count for each genome. The bag-of-word approach resulted in an information loss regarding the context of certain k-mers but made further processing feasible. Furthermore, we considered two different machine learning model for the given task. A standard feedforward neural network trained in a supervised setting and a ladder network trained in a semi-supervised setting. We trained the networks for prediction of inhibitory concentration for all the 15 antibiotics simultaneously. To handle missing labels in the data we constructed a customized output layer consisting of 15 softmax layers concatenated. Given a missing label we simply ignored to gradient from the corresponding softmax layer. The training set was also over-sampled using two different techniques based on bootstrapping and synthetic minority over-sampling. Moreover, it was found, through hyperparemeter tuning using the Parzen Tree estimator, that the semi-supervised learning did not improve the accuracy and a standard feedforward neural network had the best accuracy when it came to predicting exact minimum inhibitory concentration. Our feedforward neural network was then compared to baseline model, which was based on the distribution of labels in the dataset, and an already existing machine learning model trained on the considered dataset. It was found that our feedforward neural network outperformed both these models when it comes to prediction minimum inhibitory concentrations. The average accuracy for prediction of exact minimum inhibitory concentration where 0.78 and when the result was translated to the labels sensitive, intermediate and resistance towards an antibiotic the model got an average accuracy of 0.97.In addition, we evaluated our model with respect to the error rates defined by the National Antimicrobial Resistance Monitoring System and the error rates where found to not be low enough to be used in a clinical setting. We think that this is a combination of the limitations with a bag-of-word approach and the lack of data. Nevertheless, from this work we can conclude that machine learning is an intresting and prominent approach to autonomous prediction of of minimum inhibitory concentration and diagnosis of antibiotic resistance. However, several problems like the interpretability of the models and skewness in the datasets are yet to be solved before a machine learning model can be used on a clinical setting for this purpose. We end this thesis with a discussion regarding future work that could solve many of the problems encountered throughout this thesis.	sv
dc.identifier.coursecode	MVEX03	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/305689
dc.language.iso	eng	sv
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Machine Learning, Salmonella, Antibiotic Resistance, Minimum Inhibitory Concentration, Neural Network, Ladder Network, Bayesian Optimization	sv
dc.title	Machine Learning for Prediction of Antibiotic Resistance	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H
local.programme	Engineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: EmilCarlsson_Master_s_Thesis_2019.pdf
Size:: 4.1 MB
Format:: Adobe Portable Document Format
Description:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 1.51 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen