Machine Learning for Prediction of Antibiotic Resistance

dc.contributor.authorCarlsson, Emil
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerLundh, Torbjörn
dc.contributor.supervisorDyrkell, Erik
dc.date.accessioned2022-10-06T19:07:38Z
dc.date.available2022-10-06T19:07:38Z
dc.date.issued2019sv
dc.date.submitted2020
dc.description.abstractThis thesis aims to investigate whether machine learning can be used to diagnose whether a bacteria is resistance towards a certain antibiotic or not. This will be done by building a prediction model for prediction of minimum inhibitory concentration. Minimum inhibitory concentration is defined as the minimum dosage of a drug needed in order to inhibit a infection or disease. To do this, a labeled dataset consisting of 4964 genomes from Salmonella bacteria with corresponding minimum inhibitory concentrations for up to 15 antibiotics where used alongside a unlabeled dataset of Salmonella genomes taken from ncbi GenBank. Further, due to the small size of the dataset compared to the length of a Salmonella genome, more than 4 000 000 nucleotides, we divided each genome into k-mers and viewed each k-mer as a word. The genome can then be viewed as a document and the problem at hand becomes to classify this document w.r.t antibiotic resistance. To classify this document we took a bag-of-word approach, counting the occurrence of each k-mer and then producing a vector based on the count for each genome. The bag-of-word approach resulted in an information loss regarding the context of certain k-mers but made further processing feasible. Furthermore, we considered two different machine learning model for the given task. A standard feedforward neural network trained in a supervised setting and a ladder network trained in a semi-supervised setting. We trained the networks for prediction of inhibitory concentration for all the 15 antibiotics simultaneously. To handle missing labels in the data we constructed a customized output layer consisting of 15 softmax layers concatenated. Given a missing label we simply ignored to gradient from the corresponding softmax layer. The training set was also over-sampled using two different techniques based on bootstrapping and synthetic minority over-sampling. Moreover, it was found, through hyperparemeter tuning using the Parzen Tree estimator, that the semi-supervised learning did not improve the accuracy and a standard feedforward neural network had the best accuracy when it came to predicting exact minimum inhibitory concentration. Our feedforward neural network was then compared to baseline model, which was based on the distribution of labels in the dataset, and an already existing machine learning model trained on the considered dataset. It was found that our feedforward neural network outperformed both these models when it comes to prediction minimum inhibitory concentrations. The average accuracy for prediction of exact minimum inhibitory concentration where 0.78 and when the result was translated to the labels sensitive, intermediate and resistance towards an antibiotic the model got an average accuracy of 0.97.In addition, we evaluated our model with respect to the error rates defined by the National Antimicrobial Resistance Monitoring System and the error rates where found to not be low enough to be used in a clinical setting. We think that this is a combination of the limitations with a bag-of-word approach and the lack of data. Nevertheless, from this work we can conclude that machine learning is an intresting and prominent approach to autonomous prediction of of minimum inhibitory concentration and diagnosis of antibiotic resistance. However, several problems like the interpretability of the models and skewness in the datasets are yet to be solved before a machine learning model can be used on a clinical setting for this purpose. We end this thesis with a discussion regarding future work that could solve many of the problems encountered throughout this thesis.sv
dc.identifier.coursecodeMVEX03sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/305689
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectMachine Learning, Salmonella, Antibiotic Resistance, Minimum Inhibitory Concentration, Neural Network, Ladder Network, Bayesian Optimizationsv
dc.titleMachine Learning for Prediction of Antibiotic Resistancesv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
EmilCarlsson_Master_s_Thesis_2019.pdf
Storlek:
4.1 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: