What is a successful antibiotic resistance gene? A conceptual model and machine learning predictions

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Biotechnology (MPBIO), MSc
Publicerad
2024
Författare
Einarsson, Elinor
Torell, Stina
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Antibiotic resistance is a global public health threat and it causes bacterial infections to become more difficult to treat. The spread of antibiotic resistance genes (ARGs) is predominantly driven by horizontal gene transfer (HGT) that enables bacteria to share genetic information directly between cells. The ability of an ARG to spread is influenced by a range of factors, and has become a popular field of research, aiming to find characteristics that enable rapid antibiotic resistance dissemination. This facilitates the identification of ARGs that possess the ability to disseminate rapidly, and for proactive measures against the dissemination to be implemented. Bioinformatics tools were used to study the prevalence of 4775 known ARGs in 867 318 bacterial genomes. A conceptual model describing the success of an ARG was developed containing four different measures of dissemination, over taxonomic barriers, in different GC-environments, geographical dissemination, and dissemination to pathogenic bacteria. By using a top-down approach studying the success of a gene, the thesis complements research studying factors that characterizes successful and rapid HGT. The conceptual model resulted in a success-score for each ARG that reflected the overall performance in the four components. Among the ARGs found to be highly successful the most common class was multidrug resistance, followed by aminoglycoside, β-lactam, and MLS antibiotic resistance. Furthermore, the success-score together with information about the genes, were used to investigate the possibility to predict the success of an ARG with the use of machine learning in a binary classification Random forest algorithm. The model was built to evaluate the predictive performance using decreasing amounts of observations of each gene. As expected, the predictive performance of the model improved as the number of observation increased. Based on only one observation, it was possible to predict the class of each gene with an average sensitivity of ~70% at 90% specificity, and with 250 observations a sensitivity of 98% could be attained. Sequence related features such as gene length and codon usage were important when only a few observations of a gene were used, but as the number of observations grew, non-sequence related features such as number of countries and pathogens a gene was found in, became more relevant. A meta-analysis also aims to explore the managerial and policy implications of antibiotics resistance, and findings include that policies facilitating for machine learning are important to implement. This study can be used as a starting point in the modelling of antibiotic resistance gene success, aiming to help identify emerging ARGs that have the possibility to become future threats.
Beskrivning
Ämne/nyckelord
Antibiotic Resistance, Bioinformatics, Horizontal Gene Transfer, Successful ARGs, Machine Learning, Random Forest, Managerial implications
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index