Deep learning for prediction of antibiotic resistance genes

Typ
Examensarbete för masterexamen
Program
Engineering mathematics and computational science (MPENM), MSc
Publicerad
2021
Författare
Salomonsson, Erika
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Abstract Antibiotic resistance is a serious public health challenge since it reduces the ability to prevent and treat bacterial infections with antibiotics. Bacteria that are resistant to antibiotics contain resistance genes that are shared between cells. This flow of resistance genes is one of the main reasons behind the rapid global increase of antibiotic resistant bacteria. It is essential to gather information about the already existing resistance genes to be able to counter the flow and to understand what resistance genes might become present in future clinical settings. The aim of this master’s thesis is to investigate if the transformer, which is a relatively new deep learning model, can predict genes that are resistant to the antibiotic class aminoglycosides and also to see if the transformer can distinguish between five different resistance gene classes. An advantage with transformers is that they rely on attention mechanisms that can detect global and complex dependencies in DNA structures which help characterize resistance genes. In order to reach the aim of this project, the architecture and parameters in the transformer model are explored and evaluated to find the model yielding the best performance. The optimal model is then used to make predictions on a real dataset. We obtained a transformer model that could predict resistance genes with a sensitivity of 0.989 and a specificity of 0.999. Using the same model, around 0.237 % of the real data were predicted as resistant. When the model tried to distinguish between resistance gene classes the sensitivity varied for the classes, where the lowest sensitivity was 0.263 and the highest sensitivity was 0.823. For all classes the specificity was higher than 0.970. A conclusion is that the performance of the transformer model to a great extent depends on the appearance of the input data. The bigger and more diverse dataset, the more dependencies in the DNA structure can be captured implying better performance. With proper datasets the transformer model can make classifications with very good performance.
Beskrivning
Ämne/nyckelord
antibiotic resistance, resistance genes, deep learning, transformer model, predictions, sensitivity, specificity
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index