Deep learning for prediction of antibiotic resistance genes

Date

Type

Examensarbete för masterexamen

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Abstract Antibiotic resistance is a serious public health challenge since it reduces the ability to prevent and treat bacterial infections with antibiotics. Bacteria that are resistant to antibiotics contain resistance genes that are shared between cells. This flow of resistance genes is one of the main reasons behind the rapid global increase of antibiotic resistant bacteria. It is essential to gather information about the already existing resistance genes to be able to counter the flow and to understand what resistance genes might become present in future clinical settings. The aim of this master’s thesis is to investigate if the transformer, which is a relatively new deep learning model, can predict genes that are resistant to the antibiotic class aminoglycosides and also to see if the transformer can distinguish between five different resistance gene classes. An advantage with transformers is that they rely on attention mechanisms that can detect global and complex dependencies in DNA structures which help characterize resistance genes. In order to reach the aim of this project, the architecture and parameters in the transformer model are explored and evaluated to find the model yielding the best performance. The optimal model is then used to make predictions on a real dataset. We obtained a transformer model that could predict resistance genes with a sensitivity of 0.989 and a specificity of 0.999. Using the same model, around 0.237 % of the real data were predicted as resistant. When the model tried to distinguish between resistance gene classes the sensitivity varied for the classes, where the lowest sensitivity was 0.263 and the highest sensitivity was 0.823. For all classes the specificity was higher than 0.970. A conclusion is that the performance of the transformer model to a great extent depends on the appearance of the input data. The bigger and more diverse dataset, the more dependencies in the DNA structure can be captured implying better performance. With proper datasets the transformer model can make classifications with very good performance.

Description

Keywords

antibiotic resistance, resistance genes, deep learning, transformer model, predictions, sensitivity, specificity

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Collections

Endorsement

Review

Supplemented By

Referenced By