Deep learning for prediction of antibiotic resistance genes
Date
Authors
Type
Examensarbete för masterexamen
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract
Antibiotic resistance is a serious public health challenge since it reduces the ability
to prevent and treat bacterial infections with antibiotics. Bacteria that are resistant
to antibiotics contain resistance genes that are shared between cells. This flow
of resistance genes is one of the main reasons behind the rapid global increase of
antibiotic resistant bacteria. It is essential to gather information about the already
existing resistance genes to be able to counter the flow and to understand what
resistance genes might become present in future clinical settings.
The aim of this master’s thesis is to investigate if the transformer, which is a relatively
new deep learning model, can predict genes that are resistant to the antibiotic
class aminoglycosides and also to see if the transformer can distinguish between five
different resistance gene classes. An advantage with transformers is that they rely
on attention mechanisms that can detect global and complex dependencies in DNA
structures which help characterize resistance genes. In order to reach the aim of this
project, the architecture and parameters in the transformer model are explored and
evaluated to find the model yielding the best performance. The optimal model is
then used to make predictions on a real dataset.
We obtained a transformer model that could predict resistance genes with a sensitivity
of 0.989 and a specificity of 0.999. Using the same model, around 0.237 %
of the real data were predicted as resistant. When the model tried to distinguish
between resistance gene classes the sensitivity varied for the classes, where the lowest
sensitivity was 0.263 and the highest sensitivity was 0.823. For all classes the
specificity was higher than 0.970. A conclusion is that the performance of the transformer
model to a great extent depends on the appearance of the input data. The
bigger and more diverse dataset, the more dependencies in the DNA structure can
be captured implying better performance. With proper datasets the transformer
model can make classifications with very good performance.
Description
Keywords
antibiotic resistance, resistance genes, deep learning, transformer model, predictions, sensitivity, specificity