Taxonomic Classification of Metagenomic Short Reads
Typ
Examensarbete för masterexamen
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2020
Författare
Wikström, Matilda
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Hospital acquired infections is a large issue in modern healthcare and they are
becoming more difficult to treat due to increasing antibiotic resistance. To limit the
spread of serious bacterial infections there is a need for fast diagnosis and treatment.
The advent of next-generation sequencing has drastically reduced sequencing costs
making it feasible to analyze metagenomic samples taken directly from the patients.
This thesis has evaluated three metagenomic analysis tools with regards to species
identification and abundance estimation for simulated metagenomic short reads originating
from 15 different species. All tools showed different strengths and weaknesses,
however an outstanding weakness found was classification of reads belonging
to the Streptococcus mitis group and the Mycobacterium tuberculosis complex.
To improve the classification of reads from Streptococcus and Mycobacterium we
implemented a feed-forward neural network. For Streptococcus species we obtained
an accuracy of 95% while our models failed to reach higher than 31% accuracy for
Mycobacterium species. One of the causes for these different results is that the
pairwise BLAST identity within the species groups are around 95% similarity for
Streptococcus and 99% for Mycobacterium.
Beskrivning
Ämne/nyckelord
Metagenomics, Machine Learning, Neural Network, Taxonomic Classification