Taxonomic Classification of Metagenomic Short Reads
Examensarbete för masterexamen
Complex adaptive systems (MPCAS), MSc
Hospital acquired infections is a large issue in modern healthcare and they are becoming more difficult to treat due to increasing antibiotic resistance. To limit the spread of serious bacterial infections there is a need for fast diagnosis and treatment. The advent of next-generation sequencing has drastically reduced sequencing costs making it feasible to analyze metagenomic samples taken directly from the patients. This thesis has evaluated three metagenomic analysis tools with regards to species identification and abundance estimation for simulated metagenomic short reads originating from 15 different species. All tools showed different strengths and weaknesses, however an outstanding weakness found was classification of reads belonging to the Streptococcus mitis group and the Mycobacterium tuberculosis complex. To improve the classification of reads from Streptococcus and Mycobacterium we implemented a feed-forward neural network. For Streptococcus species we obtained an accuracy of 95% while our models failed to reach higher than 31% accuracy for Mycobacterium species. One of the causes for these different results is that the pairwise BLAST identity within the species groups are around 95% similarity for Streptococcus and 99% for Mycobacterium.
Metagenomics, Machine Learning, Neural Network, Taxonomic Classification