Taxonomic Classification of Metagenomic Short Reads

Publicerad

Typ

Examensarbete för masterexamen

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Hospital acquired infections is a large issue in modern healthcare and they are becoming more difficult to treat due to increasing antibiotic resistance. To limit the spread of serious bacterial infections there is a need for fast diagnosis and treatment. The advent of next-generation sequencing has drastically reduced sequencing costs making it feasible to analyze metagenomic samples taken directly from the patients. This thesis has evaluated three metagenomic analysis tools with regards to species identification and abundance estimation for simulated metagenomic short reads originating from 15 different species. All tools showed different strengths and weaknesses, however an outstanding weakness found was classification of reads belonging to the Streptococcus mitis group and the Mycobacterium tuberculosis complex. To improve the classification of reads from Streptococcus and Mycobacterium we implemented a feed-forward neural network. For Streptococcus species we obtained an accuracy of 95% while our models failed to reach higher than 31% accuracy for Mycobacterium species. One of the causes for these different results is that the pairwise BLAST identity within the species groups are around 95% similarity for Streptococcus and 99% for Mycobacterium.

Beskrivning

Ämne/nyckelord

Metagenomics, Machine Learning, Neural Network, Taxonomic Classification

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced