Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth

Conti, Sofia; Hermanova Billstein, Martina

Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth

dc.contributor.author	Conti, Sofia
dc.contributor.author	Hermanova Billstein, Martina
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.examiner	Kristiansson, Erik
dc.contributor.supervisor	Kristiansson, Erik
dc.contributor.supervisor	Jonsson, Viktor
dc.date.accessioned	2020-06-10T07:42:59Z
dc.date.available	2020-06-10T07:42:59Z
dc.date.issued	2020	sv
dc.date.submitted	2019
dc.description.abstract	In comparative metagenomics, samples from different environments are compared with the aim to identify differentially abundant genes. It is important to have a sound experimental design in such studies, including a sufficiently large number of samples from each environment as well as a sufficiently high sequencing depth in each sample. The aim of this master’s thesis was to provide guidance on the required number of samples and sequencing depth for experimental designs in future comparative metagenomic studies. In order to do so, various experimental designs with different number of samples and sequencing depths were evaluated based on their statistical performance. For each design, a large number of artificial datasets were created by resampling real metagenomic data. Three real datasets were used and the analyses were conducted in R. The performances of all the investigated designs were shown to improve when the effect size of the studied phenomenon was large as well as when the studied genes had high abundance or low variability. It was further found that the performance of the designs increased both with increasing sequencing depth and with increasing number of samples in each group. A sequencing depth of ten thousand reads was generally too low to yield an acceptable performance. Likewise, having only three samples in each group was found to be too few unless the studied genes had high abundance or low variability. The main result was that the performance improved more with increasing number of samples than with increasing sequencing depth. However, when taking the economic aspect into account, a larger amount of samples became less profitable due to the high sequencing cost per sample. A final conclusion was that an experimental design may be less extensive and use fewer samples if the effect size is large or if the studied genes have high abundance or low variability.	sv
dc.identifier.coursecode	MVEX03	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/300817
dc.language.iso	eng	sv
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	bioinformatics, performance, statistical power, economic impact, false discovery rate (FDR), effect size, gene abundance, gene variability, differentially abundant genes (DAGs), R.	sv
dc.title	Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H
local.programme	Engineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: MastersThesis_Conti-Hermanova Billstein.pdf
Storlek:: 5.62 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.14 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen