Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth

dc.contributor.authorConti, Sofia
dc.contributor.authorHermanova Billstein, Martina
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerKristiansson, Erik
dc.contributor.supervisorKristiansson, Erik
dc.contributor.supervisorJonsson, Viktor
dc.date.accessioned2020-06-10T07:42:59Z
dc.date.available2020-06-10T07:42:59Z
dc.date.issued2020sv
dc.date.submitted2019
dc.description.abstractIn comparative metagenomics, samples from different environments are compared with the aim to identify differentially abundant genes. It is important to have a sound experimental design in such studies, including a sufficiently large number of samples from each environment as well as a sufficiently high sequencing depth in each sample. The aim of this master’s thesis was to provide guidance on the required number of samples and sequencing depth for experimental designs in future comparative metagenomic studies. In order to do so, various experimental designs with different number of samples and sequencing depths were evaluated based on their statistical performance. For each design, a large number of artificial datasets were created by resampling real metagenomic data. Three real datasets were used and the analyses were conducted in R. The performances of all the investigated designs were shown to improve when the effect size of the studied phenomenon was large as well as when the studied genes had high abundance or low variability. It was further found that the performance of the designs increased both with increasing sequencing depth and with increasing number of samples in each group. A sequencing depth of ten thousand reads was generally too low to yield an acceptable performance. Likewise, having only three samples in each group was found to be too few unless the studied genes had high abundance or low variability. The main result was that the performance improved more with increasing number of samples than with increasing sequencing depth. However, when taking the economic aspect into account, a larger amount of samples became less profitable due to the high sequencing cost per sample. A final conclusion was that an experimental design may be less extensive and use fewer samples if the effect size is large or if the studied genes have high abundance or low variability.sv
dc.identifier.coursecodeMVEX03sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/300817
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectbioinformatics, performance, statistical power, economic impact, false discovery rate (FDR), effect size, gene abundance, gene variability, differentially abundant genes (DAGs), R.sv
dc.titleExperimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depthsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
MastersThesis_Conti-Hermanova Billstein.pdf
Storlek:
5.62 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: