Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/300817
Download file(s):
File Description SizeFormat 
MastersThesis_Conti-Hermanova Billstein.pdf5.76 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Title: Experimental Design for Comparative Metagenomics Investigating and optimising the trade-off between number of samples and sequencing depth
Authors: Conti, Sofia
Hermanova Billstein, Martina
Abstract: In comparative metagenomics, samples from different environments are compared with the aim to identify differentially abundant genes. It is important to have a sound experimental design in such studies, including a sufficiently large number of samples from each environment as well as a sufficiently high sequencing depth in each sample. The aim of this master’s thesis was to provide guidance on the required number of samples and sequencing depth for experimental designs in future comparative metagenomic studies. In order to do so, various experimental designs with different number of samples and sequencing depths were evaluated based on their statistical performance. For each design, a large number of artificial datasets were created by resampling real metagenomic data. Three real datasets were used and the analyses were conducted in R. The performances of all the investigated designs were shown to improve when the effect size of the studied phenomenon was large as well as when the studied genes had high abundance or low variability. It was further found that the performance of the designs increased both with increasing sequencing depth and with increasing number of samples in each group. A sequencing depth of ten thousand reads was generally too low to yield an acceptable performance. Likewise, having only three samples in each group was found to be too few unless the studied genes had high abundance or low variability. The main result was that the performance improved more with increasing number of samples than with increasing sequencing depth. However, when taking the economic aspect into account, a larger amount of samples became less profitable due to the high sequencing cost per sample. A final conclusion was that an experimental design may be less extensive and use fewer samples if the effect size is large or if the studied genes have high abundance or low variability.
Keywords: bioinformatics, performance, statistical power, economic impact, false discovery rate (FDR), effect size, gene abundance, gene variability, differentially abundant genes (DAGs), R.
Issue Date: 2020
Publisher: Chalmers tekniska högskola / Institutionen för matematiska vetenskaper
URI: https://hdl.handle.net/20.500.12380/300817
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.