Population Based Microsatellite Genotyping

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/203063
Download file(s):
File Description SizeFormat 
203063.pdfFulltext1.19 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Population Based Microsatellite Genotyping
Authors: Kristmundsdóttir, Snædís
Abstract: Microsatellites, also known as short tandem repeats (STRs) are short DNA sequences containing repeated motifs ranging from 2-6 bases. The number of repeats varies between individuals and the numbers occurring in a population are known as the alleles of a microsatellite. Each individual carries two copies of each chromosome and hence two alleles of each microsatellite. There are at least 250.000 microsatellites that have a known location on a human reference genome, the most common form is dinucleotide repeats. The range of applications for microsatellite analysis is very wide and includes among other things medical genetics, forensics and genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies in large due to a lack of tools capable of analyzing them. The goal of this thesis is to create a microsatellite genotype caller which is faster and more accurate than others previously presented. In order to accomplish this goal two things were examined. First, we reduce by 87% the amount of sequencing data necessary for creating microsatellite profiles using previously aligned sequencing data. This was achieved by filtering the input to contain only reads aligned to known microsatellite locations and unaligned reads as these should be the ones useful for profiling. The results indicate that when performing microsatellite profiling using previously aligned data it is possible to significantly reduce running time with negligible effects on the resulting profile. Second, the accuracy of the microsatellite profiler was increased from 87.5% to 96.3%. The improvements included using population information to train microsatellite and individual specific error profiles. This was done by adding parameters to the model as well as using sequencing data from multiple individuals to improve parameter estimates. Combining these two procedures we were able to give a practical implementation of microsatellite genotyping which is both much faster and more accurate than previously presented solutions.
Keywords: Data- och informationsvetenskap;Computer and Information Science
Issue Date: 2014
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/203063
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.