Bioinformatics and Statistical Methods for Identifying Enrichment of Functional Gene Classes in Telomeric Regions of Chromosomes

Examensarbete för masterexamen

Please use this identifier to cite or link to this item:
Download file(s):
File Description SizeFormat 
193480.pdfFulltext3.47 MBAdobe PDFThumbnail
Bibliographical item details
Type: Examensarbete för masterexamen
Master Thesis
Title: Bioinformatics and Statistical Methods for Identifying Enrichment of Functional Gene Classes in Telomeric Regions of Chromosomes
Authors: Ahamed, Tanvir Mohammad
Abstract: It has been noted that the telomeric regions of Saccharomyces cerevisiae has fewer essential genes than expected from random shuffling. Further the general effect of single gene silencing of non-essential genes in the telomeric regions with an average has less effect on viability than for non-essential genes in other chromosomal regions. It has also been suggested that the genes in the telomeric regions are less stable with higher mutation and recombination rates. And this could be an evolutionary positive property for adaption of genes with changing environment, provided that there are back up systems for the genes. In this work, we took a look at some different statistical properties of the telomeres and the genes in the telomeric regions. Some of the studied properties are: How dense the code is in the telomeric region compared to the rest of the genome? What length distribution do the genes have in the telomeric region in comparison to the general length distribution? What GO-annotated classes are over-represented in telomeres? Can we find protein sequence clusters that are over-represented in the telomeres? We have found fairly a lot of interesting properties and at least partly our results also support the earlier suggestions. Finally, for the future, we suggest that comparison of our different finding corresponding telomeric statistical properties in Saccharomyces cerevisiae should be performed with other yeast species, like Schizosaccharomyces pombe, which is evolutionary distant enough to be genomically fairly reshuffled. As usual, in multivariate statistics, the statistical properties are correlated (Length correlates to viability, function, etc.) and causality is hard to deduce, but may be easier to understand using more organisms. The main findings of the thesis were that, there is less code in the extreme telomeric region. In percentage, long essential genes in the telomeric region are very few. The numbers of genes in the long non-essential gene category are larger but also quite few compared to elsewhere. And of those that reside in the telomeric region, there are many genes related to metal ion transport, disaccharide and oligosaccharide metabolic and catabolic process. The pipeline of methods used in the present research also identifies some gene function related to helicase activity that has been pointed out in earlier research.
Keywords: Matematisk statistik;Grundläggande vetenskaper;Mathematical statistics;Basic Sciences
Issue Date: 2013
Publisher: Chalmers tekniska högskola / Institutionen för matematiska vetenskaper
Chalmers University of Technology / Department of Mathematical Sciences
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.