Bioinformatics and Statistical Methods for Identifying Enrichment of Functional Gene Classes in Telomeric Regions of Chromosomes

Typ
Examensarbete för masterexamen
Master Thesis
Program
Bioinformatics and systems biology, MSc
Publicerad
2013
Författare
Ahamed, Tanvir Mohammad
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
It has been noted that the telomeric regions of Saccharomyces cerevisiae has fewer essential genes than expected from random shuffling. Further the general effect of single gene silencing of non-essential genes in the telomeric regions with an average has less effect on viability than for non-essential genes in other chromosomal regions. It has also been suggested that the genes in the telomeric regions are less stable with higher mutation and recombination rates. And this could be an evolutionary positive property for adaption of genes with changing environment, provided that there are back up systems for the genes. In this work, we took a look at some different statistical properties of the telomeres and the genes in the telomeric regions. Some of the studied properties are: How dense the code is in the telomeric region compared to the rest of the genome? What length distribution do the genes have in the telomeric region in comparison to the general length distribution? What GO-annotated classes are over-represented in telomeres? Can we find protein sequence clusters that are over-represented in the telomeres? We have found fairly a lot of interesting properties and at least partly our results also support the earlier suggestions. Finally, for the future, we suggest that comparison of our different finding corresponding telomeric statistical properties in Saccharomyces cerevisiae should be performed with other yeast species, like Schizosaccharomyces pombe, which is evolutionary distant enough to be genomically fairly reshuffled. As usual, in multivariate statistics, the statistical properties are correlated (Length correlates to viability, function, etc.) and causality is hard to deduce, but may be easier to understand using more organisms. The main findings of the thesis were that, there is less code in the extreme telomeric region. In percentage, long essential genes in the telomeric region are very few. The numbers of genes in the long non-essential gene category are larger but also quite few compared to elsewhere. And of those that reside in the telomeric region, there are many genes related to metal ion transport, disaccharide and oligosaccharide metabolic and catabolic process. The pipeline of methods used in the present research also identifies some gene function related to helicase activity that has been pointed out in earlier research.
Beskrivning
Ämne/nyckelord
Matematisk statistik , Grundläggande vetenskaper , Mathematical statistics , Basic Sciences
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index