Integrating heterogeneous ranked sources with different reliabilities - A case study of Gene-Disease associations

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/147383
Download file(s):
File Description SizeFormat 
147383.pdfFulltext3.7 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Integrating heterogeneous ranked sources with different reliabilities - A case study of Gene-Disease associations
Authors: Akhondi, Saber Ahmad
Abstract: Identifying genes associated with a certain disease, bioprocess or pathway remains a big challenge in pharmaceutical industries, this process is time consuming and costly. To speed up the process candidate genes could be prioritized using ranked lists created by different methods and data sources. Each of these ranked lists comes with different reliabilities; integrating results of these methods are becoming necessary. Several methods have been proposed that can integrate these ranked lists but they do not take in to account the differences in reliability and they do not handle missing data satisfactorily. In this project, we modified the Discounted Rating System. The MDRS method integrates multiple ranked lists with different reliabilities, regardless of their scoring function and their list size. The reliability of different data sources were chosen through expert knowledge. The method was applied on gene-disease relations. To evaluate the results gold standard gene sets were used and output was analyzed using enrichment plots. By the uses of enrichment plots the performances of different methods and data sources were also observed. To our understanding, the MDRS method is shown to outperform current methods. The correlation of different data sources and methods were analyzed using Venn diagrams and hierarchical clustering. Distance matrices were created using Spearman’s rank correlation method and percentage of data similarities. Finally a method was introduced that would help analysis of a set of genes to find the most relevant diseases to the set.
Keywords: Datavetenskap (datalogi);Computer Science
Issue Date: 2011
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/147383
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.