Integrating heterogeneous ranked sources with different reliabilities - A case study of Gene-Disease associations
Examensarbete för masterexamen
Akhondi, Saber Ahmad
Identifying genes associated with a certain disease, bioprocess or pathway remains a big challenge in pharmaceutical industries, this process is time consuming and costly. To speed up the process candidate genes could be prioritized using ranked lists created by different methods and data sources. Each of these ranked lists comes with different reliabilities; integrating results of these methods are becoming necessary. Several methods have been proposed that can integrate these ranked lists but they do not take in to account the differences in reliability and they do not handle missing data satisfactorily. In this project, we modified the Discounted Rating System. The MDRS method integrates multiple ranked lists with different reliabilities, regardless of their scoring function and their list size. The reliability of different data sources were chosen through expert knowledge. The method was applied on gene-disease relations. To evaluate the results gold standard gene sets were used and output was analyzed using enrichment plots. By the uses of enrichment plots the performances of different methods and data sources were also observed. To our understanding, the MDRS method is shown to outperform current methods. The correlation of different data sources and methods were analyzed using Venn diagrams and hierarchical clustering. Distance matrices were created using Spearman’s rank correlation method and percentage of data similarities. Finally a method was introduced that would help analysis of a set of genes to find the most relevant diseases to the set.
Datavetenskap (datalogi) , Computer Science