Integrating heterogeneous ranked sources with different reliabilities - A case study of Gene-Disease associations

dc.contributor.authorAkhondi, Saber Ahmad
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)sv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineering (Chalmers)en
dc.date.accessioned2019-07-03T12:40:29Z
dc.date.available2019-07-03T12:40:29Z
dc.date.issued2011
dc.description.abstractIdentifying genes associated with a certain disease, bioprocess or pathway remains a big challenge in pharmaceutical industries, this process is time consuming and costly. To speed up the process candidate genes could be prioritized using ranked lists created by different methods and data sources. Each of these ranked lists comes with different reliabilities; integrating results of these methods are becoming necessary. Several methods have been proposed that can integrate these ranked lists but they do not take in to account the differences in reliability and they do not handle missing data satisfactorily. In this project, we modified the Discounted Rating System. The MDRS method integrates multiple ranked lists with different reliabilities, regardless of their scoring function and their list size. The reliability of different data sources were chosen through expert knowledge. The method was applied on gene-disease relations. To evaluate the results gold standard gene sets were used and output was analyzed using enrichment plots. By the uses of enrichment plots the performances of different methods and data sources were also observed. To our understanding, the MDRS method is shown to outperform current methods. The correlation of different data sources and methods were analyzed using Venn diagrams and hierarchical clustering. Distance matrices were created using Spearman’s rank correlation method and percentage of data similarities. Finally a method was introduced that would help analysis of a set of genes to find the most relevant diseases to the set.
dc.identifier.urihttps://hdl.handle.net/20.500.12380/147383
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectDatavetenskap (datalogi)
dc.subjectComputer Science
dc.titleIntegrating heterogeneous ranked sources with different reliabilities - A case study of Gene-Disease associations
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster Thesisen
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
147383.pdf
Storlek:
3.61 MB
Format:
Adobe Portable Document Format
Beskrivning:
Fulltext