Deep Learning Models for Data Integration and Surrogate Models for Interpretable Predictions with Applications in Integromics and Recommender Systems
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Many tasks require the simultaneous analysis of multiple heterogeneous data sets, also known as integrative
data analysis. In the past, most data integration methods made linear assumptions in the shared latent
representations between the data sets. Recently, Deep Collective Matrix Factorization (dCMF) was proposed
as a matrix completion algorithm that can utilize auxiliary data sources without making any assumptions
about the data, by modelling non-linearities using deep learning. In this thesis, we examine the performance
and versatility of dCMF and propose a framework to interpret the predictions of the model, based on Linear
Interpretable Model-agnostic Explanations (LIME), that we call dCMF-LIME. The explanations give variable
importance measures for an individual prediction and can be used to gain trust or to troubleshoot a model. We
also propose a method for unsupervised data translation that we call a Data Translation Network (DTN) that
can learn to transform data from one set of data to another by first encoding them to a shared latent domain
and then reconstructing any of the learned data from said latent domain. We saw that dCMF outperformed
our baseline methods on simulated data and a recommendation task, but it showed poor performance on our
gene-disease association test, where it was outclassed by all other methods. DTN displayed the third best
performance in the same test and shows promise for future work.
Beskrivning
Ämne/nyckelord
Integrative data analysis, Deep learning, dCMF, CMF, Integromics