Cross-tissue variance analysis of gene sets
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Gene set enrichment is used to investigate the differences between gene expression for
genetic pathways in transcriptomic data. Gene set scoring methods like GSVA and
singscore are used in gene set enrichment analysis to assess the enrichment of genes of
interest, called gene sets. GSVA and singscore produces a score of how expressed a gene
set is in relationship with a reference expression, a reference that is not always accessible.
In this work we apply variance decomposition to investigate the use of singscore and
GSVA to create a baseline for RNA-seq data that lacks control samples and apply a VAE
for prediction of gene set scores across tissues. To this end, variance decomposition was
done on GTEx to assess the dataset’s use as a baseline, and a VAE was trained on GTEx
with the aim of predicting gene set scores across tissues.
Our results show that there is a limited use of using a reference dataset as a basis for RNA-seq data. The results are not conclusive enough to warrant usage in applications with the
precision needed in pharmaceutical research. The VAE based prediction shows lacklustre
results in predicting expression over tissues, and other machine learning methods should
be investigated for this application.
Beskrivning
Ämne/nyckelord
RNA-seq, GSVA, Transcriptomics, Bioinformatics, Variational autoencoder, GTEx, Variance decomposition