Learning Meaningful Representations of Cells
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Biotechnology (MPBIO), MSc
Publicerad
2024
Författare
Andrekson, Leo
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Batch effects are a significant concern in single-cell RNA sequencing (scRNA-Seq) data analysis, where variations in the data can be attributed to factors unrelated to
cell types. This can make downstream analysis a challenging task. In this study, a neural network model is designed utilizing contrastive learning and a novel loss func tion for learning an generalizable embedding space from scRNA-Seq data. When benchmarked against multiple established methods for scRNA-Seq integration, the
model outperforms existing methods in learning a generalizable embedding space on multiple datasets. A downstream application that was investigated for the embedding space was cell type annotation. When compared against multiple well established cell type classifiers, the model in this study displayed a performance competitive with top performing methods across multiple metrics, such as accuracy, balanced accuracy, and F1 score. These findings aim to quantify the “meaningfulness” of the embedding space learned by the model, and highlight the potential applications of these learned cellular representations. The model is currently being structured into an open-source Python package, simplifying and streamlining its usage.
Beskrivning
Ämne/nyckelord
scRNA-Seq , Deep learning , Contrastive learning , Bioinformatics , Cell type annotation , Novel cell type detection , Cell type representations , Machine learning , AI , Transformer