Using Transformer-based Neural Networks for classifying cellular states in Glioblastoma
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2024
Författare
Hedberg, Ronja
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
By taking inspiration from the progress made in Natural Language Processing with
the use of Transformer-based Neural Networks, similar approaches have been proposed
for single-cell RNA-sequencing data in hope of capturing complex gene-to-gene
interactions. One such approach is the pre-trained single-cell bidirectional encoder
(scBERT), whose architecture and pre-training follows its Natural Language counterpart,
BERT. Unlike BERT, scBERT was pre-trained for masked gene expression
prediction using single-cell datasets comprising over 1.5 million single-cell RNAsequencing
profiles. This thesis performs an initial assessment of the use of scBERT
with novel single-cell data. In classifying annotated cellular states of Glioblastoma,
the inclusion of scBERT showed overall limited advantages compared to using the
gene expression directly. However, through the simulation of different scenarios,
this thesis provides preliminary evidence in favor of the use of scBERT in the lack
of ample signal (low number of expressed genes, and scarce number of training examples).
This showcases the potential benefits of using the gene representations of
massive single-cell Transformer-based models, especially when little information is
available, which is frequently the case when working with in-house data or heavily
underrepresented cellular states.
Beskrivning
Ämne/nyckelord
Machine Learning, scRNA-seq, Transformer, Cellular states, Glioblastoma, Cancer, Natural Language Processing, Encoder.