Detecting Metastable States in Proteins using E(3) Equivariant VAMPnets
Ladda ner
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Engineering mathematics and computational science (MPENM), MSc
Engineering mathematics and computational science (MPENM), MSc
Publicerad
2023
Författare
Arnesen , Sara
Nordström, David
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
As proteins fold, they encounter intermediary conformations, often denoted metastable
states, that are vital to deciphering diseases related to malfunctions in conformational
changes. To detect these metastable states, a deep learning framework using
the variational approach for Markov processes (VAMP) has been proposed, dubbed
VAMPnets. In this master’s thesis, we improve the training of VAMPnets through
the use of E(3) equivariant neural networks. These networks incorporate the symmetries
of Euclidean space, facilitating faster and more data-efficient learning. To
study the effectiveness of these networks, we benchmark two different equivariant
Transformer architectures and an equivariant convolutional network against both
a simple and an invariant multilayered perceptron. The models are evaluated on
molecular dynamics trajectories of alanine dipeptide and protein folding datasets.
The use of E(3) equivariant neural networks in training VAMPnets is shown to
significantly improve the prediction accuracy on random downsampled data. Using
only 1% of the dataset, the equivariant Transformer achieves almost twice the
VAMP-2 score as the benchmarks. Furthermore, the model exhibits improved robustness.
With only 20% data remaining, the model scores on par with the complete
dataset. On average, the model requires significantly fewer backward passes, converging
more than twice as fast as the benchmark models, showing enhanced data
efficiency. Furthermore, the results highlight the significant computational burden
that equivariant neural networks pose, especially for larger molecules, proving almost
1,000 times slower on the protein folding dataset. Finally, we propose a novel algorithm
for detecting the number of metastable states of a molecule using the VAMP-2
score and provide estimates for the 12 proteins in the protein folding dataset.
Beskrivning
Ämne/nyckelord
Computer Science , Engineering , Project , Thesis , Deep Learning , Protein Structures , Equivariant Neural Networks , Molecular Dynamics , Computational Biology , VAMPnets , Transformers