Detecting Metastable States in Proteins using E(3) Equivariant VAMPnets

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Data science and AI (MPDSC), MSc
Engineering mathematics and computational science (MPENM), MSc
Publicerad
2023
Författare
Arnesen , Sara
Nordström, David
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
As proteins fold, they encounter intermediary conformations, often denoted metastable states, that are vital to deciphering diseases related to malfunctions in conformational changes. To detect these metastable states, a deep learning framework using the variational approach for Markov processes (VAMP) has been proposed, dubbed VAMPnets. In this master’s thesis, we improve the training of VAMPnets through the use of E(3) equivariant neural networks. These networks incorporate the symmetries of Euclidean space, facilitating faster and more data-efficient learning. To study the effectiveness of these networks, we benchmark two different equivariant Transformer architectures and an equivariant convolutional network against both a simple and an invariant multilayered perceptron. The models are evaluated on molecular dynamics trajectories of alanine dipeptide and protein folding datasets. The use of E(3) equivariant neural networks in training VAMPnets is shown to significantly improve the prediction accuracy on random downsampled data. Using only 1% of the dataset, the equivariant Transformer achieves almost twice the VAMP-2 score as the benchmarks. Furthermore, the model exhibits improved robustness. With only 20% data remaining, the model scores on par with the complete dataset. On average, the model requires significantly fewer backward passes, converging more than twice as fast as the benchmark models, showing enhanced data efficiency. Furthermore, the results highlight the significant computational burden that equivariant neural networks pose, especially for larger molecules, proving almost 1,000 times slower on the protein folding dataset. Finally, we propose a novel algorithm for detecting the number of metastable states of a molecule using the VAMP-2 score and provide estimates for the 12 proteins in the protein folding dataset.
Beskrivning
Ämne/nyckelord
Computer Science , Engineering , Project , Thesis , Deep Learning , Protein Structures , Equivariant Neural Networks , Molecular Dynamics , Computational Biology , VAMPnets , Transformers
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index