Detecting Metastable States in Proteins using E(3) Equivariant VAMPnets

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

As proteins fold, they encounter intermediary conformations, often denoted metastable states, that are vital to deciphering diseases related to malfunctions in conformational changes. To detect these metastable states, a deep learning framework using the variational approach for Markov processes (VAMP) has been proposed, dubbed VAMPnets. In this master’s thesis, we improve the training of VAMPnets through the use of E(3) equivariant neural networks. These networks incorporate the symmetries of Euclidean space, facilitating faster and more data-efficient learning. To study the effectiveness of these networks, we benchmark two different equivariant Transformer architectures and an equivariant convolutional network against both a simple and an invariant multilayered perceptron. The models are evaluated on molecular dynamics trajectories of alanine dipeptide and protein folding datasets. The use of E(3) equivariant neural networks in training VAMPnets is shown to significantly improve the prediction accuracy on random downsampled data. Using only 1% of the dataset, the equivariant Transformer achieves almost twice the VAMP-2 score as the benchmarks. Furthermore, the model exhibits improved robustness. With only 20% data remaining, the model scores on par with the complete dataset. On average, the model requires significantly fewer backward passes, converging more than twice as fast as the benchmark models, showing enhanced data efficiency. Furthermore, the results highlight the significant computational burden that equivariant neural networks pose, especially for larger molecules, proving almost 1,000 times slower on the protein folding dataset. Finally, we propose a novel algorithm for detecting the number of metastable states of a molecule using the VAMP-2 score and provide estimates for the 12 proteins in the protein folding dataset.

Beskrivning

Ämne/nyckelord

Computer Science, Engineering, Project, Thesis, Deep Learning, Protein Structures, Equivariant Neural Networks, Molecular Dynamics, Computational Biology, VAMPnets, Transformers

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced