Machine Learning for Structural Predictions of PROTACs

Typ
Examensarbete för masterexamen
Master's Thesis
Program
Biotechnology (MPBIO), MSc
Publicerad
2024
Författare
Källberg, Anders
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
PROteolysis TArgeting Chimeras (PROTACs) are molecules that induce the degradation of targeted proteins by hijacking the ubiquitin–proteasome system in the cell. A PROTAC binds simultaneously to an E3 ligase and a protein of interest (POI), forming a ternary complex. The ubiquitin–proteasome system tags the POI with ubiquitin, marking it for degradation by the proteasome. The formation of a good ternary complex is essential for the ubiquitination and subsequent degradation of the POI. Being able to accurately model ternary complexes thus provides critical advantages in the development of PROTACs; however, data on PROTACs and their crystallized ternary complexes are limited. Accurate predictions of these structures are desirable, but current computational methods struggle to simulate the interactions between the PROTAC and both proteins simultaneously. AlphaFold, a machine learning tool, has been shown to accurately predict protein complexes. Yet, research on applying AlphaFold to predict ternary complexes is scarce. In the first part of this thesis, the ternary complex was modeled using AlphaFold by utilizing the sequences of both natural and artificially linked POIs and E3 ligase. Nevertheless, it was determined that AlphaFold was unable to accurately predict these complexes, reasonably because it was not able to take the PROTAC into account in the predictions. The second part of this thesis focused on generating data on PROTAC substructures, essential for the development of these molecules. Despite the availability of such data, obtaining high-quality data on substructures of specific PROTACs can be challenging and time-consuming. To address this, the PROTAC Splitter, a novel machine learning tool based on graph neural networks, was developed to predict these substructures. The PROTAC Splitter predicts 99.7% of PROTACs, with known substructures, to a maximal error of 6 atoms wrong between the boundaries of the ligands and linker. It generalizes to PROTACs with three unknown substructures, where 23.1% of these predictions satisfy the same criteria. The code for the PROTAC splitter is available at https://github.com/AndersKallberg/PROTAC_splitter. Although accurate predictions of ternary complexes remain challenging, the PROTAC Splitter makes the substructures easily accessible to anyone in this field of research. In summary, the work presented in this thesis answers scientific questions in two complementary areas of PROTAC development: (1) ternary (protein) structure prediction, and (2) PROTAC component prediction. This information is limited and valuable, and accurate predictions of these could accelerate the discovery of effective PROTACs and help in the fight against disease.
Beskrivning
Ämne/nyckelord
PROTAC , Ternary Structure , Substructures , AlphaFold , Protein Structure Prediction , Graph Neural Networks , Node Prediction , Link Prediction , Machine Learning , AI
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index