ODR kommer att vara otillgängligt pga systemunderhåll onsdag 25 februari, 13:00 -15:00 (ca). Var vänlig och logga ut i god tid. // ODR will be unavailable due to system maintenance, Wednesday February 25, 13:00 - 15:00. Please log out in due time.
 

Generating Molecules in 3D from a Single Sequence

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The de novo generation of three-dimensional molecular structures is a fundamental task in drug discovery, where state-of-the-art approaches often rely on computationally expensive and architecturally complex SE(3)-equivariant models. This thesis explores a simpler, representation-centric paradigm. We introduce a novel method that uses a standard, non-equivariant autoregressive Transformer to generate molecules from a single, unified sequence. This sequence is constructed by interleaving discrete tokens for chemical topology (from SMILES) with discretized tokens for 3D geometry (from internal coordinates), reframing the entire task as a pure language modeling problem. Our primary discrete model, ALT_TOKEN, demonstrates the success of this strategy, achieving 99.0% chemical validity and generating structures with a low median energy of 3.07 kcal/mol that closely match the dataset distribution. These results outperform baselines using continuous representations. In conclusion, this work establishes that a standard Transformer, when paired with a carefully designed discrete and interleaved data representation, provides a viable, efficient, and less complex alternative for high-quality 3D molecular design.

Beskrivning

Ämne/nyckelord

Molecular generation, Internal coordinates, Language models

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced