Generating Molecules in 3D from a Single Sequence
| dc.contributor.author | Shi, Yuwei | |
| dc.contributor.author | Zhao, Jinyi | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
| dc.contributor.examiner | Engkvist, Ola | |
| dc.contributor.supervisor | Olsson, Simon | |
| dc.date.accessioned | 2026-01-19T08:27:33Z | |
| dc.date.issued | 2025 | |
| dc.date.submitted | ||
| dc.description.abstract | The de novo generation of three-dimensional molecular structures is a fundamental task in drug discovery, where state-of-the-art approaches often rely on computationally expensive and architecturally complex SE(3)-equivariant models. This thesis explores a simpler, representation-centric paradigm. We introduce a novel method that uses a standard, non-equivariant autoregressive Transformer to generate molecules from a single, unified sequence. This sequence is constructed by interleaving discrete tokens for chemical topology (from SMILES) with discretized tokens for 3D geometry (from internal coordinates), reframing the entire task as a pure language modeling problem. Our primary discrete model, ALT_TOKEN, demonstrates the success of this strategy, achieving 99.0% chemical validity and generating structures with a low median energy of 3.07 kcal/mol that closely match the dataset distribution. These results outperform baselines using continuous representations. In conclusion, this work establishes that a standard Transformer, when paired with a carefully designed discrete and interleaved data representation, provides a viable, efficient, and less complex alternative for high-quality 3D molecular design. | |
| dc.identifier.coursecode | DATX05 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12380/310920 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Molecular generation | |
| dc.subject | Internal coordinates | |
| dc.subject | Language models | |
| dc.title | Generating Molecules in 3D from a Single Sequence | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Engineering mathematics and computational science (MPENM), MSc |
