Substructure Constrained Molecular Generation for Counterfactual Design: Transformer-Based Masked Language Modeling for Constrained Molecular Design in Drug Discovery
dc.contributor.author | Osolian, Dylan | |
dc.contributor.author | Samuelsson, Lisa | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
dc.contributor.examiner | Engkvist, Ola | |
dc.contributor.supervisor | Menke, Janosch | |
dc.date.accessioned | 2024-10-16T13:47:24Z | |
dc.date.available | 2024-10-16T13:47:24Z | |
dc.date.issued | 2024 | |
dc.date.submitted | ||
dc.description.abstract | In this thesis, masked modeling is used in de novo drug design to create substructureconstrained counterfactual molecules. Utilizing Natural Language Processing (NLP) methods, SMILES strings are processed through a transformer-based architecture. The research explores diverse masking and training strategies, demonstrating the ability to produce a wide range of relevant molecules. These strategies impact the properties of the molecules produced, showing that the choice of masking influences the molecular structures generated. Findings suggest that masking strategies must be carefully chosen based on the models intended use in molecule design. This work highlights the potential of masked modeling to effectively generate diverse molecules and enhance molecular drug design, opening possibilities for further exploration and application. | |
dc.identifier.coursecode | DATX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/308920 | |
dc.language.iso | eng | |
dc.setspec.uppsok | Technology | |
dc.subject | Machine Learning | |
dc.subject | Natural Language Processing | |
dc.subject | Cheminformatics | |
dc.subject | Transformers | |
dc.subject | Deep Learning | |
dc.subject | Molecular Design | |
dc.subject | Drug Discovery | |
dc.subject | Molecular Generation | |
dc.title | Substructure Constrained Molecular Generation for Counterfactual Design: Transformer-Based Masked Language Modeling for Constrained Molecular Design in Drug Discovery | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Computer science – algorithms, languages and logic (MPALG), MSc | |
local.programme | Data science and AI (MPDSC), MSc |