Substructure Constrained Molecular Generation for Counterfactual Design: Transformer-Based Masked Language Modeling for Constrained Molecular Design in Drug Discovery

dc.contributor.authorOsolian, Dylan
dc.contributor.authorSamuelsson, Lisa
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerEngkvist, Ola
dc.contributor.supervisorMenke, Janosch
dc.date.accessioned2024-10-16T13:47:24Z
dc.date.available2024-10-16T13:47:24Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractIn this thesis, masked modeling is used in de novo drug design to create substructureconstrained counterfactual molecules. Utilizing Natural Language Processing (NLP) methods, SMILES strings are processed through a transformer-based architecture. The research explores diverse masking and training strategies, demonstrating the ability to produce a wide range of relevant molecules. These strategies impact the properties of the molecules produced, showing that the choice of masking influences the molecular structures generated. Findings suggest that masking strategies must be carefully chosen based on the models intended use in molecule design. This work highlights the potential of masked modeling to effectively generate diverse molecules and enhance molecular drug design, opening possibilities for further exploration and application.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/308920
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectMachine Learning
dc.subjectNatural Language Processing
dc.subjectCheminformatics
dc.subjectTransformers
dc.subjectDeep Learning
dc.subjectMolecular Design
dc.subjectDrug Discovery
dc.subjectMolecular Generation
dc.titleSubstructure Constrained Molecular Generation for Counterfactual Design: Transformer-Based Masked Language Modeling for Constrained Molecular Design in Drug Discovery
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc
local.programmeData science and AI (MPDSC), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-22 DO LS.pdf
Storlek:
2.39 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: