ODR kommer att vara otillgängligt pga systemunderhåll onsdag 25 februari, 13:00 -15:00 (ca). Var vänlig och logga ut i god tid. // ODR will be unavailable due to system maintenance, Wednesday February 25, 13:00 - 15:00. Please log out in due time.
 

Contrastive Learning For Molecular Representation

dc.contributor.authorHANI, SALAM
dc.contributor.authorLINDER, JONATHAN
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerBernardy, Jean-Phillippe
dc.contributor.supervisorOlsson, Simon
dc.date.accessioned2026-01-16T07:39:21Z
dc.date.issued2025
dc.date.submitted
dc.description.abstractThis thesis explores the integration of contrastive learning into REINVENT, AstraZeneca’s in-house generative model for molecular design, with the aim of improving the model’s understanding of chemical equivalence between different SMILES representations of the same compound. To this end, a contrastive learning framework was developed, incorporating SMILES-based data augmentation techniques such as enumeration and subgraphing. The framework was evaluated on three datasets: a proprietary baseline derived from ChEMBL35, and the publicly available MOSES and GuacaMol datasets. To assess the impact of architectural design on performance, multiple model architectures were investigated, including a newly introduced intermediate architecture. Results indicate that the intermediate architecture consistently achieves higher validity across all datasets, but tends to reduce novelty. Furthermore, using multiple augmentation strategies improved the model’s ability to generate chemically diverse and novel compounds, as measured by metrics such as novelty and Fréchet ChemNet Distance (FCD). These findings suggest that contrastive learning can offer measurable benefits in de novo molecule generation, although its effectiveness may depend heavily on architecture and dataset-specific tuning.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/310897
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectArtificial Intelligence
dc.subjectAI
dc.subjectDeep Learning
dc.subjectDL
dc.subjectMachine Learning
dc.subjectComputer Science
dc.subjectComputer Engineering
dc.subjectContrastive learning
dc.subjectSelf-supervised learning
dc.subjectRepresentation learning
dc.subjectData augmentation
dc.subjectEmbeddings
dc.subjectLatent space
dc.subjectGenerative models
dc.subjectRecurrent neural networks
dc.subjectRNN
dc.subjectLong Short-Term Memory
dc.subjectLSTM
dc.subjectNeural architecture design
dc.subjectHyperparameter tuning
dc.subjectTransfer learning
dc.subjectReinforcement learning
dc.subjectNT-Xent loss
dc.subjectNegative log-likelihood
dc.subjectBenchmarking
dc.subjectPCA
dc.subjectT-SNE
dc.subjectUMAP
dc.subjectDrug discovery
dc.subjectDe novo molecular generation
dc.subjectIn silico screening
dc.subjectMolecular design
dc.subjectMolecular representations
dc.subjectMolecular fingerprints
dc.subjectSMILES
dc.subjectSMILES enumeration
dc.subjectSMILES randomization
dc.subjectSubgraph sampling
dc.subjectCanonicalization
dc.subjectChemical space
dc.subjectPhysicochemical properties
dc.subjectTanimoto similarity
dc.subjectFréchet ChemNet Distance
dc.subjectFCD
dc.subjectSynthetic accessibility
dc.subjectSA
dc.subjectQuantitative Estimate of Drug-likeness
dc.subjectQED
dc.subjectStereoisomers
dc.subjectStereochemistry
dc.subjectStereocenters
dc.subjectTautomerism
dc.subjectValidity
dc.subjectNovelty
dc.subjectDiversity
dc.subjectInternal diversity
dc.subjectIntDiv
dc.subjectChEMBL35
dc.subjectMOSES benchmark
dc.subjectGuacaMol benchmark
dc.subjectPharmaceutical AI
dc.subjectCheminformatics
dc.subjectAstraZeneca
dc.subjectREINVENT
dc.subjectBioactivity prediction
dc.titleContrastive Learning For Molecular Representation
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 25-134 SH JL.pdf
Storlek:
8.36 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: