Active Learning for Surrogate Models to Augment AI-Driven Molecular Design

dc.contributor.authorJOSEFSON, CHRISTIAN
dc.contributor.authorNYMAN, CLARA
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerDubhashi, Devdatt
dc.contributor.supervisorHaghir Chehreghani, Morteza
dc.date.accessioned2022-10-14T12:33:45Z
dc.date.available2022-10-14T12:33:45Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractThis project investigated whether an active learning (AL) framework can help mitigate computational costs for AI-driven molecular design, without negatively impacting accuracy. The surrogate models Random Forest (RF) and Support Vector Regression (SVR) were tested together with the acquisition functions (AF) Random, Thompson Sampling (TS), Tanimoto Similarity, Expected Improvement (EI), Probability of Improvement (PI), Upper Confidence Bound (UCB) and ε−Greedy. Of these, the combination RF and Random acquisition were concluded to perform the best with regards to error rate, measured as root mean square error, and time consumption, measured in runtime per epoch. SVR had slightly lower error, but took substantially longer time. Depending on the choice of AF, one run using RF took approximately 2-17.5 hours, while one run using SVR took approximately 100-175 hours. Four tuning parameters were introduced to see if they could further optimize the framework. It was discovered that a longer retrain interval and a smaller acquisition batch did not significantly impact accuracy while shortening the time consumption. To summarise, an RF model with the Random AF with a 5 epoch initial pooling, no warm-up phase, a retrain interval of 20 and an acquisition batch size of 20 was selected to mitigate computational costs while simultaneously keeping the error stable.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/305713
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectactive learningsv
dc.subjectbayesian optimizationsv
dc.subjectde novo designsv
dc.subjectmolecular designsv
dc.subjectdrug discoverysv
dc.subjectsurrogate modelsv
dc.subjectmachine learningsv
dc.subjectmolecular dockingsv
dc.titleActive Learning for Surrogate Models to Augment AI-Driven Molecular Designsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-107 Josefsson Nyman.pdf
Storlek:
6.01 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: