Optimising a Transformer-Based Model for Metabolite Prediction in Drug Discovery

Larsson, Sofia; Carlsson, Miranda

Optimising a Transformer-Based Model for Metabolite Prediction in Drug Discovery

dc.contributor.author	Larsson, Sofia
dc.contributor.author	Carlsson, Miranda
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Mercado Oropeza, Rocío
dc.contributor.supervisor	Beckmann, Richard
dc.date.accessioned	2025-10-06T14:25:25Z
dc.date.issued
dc.date.submitted
dc.description.abstract	Drug metabolism plays a crucial role in drug discovery, impacting the safety and efficacy of medications. Experimental methods of predicting metabolic reactions of potential drugs have long suffered from high costs in terms of both time and resources. Computational methods have emerged as more cost-effective and time-efficient approaches for predicting drug metabolites, but are often hindered by reliance on rigid rules. Large language models provide a more adaptable and rule-free alternative. In this project, the transformer-based language model Chemformer, previously employed in various chemical tasks, was optimised for drug metabolism prediction. To achieve this, a dataset of drugs and their corresponding metabolites was compiled and preprocessed. The Chemformer model was fine-tuned and evaluated using this dataset. Further optimisation methods to enhance the model’s performance involved incorporating an additional pre-training of the Chemformer model, randomisation of the input SMILES (Simplified Molecular Input Line Entry System) strings, augmenting the dataset, annotating the data with chemical information, employing ensemble models, and optimising prediction space by pre-training the model further. The most promising optimisation methods attempted were the additional pre-trainings and the randomisation of the SMILES strings, both of which showed a significant increase in performance. Benchmarking the best-performing model resulted in an outperformance in precision and F1 score compared to the existing models GLORYx and SyGMa. These results suggest that Chemformer is a promising tool for metabolite prediction.
dc.identifier.coursecode	DATX05
dc.identifier.uri	http://hdl.handle.net/20.500.12380/310599
dc.language.iso	eng
dc.relation.ispartofseries	CSE 25-14
dc.setspec.uppsok	Technology
dc.subject	drug discovery, metabolism, language model, transformer, computational chemistry, drugs, metabolites, optimisation
dc.title	Optimising a Transformer-Based Model for Metabolite Prediction in Drug Discovery
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 25-14 MC SL.pdf
Storlek:: 623.61 KB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen