Predicting UV-Vis absorption spectra by using graph neural network models

dc.contributor.authorNyrén , William
dc.contributor.authorTaha, Ibrahim
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerGerken, Jan
dc.contributor.supervisorJosefson, Mats
dc.contributor.supervisorHulthe, Gustaf
dc.date.accessioned2024-08-05T11:02:41Z
dc.date.available2024-08-05T11:02:41Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractIn recent years, predicting absorption spectra by using different types of models utilizing deep neural networks has become an increasingly popular topic within spectroscopy. These models can be trained on datasets consisting of molecules represented as SMILES (Simplified Molecular Input Line Entry System), as well as intensities for a range of wavelengths. The resulting models can accurately predict excitation spectra for molecules. The capabilities of these models have proven useful in the drug industry, such as identifying harmful molecules or detecting substances. Two popular models implementing graph neural networks (GNNs) and predict absorption spectra are AttentiveFP and Chemprop-IR. AttentiveFP uses a graph attention mechanism and message passing neural network to form a graph convolutional network. Chemprop-IR uses a directed message passing neural network, originally designed to predict IR spectra. These models were chosen due to both using the same type of data and molecular representation, outperforming regular regression models, and the ability of capturing complex patterns of diverse spectra as well as predicting these. The goal of this project was to set up, modify, and train the GNNs AttentiveFP and Chemprop-IR, which were constructed for other spectra prediction purposes, to predict UV-Vis absorption spectra within the range of 150 nm to 450 nm, with a 6 nm discretization, from the chemical structures of molecules. Both models were trained on an identical dataset consisting of 10,502,904 molecules obtained from Oak Ridge National Laboratory with the same split for training, testing, and validation. The models used SMILES as the input of the molecules, as well as the intensities at each corresponding wavelength. AttentiveFP was trained with three different different implementations of the attention mechanism, GAT, GATv2, and DenseGAT. GAT is proven to compute static attention while GATv2 computes dynamic attention. DenseGAT is based on GAT but instead considers the molecule as a fully connected graph. Chemprop-IR was trained for three different choices of FFN-hidden size (feedforward neural network) and hidden size of 2200, 2800, and 3400. The best-performing model of AttentiveFP used the GATv2 attention mechanism and 8 attentive layers. For Chemprop-IR, the best-performing model used an FFN hidden size of 2800. Both models show promise predicting UV-Vis absorption spectra. AttentiveFP proved to be the better-performing model of the two based on predictions and validation loss. The model was able to make accurate predictions on a large set of molecules, correctly identifying the number of peaks and their positions. However, it failed for some molecules, where the predicted spectra were completely different from the true spectra. The exact reason is hard to determine. However, the coexistence of larger molecules with high-frequency components and low prediction error, along with small molecules with low-frequency components and high prediction error, suggests that the attention mechanism is working. Chemprop- IR resulted in an increased accuracy of predicting multiple peaks for larger sizes of FFN-hidden size and hidden size. Predicted spectra of lowest accuracies were mostly ones with multiple peaks. One theory is that predicting IR spectra differs greatly from UV.
dc.identifier.coursecodeMVEX03
dc.identifier.urihttp://hdl.handle.net/20.500.12380/308333
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectAttention-based mechanism, AttentiveFP, Backpropagation, Chemprop- IR, D-MPNN, GNN, MPNN, SMILES, UV-Vis absorption spectra
dc.titlePredicting UV-Vis absorption spectra by using graph neural network models
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_Thesis_William Nyrén_Ibrahim Taha_2024.pdf
Storlek:
7.83 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: