Machine learning to predict enzymes’ optimal catalytic temperature
Ladda ner
Typ
Examensarbete för masterexamen
Program
Publicerad
2020
Författare
Ulfenborg, Josefin
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Enzymes are proteins which operate as biological catalysts in chemical processes,
for instance in biofuel production. The efficiency and sustainability of these processes
may be greatly improved by knowing the optimal catalytic temperature (Topt)
of the enzymes. However, determining these temperatures experimentally is timeconsuming
and instead a machine learning approach for predicting Topt is suggested.
In a previous approach, sequential features were used to predict Topt. In this thesis,
new structural features which account for various structural properties in the
enzymes were used alongside the sequential features. Test scores from the models
show that structural features combined with sequential features improve previous
R2 scores from 0.4 to 0.48. Furthermore, in the case where there is a pair of similar
enzymes, but one has a colder and one a hotter temperature, the models correctly
predicts the temperature order of the enzymes 83% of the time. By gathering more
data and fine-tuning the structural features, it is anticipated that scores will improve
even further.
Beskrivning
Ämne/nyckelord
Structural bioinformatics , enzymes , machine learning , feature engineering