Extracting Interpretable Equations from Data
Typ
Examensarbete för masterexamen
Program
Applied physics (MPAPP), MSc
Publicerad
2020
Författare
Eriksson, Adrian
Frostelind, Filip
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Expressing relationships within data using mathematical formulas lies at the centre
of scientific discovery. In recent years symbolic regression has been proposed as a
tool for discovering relationships in data in order to convey information regarding
system dynamics. In this study an algorithm for symbolic regression using genetic
programming is developed and tested on a number of iconic equations from physics
(50 equations from the Feynman lectures on physics) and breathing data from a
ventilator. Two important features of genetic algorithms are investigated, namely
non-disruptive bloat control and sampling methods for breeding. The results shows
that the developed algorithm performs on-par with cutting-edge commercial genetic
programming software and that interesting features can be extracted from input
data. Further, an enhanced implementation of substitution with an approximate
terminal (SAT) is performed with promising results, reducing bloat without hindering
adaptation. Lastly, the impacts of roulette, linear rank and Boltzmann selection
are investigated, and we find that they all produce similar results, but with different
strategies regarding exploration and exploitation.
Beskrivning
Ämne/nyckelord
Genetic Programming , Symbolic Regression , Stochastic Optimisation , Machine Learning , Bloat Control , Boltzmann Sampling , SAT-GP , TOPSIS