Extracting Interpretable Equations from Data

Typ
Examensarbete för masterexamen
Program
Publicerad
2020
Författare
Eriksson, Adrian
Frostelind, Filip
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Expressing relationships within data using mathematical formulas lies at the centre of scientific discovery. In recent years symbolic regression has been proposed as a tool for discovering relationships in data in order to convey information regarding system dynamics. In this study an algorithm for symbolic regression using genetic programming is developed and tested on a number of iconic equations from physics (50 equations from the Feynman lectures on physics) and breathing data from a ventilator. Two important features of genetic algorithms are investigated, namely non-disruptive bloat control and sampling methods for breeding. The results shows that the developed algorithm performs on-par with cutting-edge commercial genetic programming software and that interesting features can be extracted from input data. Further, an enhanced implementation of substitution with an approximate terminal (SAT) is performed with promising results, reducing bloat without hindering adaptation. Lastly, the impacts of roulette, linear rank and Boltzmann selection are investigated, and we find that they all produce similar results, but with different strategies regarding exploration and exploitation.
Beskrivning
Ämne/nyckelord
Genetic Programming, Symbolic Regression, Stochastic Optimisation, Machine Learning, Bloat Control, Boltzmann Sampling, SAT-GP, TOPSIS
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material