Extracting Interpretable Equations from Data
Examensarbete för masterexamen
Expressing relationships within data using mathematical formulas lies at the centre of scientific discovery. In recent years symbolic regression has been proposed as a tool for discovering relationships in data in order to convey information regarding system dynamics. In this study an algorithm for symbolic regression using genetic programming is developed and tested on a number of iconic equations from physics (50 equations from the Feynman lectures on physics) and breathing data from a ventilator. Two important features of genetic algorithms are investigated, namely non-disruptive bloat control and sampling methods for breeding. The results shows that the developed algorithm performs on-par with cutting-edge commercial genetic programming software and that interesting features can be extracted from input data. Further, an enhanced implementation of substitution with an approximate terminal (SAT) is performed with promising results, reducing bloat without hindering adaptation. Lastly, the impacts of roulette, linear rank and Boltzmann selection are investigated, and we find that they all produce similar results, but with different strategies regarding exploration and exploitation.
Genetic Programming , Symbolic Regression , Stochastic Optimisation , Machine Learning , Bloat Control , Boltzmann Sampling , SAT-GP , TOPSIS