Exploring Heuristics for Predicting Microbenchmark Stability and Code Coverage using Static Code Analysis
Master's Thesis
Software engineering and technology (MPSOF), MSc
Salek Maghsoudi, Sam
Åkvist, Malte
Performance testing is a method to optimize performance and identify regressions
in software applications. This method can be employed through microbenchmarks,
which measure the performance of a small unit of code. However, writing accurate
microbenchmark tests is difficult as they have a high need for precision. Tools
have been developed to automate the creation of microbenchmarks, but the method
they implement results in large benchmark suites and thus also long running times
when executing the suites. A solution to this problem is to only select a subset
of the benchmarks to reduce the execution time of the whole suite. The goal of
this thesis was thus to explore heuristics for selecting benchmarks with high stability
and/or high code coverage; stability and code coverage are two important
properties of benchmarks that are useful for detecting performance regressions. A
laboratory experiment was conducted to explore two heuristics: firstly, a suitable
heuristic for predicting the stability of microbenchmarks using only code features
from static code analysis; secondly, a heuristic for a suitable approach for combining
the stability of benchmarks with their code coverage. The experiment used 2250
JUnit tests from three open-source projects by converting them to benchmarks with
the tool ju2jmh. Data from these benchmarks was used to design the heuristics.
The first heuristic was created through regression models, where four separate candidates
were explored. The model with the best performance was a Random Forest,
gaining an R2 value of 0.214 and a mean absolute error (MAE) of 3.491. This indicates
that it performs better than always predicting the median value, but is still
of low explanatory power. The second heuristic was designed using average rank
aggregation and showed promising results. A balance was achieved where strengths
in either stability or code coverage compensated for lesser performance in the other.
