Advanced Algorithms to Identify Performance Degradation

Typ
Examensarbete för masterexamen
Master Thesis
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2016
Författare
Johansson, Annika
Otterberg, Markus
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
For the purpose of analysing performance of a system, data measuring the resources consumed are gathered. The common goal, independent of what is measured, is to draw conclusions on the performance and see if an update has improved or degraded it. Performance analysis in computer science becomes increasingly important as software controls more and more complex processes and requires more and more accuracy in both precision and timing. As new data are rapidly generated, automation of the analysis both saves money and achieves a more reliable result compared to manual inspection. Machine learning is common for automated data analysis. In this thesis methods from the machine learning eld are applied to performance data with the aim of identifying performance degrada- tion. Both the data and aggregated points are analysed with k-means and k-medoids clustering algorithms and the results show points leading to degraded performance. The performance measurements analysed are the load and memory usage of the hard- ware, generated during testing of the actual hardware and software in a simulated en- vironment. It is generated from a number of different tests running different scenarios, which gives the data a large internal spread in covariance. Due to this large spread a threshold method is not exact enough to determine performance of a single update. In order to analyse changes in the data, aggregated adaptations consisting of the change from one point in time to another are generated. The changes are clustered for each kind of measurement and the clustering is quantitatively and qualitatively evaluated in order to determine its success. By using two stage hierarchical clustering, where the rst layer is used to remove outliers, most of the points leading to performance degradation within the dataset are singled out. At each stage of the clustering different distance metrics are evaluated and the optimal k and the corresponding weights are algorithmically found for each metric. After evaluating each way of clustering the top performing ones are chosen based on quantitative and qualitative measures, such as V-measure and adjusted Rand index. The centroids of the chosen clustering method are labelled and all points are labelled, each point according to the centroid of it's respective cluster. The points labelled as performance degrading are used to locate updates which led to degraded performance. Finally, the method designed is compared to what is generally required of a fault detec- tion system to determine if it can be used as such.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Computer and Information Science
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index