Low-Latency Anomaly Detection using Stream Processing

dc.contributor.authorBERGBOM, KATARINA
dc.contributor.authorHÖGBERG, MATS
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerTsigas, Philippas
dc.contributor.supervisorMassimiliano Gulisano, Vincenzo
dc.date.accessioned2021-04-09T08:53:21Z
dc.date.available2021-04-09T08:53:21Z
dc.date.issued2020sv
dc.date.submitted2020
dc.description.abstractTo ensure the continuous operation of online services, it is important to be able to quickly detect system failures. This can be done by monitoring metrics, such as the number of logins or errors per hour, for unexpected behaviour. These unexpected behaviours, also known as anomalies, can indicate that something in the system is not working as intended, which makes it important to be able to detect them with low latency. In this thesis, we researched how anomalies can be detected in metrics with low latency using stream processing, a data processing paradigm in which data is processed as continuous streams of events. This thesis was conducted at Spotify, one of the largest audio streaming platforms in the world. To research low-latency anomaly detection using stream processing, we implemented Harpooner – a stream processing-based counterpart to an existing batch-processingbased anomaly detection system at Spotify. Harpooner analyses metrics in segments, which are subsets of users, and detects anomalies on an hourly basis. Anomalies are detected using the Kolmogorov-Smirnov (K-S) test, a statistical test that can be used to determine if two samples are drawn from the same underlying distribution. Harpooner was implemented using Apache Beam, a programming model for expressing stream processing pipelines. It was implemented in various versions which weighed trade-offs between implementation simplicity, data storage and computational complexity of the K-S test. Harpooner consists of two parts: a metric calculation part, which is identical in all versions; and an anomaly detection part, which is different in all versions. These parts were evaluated separately using data from Spotify to ensure semantic equivalence between Harpooner and the existing system, and synthetic data to measure their scalability. During evaluation, it was shown that the most efficient anomaly detection part was able to detect anomalies in a metric with 6,000 segments with a latency below 10 seconds when run on a single node on Cloud Dataflow, and that in a real setting the metric calculation part would be the bottleneck of the pipeline. However, if the two parts were deployed as two separate pipelines, our preliminary results indicate that Harpooner would be able to scale to handle the load necessary to do anomaly detection in metrics at Spotify.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/302291
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectStream processingsv
dc.subjectAnomaly detectionsv
dc.subjectApache Beamsv
dc.subjectStreaming systemssv
dc.subjectKolmogorov-Smirnov testsv
dc.titleLow-Latency Anomaly Detection using Stream Processingsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CSE 20-105 Bergbom Högberg.pdf
Size:
2.3 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.14 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections