Interactive Fine-Grained Provenance for Streaming-based Analysis Applications

dc.contributor.authorErlandsson, Andréas
dc.contributor.authorGordani Shahri, Mikael
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerPapatriantafilou, Marina
dc.contributor.supervisorMassimiliano Gulisano, Vincenzo
dc.contributor.supervisorPalyvos-Giannis, Dimitris
dc.date.accessioned2021-04-01T14:02:53Z
dc.date.available2021-04-01T14:02:53Z
dc.date.issued2021sv
dc.date.submitted2020
dc.description.abstractStreaming-based applications that process unbounded continuous streams of data, such as user activity on the web or sensor data, can be designed to detect critical events. With such an event, an application can benefit from maintaining the associated source data for further analysis. This can be achieved by fine-grained data provenance, which links each event back to the source data contributing to it. In this thesis, the focus is on the current state-of-the-art data provenance technique called GeneaLog, which collects fine-grained data for cyber-physical systems and maintains it with low overhead. Generating provenance could be a heavy operation in certain applications, where the overhead produced will not always be negligible. Adjusting GeneaLog to become operational with the occurrence of a critical event, as opposed to always being operational, can be beneficial as it can reduce the unnecessary provenance generation. The goal is to extend GeneaLog to generate provenance information interactively and evaluate during what conditions such an extension becomes beneficial. With this, GeneaLog and consequently data provenance techniques could be further introduced to a wider range of devices and applications, as it might reduce processing and memory overhead. In this thesis, an extension for GeneaLog is proposed called Twins. To be able to activate and deactivate GeneaLog, Twins introduces a system which consists of two queries and a pair of special operators. The first query is equipped with standard operators and the second query with operators that generates provenance information. Initially, the first query processes tuples until a critical event is produced, which initiates a transition to the other query. With an absence of critical events after a transition, a transition is made back to the first query. This is performed by the special operators called the Ward operators, which are responsible to trigger and perform a transition between the queries. A prototype of GeneaLog was used and extended in this thesis, which was built for the Stream Processing Engine Apache Flink. During the evaluation, the observed throughput of Twins resembled that of GeneaLog when provenance was active and that of a baseline query with no provenance generation when provenance was inactive. The preliminary results indicate that Twins can be beneficial in scenarios where generating provenance is not a negligible operation in terms of overhead.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/302287
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectdata analyticssv
dc.subjectapache flinksv
dc.subjectstreamingsv
dc.subjectdata provenancesv
dc.titleInteractive Fine-Grained Provenance for Streaming-based Analysis Applicationssv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 21-09 Erlandsson Gordani Shahri.pdf
Storlek:
5.22 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: