Balancing strict performance requirements and trade-offs for efficient data handling in unbounded flows - Design considerations for a proof-of-concept stream processing pipeline for vehicular data validation

dc.contributor.authorJosefsson, Måns
dc.contributor.authorWall, Carl-Magnus
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerMassimiliano Gulisano, Vincenzo
dc.contributor.supervisorPapatriantafilou, Marina
dc.date.accessioned2025-02-25T13:41:14Z
dc.date.available2025-02-25T13:41:14Z
dc.date.issued2024
dc.date.submitted
dc.description.abstractIn the current era of ever-growing networks of sophisticated sensors and smart devices, there are vast volumes of data created at every moment. Big Data environments such as these necessitate scalable and efficient data pipelines in order to provide near real-time analytics and monitoring. However, with especially high, unbounded data rates, traditional (store-then-process) database procedures and batchbased processing methods are struggling to remain performant. To this end, processing streams of data continuously has become an increasingly appealing approach, targeting low latency, high scalability and real-time data processing. Stream processing tools enable the execution of both stateless and stateful computations on data as it is being transferred through the pipeline, providing opportunities to increase the efficiency of processing procedures such as, e.g., data validation. However, stream processing pipelines can be complex, especially in multi-tenant settings, and it is not always clear how to efficiently approach their implementation. A process such as data validation might be subject to a multitude of requirements that all affect pipeline design and considerations. To investigate such requirements and give detailed insight on how to approach the design of a stream processing pipeline for efficient data validation on unbounded flows of data, a proof of concept pipeline is developed and tested in a case study at Volvo Trucks. The case study involves multi-tenant automotive testing, where the proposed pipeline enables near real-time validation of data, for purposes such as monitoring vehicle sensor behavior. The pipeline is comprised of Apache Kafka, for persistent event storing, Apache Flink, for continuous stateful analysis, and Apache Druid, for data serving. Evaluation of the pipeline is performed from the perspective of a set of metrics, namely data completeness, sustainable throughput, latency, scalability and fault tolerance. In order to harmonize the requirements of the pipeline and discern how trade-offs affect performance, various tool-tuning experiments and stress tests are performed. Performance evaluation of the pipeline reveals that in a controlled environment, with limited resources, the minimum throughput requirement of the use case can be sustained, while still achieving sub-second latencies and offering a degree of fault tolerance. The pipeline also shows promise of adapting well to different levels of scale, providing enough headroom for a tenfold increase in data volumes over current demands.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/309161
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData pipelines
dc.subjectstream processing
dc.subjectdata completeness
dc.subjectlatency & throughput
dc.subjectfault tolerance
dc.subjectscalability
dc.subjectdata validation
dc.titleBalancing strict performance requirements and trade-offs for efficient data handling in unbounded flows - Design considerations for a proof-of-concept stream processing pipeline for vehicular data validation
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc
local.programmeComputer systems and networks (MPCSN), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 24-115 MJ CMW.pdf
Storlek:
1.9 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: