Efficient industrial big data pipeline for lossless transfer of vehicular data

dc.contributor.authorHILGENDORF, MARTIN
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerMassimiliano Gulisano, Vincenzo
dc.contributor.supervisorPapatriantafilou, Marina
dc.date.accessioned2022-12-05T10:48:33Z
dc.date.available2022-12-05T10:48:33Z
dc.date.issued2022
dc.date.submitted2022
dc.description.abstractIn the age of big data and growing product complexity, it has become common to monitor and record many aspects of a product or system in order to extract wellfounded intelligence and draw conclusions to continue driving innovation. Automating and scaling data transfer and analysis processes in pipelines becomes essential to keep pace with increasing data volumes and rates generated by such practices. Further, industrial big data pipelines are subject to a number of requirements and challenges: data veracity, security, and governance, alongside overall pipeline performance and scalability. To address these challenges in a case study at Volvo Trucks, a general big data pipeline design is developed to serve as a framework for enabling efficient transfer of large data volumes from remote test sites to data centres. Synergetic effects of data compression and in-memory processing as techniques to improve pipeline performance, both in terms of throughput and end-to-end latency, are studied and evaluated. An implementation of a pipeline based on the proposed design is carried out on Apache Airflow to explore latency and throughput performance as well as other aspects such as efficiency and scalability of the design. Various general-purpose lossless data compression algorithms are evaluated and compared in order to balance compression effectiveness and compression time in the pipeline. Performance evaluation of the proposed pipeline with data compression is carried out, achieving an average throughput uplift of 38.8% over the current solution in use today, while also providing desired functionality which was previously missing such as integrity verification, logging, monitoring and traceability, as well as cataloguing of ingested data. Further, a variation of the pipeline design using shared memory processing to alleviate an identified hardware bottleneck is demonstrated, achieving 82.6% higher average throughput than the current solution using identical infrastructure and hardware resources.
dc.identifier.coursecodeDATX05
dc.identifier.urihttps://odr.chalmers.se/handle/20.500.12380/305885
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectData pipelines
dc.subjectbig data
dc.subjectlatency & throughput
dc.subjectdata compression
dc.subjectdata governance
dc.subjectdata veracity
dc.subjectcompression
dc.subjectApache Airflow
dc.subjectworkflow orchestration
dc.titleEfficient industrial big data pipeline for lossless transfer of vehicular data
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-141 MH.pdf
Storlek:
3.03 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.64 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: