Efficient industrial big data pipeline for lossless transfer of vehicular data

HILGENDORF, MARTIN

Efficient industrial big data pipeline for lossless transfer of vehicular data

dc.contributor.author	HILGENDORF, MARTIN
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering	en
dc.contributor.examiner	Massimiliano Gulisano, Vincenzo
dc.contributor.supervisor	Papatriantafilou, Marina
dc.date.accessioned	2022-12-05T10:48:33Z
dc.date.available	2022-12-05T10:48:33Z
dc.date.issued	2022
dc.date.submitted	2022
dc.description.abstract	In the age of big data and growing product complexity, it has become common to monitor and record many aspects of a product or system in order to extract wellfounded intelligence and draw conclusions to continue driving innovation. Automating and scaling data transfer and analysis processes in pipelines becomes essential to keep pace with increasing data volumes and rates generated by such practices. Further, industrial big data pipelines are subject to a number of requirements and challenges: data veracity, security, and governance, alongside overall pipeline performance and scalability. To address these challenges in a case study at Volvo Trucks, a general big data pipeline design is developed to serve as a framework for enabling efficient transfer of large data volumes from remote test sites to data centres. Synergetic effects of data compression and in-memory processing as techniques to improve pipeline performance, both in terms of throughput and end-to-end latency, are studied and evaluated. An implementation of a pipeline based on the proposed design is carried out on Apache Airflow to explore latency and throughput performance as well as other aspects such as efficiency and scalability of the design. Various general-purpose lossless data compression algorithms are evaluated and compared in order to balance compression effectiveness and compression time in the pipeline. Performance evaluation of the proposed pipeline with data compression is carried out, achieving an average throughput uplift of 38.8% over the current solution in use today, while also providing desired functionality which was previously missing such as integrity verification, logging, monitoring and traceability, as well as cataloguing of ingested data. Further, a variation of the pipeline design using shared memory processing to alleviate an identified hardware bottleneck is demonstrated, achieving 82.6% higher average throughput than the current solution using identical infrastructure and hardware resources.
dc.identifier.coursecode	DATX05
dc.identifier.uri	https://odr.chalmers.se/handle/20.500.12380/305885
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Data pipelines
dc.subject	big data
dc.subject	latency & throughput
dc.subject	data compression
dc.subject	data governance
dc.subject	data veracity
dc.subject	compression
dc.subject	Apache Airflow
dc.subject	workflow orchestration
dc.title	Efficient industrial big data pipeline for lossless transfer of vehicular data
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Computer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 22-141 MH.pdf
Storlek:: 3.03 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.64 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen