Efficient industrial big data pipeline for lossless transfer of vehicular data

HILGENDORF, MARTIN

Efficient industrial big data pipeline for lossless transfer of vehicular data

Ladda ner

CSE 22-141 MH.pdf (3.03 MB)

Publicerad

2022

Författare

HILGENDORF, MARTIN

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Computer systems and networks (MPCSN), MSc

Sammanfattning

In the age of big data and growing product complexity, it has become common to monitor and record many aspects of a product or system in order to extract wellfounded intelligence and draw conclusions to continue driving innovation. Automating and scaling data transfer and analysis processes in pipelines becomes essential to keep pace with increasing data volumes and rates generated by such practices. Further, industrial big data pipelines are subject to a number of requirements and challenges: data veracity, security, and governance, alongside overall pipeline performance and scalability. To address these challenges in a case study at Volvo Trucks, a general big data pipeline design is developed to serve as a framework for enabling efficient transfer of large data volumes from remote test sites to data centres. Synergetic effects of data compression and in-memory processing as techniques to improve pipeline performance, both in terms of throughput and end-to-end latency, are studied and evaluated. An implementation of a pipeline based on the proposed design is carried out on Apache Airflow to explore latency and throughput performance as well as other aspects such as efficiency and scalability of the design. Various general-purpose lossless data compression algorithms are evaluated and compared in order to balance compression effectiveness and compression time in the pipeline. Performance evaluation of the proposed pipeline with data compression is carried out, achieving an average throughput uplift of 38.8% over the current solution in use today, while also providing desired functionality which was previously missing such as integrity verification, logging, monitoring and traceability, as well as cataloguing of ingested data. Further, a variation of the pipeline design using shared memory processing to alleviate an identified hardware bottleneck is demonstrated, achieving 82.6% higher average throughput than the current solution using identical infrastructure and hardware resources.

Ämne/nyckelord

Data pipelines, big data, latency & throughput, data compression, data governance, data veracity, compression, Apache Airflow, workflow orchestration

URI

https://odr.chalmers.se/handle/20.500.12380/305885

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Efficient industrial big data pipeline for lossless transfer of vehicular data

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced