Distributed Sketching Pipelines for Data Mining and Analytics

dc.contributor.authorMentzer, Jonatan
dc.contributor.authorBruhn, Gustav
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerPapatrintafilou, Marina
dc.contributor.supervisorHilgendorf, Martin
dc.date.accessioned2026-03-05T10:53:54Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractThe rapid growth of data generated by modern industrial systems at the edge poses significant challenges for efficient data management. Processing the data directly on the edge device offers benefits, including improved throughput, scalability, and reduced bandwidth usage. Another effective approach is summarizing the data using sketches. Sketches are stochastic data structures that can greatly compress data while preserving essential statistical properties. This thesis investigates the conditions under which it is beneficial to offload sketching computations to edge devices. The study evaluates the throughput and latency of two systems across multiple configurations, designed to reflect real world scenarios, comparing a federated architecture with a centralized architecture. The results indicate that, across all evaluated scenarios, executing computations at the edge increases the maximum throughput, especially when the number of edge devices increases. The thesis explores the trade-offs between scalability (in form of throughput with increasing set of vehicles) and data freshness (in form of latency due to the micro-batch sizes, i.e. the frequency of data summarization).
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/311004
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectComputer Science
dc.subjectSketches
dc.subjectDistributed Systems
dc.subjectBig Data
dc.subjectFederated Computation
dc.subjectFederated Sketching
dc.subjectFederated Analytics
dc.titleDistributed Sketching Pipelines for Data Mining and Analytics
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 26-10 JM GB.pdf
Size:
2.5 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Size:
2.35 KB
Format:
Item-specific license agreed upon to submission
Description: