Distributed Sketching Pipelines for Data Mining and Analytics
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The rapid growth of data generated by modern industrial systems at the edge poses
significant challenges for efficient data management. Processing the data directly
on the edge device offers benefits, including improved throughput, scalability, and
reduced bandwidth usage. Another effective approach is summarizing the data using
sketches. Sketches are stochastic data structures that can greatly compress
data while preserving essential statistical properties. This thesis investigates the
conditions under which it is beneficial to offload sketching computations to edge
devices. The study evaluates the throughput and latency of two systems across
multiple configurations, designed to reflect real world scenarios, comparing a federated
architecture with a centralized architecture. The results indicate that, across
all evaluated scenarios, executing computations at the edge increases the maximum
throughput, especially when the number of edge devices increases. The thesis explores
the trade-offs between scalability (in form of throughput with increasing set
of vehicles) and data freshness (in form of latency due to the micro-batch sizes, i.e.
the frequency of data summarization).
Beskrivning
Ämne/nyckelord
Computer Science, Sketches, Distributed Systems, Big Data, Federated Computation, Federated Sketching, Federated Analytics
