Balancing Performance and Memory
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Stream processing has become a cornerstone of real-time data analytics, particularly in edge-to-cloud computing environments where timely insights are crucial. Stream Aggregates, as fundamental stateful operators, maintain windowed state to
compute summaries over continuous data streams. However, in resource-constrained edge deployments, managing memory efficiently while maintaining high performance remains a significant challenge. Recent work has shown that compressing infrequently accessed window instances can reduce memory usage, but existing approaches rely on fixed, manually configured thresholds-parameters that are difficult to tune without prior knowledge of data characteristics such as reporting frequency and access patterns. Moreover, these methods often overlook the impact of different compression libraries on system behavior. To address these limitations, this thesis presents a study on adaptive memory compression for stream aggregates. We first evaluate the performance trade-offs of three widely used compression libraries-Snappy, Zstandard (Zstd), and JZlib-within a stream processing context. Our results show that each library offers a distinct balance between compression efficiency and processing overhead, with Snappy providing superior throughput and latency at the cost of lower memory savings, while Zstd achieves better compression at the expense of higher CPU cost. Building on these findings, we propose a dynamic, self-tuning mechanism that automatically adjusts the compression threshold D based on runtime feedback. Instead of requiring analysts to specify a fixed D, our approach allows them to define a target range for a performance metric-such as the non-compressed/compressed (n/c) ratioand the system adapts D online using simple adjustment rules. This enables robust and predictable compression behavior under varying workloads, without requiring expert knowledge. We implement and evaluate our approach using the Liebre stream processing engine and the Linear Road benchmark. Experimental results demonstrate that our adaptive mechanism effectively stabilizes the n/c ratio within user-defined bounds, with negligible performance overhead compared to an optimally tuned fixed-D configuration. The dynamic strategy proves resilient to workload fluctuations and configuration resets, making it suitable for real-world, unpredictable environments.
Beskrivning
Ämne/nyckelord
stream processing, aggregate, memory compression
