An exploratory study of trade-offs in traditional vs. serverless stream processing

dc.contributor.authorTRUBKIN, NILS
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.departmentChalmers University of Technology / Department of Computer Science and Engineeringen
dc.contributor.examinerGulisano, Vincenzo
dc.contributor.supervisorGulisano, Vincenzo
dc.contributor.supervisorPetersen Moura Trancoso, Pedro
dc.date.accessioned2023-12-20T18:52:47Z
dc.date.available2023-12-20T18:52:47Z
dc.date.issued2023
dc.date.submitted2023
dc.description.abstractStream is the natural form of data that is in a perpetual process of being generated. Stream processing is a way to draw valuable insights from a data stream. With the rapid increase in data volumes primarily driven by IoT devices, stream processing has emerged as a practical approach for data processing. Some characteristics, such as volumes of data and their distribution, can vary over time, leading to changes in the computational requirements of such streaming applications. To be able to adjust frameworks used to the changing requirements, elasticity is needed. As traditional frameworks commonly used to run streaming processing applications, known as Stream Processing Engines (SPE) are not flexible enough, there is often some degree of over-provisioning. It means that the allocated resources are greater than required and remain unutilized. Alternative approaches, such as serverless, can ease scalability, but there are both pros and cons to the approach that this work delves into. This work has implemented a SPE-like API for serverless framework and with its help explores the differences between traditional and serverless models of stream processing engines using Apache Flink and Apache OpenWhisk. The study shows that OpenWhisk can be used for implementing and executing streaming applications similar to those run by Flink. By correctly implementing the logic and code, a behavior similar to Flink’s can be achieved in OpenWhisk. The serverless nature of OpenWhisk, with its pay-per-use pricing model, allows for reduced costs when the framework remains idle. Performance evaluation was performed using a stateless application type (does not require the state of the application to be preserved across multiple executions) utilizing map() API. Also, a stateful type of application (requires the state of the application to be preserved across multiple executions) was evaluated using windowAll() API with sum aggregate. The findings indicate a latency increase of 300-400% in the most intensive test cases and lowered throughput to 50% for OpenWhisk compared to Flink. Conclusions that can be drawn reveal that Flink exhibits greater capacity and performance compared to OpenWhisk for comparable workloads. Flink’s extensive resource base, including APIs and support resources, makes it easier to develop applications and positions it as a robust and well-established solution. On the other hand, OpenWhisk is best suited for projects that do not require rich stream processing libraries or explicit state management. Its high-level scalability abstraction, utilizing Kubernetes, simplifies scaling operations. Both frameworks can be configured to act similarly, with various benefits and tradeoffs depending on an individual use case.
dc.identifier.coursecodeDATX05
dc.identifier.urihttp://hdl.handle.net/20.500.12380/307464
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectStream
dc.subjectdata
dc.subjectserverless
dc.subjectFlink
dc.subjectOpenWhisk
dc.subjectlatency
dc.subjectthroughput
dc.titleAn exploratory study of trade-offs in traditional vs. serverless stream processing
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 23-131 NT.pdf
Storlek:
3.68 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
2.35 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: