Distributed Stream Analysis with Java 8 Stream API

Typ
Examensarbete för masterexamen
Master Thesis
Program
Computer systems and networks (MPCSN), MSc
Publicerad
2016
Författare
Mwambazi, Brian
Philogene, Andy Moise
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The increasing demand for fast and scalable data analysis has impacted the technology industry in ways that are de ning the future of computer systems. The booming of machine-to-machine communication, cloud based solutions etc. has led to a large amount of data (popularly termed as Big Data) to be generated and processed by supporting applications in a real-time fashion. These changes in the eld have pushed the way for the introduction of stronger data processing paradigms and a proliferation of data stream processing systems. These systems usually come with an execution environment and programming models that developers have to adopt. Migrating applications from one system to another is in general tedious and comes with a learning curve. The Stream package introduced in Java 8 brings a new abstraction on a sequence of data. Such a sequence can be used to represent any ow of data and thus is interesting in data streaming. Together with the support of new features like lambda expressions and functional interfaces, the stream package has been brought to increase the expressiveness of Java and ease the writing of complex operations in a declarative manner when dealing with a sequence of data. These language features are attractive in data stream processing. However the Java 8 Stream API is currently inclined towards batch processing bounded datasets like collections, than data streaming which involves unbounded datasets. For this reason it does not provide an easy way for programmers to write applications for typical data stream use-cases or stateful operations such as window-based aggregates. Furthermore, the current scalability mechanism provided in this API does not fit a typical data streaming environment. In this thesis, we explore the expressiveness of the Java 8 Stream API together with the data streaming paradigm. We design and implement JxStream, an API around the existing Java 8 Stream that provides more stream analysis capabilities to the developer and eases scalable processing in a distributed environment. We evaluate JxStream with di erent test cases aimed at investigating speci c performance attributes. The outcome shows that JxStream would allow programmers to solve a greater range of data streaming problems, without sacrificing the performance. The test results also show that our implementation performs fairly well when comparing similar features with a stream processing engine. We believe this tool can be of good use in writing stream analysis applications using Java.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Computer and Information Science
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index