Distributed Stream Analysis with Java 8 Stream API

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/234986
Download file(s):
File Description SizeFormat 
234986.pdfFulltext1.13 MBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Distributed Stream Analysis with Java 8 Stream API
Authors: Mwambazi, Brian
Philogene, Andy Moise
Abstract: The increasing demand for fast and scalable data analysis has impacted the technology industry in ways that are de ning the future of computer systems. The booming of machine-to-machine communication, cloud based solutions etc. has led to a large amount of data (popularly termed as Big Data) to be generated and processed by supporting applications in a real-time fashion. These changes in the eld have pushed the way for the introduction of stronger data processing paradigms and a proliferation of data stream processing systems. These systems usually come with an execution environment and programming models that developers have to adopt. Migrating applications from one system to another is in general tedious and comes with a learning curve. The Stream package introduced in Java 8 brings a new abstraction on a sequence of data. Such a sequence can be used to represent any ow of data and thus is interesting in data streaming. Together with the support of new features like lambda expressions and functional interfaces, the stream package has been brought to increase the expressiveness of Java and ease the writing of complex operations in a declarative manner when dealing with a sequence of data. These language features are attractive in data stream processing. However the Java 8 Stream API is currently inclined towards batch processing bounded datasets like collections, than data streaming which involves unbounded datasets. For this reason it does not provide an easy way for programmers to write applications for typical data stream use-cases or stateful operations such as window-based aggregates. Furthermore, the current scalability mechanism provided in this API does not fit a typical data streaming environment. In this thesis, we explore the expressiveness of the Java 8 Stream API together with the data streaming paradigm. We design and implement JxStream, an API around the existing Java 8 Stream that provides more stream analysis capabilities to the developer and eases scalable processing in a distributed environment. We evaluate JxStream with di erent test cases aimed at investigating speci c performance attributes. The outcome shows that JxStream would allow programmers to solve a greater range of data streaming problems, without sacrificing the performance. The test results also show that our implementation performs fairly well when comparing similar features with a stream processing engine. We believe this tool can be of good use in writing stream analysis applications using Java.
Keywords: Data- och informationsvetenskap;Computer and Information Science
Issue Date: 2016
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/234986
Collection:Examensarbeten för masterexamen // Master Theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.