SSPOC: Smart Stream Processing Operator Classification
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Stream Processing is a rapidly growing field. Efficiently handling a stream processing
query often requires knowing what type each operator is, as knowing its
behaviour allows for tailored solutions. Today, each framework handles the identification
of operators in its own way, often using semantics and compile-time info for
this purpose. Having a more general way of classification could be an interesting way
to simplify the creation of such framework. Creating such a general way requires a
change from semantic info, as different frameworks use different semantics, to more
general information. We pioneer a first step in this direction by using metrics available
at runtime to classify a basic set of operators.
In this thesis, we present a machine learning model for classification of stream processing
operators. The model is a densely connected multi-layer feed-forward neural
network. The operators that are classified are limited to a subset of the standard
set of operators available in the stream processing framework Apache Flink. The
training, validation and test datasets are also a contribution of this thesis. These
were collected from public queries using our collection method. We also propose
a set of features for our classifier, that aid in differentiating operators; we suggest
that other machine-learning based solutions can use them.The model is optimized
for prediction accuracy while training on data collected from 9 different queries. It
reaches a prediction accuracy of 97.51% on the validation dataset and 99.796% on
the test dataset.
Beskrivning
Ämne/nyckelord
Computer, science, computer science, engineering, project, thesis, machine learning, neural networks, stream processing