Evaluating Different Approaches for Predicting Task Execution Time A Case Study in a Distributed Production Environment

Carlsson, Jesper; Forsström, Erik

Evaluating Different Approaches for Predicting Task Execution Time A Case Study in a Distributed Production Environment

dc.contributor.author	Carlsson, Jesper
dc.contributor.author	Forsström, Erik
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.examiner	Dubhashi, Devdatt
dc.contributor.supervisor	Gulisano, Vincenzo
dc.date.accessioned	2019-10-03T13:36:46Z
dc.date.available	2019-10-03T13:36:46Z
dc.date.issued	2019	sv
dc.date.submitted	2019
dc.description.abstract	This project details the evaluation of several machine learning models used to predict the required processing times in a complex distributed system used to analyze large amounts of data. Speciﬁcally, the system is owned and developed by the company Recorded Future, who specialize in sifting through vast amounts of textual data acquired from a variety of online sources in search of threat intelligence they can provide to their clients. The input is pre-processed in several stages before it arrives at the main analyser process, where natural language processing and other tools are used to perform a threat analysis of the text. Our primary goal is to determine the time needed to analyse one of these texts. The ability to predict the time required to process a given set of input can be used to design scheduling algorithms in cloud computing environments [1]. It is of extra interest to Recorded Future as they use the size of message queues, which might grow large when processing takes too long, in order to decide when to start additional servers. As servers take some time to go online, being able to start them proactively based on estimated queue size, which can be inferred from the required processing time and the available computing resources, can alleviate problems with bottlenecks and other performance issues. RF has deﬁned a maximum error within one order of magnitude compared to the actual time to be acceptable for the purposes of workload estimation. To accomplish this goal, we have developed, trained, and tested several prediction models based on neural networks. Each network considers a diﬀerent set of input features that may aﬀect the processing time - information extracted from the input data, system performance at the time of analysis, total server workload in terms of input processed in parallel, and past processing times. For evaluation, the prediction error from two naive algorithms that predict the mean and median value of the task execution times in each data set is compared to each models error. Our results show that all but one of the prediction models achieve a lower error than using the naive approach, and all models perform better than the maximum error speciﬁed by RF. There is a trade-oﬀ between how feasible it would be to implement and use a model in the real system, and the achieved accuracy. The model that considers system performance achieves an error that is half that of the one based purely on input information. Considering the total workload of each server reduces the error by a negligible compared to the ﬁrst model, and using previous task execution times is shown to provide ﬂuctuating results, indicating it is not a suitable model to use for prediction in this system.	sv
dc.identifier.coursecode	DATX05	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/300389
dc.language.iso	eng	sv
dc.setspec.uppsok	Technology
dc.subject	distributed system	sv
dc.subject	processing time prediction	sv
dc.subject	EC2	sv
dc.subject	task execution time	sv
dc.subject	neural network	sv
dc.subject	time series prediction	sv
dc.title	Evaluating Different Approaches for Predicting Task Execution Time A Case Study in a Distributed Production Environment	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 19-87 Forsström Carlsson.pdf
Storlek:: 31.14 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.14 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen