Evaluating Different Approaches for Predicting Task Execution Time A Case Study in a Distributed Production Environment
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This project details the evaluation of several machine learning models used to predict the required processing times in a complex distributed system used to analyze large amounts of data. Specifically, the system is owned and developed by the company Recorded Future, who specialize in sifting through vast amounts of textual data acquired from a variety of online sources in search of threat intelligence they can provide to their clients. The input is pre-processed in several stages before it arrives at the main analyser process, where natural language processing and other tools are used to perform a threat analysis of the text. Our primary goal is to determine the time needed to analyse one of these texts. The ability to predict the time required to process a given set of input can be used to design scheduling algorithms in cloud computing environments [1]. It is of extra interest to Recorded Future as they use the size of message queues, which might grow large when processing takes too long, in order to decide when to start additional servers. As servers take some time to go online, being able to start them proactively based on estimated queue size, which can be inferred from the required processing time and the available computing resources, can alleviate problems with bottlenecks and other performance issues. RF has defined a maximum error within one order of magnitude compared to the actual time to be acceptable for the purposes of workload estimation.
To accomplish this goal, we have developed, trained, and tested several prediction models based on neural networks. Each network considers a different set of input features that may affect the processing time - information extracted from the input data, system performance at the time of analysis, total server workload in terms of input processed in parallel, and past processing times. For evaluation, the prediction error from two naive algorithms that predict the mean and median value of the task execution times in each data set is compared to each models error.
Our results show that all but one of the prediction models achieve a lower error than using the naive approach, and all models perform better than the maximum error specified by RF. There is a trade-off between how feasible it would be to implement and use a model in the real system, and the achieved accuracy. The model that considers system performance achieves an error that is half that of the one based purely on input information. Considering the total workload of each server reduces the error by a negligible compared to the first model, and using previous task execution times is shown to provide fluctuating results, indicating it is not a suitable model to use for prediction in this system.
Beskrivning
Ämne/nyckelord
distributed system, processing time prediction, EC2, task execution time, neural network, time series prediction