Auto-scaling cloud infrastructure with Reinforcement Learning A comparison between multiple RL algorithms to auto-scale resources in cloud infrastructure
Download
Date
Authors
Type
Examensarbete för masterexamen
Master Thesis
Master Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With an increasing use of cloud services for both personal and professional use, the competition for bringing the best product becomes harder as more companies provide this type of service. Not only do they want to save cost, but also improve the stability to better handle sudden, unexpected problems that can decrease the performance and responsiveness of their cloud service. Therefore, the purpose of this project was to propose and evaluate different solutions that can auto-scale the cloud infrastructure based on its resource usage. Also included in the report are algorithms that did not provide any usable results or could not handle the complexity of the problem. We developed three different reinforcement learning algorithms in Python, using the Tensorflow framework to train neural networks, and compared their performances in terms of both cost and stability. These algorithms were implemented to work on virtual machines with Apcera installed and were trained with data collected through Apceras API. The training was done in a simulation of the cloud cluster. The results of this project shows a noticeable difference between these three algorithms. While all three work to some degree, one stands out and performs significantly better than the other two in terms of cost and the stability of the cluster. Conclusively, we have an algorithm that can accurately predict how to scale the cloud cluster based on the time of day, and the current resource usage.
Description
Keywords
Data- och informationsvetenskap, Computer and Information Science