Auto-scaling cloud infrastructure with Reinforcement Learning A comparison between multiple RL algorithms to auto-scale resources in cloud infrastructure

Date

Type

Examensarbete för masterexamen
Master Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With an increasing use of cloud services for both personal and professional use, the competition for bringing the best product becomes harder as more companies provide this type of service. Not only do they want to save cost, but also improve the stability to better handle sudden, unexpected problems that can decrease the performance and responsiveness of their cloud service. Therefore, the purpose of this project was to propose and evaluate different solutions that can auto-scale the cloud infrastructure based on its resource usage. Also included in the report are algorithms that did not provide any usable results or could not handle the complexity of the problem. We developed three different reinforcement learning algorithms in Python, using the Tensorflow framework to train neural networks, and compared their performances in terms of both cost and stability. These algorithms were implemented to work on virtual machines with Apcera installed and were trained with data collected through Apceras API. The training was done in a simulation of the cloud cluster. The results of this project shows a noticeable difference between these three algorithms. While all three work to some degree, one stands out and performs significantly better than the other two in terms of cost and the stability of the cluster. Conclusively, we have an algorithm that can accurately predict how to scale the cloud cluster based on the time of day, and the current resource usage.

Description

Keywords

Data- och informationsvetenskap, Computer and Information Science

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Collections

Endorsement

Review

Supplemented By

Referenced By