Bayesian Reinforcement Learning on Semi–Markov Decision Processes

Hilding Södergren, Marcus; Vrede, Samuel

Bayesian Reinforcement Learning on Semi–Markov Decision Processes

Ladda ner

Master_Thesis_Hilding_Södergren_Marcus_och_Vrede_Samuel.pdf (1.1 MB)

Typ

Examensarbete för masterexamen

Program

Complex adaptive systems (MPCAS), MSc

Publicerad

2020

Författare

Hilding Södergren, Marcus

Vrede, Samuel

Sammanfattning

Automated decision making is a highly relevant research topic in the context of supply chains as the usage of automated vehicles is increasing. Further, utilising historical data and domain knowledge to increase the performance and ensure robust behaviour of the decision–making agent is valuable from an industry point of view. The dynamic vehicle routing problem (DVRP) is a routing problem where costs of stochastically and dynamically appearing tasks in a system are minimised. Further, the semi–Markov decision process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov decision process (MDP), that allows for modelling systems with stochastic state and time transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics are incorporated with reinforcement learning as a method to obtain better models based on historical data. In this thesis, we study how SMDPs can be applied, within the context of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for SMDPs, and show that our solver is able to outperform a naive routing strategy such as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising, though further work is needed. In addition, while we have incorporated risk–aversion into our solver we believe that the topic of risk–aversion needs further study. Based on the results, and by the development of the research fields, we believe that the ideas covered in this thesis deserve, and will get, more research attention. More autonomous decision making in the global supply chains is interesting from multiple perspectives. Improved decision making may result in more rapid supply chains while improving the efficiency resulting in reduced resource consumption. Further, incorporating risk–aversion into the decision making may lead to less fragile supply chains, potentially reducing the impacts caused by unexpected events and disturbances.

Ämne/nyckelord

Bayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision process

URI

https://hdl.handle.net/20.500.12380/302145

Samling

Examensarbeten för masterexamen

Visa fullständig post