Bayesian Reinforcement Learning on Semi–Markov Decision Processes

Examensarbete för masterexamen

Please use this identifier to cite or link to this item:
Download file(s):
File Description SizeFormat 
Master_Thesis_Hilding_Södergren_Marcus_och_Vrede_Samuel.pdf1.13 MBAdobe PDFThumbnail
Bibliographical item details
Type: Examensarbete för masterexamen
Title: Bayesian Reinforcement Learning on Semi–Markov Decision Processes
Authors: Hilding Södergren, Marcus
Vrede, Samuel
Abstract: Automated decision making is a highly relevant research topic in the context of supply chains as the usage of automated vehicles is increasing. Further, utilising historical data and domain knowledge to increase the performance and ensure robust behaviour of the decision–making agent is valuable from an industry point of view. The dynamic vehicle routing problem (DVRP) is a routing problem where costs of stochastically and dynamically appearing tasks in a system are minimised. Further, the semi–Markov decision process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov decision process (MDP), that allows for modelling systems with stochastic state and time transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics are incorporated with reinforcement learning as a method to obtain better models based on historical data. In this thesis, we study how SMDPs can be applied, within the context of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for SMDPs, and show that our solver is able to outperform a naive routing strategy such as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising, though further work is needed. In addition, while we have incorporated risk–aversion into our solver we believe that the topic of risk–aversion needs further study. Based on the results, and by the development of the research fields, we believe that the ideas covered in this thesis deserve, and will get, more research attention. More autonomous decision making in the global supply chains is interesting from multiple perspectives. Improved decision making may result in more rapid supply chains while improving the efficiency resulting in reduced resource consumption. Further, incorporating risk–aversion into the decision making may lead to less fragile supply chains, potentially reducing the impacts caused by unexpected events and disturbances.
Keywords: Bayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision process
Issue Date: 2020
Publisher: Chalmers tekniska högskola / Institutionen för matematiska vetenskaper
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.