Bayesian Reinforcement Learning on Semi–Markov Decision Processes

Hilding Södergren, Marcus; Vrede, Samuel

Bayesian Reinforcement Learning on Semi–Markov Decision Processes

dc.contributor.author	Hilding Södergren, Marcus
dc.contributor.author	Vrede, Samuel
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.examiner	Axelson-Fisk, Marina
dc.contributor.supervisor	Ehn, Gustaf
dc.date.accessioned	2021-01-08T11:56:23Z
dc.date.available	2021-01-08T11:56:23Z
dc.date.issued	2020	sv
dc.date.submitted	2020
dc.description.abstract	Automated decision making is a highly relevant research topic in the context of supply chains as the usage of automated vehicles is increasing. Further, utilising historical data and domain knowledge to increase the performance and ensure robust behaviour of the decision–making agent is valuable from an industry point of view. The dynamic vehicle routing problem (DVRP) is a routing problem where costs of stochastically and dynamically appearing tasks in a system are minimised. Further, the semi–Markov decision process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov decision process (MDP), that allows for modelling systems with stochastic state and time transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics are incorporated with reinforcement learning as a method to obtain better models based on historical data. In this thesis, we study how SMDPs can be applied, within the context of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for SMDPs, and show that our solver is able to outperform a naive routing strategy such as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising, though further work is needed. In addition, while we have incorporated risk–aversion into our solver we believe that the topic of risk–aversion needs further study. Based on the results, and by the development of the research fields, we believe that the ideas covered in this thesis deserve, and will get, more research attention. More autonomous decision making in the global supply chains is interesting from multiple perspectives. Improved decision making may result in more rapid supply chains while improving the efficiency resulting in reduced resource consumption. Further, incorporating risk–aversion into the decision making may lead to less fragile supply chains, potentially reducing the impacts caused by unexpected events and disturbances.	sv
dc.identifier.coursecode	MVEX03	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/302145
dc.language.iso	eng	sv
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Bayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision process	sv
dc.title	Bayesian Reinforcement Learning on Semi–Markov Decision Processes	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Master_Thesis_Hilding_Södergren_Marcus_och_Vrede_Samuel.pdf
Storlek:: 1.1 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.14 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen