Bayesian Reinforcement Learning on Semi–Markov Decision Processes

dc.contributor.authorHilding Södergren, Marcus
dc.contributor.authorVrede, Samuel
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerAxelson-Fisk, Marina
dc.contributor.supervisorEhn, Gustaf
dc.date.accessioned2021-01-08T11:56:23Z
dc.date.available2021-01-08T11:56:23Z
dc.date.issued2020sv
dc.date.submitted2020
dc.description.abstractAutomated decision making is a highly relevant research topic in the context of supply chains as the usage of automated vehicles is increasing. Further, utilising historical data and domain knowledge to increase the performance and ensure robust behaviour of the decision–making agent is valuable from an industry point of view. The dynamic vehicle routing problem (DVRP) is a routing problem where costs of stochastically and dynamically appearing tasks in a system are minimised. Further, the semi–Markov decision process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov decision process (MDP), that allows for modelling systems with stochastic state and time transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics are incorporated with reinforcement learning as a method to obtain better models based on historical data. In this thesis, we study how SMDPs can be applied, within the context of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for SMDPs, and show that our solver is able to outperform a naive routing strategy such as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising, though further work is needed. In addition, while we have incorporated risk–aversion into our solver we believe that the topic of risk–aversion needs further study. Based on the results, and by the development of the research fields, we believe that the ideas covered in this thesis deserve, and will get, more research attention. More autonomous decision making in the global supply chains is interesting from multiple perspectives. Improved decision making may result in more rapid supply chains while improving the efficiency resulting in reduced resource consumption. Further, incorporating risk–aversion into the decision making may lead to less fragile supply chains, potentially reducing the impacts caused by unexpected events and disturbances.sv
dc.identifier.coursecodeMVEX03sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/302145
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectBayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision processsv
dc.titleBayesian Reinforcement Learning on Semi–Markov Decision Processessv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_Thesis_Hilding_Södergren_Marcus_och_Vrede_Samuel.pdf
Storlek:
1.1 MB
Format:
Adobe Portable Document Format
Beskrivning:

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: