Bayesian Reinforcement Learning on Semi–Markov Decision Processes
Publicerad
Författare
Typ
Examensarbete för masterexamen
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Automated decision making is a highly relevant research topic in the context of supply
chains as the usage of automated vehicles is increasing. Further, utilising historical data
and domain knowledge to increase the performance and ensure robust behaviour of the
decision–making agent is valuable from an industry point of view. The dynamic vehicle
routing problem (DVRP) is a routing problem where costs of stochastically and dynamically
appearing tasks in a system are minimised. Further, the semi–Markov decision
process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov
decision process (MDP), that allows for modelling systems with stochastic state and time
transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics
are incorporated with reinforcement learning as a method to obtain better models based
on historical data. In this thesis, we study how SMDPs can be applied, within the context
of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We
develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for
SMDPs, and show that our solver is able to outperform a naive routing strategy such
as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of
applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising,
though further work is needed. In addition, while we have incorporated risk–aversion into
our solver we believe that the topic of risk–aversion needs further study. Based on the
results, and by the development of the research fields, we believe that the ideas covered
in this thesis deserve, and will get, more research attention. More autonomous decision
making in the global supply chains is interesting from multiple perspectives. Improved
decision making may result in more rapid supply chains while improving the efficiency
resulting in reduced resource consumption. Further, incorporating risk–aversion into the
decision making may lead to less fragile supply chains, potentially reducing the impacts
caused by unexpected events and disturbances.
Beskrivning
Ämne/nyckelord
Bayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision process