Bayesian Reinforcement Learning on Semi–Markov Decision Processes

Typ
Examensarbete för masterexamen
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2020
Författare
Hilding Södergren, Marcus
Vrede, Samuel
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Automated decision making is a highly relevant research topic in the context of supply chains as the usage of automated vehicles is increasing. Further, utilising historical data and domain knowledge to increase the performance and ensure robust behaviour of the decision–making agent is valuable from an industry point of view. The dynamic vehicle routing problem (DVRP) is a routing problem where costs of stochastically and dynamically appearing tasks in a system are minimised. Further, the semi–Markov decision process (SMDP) is an extension of the, in reinforcement learning commonly used, Markov decision process (MDP), that allows for modelling systems with stochastic state and time transition dynamics. In Bayesian reinforcement learning, ideas from Bayesian statistics are incorporated with reinforcement learning as a method to obtain better models based on historical data. In this thesis, we study how SMDPs can be applied, within the context of Bayesian reinforcement learning, to the DVRP, while considering risk–aversion. We develop an SMDP model of the DVRP and a Bayesian reinforcement learning solver for SMDPs, and show that our solver is able to outperform a naive routing strategy such as first in, first out (FIFO). Our results show that the, to our knowledge, novel idea of applying SMDPs in a Bayesian reinforcement learning context to the DVRP is promising, though further work is needed. In addition, while we have incorporated risk–aversion into our solver we believe that the topic of risk–aversion needs further study. Based on the results, and by the development of the research fields, we believe that the ideas covered in this thesis deserve, and will get, more research attention. More autonomous decision making in the global supply chains is interesting from multiple perspectives. Improved decision making may result in more rapid supply chains while improving the efficiency resulting in reduced resource consumption. Further, incorporating risk–aversion into the decision making may lead to less fragile supply chains, potentially reducing the impacts caused by unexpected events and disturbances.
Beskrivning
Ämne/nyckelord
Bayes–adaptive Monte–Carlo planning, Bayesian reinforcement learning, dynamic vehicle routing, Monte–Carlo tree search, risk measure, robustness, semi–Markov decision process
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index