Pathfinding med reinforcement learning i delvis observerbara miljöer
Examensarbete för kandidatexamen
Reinforcement learning algorithms have the ability to solve problems without explicit knowledge of their underlying model. Instead, they infer a strategy directly from observations and rewards acquired by interacting with their environment. This makes them suitable candidates for solving pathfinding problems in a partially observable setting, where the aim is to find a path in an environment with restricted vision. This report aims to investigate how Markov decision processes and reinforcement learning can be used to model and solve partially observable pathfinding problems. Existing literature has been reviewed to give a theoretical background of the subject, before progressing to practical implementations. We have applied state-of-the-art algorithms taken from two subclasses of reinforcement learning methods: value based algorithms and policy based algorithms. We find that partially observable Markov decision processes can be used to model pathfinding problems, but not all reinforcement learning algorithms are suitable for solving them. In theory, value based algorithms show potential but when implemented they did not yield positive results. Conversely, the policy based algorithm Proximal Policy Optimization is able to solve the problem convincingly. This algorithm also performs well in environments previously not trained in, thus displaying some ability to generalize its policy.
Grundläggande vetenskaper , Matematik , Basic Sciences , Mathematics