Pathfinding med reinforcement learning i delvis observerbara miljöer

Loading...
Thumbnail Image

Date

Type

Examensarbete för kandidatexamen
Bachelor Thesis

Programme

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Reinforcement learning algorithms have the ability to solve problems without explicit knowledge of their underlying model. Instead, they infer a strategy directly from observations and rewards acquired by interacting with their environment. This makes them suitable candidates for solving pathfinding problems in a partially observable setting, where the aim is to find a path in an environment with restricted vision. This report aims to investigate how Markov decision processes and reinforcement learning can be used to model and solve partially observable pathfinding problems. Existing literature has been reviewed to give a theoretical background of the subject, before progressing to practical implementations. We have applied state-of-the-art algorithms taken from two subclasses of reinforcement learning methods: value based algorithms and policy based algorithms. We find that partially observable Markov decision processes can be used to model pathfinding problems, but not all reinforcement learning algorithms are suitable for solving them. In theory, value based algorithms show potential but when implemented they did not yield positive results. Conversely, the policy based algorithm Proximal Policy Optimization is able to solve the problem convincingly. This algorithm also performs well in environments previously not trained in, thus displaying some ability to generalize its policy.

Description

Keywords

Grundläggande vetenskaper, Matematik, Basic Sciences, Mathematics

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Collections

Endorsement

Review

Supplemented By

Referenced By