Pathfinding med reinforcement learning i delvis observerbara miljöer

Engström, Anne; Lidin, Joel; Molander, Gustav; Onoszko, Noa; Månsson, Olle; Ölund, Hugo

Pathfinding med reinforcement learning i delvis observerbara miljöer

dc.contributor.author	Engström, Anne
dc.contributor.author	Lidin, Joel
dc.contributor.author	Molander, Gustav
dc.contributor.author	Onoszko, Noa
dc.contributor.author	Månsson, Olle
dc.contributor.author	Ölund, Hugo
dc.contributor.department	Chalmers tekniska högskola / Institutionen för matematiska vetenskaper	sv
dc.contributor.department	Chalmers University of Technology / Department of Mathematical Sciences	en
dc.date.accessioned	2019-07-05T12:03:09Z
dc.date.available	2019-07-05T12:03:09Z
dc.date.issued	2019
dc.description.abstract	Reinforcement learning algorithms have the ability to solve problems without explicit knowledge of their underlying model. Instead, they infer a strategy directly from observations and rewards acquired by interacting with their environment. This makes them suitable candidates for solving pathfinding problems in a partially observable setting, where the aim is to find a path in an environment with restricted vision. This report aims to investigate how Markov decision processes and reinforcement learning can be used to model and solve partially observable pathfinding problems. Existing literature has been reviewed to give a theoretical background of the subject, before progressing to practical implementations. We have applied state-of-the-art algorithms taken from two subclasses of reinforcement learning methods: value based algorithms and policy based algorithms. We find that partially observable Markov decision processes can be used to model pathfinding problems, but not all reinforcement learning algorithms are suitable for solving them. In theory, value based algorithms show potential but when implemented they did not yield positive results. Conversely, the policy based algorithm Proximal Policy Optimization is able to solve the problem convincingly. This algorithm also performs well in environments previously not trained in, thus displaying some ability to generalize its policy.
dc.identifier.uri	https://hdl.handle.net/20.500.12380/257380
dc.language.iso	swe
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Grundläggande vetenskaper
dc.subject	Matematik
dc.subject	Basic Sciences
dc.subject	Mathematics
dc.title	Pathfinding med reinforcement learning i delvis observerbara miljöer
dc.type.degree	Examensarbete för kandidatexamen	sv
dc.type.degree	Bachelor Thesis	en
dc.type.uppsok	M2

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: 257380.pdf
Size:: 1.45 MB
Format:: Adobe Portable Document Format
Description:: Fulltext

Ladda ner

Samlingar

Examensarbeten för kandidatexamen