Safe Multi-Robot Planning Via Long-Run Averages

Hämtar...
Bild (thumbnail)

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Constrained reinforcement learning in Markov decision processes (MDPs) has received increasing attention for its use in sequential decision making problems with safety requirements. This study investigated safe planning via long run average reward using MDPs. This thesis uses grid-world environments and builds on the Triple-QA framework [1]. Three approaches are evaluated: A single-agent baseline and two multi agent extensions, a trivial joint-state extension, and separate Q-table approach. The results show that the single agent algorithm reproduces the result found in the original framework and serves as a reliable baseline. The joint state space extension suffers from poor scalability due to exponential growth in the state action space, and therefore does not achieve comparable reward per agent as the baseline. In contrast, the separate Q-table approach scales significantly better and achieves a level comparable to the single agent case both in an environment with and without agent interaction. Although the result of two agents with trivial extension and separate Q-table was achieved to satisfy the constraint, the test with three agents did not satisfy for both algorithms.

Beskrivning

Ämne/nyckelord

Computer, science, computer science, engineering, multi-agent, reinforce ment learning, project, thesis.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

Endorsement

Review

Supplemented By

Referenced By