Safe Multi-Robot Planning Via Long-Run Averages
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Constrained reinforcement learning in Markov decision processes (MDPs) has received
increasing attention for its use in sequential decision making problems with safety
requirements. This study investigated safe planning via long run average reward
using MDPs. This thesis uses grid-world environments and builds on the Triple-QA
framework [1]. Three approaches are evaluated: A single-agent baseline and two
multi agent extensions, a trivial joint-state extension, and separate Q-table approach.
The results show that the single agent algorithm reproduces the result found in the
original framework and serves as a reliable baseline. The joint state space extension
suffers from poor scalability due to exponential growth in the state action space,
and therefore does not achieve comparable reward per agent as the baseline. In
contrast, the separate Q-table approach scales significantly better and achieves a level
comparable to the single agent case both in an environment with and without agent
interaction. Although the result of two agents with trivial extension and separate
Q-table was achieved to satisfy the constraint, the test with three agents did not
satisfy for both algorithms.
Beskrivning
Ämne/nyckelord
Computer, science, computer science, engineering, multi-agent, reinforce ment learning, project, thesis.
