Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Multi-robot formation control requires a group of robots to coordinate under limited
communication and local sensing in order to form desired geometric structures.
Classical artificial-potential or spring-based controllers are interpretable and stable,
but are difficult to turn directly into trainable closed-loop neural policies. In contrast,
learning formation behavior directly with multi-agent reinforcement learning often
suffers from symmetry-induced local optima, unstable reward shaping, and poor
exploration.
This thesis studies strict-local self-organization for multi-robot formation control,
where each robot observes only local relative geometry, local edge-distance information,
and its own role encoding, without access to global target vectors or centralized
planning. We propose a controller-guided learning framework that combines DAgger
behavior cloning with Multi-Agent Proximal Policy Optimization (MAPPO) refinement.
A local spring controller is first constructed from the desired pairwise distances
of the target template and used as the teacher policy. A graph neural network with
recurrent memory is then trained on the closed-loop state distribution induced by the
learned policy. Finally, MAPPO refines the policy using a formation-stress reward
aligned with the teacher controller, while a frozen behavior-cloning policy serves as
an MSE anchor to prevent destructive policy drift. The target shape is conveyed
entirely through the desired pairwise distance on local communication edges, so a
single policy can switch between formations without any global goal vector.
Experiments demonstrate that the proposed method achieves stable formation on 10-
robot multi-shape tasks, reaching succ@0.4d = 0.998 and succ@0.25d = 0.987 across
triangular, hexagonal, circular, and rectangular-grid target formations. Using a single
model per team size, trained on all of its target formations, the method retains partial
scaling on the non-circular formations: succ@0.4d = 0.965 and succ@0.25d = 0.617
for the 21-robot system, and succ@0.4d = 0.525 and succ@0.25d = 0.254 for the
28-robot system. Large circular formations, however, remain unsolved at these larger
team sizes, suggesting an empirical feasibility boundary of fixed-local formation
control: the approach handles shapes that stay locally observable at the chosen
communication scale, but struggles when the target geometry becomes globally sparse
relative to the communication radius.
Beskrivning
Ämne/nyckelord
Multi-Robot Formation Control; Multi-Agent Reinforcement Learning; Behavior Cloning; Graph Neural Network.
