Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning

Hämtar...
Bild (thumbnail)

Publicerad

Författare

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Multi-robot formation control requires a group of robots to coordinate under limited communication and local sensing in order to form desired geometric structures. Classical artificial-potential or spring-based controllers are interpretable and stable, but are difficult to turn directly into trainable closed-loop neural policies. In contrast, learning formation behavior directly with multi-agent reinforcement learning often suffers from symmetry-induced local optima, unstable reward shaping, and poor exploration. This thesis studies strict-local self-organization for multi-robot formation control, where each robot observes only local relative geometry, local edge-distance information, and its own role encoding, without access to global target vectors or centralized planning. We propose a controller-guided learning framework that combines DAgger behavior cloning with Multi-Agent Proximal Policy Optimization (MAPPO) refinement. A local spring controller is first constructed from the desired pairwise distances of the target template and used as the teacher policy. A graph neural network with recurrent memory is then trained on the closed-loop state distribution induced by the learned policy. Finally, MAPPO refines the policy using a formation-stress reward aligned with the teacher controller, while a frozen behavior-cloning policy serves as an MSE anchor to prevent destructive policy drift. The target shape is conveyed entirely through the desired pairwise distance on local communication edges, so a single policy can switch between formations without any global goal vector. Experiments demonstrate that the proposed method achieves stable formation on 10- robot multi-shape tasks, reaching succ@0.4d = 0.998 and succ@0.25d = 0.987 across triangular, hexagonal, circular, and rectangular-grid target formations. Using a single model per team size, trained on all of its target formations, the method retains partial scaling on the non-circular formations: succ@0.4d = 0.965 and succ@0.25d = 0.617 for the 21-robot system, and succ@0.4d = 0.525 and succ@0.25d = 0.254 for the 28-robot system. Large circular formations, however, remain unsolved at these larger team sizes, suggesting an empirical feasibility boundary of fixed-local formation control: the approach handles shapes that stay locally observable at the chosen communication scale, but struggles when the target geometry becomes globally sparse relative to the communication radius.

Beskrivning

Ämne/nyckelord

Multi-Robot Formation Control; Multi-Agent Reinforcement Learning; Behavior Cloning; Graph Neural Network.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

Endorsement

Review

Supplemented By

Referenced By