Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning
| dc.contributor.author | Cai, Yihuai | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för fysik | sv |
| dc.contributor.department | Chalmers University of Technology / Department of Physics | en |
| dc.contributor.examiner | Volpe, Giovanni | |
| dc.contributor.supervisor | Volpe, Giovanni | |
| dc.date.accessioned | 2026-06-22T16:27:49Z | |
| dc.date.issued | 2026 | |
| dc.date.submitted | ||
| dc.description.abstract | Multi-robot formation control requires a group of robots to coordinate under limited communication and local sensing in order to form desired geometric structures. Classical artificial-potential or spring-based controllers are interpretable and stable, but are difficult to turn directly into trainable closed-loop neural policies. In contrast, learning formation behavior directly with multi-agent reinforcement learning often suffers from symmetry-induced local optima, unstable reward shaping, and poor exploration. This thesis studies strict-local self-organization for multi-robot formation control, where each robot observes only local relative geometry, local edge-distance information, and its own role encoding, without access to global target vectors or centralized planning. We propose a controller-guided learning framework that combines DAgger behavior cloning with Multi-Agent Proximal Policy Optimization (MAPPO) refinement. A local spring controller is first constructed from the desired pairwise distances of the target template and used as the teacher policy. A graph neural network with recurrent memory is then trained on the closed-loop state distribution induced by the learned policy. Finally, MAPPO refines the policy using a formation-stress reward aligned with the teacher controller, while a frozen behavior-cloning policy serves as an MSE anchor to prevent destructive policy drift. The target shape is conveyed entirely through the desired pairwise distance on local communication edges, so a single policy can switch between formations without any global goal vector. Experiments demonstrate that the proposed method achieves stable formation on 10- robot multi-shape tasks, reaching succ@0.4d = 0.998 and succ@0.25d = 0.987 across triangular, hexagonal, circular, and rectangular-grid target formations. Using a single model per team size, trained on all of its target formations, the method retains partial scaling on the non-circular formations: succ@0.4d = 0.965 and succ@0.25d = 0.617 for the 21-robot system, and succ@0.4d = 0.525 and succ@0.25d = 0.254 for the 28-robot system. Large circular formations, however, remain unsolved at these larger team sizes, suggesting an empirical feasibility boundary of fixed-local formation control: the approach handles shapes that stay locally observable at the chosen communication scale, but struggles when the target geometry becomes globally sparse relative to the communication radius. | |
| dc.identifier.coursecode | TIFX61 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12380/311443 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | PhysicsChemistryMaths | |
| dc.subject | Multi-Robot Formation Control; Multi-Agent Reinforcement Learning; Behavior Cloning; Graph Neural Network. | |
| dc.title | Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Data science and AI (MPDSC), MSc |
