Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning

dc.contributor.authorCai, Yihuai
dc.contributor.departmentChalmers tekniska högskola / Institutionen för fysiksv
dc.contributor.departmentChalmers University of Technology / Department of Physicsen
dc.contributor.examinerVolpe, Giovanni
dc.contributor.supervisorVolpe, Giovanni
dc.date.accessioned2026-06-22T16:27:49Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractMulti-robot formation control requires a group of robots to coordinate under limited communication and local sensing in order to form desired geometric structures. Classical artificial-potential or spring-based controllers are interpretable and stable, but are difficult to turn directly into trainable closed-loop neural policies. In contrast, learning formation behavior directly with multi-agent reinforcement learning often suffers from symmetry-induced local optima, unstable reward shaping, and poor exploration. This thesis studies strict-local self-organization for multi-robot formation control, where each robot observes only local relative geometry, local edge-distance information, and its own role encoding, without access to global target vectors or centralized planning. We propose a controller-guided learning framework that combines DAgger behavior cloning with Multi-Agent Proximal Policy Optimization (MAPPO) refinement. A local spring controller is first constructed from the desired pairwise distances of the target template and used as the teacher policy. A graph neural network with recurrent memory is then trained on the closed-loop state distribution induced by the learned policy. Finally, MAPPO refines the policy using a formation-stress reward aligned with the teacher controller, while a frozen behavior-cloning policy serves as an MSE anchor to prevent destructive policy drift. The target shape is conveyed entirely through the desired pairwise distance on local communication edges, so a single policy can switch between formations without any global goal vector. Experiments demonstrate that the proposed method achieves stable formation on 10- robot multi-shape tasks, reaching succ@0.4d = 0.998 and succ@0.25d = 0.987 across triangular, hexagonal, circular, and rectangular-grid target formations. Using a single model per team size, trained on all of its target formations, the method retains partial scaling on the non-circular formations: succ@0.4d = 0.965 and succ@0.25d = 0.617 for the 21-robot system, and succ@0.4d = 0.525 and succ@0.25d = 0.254 for the 28-robot system. Large circular formations, however, remain unsolved at these larger team sizes, suggesting an empirical feasibility boundary of fixed-local formation control: the approach handles shapes that stay locally observable at the chosen communication scale, but struggles when the target geometry becomes globally sparse relative to the communication radius.
dc.identifier.coursecodeTIFX61
dc.identifier.urihttps://hdl.handle.net/20.500.12380/311443
dc.language.isoeng
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectMulti-Robot Formation Control; Multi-Agent Reinforcement Learning; Behavior Cloning; Graph Neural Network.
dc.titleStrictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeData science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Yihuai_Cai.pdf
Size:
2.42 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Size:
2.35 KB
Format:
Item-specific license agreed upon to submission
Description: