Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning

Cai, Yihuai

Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning

dc.contributor.author	Cai, Yihuai
dc.contributor.department	Chalmers tekniska högskola / Institutionen för fysik	sv
dc.contributor.department	Chalmers University of Technology / Department of Physics	en
dc.contributor.examiner	Volpe, Giovanni
dc.contributor.supervisor	Volpe, Giovanni
dc.date.accessioned	2026-06-22T16:27:49Z
dc.date.issued	2026
dc.date.submitted
dc.description.abstract	Multi-robot formation control requires a group of robots to coordinate under limited communication and local sensing in order to form desired geometric structures. Classical artificial-potential or spring-based controllers are interpretable and stable, but are difficult to turn directly into trainable closed-loop neural policies. In contrast, learning formation behavior directly with multi-agent reinforcement learning often suffers from symmetry-induced local optima, unstable reward shaping, and poor exploration. This thesis studies strict-local self-organization for multi-robot formation control, where each robot observes only local relative geometry, local edge-distance information, and its own role encoding, without access to global target vectors or centralized planning. We propose a controller-guided learning framework that combines DAgger behavior cloning with Multi-Agent Proximal Policy Optimization (MAPPO) refinement. A local spring controller is first constructed from the desired pairwise distances of the target template and used as the teacher policy. A graph neural network with recurrent memory is then trained on the closed-loop state distribution induced by the learned policy. Finally, MAPPO refines the policy using a formation-stress reward aligned with the teacher controller, while a frozen behavior-cloning policy serves as an MSE anchor to prevent destructive policy drift. The target shape is conveyed entirely through the desired pairwise distance on local communication edges, so a single policy can switch between formations without any global goal vector. Experiments demonstrate that the proposed method achieves stable formation on 10- robot multi-shape tasks, reaching succ@0.4d = 0.998 and succ@0.25d = 0.987 across triangular, hexagonal, circular, and rectangular-grid target formations. Using a single model per team size, trained on all of its target formations, the method retains partial scaling on the non-circular formations: succ@0.4d = 0.965 and succ@0.25d = 0.617 for the 21-robot system, and succ@0.4d = 0.525 and succ@0.25d = 0.254 for the 28-robot system. Large circular formations, however, remain unsolved at these larger team sizes, suggesting an empirical feasibility boundary of fixed-local formation control: the approach handles shapes that stay locally observable at the chosen communication scale, but struggles when the target geometry becomes globally sparse relative to the communication radius.
dc.identifier.coursecode	TIFX61
dc.identifier.uri	https://hdl.handle.net/20.500.12380/311443
dc.language.iso	eng
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	Multi-Robot Formation Control; Multi-Agent Reinforcement Learning; Behavior Cloning; Graph Neural Network.
dc.title	Strictly-Local Multi-Agent Formation Control via BC-Anchored Reinforcement Learning
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Data science and AI (MPDSC), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Yihuai_Cai.pdf
Size:: 2.42 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen