Paper 1: Graph Neural Network for Decentralized Multi-Robot Goal Assignment
用图神经网络GNN去解决具有通信约束情况下的Linear Sum Assignment Problem (LSAP)——机器人任务一对一约束下最小化总成本
论文信息
- 标题:Graph Neural Network for Decentralized Multi-Robot Goal Assignment
- 作者 / 单位:Manohari Goarin, Giuseppe Loianno / Tandon School of Engineering, New York University, Brooklyn, NY 11201 USA
- 来源:IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 5, MAY 2024
- 原文链接:https://ieeexplore.ieee.org/document/10452797
背景和贡献
- 解决LSAP思路有:Centralized Method 和 Decentralized Method
- Decentralied Method 可以有
- Optimization-based Method:例如分布式匈牙利算法(Hungarian algorithm)
- Market-based Methods:例如拍卖算法
- Learning-based Methods:例如GNN
- 主要贡献:处理了具有通信拓扑约束的情况
方法
网络的输入是一个节点包含机器人r和目标g的异构图,图的边是通信拓扑约束和代价。
网络的输出是机器人是否分配目标的概率
训练采用监督学习(supervised learning),模仿集中式 Hungarian algorithm 的最优 LSAP 解。

结果和评价
GNN类方法提供了一个思路,即如何去建模和表征任务与Agent之间的关系。
Paper 2:Dynamic Coalition Formation and Routing for Multirobot Task Allocation via Reinforcement Learning
用 attention 网络去解决同构机器人集群的任务分配问题 ST-MR-TA
论文信息
- 标题:Dynamic Coalition Formation and Routing for Multirobot Task Allocation via Reinforcement Learning
- 作者 / 单位:Weiheng Dai1, Aditya Bidwai, Guillaume Sartoretti / Department of Mechanical Engineering, College of Design and Engineering, National University of Singapore
- 来源:2024 IEEE International Conference on Robotics and Automation (ICRA)
- 原文链接:https://ieeexplore.ieee.org/document/10611244/
背景和贡献
- Work falls under the category of ST-MR-TA, where each robot can perform only one task at a time (ST), each task can require the cooperation of multiple robots (MR), and task allocation continuously happens across time (TA).
- Agents learn to reason about their position, the status of all tasks, as well as the position and short-term intent of other agents, to make reactive movement decisions (i.e., which task to travel to and complete next)
- 提供了一个减少训练时真正用到的决策变量的leader-follower的trick
方法

Observation(task state, agent info, task_already_done_flag) --[Linear Projection]--> embeddings --[Multi-head Attentions Encoder]--> contents --[Decoder]--> Probability Distribution of Task
Training: REINFORCE [28] algorithm with greedy rollout baseline
结果和评价
- 采用Attension网络去构造规划器,提供了一个例子
- 用到的Agent是全局的信息,不是局部感知信息
Paper 3:Learning Policies for Dynamic Coalition Formation in Multi-Robot Task Allocation
一句话总结
论文信息
- 标题:Learning Policies for Dynamic Coalition Formation in Multi-Robot Task Allocation
- 作者 / 单位:Lucas C. D. Bezerra , Ata ́ıde M. G. dos Santos , and Shinkyu Park / Electrical and Computer Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, 239556900, Kingdom of Saudi Arabia;Department of Electrical Engineering, Federal University of Sergipe (UFS), Sa ̃o Cristo ́va ̃o, Sergipe, 49107-230, Brazil.
- 来源:IEEE Robotics and Automation Letters ( Volume: 10, Issue: 9, September 2025)
- 原文来源:https://ieeexplore.ieee.org/document/11091462
背景和贡献
- Multi-Robot Task Allocation (MRTA), Single-Task robots, Multi-Robot tasks, Timeextended Assignment (ST-MR-TA)
- The problem of decentralized dynamic coalition formation under partial observability has not been previously addressed
- Focus on developing policies for a team of robots capable of performing tasks that require coalition in dynamic environments.
- An end-to-end convolutional neural network based on the U-Net architecture
方法
- In this framework, the policy selects a task at each time step, while a motion planner handles the low-level actuator control to navigate the robot to the task location —— This abstraction alleviates the learned policy from handling low-level control, allowing it to concentrate on long-term planning.
- Model the problem as a Decentralized Partially-Observable Markov Decision Process (Dec-POMDP)
- Adopt MAPPO, a CTDE algorithm designed for MARL, is an Actor-Critic algorithm

结果和评价
- 用Dec-POMDP建模问题,用Actor-Critic algorithm来训练Policy