GNNs for Robotics: Planning, Manipulation, and Multi-Agent Systems
Published:
Why Graphs in Robotics
Problem 1: Variable structure A robot arm picking up objects faces different numbers of objects each time. A flat neural network with fixed input size cannot handle this. A GNN operates on graphs of any size.
Problem 2: Relational reasoning โObject A is above object B, which is supported by the tableโ โ planning a stack requires reasoning about these relations. GNNs capture relational structure explicitly.
Problem 3: Generalisation A policy trained on a 4-link robot should generalise to a 6-link robot. A GNN treating robot links as nodes generalises to different numbers of links โ the same message passing applies regardless of graph size.
Application 1: Robot Morphology (NerveNet)
NerveNet (Wang et al., 2018): model a robotโs body as a graph where:
- Nodes = actuators/joints
- Edges = kinematic connections (joint โ joint)
- Node features = joint state (angle, velocity)
GNN propagates information along the kinematic chain. The policy maps joint-level graph โ actions. Crucially, the same GNN policy works for robots with different numbers of joints โ tested on 2-link, 4-link, and 6-link robots from the same policy.
Advantage: policy generalises to robot variants not seen during training โ e.g., train on 4 legs, test on 3 legs or 5 legs.
Application 2: Object Manipulation
Task-and-motion planning: plan a sequence of robot actions to achieve a goal (e.g., build a tower from blocks).
Scene graph GNN: represent the scene as a graph:
- Nodes = objects (position, shape, type)
- Edges = spatial relations (on-top-of, adjacent-to, in-front-of)
GNN encodes the current state; planning algorithm searches over sequences of actions and predicted resulting states. The GNNโs relational encoding enables compositional generalisation โ solving 5-block towers after training on 3-block towers.
Application 3: Multi-Robot Coordination
Decentralised multi-robot planning: N robots must coordinate without a central controller. Each robot observes local state and communicates with nearby robots.
CommNet / GMMN: model inter-robot communication as a GNN. At each step:
- Each robot sends a message to nearby robots (edge to edge in proximity graph)
- Each robot aggregates received messages
- Each robot decides its action based on own state + aggregated messages
The GNN is the communication protocol. Training via multi-agent RL.
Key results:
- GNN-based communication outperforms no-communication baselines by 40%+ on cooperative navigation tasks
- Scales from N=5 to N=20 robots without retraining (variable graph size)
Application 4: Physics Simulation and Model-Based RL
Interaction networks (Battaglia et al., 2016): model physical systems as graphs. Nodes = objects, edges = interactions. GNN predicts next state from current state.
Applications:
- Cloth simulation: nodes = vertices, edges = cloth edges
- Rigid body dynamics: nodes = objects, edges = contact constraints
- Particle systems: nodes = particles, edges = proximity
Model-based RL with GNN dynamics model: learn the physical model as a GNN, use it for planning (model-predictive control or model-based policy search). GNNs generalise to unseen object configurations because the dynamics are object-agnostic.
Application 5: Point Cloud Processing for Perception
Lidar sensors produce 3D point clouds โ unordered sets of 3D points. GNNs can process point clouds by constructing a graph (k-nearest neighbours) and running message passing:
DGCNN (Wang et al., 2019): dynamic graph CNN โ rebuild the k-NN graph after each layer (in feature space, not just spatial). Achieves SOTA on ModelNet40 (3D object classification) and ShapeNet (part segmentation).
Equivariant GNNs for point clouds (EGNN): maintain SE(3) equivariance โ rotation-equivariant detection, regardless of LiDAR orientation.
Summary
| Application | Graph structure | Key challenge solved |
|---|---|---|
| Robot morphology | Kinematic graph | Generalise to new robot designs |
| Object manipulation | Scene graph | Compositional planning |
| Multi-robot | Proximity/communication graph | Scalable coordination |
| Physics simulation | Particle/object interaction graph | Generalise to new configurations |
| Point cloud perception | k-NN graph | Unordered 3D data |
Robotics is one of the most natural application domains for GNNs โ physical and relational structure is explicit and actionable. The field is rapidly adopting GNN-based representations for perception, dynamics modelling, planning, and multi-agent control.
References
- Wang, T., Liao, R., Ba, J., & Fidler, S. (2018). NerveNet: Learning Structured Policy with Graph Neural Networks. ICLR 2018 (NerveNet: kinematic graph GNNs for robot locomotion policies that generalise across morphologies).
- Battaglia, P., Pascanu, R., Lai, M., Rezende, D. J., & Kavukcuoglu, K. (2016). Interaction Networks for Learning about Objects, Relations and Physics. NeurIPS 2016 (Interaction Networks: object-relation graphs for physics simulation โ foundational for GNN robotics applications).
- Tolstaya, E., Gama, F., Paulos, J., Pappas, G., Kumar, V., & Ribeiro, A. (2020). Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks. CoRL 2020 (GNN-based decentralised multi-robot coordination that scales to large swarms without per-robot retraining).
