Global Pooling in GNNs: Mean, Sum, and Max
Published:
The Readout Problem
A K-layer GNN produces a set of node embeddings {h^{(K)}_v : v ∈ V}. For node-level tasks (node classification, link prediction), these are used directly. For graph-level tasks (graph classification, graph regression), they must be compressed into a single vector h_G.
This compression is the readout or global pooling step. It must be:
- Permutation-invariant: the same graph regardless of node ordering
- Differentiable: end-to-end training
- Expressive: different graphs should map to different embeddings
Mean Pooling
Properties:
- Permutation-invariant: ✓
Normalised by graph size: yes (divides by V ) - Sensitive to graph size: no (a graph with 10 nodes and 100 identical nodes → same embedding)
- Captures average node behaviour
When to use: tasks where the typical node matters — e.g., average atom property in a molecule, average sentiment in a document graph.
Failure case: cannot distinguish a graph with one active node from a graph with 100 identical active nodes — mean pooling normalises out the count.
Sum Pooling
Properties:
- Permutation-invariant: ✓
- Sensitive to graph size: yes (more nodes → larger magnitude)
- Injective over multisets of bounded node embeddings: yes (under the right conditions)
- Captures total contribution of all nodes
When to use: tasks where the total matters — e.g., total charge of a molecule, total influence in a social network.
Expressive power: Xu et al. (GIN, 2019) proved that sum readout is strictly more expressive than mean or max for distinguishing non-isomorphic graphs. Mean collapses count information; sum preserves it.
Failure case: sensitive to graph size in ways that may not be desired — a graph with 100 zero-embedding nodes has the same sum as a graph with 0 nodes.
Max Pooling
Properties:
- Permutation-invariant: ✓
- Captures the most prominent feature value in each dimension
- Insensitive to count of nodes with non-maximal features
When to use: tasks where the extreme matters — e.g., is there any toxic functional group? Does any node have property X?
Failure case: cannot distinguish {1, 2} from {2} — max pooling drops information about non-maximal elements.
Expressivity Ranking
For graph-level tasks requiring discrimination between non-isomorphic graphs:
Sum > Mean ≈ Max (in terms of distinguishing power)
Sum aggregation is the foundation of GIN’s graph-level expressiveness. The GIN paper proved: if node-level embeddings are injective and readout is sum, the resulting graph-level model is as expressive as 1-WL on graphs.
Combinations and Hierarchical Pooling
In practice, combining multiple pooling types often works best:
h_G = concat( mean_pool(H), sum_pool(H), max_pool(H) )
This captures average behaviour (mean), count sensitivity (sum), and extreme values (max) simultaneously.
For graphs where structure at different scales matters (molecules with atoms and functional groups, social networks with individuals and communities), hierarchical pooling — covered in DiffPool and TopK-Pool posts — is more appropriate than flat global pooling.
Summary
| Pooling | Formula | Sensitive to Size | Information Captured | Best For |
|---|---|---|---|---|
| Mean | Σh / |V| | No | Average node behaviour | Distribution of properties |
| Sum | Σh | Yes | Total + count | Additive properties |
| Max | max(h) | No | Extreme values | Existence queries |
| Concat(all) | [mean; sum; max] | Partial | Combined | General tasks |
The choice of readout is as important as the choice of message passing architecture. On graph classification benchmarks, switching from mean to sum pooling alone can change accuracy by 5-10 percentage points.
References
- Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. ICLR 2019 (proves sum readout is strictly more expressive than mean or max).
- Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., & Smola, A. J. (2017). Deep Sets. NeurIPS 2017 (theory of permutation-invariant functions over sets).
