Why GNNs Need Positional Encodings
Published:
Permutation Equivariance: A Double-Edged Sword
GNNs are designed to be permutation equivariant: the result of processing a graph should not depend on the arbitrary labelling of nodes. If you permute the node indices, the output node embeddings permute accordingly.
This is a desirable property โ the graph has no canonical ordering, so the model should not depend on one.
But equivariance creates a problem: two nodes with identical structural roles get identical embeddings, even if their global position in the graph is different.
The Symmetric Node Problem
Consider a path graph: A โ B โ C โ D โ E
Nodes B and D are structurally symmetric: both have degree 2, and their 2-hop neighbourhoods are identical. A 2-layer GNN will assign B and D the same embedding.
But B and D may need different predictions: if A has a specific feature that makes โthe second node from Aโ important, but the GNN cannot distinguish B from D (both look the same locally), it cannot leverage this.
More extreme: in a perfectly regular graph (every node has the same degree), all nodes have identical K-hop neighbourhoods for all K โ all nodes get the same embedding.
Why Sequences Donโt Have This Problem
In a Transformer processing a sentence, position 3 is always โposition 3โ regardless of the tokenโs content. Positional encodings inject this absolute location.
In graphs, there is no canonical position 3. The graph has no start, no end, no linear order. This is why graph PEs must be derived from the graph structure itself.
What Positional Encodings Can Provide
A good graph PE x_pe_v should:
- Be unique (or near-unique) for each node โ distinguishes it from others
- Encode structural role โ similar nodes get similar encodings
- Be computationally efficient โ no O(Nยณ) precomputation
- Be transferable โ PE computed on training graphs should generalise to test graphs
Types of Graph Positional Information
| Type | What it encodes | Example |
|---|---|---|
| Positional | Where the node is in the global graph | Laplacian eigenvectors |
| Structural | What role the node plays locally | Degree, clustering coefficient, cycle membership |
| Distance-based | Distances to other nodes | Random walk landing probabilities |
Positional and structural encodings are complementary โ some tasks need absolute position, others need local role information.
Impact on Graph Transformers
For Graph Transformers (which lack the inductive structural bias of message passing), positional encodings are essential. Without them, the Transformer has no information about which nodes are connected โ it processes a set of feature vectors with no graph structure at all.
With Laplacian eigenvector PEs: the model can compute attention scores that reflect graph distance. With random walk PEs: the model can identify structurally similar nodes. Graph PEs are to Graph Transformers what sinusoidal encodings are to sequence Transformers.
Summary
| Without PEs | With PEs |
|---|---|
| Symmetric nodes are indistinguishable | Each node has a unique structural fingerprint |
| Regular graphs: all nodes identical | Eigenvector PEs break symmetry |
| Graph Transformer ignores structure | Structure encoded via PE attention biases |
| Limited to 1-WL expressiveness | Can exceed 1-WL |
The next posts cover specific PE methods: Laplacian eigenvectors, random walk PEs, shortest-path encodings, and the challenges they introduce.
References
- Dwivedi, V. P., Lim, A. T., Beaini, D., & Liรณ, P. (2021). Graph Neural Networks with Learnable Structural and Positional Representations. ICLR 2022.
- Srinivasan, B., & Ribeiro, B. (2020). On the Equivalence between Positional Node Embeddings and Structural Graph Representations. ICLR 2020.
