Why GNNs Need Positional Encodings

3 minute read

Published:

TL;DR: GNNs are permutation-equivariant: relabelling nodes does not change the output. This means two structurally identical but geometrically different nodes get the same embedding. Positional encodings break this symmetry โ€” injecting node-specific structural information that message passing alone cannot provide.

Permutation Equivariance: A Double-Edged Sword

GNNs are designed to be permutation equivariant: the result of processing a graph should not depend on the arbitrary labelling of nodes. If you permute the node indices, the output node embeddings permute accordingly.

This is a desirable property โ€” the graph has no canonical ordering, so the model should not depend on one.

But equivariance creates a problem: two nodes with identical structural roles get identical embeddings, even if their global position in the graph is different.

The Symmetric Node Problem

Consider a path graph: A โ€” B โ€” C โ€” D โ€” E

Nodes B and D are structurally symmetric: both have degree 2, and their 2-hop neighbourhoods are identical. A 2-layer GNN will assign B and D the same embedding.

But B and D may need different predictions: if A has a specific feature that makes โ€œthe second node from Aโ€ important, but the GNN cannot distinguish B from D (both look the same locally), it cannot leverage this.

More extreme: in a perfectly regular graph (every node has the same degree), all nodes have identical K-hop neighbourhoods for all K โ†’ all nodes get the same embedding.

Why Sequences Donโ€™t Have This Problem

In a Transformer processing a sentence, position 3 is always โ€œposition 3โ€ regardless of the tokenโ€™s content. Positional encodings inject this absolute location.

In graphs, there is no canonical position 3. The graph has no start, no end, no linear order. This is why graph PEs must be derived from the graph structure itself.

What Positional Encodings Can Provide

A good graph PE x_pe_v should:

  1. Be unique (or near-unique) for each node โ€” distinguishes it from others
  2. Encode structural role โ€” similar nodes get similar encodings
  3. Be computationally efficient โ€” no O(Nยณ) precomputation
  4. Be transferable โ€” PE computed on training graphs should generalise to test graphs

Types of Graph Positional Information

TypeWhat it encodesExample
PositionalWhere the node is in the global graphLaplacian eigenvectors
StructuralWhat role the node plays locallyDegree, clustering coefficient, cycle membership
Distance-basedDistances to other nodesRandom walk landing probabilities

Positional and structural encodings are complementary โ€” some tasks need absolute position, others need local role information.

Impact on Graph Transformers

For Graph Transformers (which lack the inductive structural bias of message passing), positional encodings are essential. Without them, the Transformer has no information about which nodes are connected โ€” it processes a set of feature vectors with no graph structure at all.

With Laplacian eigenvector PEs: the model can compute attention scores that reflect graph distance. With random walk PEs: the model can identify structurally similar nodes. Graph PEs are to Graph Transformers what sinusoidal encodings are to sequence Transformers.

Summary

Without PEsWith PEs
Symmetric nodes are indistinguishableEach node has a unique structural fingerprint
Regular graphs: all nodes identicalEigenvector PEs break symmetry
Graph Transformer ignores structureStructure encoded via PE attention biases
Limited to 1-WL expressivenessCan exceed 1-WL

The next posts cover specific PE methods: Laplacian eigenvectors, random walk PEs, shortest-path encodings, and the challenges they introduce.

References