The Weisfeiler-Lehman Test: How Powerful Are GNNs?

5 minute read

Published:

TL;DR: The 1-WL test repeatedly hashes each node's label with the multiset of its neighbours' labels until stable. Two graphs are "probably isomorphic" if they produce the same sequence of label histograms. Xu et al. (2019) proved that no MPNN can be more expressive than 1-WL, and GIN achieves exactly this bound by using sum aggregation and an injective MLP.

What Is Graph Isomorphism?

Two graphs G and G’ are isomorphic (G ≅ G’) if there exists a bijection σ: V(G) → V(G’) such that (u,v) ∈ E(G) ↔ (σ(u), σ(v)) ∈ E(G’). Informally: they are the same graph up to node relabelling.

Graph isomorphism testing is a fundamental problem. It is not known to be in P or NP-complete (it’s in NP ∩ co-AM ∩ GI-complete). For practical purposes, the 1-WL algorithm provides a fast, powerful (but not complete) test.

The 1-WL Algorithm (Color Refinement)

Initialisation: assign each node a colour (label) based on its initial feature or degree.

Iteration: at each step, each node v updates its colour by hashing the multiset of its neighbours’ colours:

c^{(k)}_v = HASH( c^{(k-1)}_v, {{c^{(k-1)}_u : u ∈ N(v)}} )

Where {{ }} denotes a multiset (counts matter, order does not).

Termination: stop when no colour changes between iterations.

Test: compare the multisets of final colours between G and G’. If they differ → G ≇ G’. If they match → 1-WL declares G ≅ G’ (though this can be wrong for some graph pairs).

1-WL as Message Passing

The 1-WL iteration is exactly message passing with an injective aggregation:

Message:  m_{u→v} = c_u   (send colour to neighbour)
Aggregate: m_v = {c_u : u in N(v)}   (multiset of neighbour colours)
Update:   c_v = HASH(c_v, m_v)    (inject own colour + multiset into new colour)

The HASH function must be injective: different (c_v, m_v) → different c_v’. This is the key requirement.

The Main Theorem (Xu et al., 2019)

Theorem: Let A and B be two graphs. If 1-WL determines A ≅ B (cannot distinguish them), then no MPNN can distinguish them either — regardless of the message function M, aggregation ā–”, or update function U.

Conversely, there exists an MPNN that is as powerful as 1-WL: one that uses sum aggregation and an injective update function.

This means:

  • 1-WL is the exact upper bound on MPNN expressivity
  • The bound is tight — achievable by a specific architecture (GIN)
  • More expressive architectures must go beyond MPNN (higher-order WL, graph Transformers, structural encodings)

What 1-WL Cannot Distinguish

Two classes of graphs that fool 1-WL (and therefore any MPNN):

1. Regular graphs: if every node in G has the same degree, then after initialisation all nodes get the same colour → 1-WL cannot distinguish any two regular graphs with the same number of nodes and edges.

Triangle (3 nodes, 3 edges, degree-2 regular)
vs.
No edges + 3 isolated nodes: distinguished (different degrees)

Two different 3-regular graphs on 6 nodes: NOT distinguished by 1-WL

2. Structurally indistinguishable substructures: any two nodes v and v’ with identical K-hop neighbourhood multisets for all K have the same 1-WL colour — even if their global structural roles differ (e.g., one is in a 4-cycle, one is not, but their up-to-K-hop neighbourhood multisets match).

GIN: The Most Expressive MPNN

GIN (Graph Isomorphism Network, Xu et al., 2019) achieves 1-WL expressiveness:

h^{(k)}_v = MLP^{(k)}( (1 + ε^{(k)}) h^{(k-1)}_v + Σ_{u ∈ N(v)} h^{(k-1)}_u )

Sum aggregation (not mean or max) is crucial. By the theory of multisets: sum is injective over multisets of bounded values, while mean and max are not.

  • Mean cannot distinguish {1,1,1} from {1} (normalises out count information)
  • Max cannot distinguish {1,2,3} from {2,3} (drops minimum)
  • Sum distinguishes both: 3 ≠ 1, 6 ≠ 5

The MLP must be sufficiently expressive (universal approximator) to implement the injective mapping.

Why does ε matter? The (1+ε) term on the self-loop allows the model to weight its own features differently from neighbour features. Without it, a node with zero features but non-zero neighbours could not distinguish itself from a node with the same neighbour multiset. ε can be 0 (fixed) or learned.

Beyond 1-WL: Higher-Order Tests

To go beyond 1-WL, you need:

MethodExpressivenessCost
1-WL / MPNN / GIN1-WLO(N)
k-WLk-WL (strictly more)O(N^k)
Structural encodings (RWPE, LapPE)Breaks some 1-WL tiesLow overhead
Subgraph GNNs3-WL equivalentO(N²) or O(N³)
Graph TransformersBeyond 1-WL (distance info)O(N²)

The k-WL hierarchy: k-WL is strictly more powerful than (k-1)-WL. The price: k-WL considers k-tuples of nodes simultaneously — O(N^k) in complexity.

Summary

ResultImplication
All MPNNs ≤ 1-WLStructural indistinguishability is a hard limit
GIN = 1-WLSum + injective MLP achieves the bound
Mean aggregation < GINLoses count information
Max aggregation < GINLoses minimum information
1-WL fails on regular graphsAny MPNN also fails

The WL test is the lens through which GNN expressivity is understood. It tells you not just what GNNs can do, but precisely what they cannot — and why adding structural encodings, higher-order interactions, or global attention is necessary for harder graph reasoning tasks.

References

  • Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. ICLR 2019.
  • Weisfeiler, B., & Lehman, A. A. (1968). A Reduction of a Graph to a Canonical Form and an Algebra Arising During This Reduction. Nauchno-Technicheskaya Informatsia.
  • Babai, L., & Kucera, L. (1979). Canonical Labelling of Graphs in Linear Average Time. FOCS 1979.