The Weisfeiler-Lehman Test: How Powerful Are GNNs?

5 minute read

Published: April 12, 2024

TL;DR: The 1-WL test repeatedly hashes each node's label with the multiset of its neighbours' labels until stable. Two graphs are "probably isomorphic" if they produce the same sequence of label histograms. Xu et al. (2019) proved that no MPNN can be more expressive than 1-WL, and GIN achieves exactly this bound by using sum aggregation and an injective MLP.

What Is Graph Isomorphism?

Two graphs G and G’ are isomorphic (G ≅ G’) if there exists a bijection σ: V(G) → V(G’) such that (u,v) ∈ E(G) ↔ (σ(u), σ(v)) ∈ E(G’). Informally: they are the same graph up to node relabelling.

Graph isomorphism testing is a fundamental problem. It is not known to be in P or NP-complete (it’s in NP ∩ co-AM ∩ GI-complete). For practical purposes, the 1-WL algorithm provides a fast, powerful (but not complete) test.

Initialisation: assign each node a colour (label) based on its initial feature or degree.

Iteration: at each step, each node v updates its colour by hashing the multiset of its neighbours’ colours:

c^{(k)}_v = HASH( c^{(k-1)}_v, {{c^{(k-1)}_u : u ∈ N(v)}} )

Where {{ }} denotes a multiset (counts matter, order does not).

Termination: stop when no colour changes between iterations.

Test: compare the multisets of final colours between G and G’. If they differ → G ≇ G’. If they match → 1-WL declares G ≅ G’ (though this can be wrong for some graph pairs).

1-WL as Message Passing

The 1-WL iteration is exactly message passing with an injective aggregation:

Message:  m_{u→v} = c_u   (send colour to neighbour)
Aggregate: m_v = {c_u : u in N(v)}   (multiset of neighbour colours)
Update:   c_v = HASH(c_v, m_v)    (inject own colour + multiset into new colour)

The HASH function must be injective: different (c_v, m_v) → different c_v’. This is the key requirement.

The Main Theorem (Xu et al., 2019)

Theorem: Let A and B be two graphs. If 1-WL determines A ≅ B (cannot distinguish them), then no MPNN can distinguish them either — regardless of the message function M, aggregation □, or update function U.

Conversely, there exists an MPNN that is as powerful as 1-WL: one that uses sum aggregation and an injective update function.

This means:

1-WL is the exact upper bound on MPNN expressivity
The bound is tight — achievable by a specific architecture (GIN)
More expressive architectures must go beyond MPNN (higher-order WL, graph Transformers, structural encodings)

What 1-WL Cannot Distinguish

Two classes of graphs that fool 1-WL (and therefore any MPNN):

1. Regular graphs: if every node in G has the same degree, then after initialisation all nodes get the same colour → 1-WL cannot distinguish any two regular graphs with the same number of nodes and edges.

Triangle (3 nodes, 3 edges, degree-2 regular)
vs.
No edges + 3 isolated nodes: distinguished (different degrees)

Two different 3-regular graphs on 6 nodes: NOT distinguished by 1-WL

2. Structurally indistinguishable substructures: any two nodes v and v’ with identical K-hop neighbourhood multisets for all K have the same 1-WL colour — even if their global structural roles differ (e.g., one is in a 4-cycle, one is not, but their up-to-K-hop neighbourhood multisets match).

GIN: The Most Expressive MPNN

GIN (Graph Isomorphism Network, Xu et al., 2019) achieves 1-WL expressiveness:

h^{(k)}_v = MLP^{(k)}( (1 + ε^{(k)}) h^{(k-1)}_v + Σ_{u ∈ N(v)} h^{(k-1)}_u )

Sum aggregation (not mean or max) is crucial. By the theory of multisets: sum is injective over multisets of bounded values, while mean and max are not.

Mean cannot distinguish {1,1,1} from {1} (normalises out count information)
Max cannot distinguish {1,2,3} from {2,3} (drops minimum)
Sum distinguishes both: 3 ≠ 1, 6 ≠ 5

The MLP must be sufficiently expressive (universal approximator) to implement the injective mapping.

Why does ε matter? The (1+ε) term on the self-loop allows the model to weight its own features differently from neighbour features. Without it, a node with zero features but non-zero neighbours could not distinguish itself from a node with the same neighbour multiset. ε can be 0 (fixed) or learned.

Beyond 1-WL: Higher-Order Tests

To go beyond 1-WL, you need:

Method	Expressiveness	Cost
1-WL / MPNN / GIN	1-WL	O(N)
k-WL	k-WL (strictly more)	O(N^k)
Structural encodings (RWPE, LapPE)	Breaks some 1-WL ties	Low overhead
Subgraph GNNs	3-WL equivalent	O(N²) or O(N³)
Graph Transformers	Beyond 1-WL (distance info)	O(N²)

The k-WL hierarchy: k-WL is strictly more powerful than (k-1)-WL. The price: k-WL considers k-tuples of nodes simultaneously — O(N^k) in complexity.

Summary

Result	Implication
All MPNNs ≤ 1-WL	Structural indistinguishability is a hard limit
GIN = 1-WL	Sum + injective MLP achieves the bound
Mean aggregation < GIN	Loses count information
Max aggregation < GIN	Loses minimum information
1-WL fails on regular graphs	Any MPNN also fails

The WL test is the lens through which GNN expressivity is understood. It tells you not just what GNNs can do, but precisely what they cannot — and why adding structural encodings, higher-order interactions, or global attention is necessary for harder graph reasoning tasks.

References

Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. ICLR 2019.
Weisfeiler, B., & Lehman, A. A. (1968). A Reduction of a Graph to a Canonical Form and an Algebra Arising During This Reduction. Nauchno-Technicheskaya Informatsia.
Babai, L., & Kucera, L. (1979). Canonical Labelling of Graphs in Linear Average Time. FOCS 1979.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

The Weisfeiler-Lehman Test: How Powerful Are GNNs?

What Is Graph Isomorphism?

The 1-WL Algorithm (Color Refinement)

1-WL as Message Passing

The Main Theorem (Xu et al., 2019)

What 1-WL Cannot Distinguish

GIN: The Most Expressive MPNN

Beyond 1-WL: Higher-Order Tests

Summary

References

Share on

You May Also Enjoy

Open Problems and Future Directions in Sheaf Neural Networks

Sheaf GNNs for Molecular Property Prediction

Sheaf GNNs for Heterophilic Node Classification: Benchmarks and Results

Sheaves Meet Attention: Transformer-Inspired Sheaf Models