The Weisfeiler-Lehman Test: How Powerful Are GNNs?
Published:
What Is Graph Isomorphism?
Two graphs G and Gā are isomorphic (G ā Gā) if there exists a bijection Ļ: V(G) ā V(Gā) such that (u,v) ā E(G) ā (Ļ(u), Ļ(v)) ā E(Gā). Informally: they are the same graph up to node relabelling.
Graph isomorphism testing is a fundamental problem. It is not known to be in P or NP-complete (itās in NP ā© co-AM ā© GI-complete). For practical purposes, the 1-WL algorithm provides a fast, powerful (but not complete) test.
The 1-WL Algorithm (Color Refinement)
Initialisation: assign each node a colour (label) based on its initial feature or degree.
Iteration: at each step, each node v updates its colour by hashing the multiset of its neighboursā colours:
Where {{ }} denotes a multiset (counts matter, order does not).
Termination: stop when no colour changes between iterations.
Test: compare the multisets of final colours between G and Gā. If they differ ā G ā Gā. If they match ā 1-WL declares G ā Gā (though this can be wrong for some graph pairs).
1-WL as Message Passing
The 1-WL iteration is exactly message passing with an injective aggregation:
Message: m_{uāv} = c_u (send colour to neighbour)
Aggregate: m_v = {c_u : u in N(v)} (multiset of neighbour colours)
Update: c_v = HASH(c_v, m_v) (inject own colour + multiset into new colour)
The HASH function must be injective: different (c_v, m_v) ā different c_vā. This is the key requirement.
The Main Theorem (Xu et al., 2019)
Theorem: Let A and B be two graphs. If 1-WL determines A ā B (cannot distinguish them), then no MPNN can distinguish them either ā regardless of the message function M, aggregation ā”, or update function U.
Conversely, there exists an MPNN that is as powerful as 1-WL: one that uses sum aggregation and an injective update function.
This means:
- 1-WL is the exact upper bound on MPNN expressivity
- The bound is tight ā achievable by a specific architecture (GIN)
- More expressive architectures must go beyond MPNN (higher-order WL, graph Transformers, structural encodings)
What 1-WL Cannot Distinguish
Two classes of graphs that fool 1-WL (and therefore any MPNN):
1. Regular graphs: if every node in G has the same degree, then after initialisation all nodes get the same colour ā 1-WL cannot distinguish any two regular graphs with the same number of nodes and edges.
Triangle (3 nodes, 3 edges, degree-2 regular)
vs.
No edges + 3 isolated nodes: distinguished (different degrees)
Two different 3-regular graphs on 6 nodes: NOT distinguished by 1-WL
2. Structurally indistinguishable substructures: any two nodes v and vā with identical K-hop neighbourhood multisets for all K have the same 1-WL colour ā even if their global structural roles differ (e.g., one is in a 4-cycle, one is not, but their up-to-K-hop neighbourhood multisets match).
GIN: The Most Expressive MPNN
GIN (Graph Isomorphism Network, Xu et al., 2019) achieves 1-WL expressiveness:
Sum aggregation (not mean or max) is crucial. By the theory of multisets: sum is injective over multisets of bounded values, while mean and max are not.
- Mean cannot distinguish {1,1,1} from {1} (normalises out count information)
- Max cannot distinguish {1,2,3} from {2,3} (drops minimum)
- Sum distinguishes both: 3 ā 1, 6 ā 5
The MLP must be sufficiently expressive (universal approximator) to implement the injective mapping.
Beyond 1-WL: Higher-Order Tests
To go beyond 1-WL, you need:
| Method | Expressiveness | Cost |
|---|---|---|
| 1-WL / MPNN / GIN | 1-WL | O(N) |
| k-WL | k-WL (strictly more) | O(N^k) |
| Structural encodings (RWPE, LapPE) | Breaks some 1-WL ties | Low overhead |
| Subgraph GNNs | 3-WL equivalent | O(N²) or O(N³) |
| Graph Transformers | Beyond 1-WL (distance info) | O(N²) |
The k-WL hierarchy: k-WL is strictly more powerful than (k-1)-WL. The price: k-WL considers k-tuples of nodes simultaneously ā O(N^k) in complexity.
Summary
| Result | Implication |
|---|---|
| All MPNNs ⤠1-WL | Structural indistinguishability is a hard limit |
| GIN = 1-WL | Sum + injective MLP achieves the bound |
| Mean aggregation < GIN | Loses count information |
| Max aggregation < GIN | Loses minimum information |
| 1-WL fails on regular graphs | Any MPNN also fails |
The WL test is the lens through which GNN expressivity is understood. It tells you not just what GNNs can do, but precisely what they cannot ā and why adding structural encodings, higher-order interactions, or global attention is necessary for harder graph reasoning tasks.
References
- Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful are Graph Neural Networks?. ICLR 2019.
- Weisfeiler, B., & Lehman, A. A. (1968). A Reduction of a Graph to a Canonical Form and an Algebra Arising During This Reduction. Nauchno-Technicheskaya Informatsia.
- Babai, L., & Kucera, L. (1979). Canonical Labelling of Graphs in Linear Average Time. FOCS 1979.
