Why Message Passing Is Not Enough: The Case for Sheaves
Published:
The Fundamental Assumption of Message Passing
Standard message passing (e.g., GCN, GAT) computes something like:
For this to make sense, the features h^{(k)}_u from different neighbouring nodes must live in the same feature space and be meaningfully aggregatable (averageable, summable).
This is a strong assumption. Consider:
Heterophilic graphs: in a social network, a user interested in cooking might be connected to a user interested in music. Their feature vectors are in very different semantic directions. Averaging them produces something meaningful to neither.
Multi-relational graphs: “A is-parent-of B” and “A works-with B” are very different relationships. Aggregating h_B via both gives a confused mixture.
Cross-domain graphs: a node representing a paper (text features) connected to a node representing an author (profile features). These live in literally different feature spaces.
What Goes Wrong: The Heterophily Problem
On homophilic graphs (connected nodes tend to have the same label), GNNs work well — averaging similar nodes gives a good representation of the node’s label.
On heterophilic graphs (connected nodes tend to have different labels), the standard GNN suffers:
- It averages over nodes with different labels → the average is “between” all label classes → uninformative
- Oversmoothing pushes all nodes toward the global average faster → even worse on heterophilic data
- The model must learn to “undo” the averaging to recover discriminative information
Empirically: GCN on Chameleon and Squirrel (heterophilic graphs) achieves 50-60% accuracy — barely above random. Models designed for heterophily (H2GCN, GPRGNN, GPR) reach 70-80%.
The Core Issue: Features on Edges
Standard GNNs attach features to nodes and send them unchanged along edges. There is no mechanism to transform features as they cross an edge.
Consider two nodes u and v connected by an edge, with features x_u ∈ ℝ^d. The message from u to v is (some function of) x_u. But what if the “right” message from u to v should be a different projection of x_u — one that highlights what is relevant from u’s perspective to v?
Sheaves formalise exactly this: each edge (u,v) carries a linear map F(u→v): ℝ^{d_u} → ℝ^{d_{uv}} that transforms u’s features before they are compared to v’s (also transformed) features.
From Flat to Structured Aggregation
Standard message passing:
All neighbour features aggregated directly.
Sheaf message passing:
Each neighbour feature transformed by edge-specific map before aggregation.
The edge maps F_{u→v} can be:
- Scalar (d=1): just a scalar weight per edge — equivalent to standard GAT with fixed weights
- Diagonal: elementwise rescaling — captures which features to emphasise
- Orthogonal: rotation in feature space — preserves norm, changes direction
- General (d×d matrix): full linear transformation — most expressive
The Mathematical Object: A Cellular Sheaf
A cellular sheaf on graph G assigns:
- A vector space F(v) to each node v (the “stalk” over v)
- A vector space F(e) to each edge e (the “stalk” over e)
- A linear map F(v ⊴ e): F(v) → F(e) for each v incident to e (the “restriction map”)
The restriction maps are the edge maps F_{u→v}. They “restrict” the node feature to the edge — producing a view of the node from the edge’s perspective.
This structure, coming from algebraic topology, provides a principled mathematical foundation for understanding information flow on graphs beyond simple averaging.
Why This Matters for Deep Learning
Sheaf-based GNNs can:
- Handle heterophilic graphs by learning edge maps that align features of nodes with different labels
- Model multi-relational graphs with different maps per edge type
- Enable richer information flow: the “disagreement” between F(u→e) h_u and F(v→e) h_v measures edge inconsistency — a useful signal
- Connect to topological data analysis, providing interpretability
The next posts build this intuition into concrete architectures: the Sheaf Laplacian, Neural Sheaf Diffusion, and Polynomial Neural Sheaf Diffusion.
References
- Hansen, J., & Gebhart, T. (2020). Sheaf Neural Networks. NeurIPS 2020 GRL+ Workshop (first application of cellular sheaves to graph neural networks, showing how sheaves resolve heterophily).
- Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs. NeurIPS 2022 (NSD: learning sheaf maps from data to build the Sheaf Laplacian, with theoretical analysis of heterophily and oversmoothing).
- Zhu, M., Wang, X., Shi, C., Ji, H., & Cui, P. (2020). Interpreting and Unifying Graph Neural Networks with An Optimization Framework. WWW 2021 (unified GNN analysis showing that oversmoothing corresponds to feature homogenisation by the graph Laplacian).
