Why Message Passing Is Not Enough: The Case for Sheaves

5 minute read

Published: May 13, 2024

TL;DR: Standard message passing computes h_v = UPDATE(h_v, AGG({h_u : u ∈ N(v)})). This assumes neighbours' features are directly comparable — the same "type" of information. On heterophilic graphs, this assumption fails: neighbours have different labels, different semantics, different feature spaces. Sheaves replace this flat comparison with per-edge linear maps that transform features before comparison.

The Fundamental Assumption of Message Passing

Standard message passing (e.g., GCN, GAT) computes something like:

h^{(k+1)}_v = σ( W^{(k)} · AGG({ h^{(k)}_u : u ∈ N(v) }) )

For this to make sense, the features h^{(k)}_u from different neighbouring nodes must live in the same feature space and be meaningfully aggregatable (averageable, summable).

This is a strong assumption. Consider:

Heterophilic graphs: in a social network, a user interested in cooking might be connected to a user interested in music. Their feature vectors are in very different semantic directions. Averaging them produces something meaningful to neither.

Multi-relational graphs: “A is-parent-of B” and “A works-with B” are very different relationships. Aggregating h_B via both gives a confused mixture.

Cross-domain graphs: a node representing a paper (text features) connected to a node representing an author (profile features). These live in literally different feature spaces.

What Goes Wrong: The Heterophily Problem

On homophilic graphs (connected nodes tend to have the same label), GNNs work well — averaging similar nodes gives a good representation of the node’s label.

On heterophilic graphs (connected nodes tend to have different labels), the standard GNN suffers:

It averages over nodes with different labels → the average is “between” all label classes → uninformative
Oversmoothing pushes all nodes toward the global average faster → even worse on heterophilic data
The model must learn to “undo” the averaging to recover discriminative information

Empirically: GCN on Chameleon and Squirrel (heterophilic graphs) achieves 50-60% accuracy — barely above random. Models designed for heterophily (H2GCN, GPRGNN, GPR) reach 70-80%.

The Core Issue: Features on Edges

Standard GNNs attach features to nodes and send them unchanged along edges. There is no mechanism to transform features as they cross an edge.

Consider two nodes u and v connected by an edge, with features x_u ∈ ℝ^d. The message from u to v is (some function of) x_u. But what if the “right” message from u to v should be a different projection of x_u — one that highlights what is relevant from u’s perspective to v?

Sheaves formalise exactly this: each edge (u,v) carries a linear map F(u→v): ℝ^{d_u} → ℝ^{d_{uv}} that transforms u’s features before they are compared to v’s (also transformed) features.

The geometric intuition: Think of two observers at different locations. They may be looking at the same object, but from different angles. To compare their observations, you must first transform each observation to a common reference frame. The sheaf's edge maps are exactly these "frame transformation" operations — they align features from different nodes before aggregation.

From Flat to Structured Aggregation

Standard message passing:

Aggregation: AGG({ h_u : u ∈ N(v) })

All neighbour features aggregated directly.

Sheaf message passing:

Aggregation: AGG({ F_{u→v} h_u : u ∈ N(v) })

Each neighbour feature transformed by edge-specific map before aggregation.

The edge maps F_{u→v} can be:

Scalar (d=1): just a scalar weight per edge — equivalent to standard GAT with fixed weights
Diagonal: elementwise rescaling — captures which features to emphasise
Orthogonal: rotation in feature space — preserves norm, changes direction
General (d×d matrix): full linear transformation — most expressive

The Mathematical Object: A Cellular Sheaf

A cellular sheaf on graph G assigns:

A vector space F(v) to each node v (the “stalk” over v)
A vector space F(e) to each edge e (the “stalk” over e)
A linear map F(v ⊴ e): F(v) → F(e) for each v incident to e (the “restriction map”)

The restriction maps are the edge maps F_{u→v}. They “restrict” the node feature to the edge — producing a view of the node from the edge’s perspective.

This structure, coming from algebraic topology, provides a principled mathematical foundation for understanding information flow on graphs beyond simple averaging.

Why This Matters for Deep Learning

Sheaf-based GNNs can:

Handle heterophilic graphs by learning edge maps that align features of nodes with different labels
Model multi-relational graphs with different maps per edge type
Enable richer information flow: the “disagreement” between F(u→e) h_u and F(v→e) h_v measures edge inconsistency — a useful signal
Connect to topological data analysis, providing interpretability

The next posts build this intuition into concrete architectures: the Sheaf Laplacian, Neural Sheaf Diffusion, and Polynomial Neural Sheaf Diffusion.

References

Hansen, J., & Gebhart, T. (2020). Sheaf Neural Networks. NeurIPS 2020 GRL+ Workshop (first application of cellular sheaves to graph neural networks, showing how sheaves resolve heterophily).
Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs. NeurIPS 2022 (NSD: learning sheaf maps from data to build the Sheaf Laplacian, with theoretical analysis of heterophily and oversmoothing).
Zhu, M., Wang, X., Shi, C., Ji, H., & Cui, P. (2020). Interpreting and Unifying Graph Neural Networks with An Optimization Framework. WWW 2021 (unified GNN analysis showing that oversmoothing corresponds to feature homogenisation by the graph Laplacian).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Why Message Passing Is Not Enough: The Case for Sheaves

The Fundamental Assumption of Message Passing

What Goes Wrong: The Heterophily Problem

The Core Issue: Features on Edges

From Flat to Structured Aggregation

The Mathematical Object: A Cellular Sheaf

Why This Matters for Deep Learning

References

Share on

You May Also Enjoy

Open Problems and Future Directions in Sheaf Neural Networks

Sheaf GNNs for Molecular Property Prediction

Sheaf GNNs for Heterophilic Node Classification: Benchmarks and Results

Sheaves Meet Attention: Transformer-Inspired Sheaf Models