Sheaves and Oversquashing: Topology, Curvature, and Information Flow

6 minute read

Published: June 15, 2025

TL;DR: Oversquashing measures how much influence node u has on node v's representation after K layers — quantified by the Jacobian ∂h_v^{(K)}/∂x_u. For GCNs, this Jacobian decays exponentially with the graph distance dist(u,v) on bottleneck graphs. Sheaf structure modifies the effective resistance between nodes via the Sheaf Laplacian — changing the Cheeger constant and hence the Jacobian decay rate. Learned sheaf maps can improve information flow by increasing the spectral gap of Δ_F, but cannot resolve oversquashing when the graph itself has structural bottlenecks.

What Is Oversquashing?

Oversquashing (Alon & Yahav, 2021) is the failure of message-passing GNNs to propagate information across long distances. After K layers, node v’s representation depends on its K-hop neighbourhood:

h_v^{(K)} = f(x_u : u ∈ N^K(v))

For nodes u with dist(u, v) = K, the influence of x_u on h_v^{(K)} is measured by the Jacobian:

J_{vu}^{(K)} = ∂h_v^{(K)} / ∂x_u

**Oversquashing occurs when

J_{vu}^{(K)}

→ 0 exponentially in K** — the information from u is “squashed” and doesn’t reach v.

The Jacobian Decay in GCN

For a K-layer GCN (ignoring nonlinearities for simplicity):

J_{vu}^{(K)} ∝ [(D^{-1/2}ÃD^{-1/2})^K]_{vu} W^{(1)} ⊗ ... ⊗ W^{(K)}

The key factor is [(Ã_norm)^K]_{vu} — the (v,u)-entry of the K-th power of the normalised adjacency. This decays as:

|[(Ã_norm)^K]_{vu}| ≤ exp(−K · h(G)²/2) · C

where h(G) is the Cheeger constant of G. Graphs with small h(G) (long bottlenecks, sparse cuts) have slow information propagation — oversquashing is severe.

The Cheeger Constant and Graph Topology

The Cheeger constant h(G) measures the minimum bottleneck across all graph partitions:

h(G) = min_{S ⊂ V, |S| ≤ |V|/2} |∂S| / |S|

where

∂S

= number of edges crossing the cut. Small h(G) → bad expansion → oversquashing.

Examples:

Path graph P_n: h(P_n) = O(1/n) — terrible expansion, severe oversquashing
Complete graph K_n: h(K_n) = n/2 — perfect expansion, no oversquashing
Expander graphs: h(G) = Ω(1) — constant expansion, no oversquashing

The Cheeger inequality relates h(G) to the graph Laplacian spectral gap:

h(G)²/2 ≤ λ₂(L) ≤ 2h(G)

How Sheaf Structure Affects Oversquashing

The Sheaf Cheeger constant is:

h(G, F) = min_{S ⊂ V} ||δ₀ · 1_S|| / min(vol_F(S), vol_F(V\S))

where 1_S is the sheaf indicator of S and vol_F(S) is the sheaf-weighted volume.

The Sheaf Cheeger inequality (Bandeira et al., 2013):

h(G, F)²/2 ≤ λ_gap(Δ_F) ≤ 2h(G, F)

Key question: Can learned sheaf restriction maps increase h(G, F) relative to h(G)?

Theorem (informal): For a fixed graph G, there exist restriction maps F such that:

h(G, F) > h(G)

i.e., the sheaf Cheeger constant is strictly larger than the graph Cheeger constant.

Proof sketch: For an edge bottleneck e = (u,v) that contributes to a small cut, choosing F_{u▷e} and F_{v▷e} to be rank-deficient (e.g., zero maps) effectively removes the bottleneck from the sheaf — the edge doesn’t contribute to δ₀ and hence not to h(G, F). More constructively: choosing maps that amplify the signal crossing the cut increases

δ₀ 1_S

, raising h(G, F). □

Practical insight: Sheaf maps can selectively amplify information flow across graph bottlenecks by boosting the contribution of bottleneck edges to the Sheaf Laplacian. However, this is a spectral effect — it improves the Jacobian decay rate but cannot make a non-expander graph behave like an expander. For truly long-range dependencies, graph rewiring (adding edges) remains necessary alongside sheaf structure.

The Sheaf Jacobian

For a K-layer NSD model:

h_v^{(K)} = σ( (I − Δ_F^{norm})^K H^{(0)} W^{(1)}...W^{(K)} )_v

The Jacobian ∂h_v^{(K)}/∂x_u contains the factor:

[(I − Δ_F^{norm})^K]_{vu} ∈ ℝ^{d×d}

This is a d×d block — rather than a scalar — measuring the influence of u’s stalk on v’s stalk after K diffusion steps.

Comparison with standard GCN:

GCN Jacobian: scalar [(Ã)^K]_{vu} ∈ ℝ → rank-1 bottleneck
NSD Jacobian: matrix [(I−Δ_F^{norm})^K]_{vu} ∈ ℝ^{d×d} → rank-d bottleneck (d times more capacity)

The sheaf Jacobian has d-fold higher rank than the standard Jacobian — each node can transmit d-dimensional information through bottleneck edges, compared to 1-dimensional for standard GCN. This is the sense in which sheaf GNNs have higher information capacity at bottlenecks.

Ollivier-Ricci Curvature on Sheaves

Oversquashing in GNNs has also been connected to Ollivier-Ricci curvature (Topping et al., 2022). Positive curvature edges facilitate information flow; negative curvature edges are bottlenecks.

For standard graphs, the Ollivier-Ricci curvature of edge (u,v) is:

κ(u,v) = 1 − W₁(μ_u, μ_v) / dist(u,v)

where W₁ is the Wasserstein distance between the degree-normalised neighbourhood distributions of u and v.

For sheaf GNNs, a natural sheaf Ricci curvature can be defined using the Sheaf Laplacian’s off-diagonal blocks:

κ_F(u,v) = 1 − ||[Δ_F^{norm}]_{uv}||_F / d_{eff}(u,v)

Restriction maps that are rank-deficient at edge (u,v) reduce

[Δ_F^{norm}]_{uv}

_F → reduce negative curvature → improve information flow. This provides a sheaf-theoretic interpretation of graph rewiring: instead of adding edges, learning restriction maps that “virtually add” information channels.

When Sheaf Structure Helps vs Doesn’t Help

Sheaves help oversquashing when:

The bottleneck is spectral (low spectral gap of L), not structural — maps can increase the sheaf spectral gap
The model needs to distinguish multiple channels of information crossing the bottleneck (rank-d Jacobian)
The bottleneck edges have learnable relational structure

Sheaves don’t help oversquashing when:

The bottleneck is a single edge (a bridge) — even with d-dimensional maps, only one path exists
The graph has O(log n) diameter — oversquashing is inherent at scale, regardless of sheaf structure
Tasks require exponentially long-range dependencies — no polynomial-depth GNN (sheaf or not) can help

What still works: For truly long-range tasks, adding a global attention mechanism (like a Graph Transformer) in combination with sheaf local aggregation provides both local relational structure and long-range dependencies.

Practical Recommendations

Diagnose whether your task suffers from oversquashing or heterophily (or both) before applying sheaf GNNs
Sheaf GNNs primarily address heterophily; oversquashing improvements are secondary
For oversquashing, consider graph rewiring (SDRF, FoSR) in addition to sheaf structure
Monitor the sheaf spectral gap λ_gap(Δ_F) during training — if it increases, the model is learning to improve information flow

References

Alon, U., & Yahav, E. (2021). On the Bottleneck of Graph Neural Networks and its Practical Implications. ICLR 2021 (introduces oversquashing and the Jacobian analysis — the standard GNN baseline being compared).
Topping, J., Giovanni, F. D., Chamberlain, B. P., Dong, X., & Bronstein, M. M. (2022). Understanding Over-Squashing and Bottlenecks on Graphs via Curvature. ICLR 2022 (connects oversquashing to Ollivier-Ricci curvature — the geometric framework extended to sheaves in this post).
Bandeira, A. S., Singer, A., & Spielman, D. A. (2013). A Cheeger Inequality for the Graph Connection Laplacian. SIAM Journal on Matrix Analysis (sheaf/connection Cheeger inequality — foundation for the sheaf oversquashing analysis).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Sheaves and Oversquashing: Topology, Curvature, and Information Flow

What Is Oversquashing?

The Jacobian Decay in GCN

The Cheeger Constant and Graph Topology

How Sheaf Structure Affects Oversquashing

The Sheaf Jacobian

Ollivier-Ricci Curvature on Sheaves

When Sheaf Structure Helps vs Doesn’t Help

Practical Recommendations

References

Share on

You May Also Enjoy

Open Problems and Future Directions in Sheaf Neural Networks

Sheaf GNNs for Molecular Property Prediction

Sheaf GNNs for Heterophilic Node Classification: Benchmarks and Results

Sheaves Meet Attention: Transformer-Inspired Sheaf Models