Sheaves and Oversquashing: Topology, Curvature, and Information Flow

6 minute read

Published:

TL;DR: Oversquashing measures how much influence node u has on node v's representation after K layers — quantified by the Jacobian ∂h_v^{(K)}/∂x_u. For GCNs, this Jacobian decays exponentially with the graph distance dist(u,v) on bottleneck graphs. Sheaf structure modifies the effective resistance between nodes via the Sheaf Laplacian — changing the Cheeger constant and hence the Jacobian decay rate. Learned sheaf maps can improve information flow by increasing the spectral gap of Δ_F, but cannot resolve oversquashing when the graph itself has structural bottlenecks.

What Is Oversquashing?

Oversquashing (Alon & Yahav, 2021) is the failure of message-passing GNNs to propagate information across long distances. After K layers, node v’s representation depends on its K-hop neighbourhood:

h_v^{(K)} = f(x_u : u ∈ N^K(v))

For nodes u with dist(u, v) = K, the influence of x_u on h_v^{(K)} is measured by the Jacobian:

J_{vu}^{(K)} = ∂h_v^{(K)} / ∂x_u
**Oversquashing occurs when J_{vu}^{(K)} → 0 exponentially in K** — the information from u is “squashed” and doesn’t reach v.

The Jacobian Decay in GCN

For a K-layer GCN (ignoring nonlinearities for simplicity):

J_{vu}^{(K)} ∝ [(D^{-1/2}ÃD^{-1/2})^K]_{vu} W^{(1)} ⊗ ... ⊗ W^{(K)}

The key factor is [(Ã_norm)^K]_{vu} — the (v,u)-entry of the K-th power of the normalised adjacency. This decays as:

|[(Ã_norm)^K]_{vu}| ≤ exp(−K · h(G)²/2) · C

where h(G) is the Cheeger constant of G. Graphs with small h(G) (long bottlenecks, sparse cuts) have slow information propagation — oversquashing is severe.

The Cheeger Constant and Graph Topology

The Cheeger constant h(G) measures the minimum bottleneck across all graph partitions:

h(G) = min_{S ⊂ V, |S| ≤ |V|/2} |∂S| / |S|
where∂S= number of edges crossing the cut. Small h(G) → bad expansion → oversquashing.

Examples:

  • Path graph P_n: h(P_n) = O(1/n) — terrible expansion, severe oversquashing
  • Complete graph K_n: h(K_n) = n/2 — perfect expansion, no oversquashing
  • Expander graphs: h(G) = Ω(1) — constant expansion, no oversquashing

The Cheeger inequality relates h(G) to the graph Laplacian spectral gap:

h(G)²/2 ≤ λ₂(L) ≤ 2h(G)

How Sheaf Structure Affects Oversquashing

The Sheaf Cheeger constant is:

h(G, F) = min_{S ⊂ V} ||δ₀ · 1_S|| / min(vol_F(S), vol_F(V\S))

where 1_S is the sheaf indicator of S and vol_F(S) is the sheaf-weighted volume.

The Sheaf Cheeger inequality (Bandeira et al., 2013):

h(G, F)²/2 ≤ λ_gap(Δ_F) ≤ 2h(G, F)

Key question: Can learned sheaf restriction maps increase h(G, F) relative to h(G)?

Theorem (informal): For a fixed graph G, there exist restriction maps F such that:

h(G, F) > h(G)

i.e., the sheaf Cheeger constant is strictly larger than the graph Cheeger constant.

Proof sketch: For an edge bottleneck e = (u,v) that contributes to a small cut, choosing F_{u▷e} and F_{v▷e} to be rank-deficient (e.g., zero maps) effectively removes the bottleneck from the sheaf — the edge doesn’t contribute to δ₀ and hence not to h(G, F). More constructively: choosing maps that amplify the signal crossing the cut increases δ₀ 1_S , raising h(G, F). □
Practical insight: Sheaf maps can selectively amplify information flow across graph bottlenecks by boosting the contribution of bottleneck edges to the Sheaf Laplacian. However, this is a spectral effect — it improves the Jacobian decay rate but cannot make a non-expander graph behave like an expander. For truly long-range dependencies, graph rewiring (adding edges) remains necessary alongside sheaf structure.

The Sheaf Jacobian

For a K-layer NSD model:

h_v^{(K)} = σ( (I − Δ_F^{norm})^K H^{(0)} W^{(1)}...W^{(K)} )_v

The Jacobian ∂h_v^{(K)}/∂x_u contains the factor:

[(I − Δ_F^{norm})^K]_{vu} ∈ ℝ^{d×d}

This is a d×d block — rather than a scalar — measuring the influence of u’s stalk on v’s stalk after K diffusion steps.

Comparison with standard GCN:

  • GCN Jacobian: scalar [(Ã)^K]_{vu} ∈ ℝ → rank-1 bottleneck
  • NSD Jacobian: matrix [(I−Δ_F^{norm})^K]_{vu} ∈ ℝ^{d×d} → rank-d bottleneck (d times more capacity)

The sheaf Jacobian has d-fold higher rank than the standard Jacobian — each node can transmit d-dimensional information through bottleneck edges, compared to 1-dimensional for standard GCN. This is the sense in which sheaf GNNs have higher information capacity at bottlenecks.

Ollivier-Ricci Curvature on Sheaves

Oversquashing in GNNs has also been connected to Ollivier-Ricci curvature (Topping et al., 2022). Positive curvature edges facilitate information flow; negative curvature edges are bottlenecks.

For standard graphs, the Ollivier-Ricci curvature of edge (u,v) is:

κ(u,v) = 1 − W₁(μ_u, μ_v) / dist(u,v)

where W₁ is the Wasserstein distance between the degree-normalised neighbourhood distributions of u and v.

For sheaf GNNs, a natural sheaf Ricci curvature can be defined using the Sheaf Laplacian’s off-diagonal blocks:

κ_F(u,v) = 1 − ||[Δ_F^{norm}]_{uv}||_F / d_{eff}(u,v)
Restriction maps that are rank-deficient at edge (u,v) reduce [Δ_F^{norm}]_{uv} _F → reduce negative curvature → improve information flow. This provides a sheaf-theoretic interpretation of graph rewiring: instead of adding edges, learning restriction maps that “virtually add” information channels.

When Sheaf Structure Helps vs Doesn’t Help

Sheaves help oversquashing when:

  • The bottleneck is spectral (low spectral gap of L), not structural — maps can increase the sheaf spectral gap
  • The model needs to distinguish multiple channels of information crossing the bottleneck (rank-d Jacobian)
  • The bottleneck edges have learnable relational structure

Sheaves don’t help oversquashing when:

  • The bottleneck is a single edge (a bridge) — even with d-dimensional maps, only one path exists
  • The graph has O(log n) diameter — oversquashing is inherent at scale, regardless of sheaf structure
  • Tasks require exponentially long-range dependencies — no polynomial-depth GNN (sheaf or not) can help

What still works: For truly long-range tasks, adding a global attention mechanism (like a Graph Transformer) in combination with sheaf local aggregation provides both local relational structure and long-range dependencies.

Practical Recommendations

  1. Diagnose whether your task suffers from oversquashing or heterophily (or both) before applying sheaf GNNs
  2. Sheaf GNNs primarily address heterophily; oversquashing improvements are secondary
  3. For oversquashing, consider graph rewiring (SDRF, FoSR) in addition to sheaf structure
  4. Monitor the sheaf spectral gap λ_gap(Δ_F) during training — if it increases, the model is learning to improve information flow

References