Sheaves and Oversquashing: Topology, Curvature, and Information Flow
Published:
What Is Oversquashing?
Oversquashing (Alon & Yahav, 2021) is the failure of message-passing GNNs to propagate information across long distances. After K layers, node v’s representation depends on its K-hop neighbourhood:
For nodes u with dist(u, v) = K, the influence of x_u on h_v^{(K)} is measured by the Jacobian:
| **Oversquashing occurs when | J_{vu}^{(K)} | → 0 exponentially in K** — the information from u is “squashed” and doesn’t reach v. |
The Jacobian Decay in GCN
For a K-layer GCN (ignoring nonlinearities for simplicity):
The key factor is [(Ã_norm)^K]_{vu} — the (v,u)-entry of the K-th power of the normalised adjacency. This decays as:
where h(G) is the Cheeger constant of G. Graphs with small h(G) (long bottlenecks, sparse cuts) have slow information propagation — oversquashing is severe.
The Cheeger Constant and Graph Topology
The Cheeger constant h(G) measures the minimum bottleneck across all graph partitions:
| where | ∂S | = number of edges crossing the cut. Small h(G) → bad expansion → oversquashing. |
Examples:
- Path graph P_n: h(P_n) = O(1/n) — terrible expansion, severe oversquashing
- Complete graph K_n: h(K_n) = n/2 — perfect expansion, no oversquashing
- Expander graphs: h(G) = Ω(1) — constant expansion, no oversquashing
The Cheeger inequality relates h(G) to the graph Laplacian spectral gap:
How Sheaf Structure Affects Oversquashing
The Sheaf Cheeger constant is:
where 1_S is the sheaf indicator of S and vol_F(S) is the sheaf-weighted volume.
The Sheaf Cheeger inequality (Bandeira et al., 2013):
Key question: Can learned sheaf restriction maps increase h(G, F) relative to h(G)?
Theorem (informal): For a fixed graph G, there exist restriction maps F such that:
i.e., the sheaf Cheeger constant is strictly larger than the graph Cheeger constant.
| Proof sketch: For an edge bottleneck e = (u,v) that contributes to a small cut, choosing F_{u▷e} and F_{v▷e} to be rank-deficient (e.g., zero maps) effectively removes the bottleneck from the sheaf — the edge doesn’t contribute to δ₀ and hence not to h(G, F). More constructively: choosing maps that amplify the signal crossing the cut increases | δ₀ 1_S | , raising h(G, F). □ |
The Sheaf Jacobian
For a K-layer NSD model:
The Jacobian ∂h_v^{(K)}/∂x_u contains the factor:
This is a d×d block — rather than a scalar — measuring the influence of u’s stalk on v’s stalk after K diffusion steps.
Comparison with standard GCN:
- GCN Jacobian: scalar [(Ã)^K]_{vu} ∈ ℝ → rank-1 bottleneck
- NSD Jacobian: matrix [(I−Δ_F^{norm})^K]_{vu} ∈ ℝ^{d×d} → rank-d bottleneck (d times more capacity)
The sheaf Jacobian has d-fold higher rank than the standard Jacobian — each node can transmit d-dimensional information through bottleneck edges, compared to 1-dimensional for standard GCN. This is the sense in which sheaf GNNs have higher information capacity at bottlenecks.
Ollivier-Ricci Curvature on Sheaves
Oversquashing in GNNs has also been connected to Ollivier-Ricci curvature (Topping et al., 2022). Positive curvature edges facilitate information flow; negative curvature edges are bottlenecks.
For standard graphs, the Ollivier-Ricci curvature of edge (u,v) is:
where W₁ is the Wasserstein distance between the degree-normalised neighbourhood distributions of u and v.
For sheaf GNNs, a natural sheaf Ricci curvature can be defined using the Sheaf Laplacian’s off-diagonal blocks:
| Restriction maps that are rank-deficient at edge (u,v) reduce | [Δ_F^{norm}]_{uv} | _F → reduce negative curvature → improve information flow. This provides a sheaf-theoretic interpretation of graph rewiring: instead of adding edges, learning restriction maps that “virtually add” information channels. |
When Sheaf Structure Helps vs Doesn’t Help
Sheaves help oversquashing when:
- The bottleneck is spectral (low spectral gap of L), not structural — maps can increase the sheaf spectral gap
- The model needs to distinguish multiple channels of information crossing the bottleneck (rank-d Jacobian)
- The bottleneck edges have learnable relational structure
Sheaves don’t help oversquashing when:
- The bottleneck is a single edge (a bridge) — even with d-dimensional maps, only one path exists
- The graph has O(log n) diameter — oversquashing is inherent at scale, regardless of sheaf structure
- Tasks require exponentially long-range dependencies — no polynomial-depth GNN (sheaf or not) can help
What still works: For truly long-range tasks, adding a global attention mechanism (like a Graph Transformer) in combination with sheaf local aggregation provides both local relational structure and long-range dependencies.
Practical Recommendations
- Diagnose whether your task suffers from oversquashing or heterophily (or both) before applying sheaf GNNs
- Sheaf GNNs primarily address heterophily; oversquashing improvements are secondary
- For oversquashing, consider graph rewiring (SDRF, FoSR) in addition to sheaf structure
- Monitor the sheaf spectral gap λ_gap(Δ_F) during training — if it increases, the model is learning to improve information flow
References
- Alon, U., & Yahav, E. (2021). On the Bottleneck of Graph Neural Networks and its Practical Implications. ICLR 2021 (introduces oversquashing and the Jacobian analysis — the standard GNN baseline being compared).
- Topping, J., Giovanni, F. D., Chamberlain, B. P., Dong, X., & Bronstein, M. M. (2022). Understanding Over-Squashing and Bottlenecks on Graphs via Curvature. ICLR 2022 (connects oversquashing to Ollivier-Ricci curvature — the geometric framework extended to sheaves in this post).
- Bandeira, A. S., Singer, A., & Spielman, D. A. (2013). A Cheeger Inequality for the Graph Connection Laplacian. SIAM Journal on Matrix Analysis (sheaf/connection Cheeger inequality — foundation for the sheaf oversquashing analysis).
