The Sheaf Laplacian: Spectrum, Hodge Decomposition, and Diffusion
Published:
Spectral Properties of Δ_F
The Sheaf Laplacian Δ_F = δ₀ᵀδ₀ is:
- Symmetric: Δ_F = Δ_Fᵀ (since δ₀ᵀδ₀ is always symmetric)
Positive semidefinite: xᵀΔ_Fx = δ₀x ² ≥ 0 - Real eigenvalues 0 ≤ λ₁ ≤ λ₂ ≤ … ≤ λ_{Nd}
- Zero eigenspace = ker(Δ_F) = ker(δ₀) = H⁰(G, F)
For the normalised Sheaf Laplacian Δ_F^{norm} = D_F^{-1/2}Δ_FD_F^{-1/2} (where D_F is the block-diagonal of Δ_F):
This mirrors the standard normalised graph Laplacian bound λ_{max} ≤ 2.
Spectral Gap and Mixing
The spectral gap is:
It controls how quickly sheaf diffusion converges to H⁰:
Large spectral gap → fast convergence → information from the initial condition is quickly forgotten (rapid mixing, but potential loss of discriminative power). Small spectral gap → slow convergence → the model retains initial features for many diffusion steps before projecting onto H⁰.
For standard GCN: λ_gap = λ₂(L) (the algebraic connectivity / Fiedler value of the graph). For sheaf GNNs, the spectral gap of Δ_F depends on both the graph topology and the learned restriction maps — the model can effectively increase or decrease its own mixing rate.
The Hodge Decomposition on Node Space
The space of 0-cochains C⁰ = ℝ^{Nd} decomposes orthogonally as:
(This uses the fact that ker(Δ_F) = ker(δ₀) and im(Δ_F) = im(δ₀ᵀ) since Δ_F = δ₀ᵀδ₀.)
Every node signal x ∈ ℝ^{Nd} decomposes uniquely as:
where x_harm ∈ ker(Δ_F) (the harmonic / global-section component) and x_grad ∈ im(Δ_F) (the gradient component, with positive Dirichlet energy).
Sheaf diffusion acts as:
- x_harm is unchanged (in ker(Δ_F))
- x_grad decays: at time t, the gradient component is exp(−Δ_F t) x_grad → 0
So diffusion retains the harmonic component and attenuates the gradient component. This is the sheaf analogue of low-pass filtering — but “low” means “in ker(Δ_F)”, not “in span{1}”.
Comparing Standard and Sheaf Laplacians
| Property | Standard L | Sheaf Δ_F |
|---|---|---|
| Size | N×N | (Nd)×(Nd) |
| Null space | span{1_N} (constants) | ker(δ₀) = H⁰(G, F) |
| Null space dimension | 1 (per component) | dim H⁰ ≥ d (can be » d) |
| Maximum eigenvalue | ≤ 2 (normalised) | ≤ 2 (normalised) |
| Depends on graph | Yes | Yes (topology) |
| Depends on features | No | Yes (via learned maps F) |
| Controls | Homophily | Relational geometry |
Spectral Graph Convolution with Δ_F
A spectral sheaf filter is a polynomial in Δ_F applied to the signal:
In the eigenbasis Δ_F = QΛQᵀ, this becomes:
Different filter profiles:
- h(λ) = 1 − λ (GCN-style): low-pass, amplifies ker(Δ_F)
- h(λ) = 1: identity (no filtering)
- h(λ) = λ: high-pass, amplifies gradient components
- h(λ) = a₀ + a₁λ + a₂λ²: learnable, can be any polynomial profile
Polynomial Neural Sheaf Diffusion (PNSD) learns the coefficients a_k to fit the task, rather than using the fixed low-pass filter (1 − λ).
Sheaf Cheeger Inequality
The classical Cheeger inequality relates the graph spectral gap λ₂(L) to the graph’s edge connectivity (via the Cheeger constant h(G)):
An analogous inequality holds for the Sheaf Laplacian. Define the sheaf Cheeger constant:
where 1_S is the indicator vector of S (in the sheaf sense — an indicator in each stalk). Then:
This sheaf Cheeger inequality provides a topological interpretation of the spectral gap: large λ_gap means it is hard to “cut” the sheaf across any partition of V, i.e., the relational structure is globally well-connected.
Continuous-Time Sheaf Diffusion
The heat equation on the sheaf is:
Solution: X(t) = exp(−Δ_F t) X₀.
Discrete-time approximation (Euler step, step size α):
For stability: α ≤ 1 (since eigenvalues of Δ_F^{norm} ≤ 2, so (I − αΔ_F^{norm}) has eigenvalues in [1−2α, 1]).
With α = 1: X_{k+1} = (I − Δ_F^{norm}) X_k. This is the NSD update (before adding the trainable weight W).
The full NSD layer: X ← (I − Δ_F^{norm}) X W, where W ∈ ℝ^{d×d} is a trainable weight matrix applied after diffusion.
Spectral Interpretation of Oversmoothing
Standard GCN oversmoothing: applying (I − L)^K x repeatedly drives x toward ker(L) = span{1_N}. This is a rank-N collapse to a d-dimensional space — devastating for node classification.
Sheaf diffusion: applying (I − Δ_F^{norm})^K x drives x toward ker(Δ_F) = H⁰(G, F). This is a collapse to an m-dimensional space where m = dim H⁰. Since m can be » d (and is determined by the learned restriction maps), this collapse is far less destructive — and can be tuned to preserve task-relevant features.
In the limit, as the restriction maps are learned end-to-end, the model can implicitly choose m to balance expressiveness and smoothness.
References
- Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion. NeurIPS 2022 (proves the spectral gap argument and null-space characterisation of oversmoothing for sheaf diffusion).
- Zaghen, O., Quak, M., & Bronstein, M. M. (2024). Polynomial Neural Sheaf Diffusion. ICLR 2024 (uses the spectral interpretation to motivate learnable polynomial filters over Δ_F).
- Horak, D., & Jost, J. (2013). Spectra of Combinatorial Laplace Operators on Simplicial Complexes. Advances in Mathematics 2013 (Cheeger inequality for higher-order Laplacians, foundational for sheaf Cheeger theory).
