Spectral Sheaf Convolution: Filtering Signals on Sheaves

6 minute read

Published:

TL;DR: A sheaf convolution replaces the graph Laplacian L with the Sheaf Laplacian Δ_F in any spectral filter h(L). The resulting filter h(Δ_F) acts on Nd-dimensional node signals, with the Sheaf Laplacian's eigenvectors as the Fourier basis. Polynomial filters h(Δ_F) = Σ_k a_k Δ_F^k are computed by K sparse matrix-vector products, with cost O(K·E·d²). This generalises ChebNet, GCN, and GPRGNN to the sheaf setting.

Graph Signal Processing Recap

In classical Graph Signal Processing (GSP), a signal x ∈ ℝ^N on a graph G has a Fourier transform defined by the eigenvectors of L = UΛUᵀ:

x̂ = Uᵀ x (Fourier transform) x = U x̂ (inverse Fourier transform)

A spectral filter is a function h applied to the Fourier coefficients:

h(L) x = U · diag(h(λ₁), ..., h(λ_N)) · Uᵀ x

This multiplies each frequency component by h(λ_i). Low-pass filters (h(λ) small for large λ) smooth the signal; high-pass filters (h(λ) large for large λ) sharpen it.

Sheaf Signal Processing

For a sheaf F on G, signals are 0-cochains x ∈ C⁰(G, F) = ℝ^{Nd}. The Sheaf Laplacian Δ_F = QΛQᵀ provides the Fourier basis Q ∈ ℝ^{Nd × Nd}:

x̂ = Qᵀ x (sheaf Fourier transform) h(Δ_F) x = Q · diag(h(λ₁), ..., h(λ_{Nd})) · Qᵀ x

A sheaf spectral filter h(Δ_F) acts on the Nd-dimensional signal x by scaling each sheaf-Fourier component independently.

Key differences from standard GSP:

  • The signal dimension is Nd (not N) — each node contributes d frequency components
  • The eigenvectors Q are sheaf-specific — they depend on the restriction maps
  • The filter h(Δ_F) operates on the sheaf’s frequency domain, not the graph’s
  • Low-frequency components (λ_i ≈ 0) correspond to global sections — consistent signals

Polynomial Sheaf Filters

Computing the full eigendecomposition of Δ_F costs O((Nd)³) — prohibitive for large N. Instead, polynomial filters avoid explicit eigendecomposition:

h(Δ_F) x ≈ Σ_{k=0}^{K} a_k Δ_F^k x

This requires only K applications of Δ_F to x, each costing O(E·d²) (sparse matrix-vector product). Total cost: O(K·E·d²).

Universality: By the Weierstrass approximation theorem, any continuous function h on [0, λ_max] can be approximated by polynomials. So polynomial sheaf filters can approximate any spectral filter to arbitrary accuracy (as K → ∞).

Standard Graph Filters as Special Cases

Graph filterPolynomial in LSheaf generalisation
GCN(I − L̃)¹(I − Δ_F^{norm})¹ = NSD
ChebNetΣ_k a_k T_k(L)Σ_k a_k T_k(Δ_F)
APPNPα(I − (1−α)L̃)^{-1}α(I − (1−α)Δ_F^{norm})^{-1}
GPRGNNΣ_k a_k L^kΣ_k a_k Δ_F^k = PNSD
BernNetΣ_k θ_k B_k^K(L/2)Σ_k θ_k B_k^K(Δ_F/2) = PNSD with Bernstein
SGCL^K (no trainable weights in filter)Δ_F^K (K-hop sheaf diffusion)

Every spectral GNN has a natural sheaf generalisation — replace L with Δ_F.

The Sheaf-Fourier Basis: What Does It Look Like?

For the identity sheaf (all maps = I): Δ_F = L ⊗ I_d. The eigenvectors are Q = U ⊗ I_d where U is the graph Fourier basis. Each node-frequency pair (i, k) has eigenvector u_i ⊗ e_k (the i-th graph eigenvector in the k-th coordinate direction). The eigenvalue is λ_i (repeated d times).

For a non-trivial sheaf: Q is no longer block-diagonal. Each eigenvector is a sheaf-specific “vibrational mode” — a consistent pattern of vectors across nodes that respects the restriction maps.

Geometric interpretation: The k-th eigenvector of Δ_F describes a "mode of global inconsistency" — how nodes could jointly move in their stalks to reduce Sheaf Dirichlet energy. Modes with small eigenvalues (near 0) are near-consistent (close to global sections). Modes with large eigenvalues are highly inconsistent and are attenuated by low-pass sheaf filters.

Chebyshev Sheaf Filters

The Chebyshev basis {T_k}_{k=0}^K provides numerically stable polynomial approximation on [−1, 1]:

T_0(x) = 1, T_1(x) = x, T_k(x) = 2x·T_{k-1}(x) − T_{k-2}(x)

A Chebyshev sheaf filter rescales Δ_F to [−1, 1]: let Λ̃ = 2Δ_F/λ_max − I, then:

h(Δ_F) x ≈ Σ_{k=0}^{K} a_k T_k(Λ̃) x

Computed via the recurrence: x₀ = x, x₁ = Λ̃x, x_k = 2Λ̃x_{k-1} − x_{k-2}. Each step costs O(E·d²).

This is the sheaf generalisation of ChebNet — applying Chebyshev polynomials of the Sheaf Laplacian rather than the graph Laplacian.

Wavelet-Like Filters on Sheaves

The diffusion wavelets framework (Coifman & Maggioni, 2006) can be extended to sheaves: define sheaf wavelets as:

ψ_{j,v}(u) = [Δ_F^{2^j} δ_v](u)

where δ_v is the indicator of node v (a delta function in the stalk). Sheaf wavelets are scale-specific (at scale 2^j) and localised around node v — they describe how a perturbation at v diffuses through the sheaf at scale j.

These provide a multi-resolution representation of sheaf signals — useful for hierarchical graph learning and sheaf-based graph coarsening.

Computational Trade-offs

OperationCostComment
Δ_F constructionO(E·d²)Assembling block matrix from maps
Δ_F^k x (one step)O(E·d²)Sparse block-matrix-vector product
Full eigendecomp of Δ_FO((Nd)³)Infeasible for large N
K-polynomial filterO(K·E·d²)Feasible; preferred approach
Map prediction (MLP)O(E·d_hidden²)Per-edge MLP forward pass

For d=2, N=10⁴, E=10⁵, K=5: cost ≈ 5×10⁵×4 = 2×10⁶ operations per layer — fast on modern hardware.

When Spectral View Helps

The spectral view of sheaf GNNs is useful for:

  1. Diagnosing oversmoothing: the filter h(λ) = (1−λ)^K at large K suppresses all frequencies except ker(Δ_F) — visualising the spectrum shows what information survives
  2. Designing task-specific filters: for heterophilic tasks, design h to amplify high frequencies; for smooth tasks, design h to suppress them
  3. Understanding PNSD: PNSD learns h(λ) ≈ Σ_k a_k λ^k — the learned profile reveals what frequency the task requires
  4. Connecting to GSP: established GSP theory on sampling, reconstruction, and uncertainty principles transfers to the sheaf setting

References