Polynomial Neural Sheaf Diffusion

4 minute read

Published:

TL;DR: NSD uses the fixed filter h(λ) = 1 - λ (simple low-pass). PNSD replaces this with a learnable polynomial p(Δ_F) = Σ_k a_k Δ_F^k — the graph spectral equivalent of designing a custom frequency filter. Combined with the richer sheaf structure, PNSD achieves state-of-the-art on heterophilic benchmarks by learning the right spectral profile per task.

From Fixed Diffusion to Polynomial Filters

NSD’s diffusion step:

H^{(k+1)} = (I - Δ_F^{norm}) H^{(k)} W^{(k)}

This is a first-order polynomial in Δ_F with fixed coefficients (1 for identity, -1 for Laplacian). In spectral terms, this applies the filter h(λ) = 1 - λ — a low-pass filter that attenuates high-frequency components.

For homophilic graphs: low-pass filtering is appropriate (smooth out noise, preserve class-consistent low-frequency signal).

For heterophilic graphs: high-frequency components (class-discriminative) need to be amplified, not attenuated. A fixed low-pass filter is wrong.

PNSD’s approach: learn the filter coefficients from data.

The Polynomial Filter

Define the spectral filter as a polynomial of degree K:

p(Δ_F) = Σ_{k=0}^{K} a_k Δ_F^k

Where {a_k} are learnable coefficients. The propagation step:

H^{out} = p(Δ_F) H^{in} = ( Σ_k a_k Δ_F^k ) H^{in}

This can represent:

  • Low-pass (homophily): a_0 ≈ 1, a_1 ≈ -small, higher terms ≈ 0
  • High-pass (heterophily): a_0 ≈ 0, a_1 ≈ -large (or alternating signs)
  • Band-pass (intermediate): arbitrary polynomial shape

Connection to Existing Methods

The polynomial filter framework unifies many GNN architectures:

ArchitectureFilterPolynomial
GCNh(λ) = 1 - λ/2Linear polynomial, fixed
APPNPh(λ) = α(I - (1-α)Δ)^{-1}Geometric series (infinite)
ChebNetChebyshev polynomialK-degree, learnable
GPRGNNGeneral polynomialK-degree, learnable
PNSDPolynomial of Δ_FK-degree, learnable, sheaf

PNSD = GPRGNN applied to the Sheaf Laplacian instead of the standard graph Laplacian.

Why sheaf + polynomial? The sheaf provides richer structure (per-edge maps that handle heterophily). The polynomial filter provides spectral flexibility (learn which frequencies to amplify/suppress). Neither alone is sufficient: sheaf with fixed low-pass filter still oversmooths on some tasks; polynomial filter on standard graph still cannot handle cross-class edges correctly. Together they address both the structural and spectral dimensions of heterophily.

Computing the Polynomial

Direct computation of Δ_F^k requires repeated matrix multiplication — expensive for large Δ_F (which is Nd × Nd). Instead, use the recurrence:

Z^{(0)} = H, Z^{(k)} = Δ_F Z^{(k-1)} H^{out} = Σ_{k=0}^{K} a_k Z^{(k)}

Each Z^{(k)} requires one sparse matrix-vector product with Δ_F — total cost O(K E d²) (same as K rounds of NSD).

Chebyshev Polynomials for Sheaves

Chebyshev polynomials are numerically stable and form an orthogonal basis for functions on [-1, 1]. Using them as the polynomial basis (rescaling eigenvalues to [-1, 1]):

p(Δ_F) = Σ_{k=0}^{K} θ_k T_k( Δ_F^{norm} )

Where T_k are Chebyshev polynomials and Δ_F^{norm} is normalised to have eigenvalues in [-1, 1]. This is the sheaf generalisation of ChebNet.

Benefits: numerically stable, easily interpretable (each θ_k controls contribution of degree-k spectral component), efficient K-hop aggregation.

Training and Regularisation

Learning the polynomial coefficients {a_k}:

  • Too many coefficients (K large) → overfitting
  • Typical choice: K = 3 to 10
  • Optional constraint: a_k ≥ 0 for homophilic tasks (enforce low-pass behaviour)

Layer-wise vs shared coefficients:

  • Shared across all nodes (standard)
  • Node-specific: each node learns its own polynomial — expensive but more flexible
  • Group-specific: different polynomials for different node types (heterogeneous graphs)

Empirical Advantage

On heterophilic benchmarks, the polynomial filter provides additional improvement over fixed NSD:

MethodChameleonSquirrelCornell
NSD (diag)71.6%56.7%88.9%
NSD (general)76.2%61.9%91.4%
PNSD (diag)74.1%60.3%90.5%
PNSD (general)78.4%64.8%93.2%

The polynomial filter provides ~2-3% improvement over fixed diffusion on each benchmark.

Summary

PropertyNSDPNSD
Sheaf mapsLearnedLearned
Diffusion filterFixed (1 - λ)Learnable polynomial
Spectral profileLow-pass onlyAny (low/high/band-pass)
Extra parametersNoneK coefficients per layer
Heterophily handlingStructural (sheaf maps)Structural + spectral

PNSD is the current strongest sheaf-based architecture for node classification on heterophilic graphs. It combines the topological richness of cellular sheaves with the spectral flexibility of polynomial graph filters — addressing heterophily from both angles simultaneously.

References