Polynomial Neural Sheaf Diffusion

6 minute read

Published: May 20, 2024

TL;DR: NSD uses the fixed filter h(λ) = 1 - λ (simple low-pass). PNSD replaces this with a learnable polynomial p(Δ_F) = Σ_k a_k Δ_F^k — the graph spectral equivalent of designing a custom frequency filter. Combined with the richer sheaf structure, PNSD achieves state-of-the-art on heterophilic benchmarks by learning the right spectral profile per task.

Polynomial sheaf diffusion — Polynomial spectral filters on the Sheaf Laplacian (Bodnar et al., 2022)

From Fixed Diffusion to Polynomial Filters

Intuition First: The fixed NSD filter h(λ) = 1 − λ is like an audio equaliser with only one preset: “bass boost” (attenuates high frequencies). For homophilic graphs this is perfect — smooth, low-frequency signals carry the class information. For heterophilic graphs you need the opposite: “treble boost” — amplify the high-frequency, class-discriminative components. PNSD gives you a fully programmable equaliser, learned from data. The K polynomial coefficients {a_k} define the frequency response curve, and gradient descent finds the right curve for each task.

NSD’s diffusion step:

H^{(k+1)} = (I - Δ_F^{norm}) H^{(k)} W^{(k)}

This is a first-order polynomial in Δ_F with fixed coefficients (1 for identity, -1 for Laplacian). In spectral terms, this applies the filter h(λ) = 1 - λ — a low-pass filter that attenuates high-frequency components.

For homophilic graphs: low-pass filtering is appropriate (smooth out noise, preserve class-consistent low-frequency signal).

For heterophilic graphs: high-frequency components (class-discriminative) need to be amplified, not attenuated. A fixed low-pass filter is wrong.

PNSD’s approach: learn the filter coefficients from data.

The Polynomial Filter

Define the spectral filter as a polynomial of degree K:

p(Δ_F) = Σ_{k=0}^{K} a_k Δ_F^k

Where {a_k} are learnable coefficients. The propagation step:

H^{out} = p(Δ_F) H^{in} = ( Σ_k a_k Δ_F^k ) H^{in}

This can represent:

Low-pass (homophily): a_0 ≈ 1, a_1 ≈ -small, higher terms ≈ 0
High-pass (heterophily): a_0 ≈ 0, a_1 ≈ -large (or alternating signs)
Band-pass (intermediate): arbitrary polynomial shape

Connection to Existing Methods

The polynomial filter framework unifies many GNN architectures:

Architecture	Filter	Polynomial
GCN	h(λ) = 1 - λ/2	Linear polynomial, fixed
APPNP	h(λ) = α(I - (1-α)Δ)^{-1}	Geometric series (infinite)
ChebNet	Chebyshev polynomial	K-degree, learnable
GPRGNN	General polynomial	K-degree, learnable
PNSD	Polynomial of Δ_F	K-degree, learnable, sheaf

PNSD = GPRGNN applied to the Sheaf Laplacian instead of the standard graph Laplacian.

Why sheaf + polynomial? The sheaf provides richer structure (per-edge maps that handle heterophily). The polynomial filter provides spectral flexibility (learn which frequencies to amplify/suppress). Neither alone is sufficient: sheaf with fixed low-pass filter still oversmooths on some tasks; polynomial filter on standard graph still cannot handle cross-class edges correctly. Together they address both the structural and spectral dimensions of heterophily.

Computing the Polynomial

Direct computation of Δ_F^k requires repeated matrix multiplication — expensive for large Δ_F (which is Nd × Nd). Instead, use the recurrence:

Z^{(0)} = H, Z^{(k)} = Δ_F Z^{(k-1)} H^{out} = Σ_{k=0}^{K} a_k Z^{(k)}

Each Z^{(k)} requires one sparse matrix-vector product with Δ_F — total cost O(K E d²) (same as K rounds of NSD).

Chebyshev Polynomials for Sheaves

Chebyshev polynomials are numerically stable and form an orthogonal basis for functions on [-1, 1]. Using them as the polynomial basis (rescaling eigenvalues to [-1, 1]):

p(Δ_F) = Σ_{k=0}^{K} θ_k T_k( Δ_F^{norm} )

Where T_k are Chebyshev polynomials and Δ_F^{norm} is normalised to have eigenvalues in [-1, 1]. This is the sheaf generalisation of ChebNet.

Benefits: numerically stable, easily interpretable (each θ_k controls contribution of degree-k spectral component), efficient K-hop aggregation.

Training and Regularisation

Learning the polynomial coefficients {a_k}:

Too many coefficients (K large) → overfitting
Typical choice: K = 3 to 10
Optional constraint: a_k ≥ 0 for homophilic tasks (enforce low-pass behaviour)

Layer-wise vs shared coefficients:

Shared across all nodes (standard)
Node-specific: each node learns its own polynomial — expensive but more flexible
Group-specific: different polynomials for different node types (heterogeneous graphs)

Worked Example: Learning a High-Pass Filter

Setup: degree-2 polynomial, Sheaf Laplacian eigenvalues λ ∈ {0, 0.5, 1.0, 1.5, 2.0}.

Low-pass (standard NSD): h(λ) = 1 − λ → responses: 1.0, 0.5, 0.0, −0.5, −1.0. High-frequency components (λ close to 2) are suppressed.

High-pass PNSD (learned for heterophily): learned coefficients a₀=−1, a₁=0, a₂=1 → h(λ) = −1 + λ² → responses: −1.0, −0.75, 0.0, 1.25, 3.0. Low-frequency components (near-consistent signals) are suppressed; high-frequency class-discriminative signals are amplified.

Training: with K=3 and a task on the Chameleon heterophilic graph, the model learns roughly a₀≈0.2, a₁≈−1.4, a₂≈0.8, a₃≈0.4 — a band-pass / high-pass shape that the fixed NSD filter cannot represent. This 3-coefficient generalisation provides the ~2-3% accuracy boost shown in empirical results.

Empirical Advantage

On heterophilic benchmarks, the polynomial filter provides additional improvement over fixed NSD:

Method	Chameleon	Squirrel	Cornell
NSD (diag)	71.6%	56.7%	88.9%
NSD (general)	76.2%	61.9%	91.4%
PNSD (diag)	74.1%	60.3%	90.5%
PNSD (general)	78.4%	64.8%	93.2%

The polynomial filter provides ~2-3% improvement over fixed diffusion on each benchmark.

Summary

Property	NSD	PNSD
Sheaf maps	Learned	Learned
Diffusion filter	Fixed (1 - λ)	Learnable polynomial
Spectral profile	Low-pass only	Any (low/high/band-pass)
Extra parameters	None	K coefficients per layer
Heterophily handling	Structural (sheaf maps)	Structural + spectral

PNSD is the current strongest sheaf-based architecture for node classification on heterophilic graphs. It combines the topological richness of cellular sheaves with the spectral flexibility of polynomial graph filters — addressing heterophily from both angles simultaneously.

References

Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs. NeurIPS 2022 (NSD: the base architecture that PNSD extends with a learnable polynomial diffusion filter).
Zaghen, O., Quak, M., & Bronstein, M. M. (2024). Polynomial Neural Sheaf Diffusion. ICLR 2024 (PNSD: replaces the fixed (I - Δ_F) diffusion step with a learnable polynomial p(Δ_F) for spectral flexibility).
He, M., Wei, Z., Huang, Z., & Xu, H. (2021). BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation. NeurIPS 2021 (BernNet: polynomial spectral filters using Bernstein basis — the homogeneous-graph precursor to the polynomial sheaf filter in PNSD).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Polynomial Neural Sheaf Diffusion

From Fixed Diffusion to Polynomial Filters

The Polynomial Filter

Connection to Existing Methods

Computing the Polynomial

Chebyshev Polynomials for Sheaves

Training and Regularisation

Worked Example: Learning a High-Pass Filter

Empirical Advantage

Summary

References

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization

Alessio Borgi

From Fixed Diffusion to Polynomial Filters

The Polynomial Filter

Connection to Existing Methods

Computing the Polynomial

Chebyshev Polynomials for Sheaves

Training and Regularisation

Worked Example: Learning a High-Pass Filter

Empirical Advantage

Summary

References

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 FoPE: Fourier Position Embedding for Length Generalization

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization