PolyNSD: Polynomial Neural Sheaf Diffusion
Published:
Background: Neural Sheaf Diffusion
A Sheaf Neural Network enriches a graph with a cellular sheaf: each node and edge gets a vector space (a stalk), and each endpoint of each edge gets a restriction map encoding how node signals relate to edge signals. The sheaf Laplacian encodes this relational geometry and replaces the standard graph Laplacian in the diffusion operator.
Neural Sheaf Diffusion (NSD) โ the dominant sheaf GNN approach โ learns restriction maps end-to-end and runs diffusion on the sheaf Laplacian. It handles heterophily well and resists oversmoothing, but has three practical problems:
- SVD-based normalisation: requires expensive SVD decomposition of the sheaf Laplacian at every layer, making Laplacian rebuilds slow.
- Dense restriction maps: one d ร d matrix per node-edge pair, scaling quadratically with stalk dimension d.
- Brittle gradients: the normalised sheaf Laplacian construction is numerically unstable for large d, leading to gradient issues.
The PolyNSD Fix
PolyNSD replaces the NSD propagation operator with a degree-K polynomial of a spectrally rescaled normalised sheaf Laplacian, evaluated via a stable three-term Chebyshev recurrence.
This gives:
- Explicit K-hop receptive field in a single layer (independently of the stalk dimension d).
- Trainable spectral response as a convex mixture of K+1 orthogonal polynomial basis responses โ the model learns which frequency components to amplify or suppress.
- No SVD needed: the recurrence only requires sparse matrix-vector products.
- Stability via convex mixtures (coefficients sum to 1) + spectral rescaling to [โ1, 1] + residual/gated paths.

Architecture Overview

Diagonal Restriction Maps
The key parameter-reduction insight: diagonal restriction maps (a vector of d scalars per node-edge pair instead of a d ร d matrix) are sufficient for strong performance. This reduces per-edge parameter count from O(dยฒ) to O(d) and decouples performance from large stalk dimensions.
Results

Key results vs. NSD and spectral GNN baselines:
- New SOTA on both homophilic (Cora, CiteSeer, PubMed) and heterophilic (Texas, Film, Wisconsin) benchmarks โ inverting the NSD trend that required large stalk dimensions for heterophilic gains.
- Diagonal maps + small d match or exceed NSD with dense maps + large d.
- Lower runtime and memory: no SVD, sparse recurrence, small stalk dimensions.
- Spectral filter shape is interpretable: the model learns when to apply low-pass (homophilic) vs. high-pass (heterophilic) filters.
โ Key Takeaways
- PolyNSD replaces the NSD diffusion operator with a degree-K Chebyshev polynomial in the normalised sheaf Laplacian, evaluated via a stable three-term recurrence.
- Diagonal restriction maps are sufficient โ decoupling performance from stalk dimension and reducing parameters from O(dยฒ) to O(d) per edge.
- Stable by design: convex mixture coefficients + spectral rescaling + residual paths prevent gradient collapse.
- SOTA on homo- and heterophilic benchmarks with lower runtime and memory than NSD.
