PolyNSD: Polynomial Neural Sheaf Diffusion

3 minute read

Published:

TL;DR: Neural Sheaf Diffusion works but has problems โ€” SVD-based normalisation, dense restriction maps, and fragile gradients. PolyNSD fixes all three by using a degree-K polynomial in the normalised sheaf Laplacian (via a stable three-term recurrence), with only diagonal restriction maps, achieving SOTA on both homophilic and heterophilic benchmarks while being cheaper to run.
Paper: "Polynomial Neural Sheaf Diffusion"  ยท  arXiv:2512.00242
Authors: A. Borgi, P. Liรฒ
Venue: arXiv preprint, 2025  ยท  ๐Ÿ“„ Read the paper

Background: Neural Sheaf Diffusion

A Sheaf Neural Network enriches a graph with a cellular sheaf: each node and edge gets a vector space (a stalk), and each endpoint of each edge gets a restriction map encoding how node signals relate to edge signals. The sheaf Laplacian encodes this relational geometry and replaces the standard graph Laplacian in the diffusion operator.

Neural Sheaf Diffusion (NSD) โ€” the dominant sheaf GNN approach โ€” learns restriction maps end-to-end and runs diffusion on the sheaf Laplacian. It handles heterophily well and resists oversmoothing, but has three practical problems:

  1. SVD-based normalisation: requires expensive SVD decomposition of the sheaf Laplacian at every layer, making Laplacian rebuilds slow.
  2. Dense restriction maps: one d ร— d matrix per node-edge pair, scaling quadratically with stalk dimension d.
  3. Brittle gradients: the normalised sheaf Laplacian construction is numerically unstable for large d, leading to gradient issues.

The PolyNSD Fix

PolyNSD replaces the NSD propagation operator with a degree-K polynomial of a spectrally rescaled normalised sheaf Laplacian, evaluated via a stable three-term Chebyshev recurrence.

This gives:

  • Explicit K-hop receptive field in a single layer (independently of the stalk dimension d).
  • Trainable spectral response as a convex mixture of K+1 orthogonal polynomial basis responses โ€” the model learns which frequency components to amplify or suppress.
  • No SVD needed: the recurrence only requires sparse matrix-vector products.
  • Stability via convex mixtures (coefficients sum to 1) + spectral rescaling to [โˆ’1, 1] + residual/gated paths.
PolyNSD Layer: Chebyshev recurrence, high-pass correction, gated residual
Figure 1 โ€” A single PolyNSD layer. The sheaf Laplacian polynomial is evaluated via Chebyshev three-term recurrence (left), followed by a high-pass correction gate and a gated residual connection for stable training.

Architecture Overview

PolyNSD end-to-end architecture overview
Figure 2 โ€” End-to-end PolyNSD. Input features are projected into stalk spaces using diagonal restriction maps; the polynomial sheaf diffusion layers run the Chebyshev recurrence over the normalised sheaf Laplacian; the output heads perform node classification or regression.

Diagonal Restriction Maps

The key parameter-reduction insight: diagonal restriction maps (a vector of d scalars per node-edge pair instead of a d ร— d matrix) are sufficient for strong performance. This reduces per-edge parameter count from O(dยฒ) to O(d) and decouples performance from large stalk dimensions.

Results

Test accuracy vs. stalk dimension on Cora, PubMed, Texas, Film
Figure 3 โ€” Test accuracy vs. stalk dimension on four benchmarks (Cora, PubMed, Texas, Film). PolyNSD (solid) maintains strong accuracy across all stalk dimensions; standard NSD (dashed) degrades with large stalk dimensions due to numerical instability. PolyNSD achieves its best results with diagonal maps at small stalk dimensions.

Key results vs. NSD and spectral GNN baselines:

  • New SOTA on both homophilic (Cora, CiteSeer, PubMed) and heterophilic (Texas, Film, Wisconsin) benchmarks โ€” inverting the NSD trend that required large stalk dimensions for heterophilic gains.
  • Diagonal maps + small d match or exceed NSD with dense maps + large d.
  • Lower runtime and memory: no SVD, sparse recurrence, small stalk dimensions.
  • Spectral filter shape is interpretable: the model learns when to apply low-pass (homophilic) vs. high-pass (heterophilic) filters.

โœ… Key Takeaways

  • PolyNSD replaces the NSD diffusion operator with a degree-K Chebyshev polynomial in the normalised sheaf Laplacian, evaluated via a stable three-term recurrence.
  • Diagonal restriction maps are sufficient โ€” decoupling performance from stalk dimension and reducing parameters from O(dยฒ) to O(d) per edge.
  • Stable by design: convex mixture coefficients + spectral rescaling + residual paths prevent gradient collapse.
  • SOTA on homo- and heterophilic benchmarks with lower runtime and memory than NSD.