ChebNet: Spectral Graph Convolutions via Chebyshev Polynomials
Published:
The Problem with Direct Spectral Convolution
The spectral graph convolution h_θ *_G f = U h_θ(Λ) Uᵀ f requires:
- Full eigendecomposition of L: O(N³) — infeasible for large graphs
- Multiplying by U and Uᵀ: O(N²) per forward pass
- Learned filter h_θ(Λ) has N parameters — one per eigenvalue — not scalable
ChebNet solves all three problems at once.
Chebyshev Polynomial Approximation
Chebyshev polynomials {T₀, T₁, T₂, …} form an orthonormal basis for functions on [−1, 1] — they are the optimal polynomial approximation in the minimax sense.
ChebNet approximates the spectral filter h_θ(Λ) as a degree-K Chebyshev polynomial:
Where Λ̃ = 2Λ/λₘₐₓ − I rescales eigenvalues to [−1, 1], and the Chebyshev polynomials satisfy the recursion:
Why this matters: applying T_k(Λ̃) to f does not require computing eigenvectors. Since T_k(L̃) = U T_k(Λ̃) Uᵀ, we can compute:
Where z̄₀ = f, z̄₁ = L̃f, z̄ₖ = 2L̃z̄ₖ₋₁ − z̄ₖ₋₂ — a simple matrix-vector recursion using only sparse matrix-vector products with L̃.
Computational Properties
| Property | Direct spectral | ChebNet (order K) | ||
|---|---|---|---|---|
| Eigendecomposition | O(N³) | None needed | ||
| Forward pass cost | O(N²) | O(K · | E | ) |
| Filter parameters | N | K+1 | ||
| Spatial locality | Non-local (all nodes) | K-hop neighbourhood | ||
| Scalable to large graphs | No | Yes |
Spatial interpretation: T_k(L̃) applied to f aggregates information from the k-hop neighbourhood of each node. A K-th order ChebNet is equivalent to aggregating from all nodes within K hops — exactly what K stacked message-passing layers would do.
The ChebNet Layer
In the multi-feature case, a ChebNet layer with F_in input features and F_out output features:
Where Θₖ ∈ ℝ^{F_in × F_out} are learnable weight matrices for each polynomial order k.
Learnable parameters: K × F_in × F_out (versus a single F_in × F_out for a standard linear layer).
From ChebNet to GCN
GCN (Kipf & Welling, 2017) simplifies ChebNet to K=1:
With the further simplification θ = θ₀ = −θ₁ (one free parameter) and the renormalisation trick (adding self-loops to avoid numerical issues):
Where à = A + I. This is the K=1 first-order Chebyshev approximation with parameter tying. GCN sacrifices filter expressiveness for simplicity and scalability.
ChebConv in Practice (PyG)
from torch_geometric.nn import ChebConv
conv = ChebConv(in_channels=64, out_channels=64, K=3)
# K=3: aggregates from 3-hop neighbourhood
# 3 × 64 × 64 = 12,288 learnable parameters vs 64 × 64 = 4,096 for GCN
When to Use ChebNet vs GCN
| Scenario | Recommendation |
|---|---|
| Need multi-hop receptive field explicitly | ChebNet (set K) |
| Fast, simple baseline | GCN (K=1) |
| Very large graph (>1M nodes) | GCN (simpler, better optimised) |
| Spectral filter learning | ChebNet |
| Need long-range dependencies | Multiple layers (both) or Graph Transformers |
Summary
ChebNet is the theoretical bridge between the spectral and spatial views of graph neural networks. It:
- Avoids eigendecomposition via polynomial approximation
Achieves O(K· E ) complexity (linear in edges) - Reduces filter parameters from N to K+1
- Has provable locality (K-hop neighbourhood)
- Subsumes GCN as the K=1 special case
Understanding ChebNet is essential for understanding why GCN works and what it approximates.
References
- Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. NeurIPS 2016.
- Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. ICLR 2017.
