ChebNet: Spectral Graph Convolutions via Chebyshev Polynomials

4 minute read

Published:

TL;DR: ChebNet (Defferrard et al., 2016) approximates spectral graph filters using K-th order Chebyshev polynomials of the Laplacian. This eliminates the need for eigendecomposition, runs in O(K·|E|) time, and each node only aggregates from its K-hop neighbourhood. GCN is the special case K=1.

The Problem with Direct Spectral Convolution

The spectral graph convolution h_θ *_G f = U h_θ(Λ) Uᵀ f requires:

  1. Full eigendecomposition of L: O(N³) — infeasible for large graphs
  2. Multiplying by U and Uᵀ: O(N²) per forward pass
  3. Learned filter h_θ(Λ) has N parameters — one per eigenvalue — not scalable

ChebNet solves all three problems at once.

Chebyshev Polynomial Approximation

Chebyshev polynomials {T₀, T₁, T₂, …} form an orthonormal basis for functions on [−1, 1] — they are the optimal polynomial approximation in the minimax sense.

ChebNet approximates the spectral filter h_θ(Λ) as a degree-K Chebyshev polynomial:

h_θ(Λ) ≈ Σₖ₌₀ᴷ θₖ Tₖ(Λ̃)

Where Λ̃ = 2Λ/λₘₐₓ − I rescales eigenvalues to [−1, 1], and the Chebyshev polynomials satisfy the recursion:

T₀(x) = 1    T₁(x) = x    Tₖ(x) = 2x Tₖ₋₁(x) − Tₖ₋₂(x)

Why this matters: applying T_k(Λ̃) to f does not require computing eigenvectors. Since T_k(L̃) = U T_k(Λ̃) Uᵀ, we can compute:

ȳ = Σₖ θₖ Tₖ(L̃) f = Σₖ θₖ z̄ₖ

Where z̄₀ = f, z̄₁ = L̃f, z̄ₖ = 2L̃z̄ₖ₋₁ − z̄ₖ₋₂ — a simple matrix-vector recursion using only sparse matrix-vector products with L̃.

Computational Properties

PropertyDirect spectralChebNet (order K)  
EigendecompositionO(N³)None needed  
Forward pass costO(N²)O(K ·E)
Filter parametersNK+1  
Spatial localityNon-local (all nodes)K-hop neighbourhood  
Scalable to large graphsNoYes  

Spatial interpretation: T_k(L̃) applied to f aggregates information from the k-hop neighbourhood of each node. A K-th order ChebNet is equivalent to aggregating from all nodes within K hops — exactly what K stacked message-passing layers would do.

Key insight: By choosing Chebyshev polynomials, Defferrard et al. simultaneously (1) avoided eigendecomposition, (2) reduced filter parameters from N to K+1, and (3) localised the filter to K-hop neighbourhoods. This makes ChebNet a spatial-spectral bridge: defined spectrally, computed spatially.

The ChebNet Layer

In the multi-feature case, a ChebNet layer with F_in input features and F_out output features:

H_out = σ( Σₖ₌₀ᴷ Tₖ(L̃) H_in Θₖ )

Where Θₖ ∈ ℝ^{F_in × F_out} are learnable weight matrices for each polynomial order k.

Learnable parameters: K × F_in × F_out (versus a single F_in × F_out for a standard linear layer).

From ChebNet to GCN

GCN (Kipf & Welling, 2017) simplifies ChebNet to K=1:

h_θ(L̃) ≈ θ₀ T₀(L̃) + θ₁ T₁(L̃) = θ₀ I + θ₁ L̃

With the further simplification θ = θ₀ = −θ₁ (one free parameter) and the renormalisation trick (adding self-loops to avoid numerical issues):

GCN: H' = σ( D̃^{-1/2} Ã D̃^{-1/2} H W )

Where à = A + I. This is the K=1 first-order Chebyshev approximation with parameter tying. GCN sacrifices filter expressiveness for simplicity and scalability.

ChebConv in Practice (PyG)

from torch_geometric.nn import ChebConv

conv = ChebConv(in_channels=64, out_channels=64, K=3)
# K=3: aggregates from 3-hop neighbourhood
# 3 × 64 × 64 = 12,288 learnable parameters vs 64 × 64 = 4,096 for GCN

When to Use ChebNet vs GCN

ScenarioRecommendation
Need multi-hop receptive field explicitlyChebNet (set K)
Fast, simple baselineGCN (K=1)
Very large graph (>1M nodes)GCN (simpler, better optimised)
Spectral filter learningChebNet
Need long-range dependenciesMultiple layers (both) or Graph Transformers

Summary

ChebNet is the theoretical bridge between the spectral and spatial views of graph neural networks. It:

  • Avoids eigendecomposition via polynomial approximation
  • Achieves O(K·E) complexity (linear in edges)
  • Reduces filter parameters from N to K+1
  • Has provable locality (K-hop neighbourhood)
  • Subsumes GCN as the K=1 special case

Understanding ChebNet is essential for understanding why GCN works and what it approximates.

References