SGC: Simple Graph Convolution

4 minute read

Published:

TL;DR: SGC (Wu et al., 2019) collapses K GCN layers into one: propagate features K times with the normalised adjacency (no weights, no nonlinearities), then apply a single linear classifier. This makes propagation pre-computable, turns the model into a linear logistic regression on smoothed features, and runs orders of magnitude faster than GCN with comparable accuracy.

The GCN Formula Revisited

GCN with K layers applies:

H^{(k+1)} = σ( S̃ H^{(k)} W^{(k)} )

Where S̃ = D̃^{-1/2} Ã D̃^{-1/2} is the normalised adjacency with self-loops, and σ is ReLU.

Each layer: (1) aggregate from neighbours (S̃ H^{(k)}), (2) apply a linear transformation W^{(k)}, (3) apply nonlinearity σ.

The nonlinearity σ couples the graph propagation and the linear transformation — you cannot separate or pre-compute them.

The SGC Simplification

SGC (Simple Graph Convolution) makes two changes:

  1. Remove all intermediate nonlinearities
  2. Collapse all weight matrices into one
Ŷ = softmax( S̃ᴷ X W )

Where S̃ᴷ = S̃ · S̃ · … · S̃ (K times) is the K-th power of the normalised adjacency.

The entire model is now:

  1. Pre-compute S̃ᴷ X (K steps of feature propagation — this is a fixed linear operation)
  2. Apply a single linear classifier W to the smoothed features
  3. Softmax for probabilities

Pre-Computation: The Key Efficiency Gain

Because S̃ᴷ X involves no learned parameters, it can be computed once before training and cached. Training then reduces to logistic regression on the pre-computed features S̃ᴷ X.

Computational cost comparison (training, N nodes, K=2):

ModelForward pass cost  
GCN (2 layers)O(E· d₁ + N · d₁ · d₂)
SGC (K=2)**O(E· d)** pre-compute, then O(N · d · C) per epoch

For Cora (2,708 nodes, 10,556 edges): SGC is 11× faster than GCN with similar accuracy.

What does this tell us? If removing all nonlinearities between GCN layers has minimal impact on accuracy, then those nonlinearities were not doing much. The expressive power of GCN comes largely from the graph propagation, not the intermediate MLPs. The final nonlinearity (softmax) is sufficient.

Theoretical Interpretation

S̃ᴷ X is a K-step random walk smoothing of node features. Each application of S̃ moves each node’s feature vector toward the weighted average of its neighbours. After K steps, each node’s feature is a weighted average over its K-hop neighbourhood.

SGC then applies a linear classifier on this smoothed representation — exactly analogous to a linear classifier on bag-of-words features, where the “bag” is the K-hop neighbourhood.

When Nonlinearities Do Matter

SGC’s simplification works on homophilic graphs (Cora, CiteSeer, Pubmed). On these, K-step averaging makes same-class nodes progressively more similar — which is exactly what the linear classifier needs.

SGC fails relative to GCN on:

  • Heterophilic graphs: averaging across class boundaries hurts
  • Complex structural tasks: tasks where the pattern is in the local structure, not smooth features
  • Very deep propagation: without nonlinearities, K-step averaging becomes over-smoothing very quickly

SGC as Logistic Regression

After pre-computing X̃ = S̃ᴷ X, the SGC model is:

Ŷ = softmax( X̃ W ) trained with cross-entropy

This is exactly logistic regression on pre-computed graph-smoothed features. The model has no architecture hyperparameters beyond K. There is no depth, no hidden layers, no dropout decisions.

This makes SGC the perfect ablation baseline: if SGC matches your GNN, the GNN’s complexity is not providing value.

SGC Variants and Successors

  • SIGN (2020): use multiple powers of S̃ simultaneously (X, S̃X, S̃²X, …) and concatenate, then apply an MLP. More expressive than SGC while keeping pre-computation.
  • APPNP: separate propagation from transformation with personalized PageRank.
  • GAMLP: large-scale variant using multiple propagation matrices.

Summary

PropertyGCNSGC
Per-layer nonlinearityYes (ReLU)No
Weight matricesK separate W^{(k)}Single W
Propagation pre-computableNoYes
Training costHigh (full forward pass)Low (logistic regression)
Accuracy (homophilic)Slightly betterComparable
Accuracy (heterophilic)PoorAlso poor

SGC reveals that graph neural networks, at least in the homophilic setting, are largely doing one thing: smoothing features over the graph and then classifying. The sophistication of multi-layer nonlinear GNNs can sometimes be simplified to this without loss — a profound insight about what graph learning actually requires.

References