GCN: Graph Convolutional Networks
Published:
From Spectral Theory to a Practical Layer
As explained in the Graph Laplacian post, spectral graph convolution filters signals using the Laplacian’s eigenvectors. But computing eigenvectors is O(N³) — completely infeasible for large graphs.
Kipf & Welling (2016) made two elegant simplifications:
- Localise: approximate the spectral filter as a first-order polynomial in L — only immediate neighbours matter.
- Normalise: use symmetric normalisation to prevent large-degree nodes from dominating.
The result is a layer that can be computed with simple sparse matrix multiplication.
The GCN Formula
Where:
H^(l)— node features at layer l (N × F matrix, one row per node).Ã = A + I— adjacency with self-loops added.D̃— degree matrix of Ã.W^(l)— learned weight matrix (F × F’ ).σ— activation function (ReLU).
Breaking It Down Step by Step
Let’s trace what happens for a single node v with neighbours A, B, C:
Step 1: Add self-loops. Ã = A + I ensures v attends to itself too, not just its neighbours. Without this, a node’s own features would be excluded from the aggregation.
Step 2: Compute normalisation weights. The weight for the edge (u→v) is: 1 / √(deg_Ã(u) · deg_Ã(v)).
This is the symmetric normalisation. It down-weights edges from high-degree “hub” nodes (which would otherwise dominate).
Step 3: Aggregate. (D̃^(-½) Ã D̃^(-½) · H)[v] = weighted sum of features from v and all its neighbours.
Step 4: Transform. Multiply by W: linear projection to a new feature space.
Step 5: Activate. Apply ReLU (or other non-linearity).
What GCN Is in MPNN Terms
- MSG: send neighbour features multiplied by normalisation weight.
- AGGREGATE: weighted sum (the normalised adjacency does this).
- UPDATE: apply a single linear layer W followed by ReLU.
Stacking Layers and the Oversmoothing Problem
With k GCN layers, each node’s representation captures its k-hop neighbourhood.
- k=2: captures 2-hop info (good for most citation/social graph tasks).
- k≫1: oversmoothing — all node representations converge to the same vector as information diffuses uniformly across the whole graph, making nodes indistinguishable.
This is GCN’s main limitation. GAT, GPRGNN, and JK-Nets propose various solutions.
✅ Key Takeaways
- GCN layer: H' = σ(Ã_norm · H · W), where Ã_norm is the degree-normalised adjacency with self-loops.
- All neighbours contribute equally (weighted only by degree normalisation) — no attention, no learned weights on edges.
- Simple and fast: just sparse matrix multiplication + a linear layer.
- Main weakness: oversmoothing with deep stacks; all node embeddings converge to the same vector.
