GCN: Graph Convolutional Networks

3 minute read

Published: February 05, 2024

TL;DR: GCN's layer formula is H' = σ(Ã·H·W), where Ã = D̃^(-1/2)(A+I)D̃^(-1/2) is the degree-normalised adjacency with self-loops. Each layer averages neighbour representations and applies a learned linear transformation. Simple. Powerful. The foundation everything else builds on.

From Spectral Theory to a Practical Layer

As explained in the Graph Laplacian post, spectral graph convolution filters signals using the Laplacian’s eigenvectors. But computing eigenvectors is O(N³) — completely infeasible for large graphs.

Kipf & Welling (2016) made two elegant simplifications:

Localise: approximate the spectral filter as a first-order polynomial in L — only immediate neighbours matter.
Normalise: use symmetric normalisation to prevent large-degree nodes from dominating.

The result is a layer that can be computed with simple sparse matrix multiplication.

The GCN Formula

H^(l+1) = σ( D̃^-½ Ã D̃^-½ · H^(l) · W^(l) )

Where:

H^(l) — node features at layer l (N × F matrix, one row per node).
Ã = A + I — adjacency with self-loops added.
D̃ — degree matrix of Ã.
W^(l) — learned weight matrix (F × F’ ).
σ — activation function (ReLU).

Breaking It Down Step by Step

Let’s trace what happens for a single node v with neighbours A, B, C:

Step 1: Add self-loops. Ã = A + I ensures v attends to itself too, not just its neighbours. Without this, a node’s own features would be excluded from the aggregation.

Step 2: Compute normalisation weights. The weight for the edge (u→v) is: 1 / √(deg_Ã(u) · deg_Ã(v)).

This is the symmetric normalisation. It down-weights edges from high-degree “hub” nodes (which would otherwise dominate).

Step 3: Aggregate. (D̃^(-½) Ã D̃^(-½) · H)[v] = weighted sum of features from v and all its neighbours.

Step 4: Transform. Multiply by W: linear projection to a new feature space.

Step 5: Activate. Apply ReLU (or other non-linearity).

Figure 1: GCN aggregation at node v. Degree normalisation (1/√(deg_u · deg_v)) prevents high-degree hub nodes from dominating the aggregation. The self-loop ensures v's own features are included.

What GCN Is in MPNN Terms

MSG: send neighbour features multiplied by normalisation weight.
AGGREGATE: weighted sum (the normalised adjacency does this).
UPDATE: apply a single linear layer W followed by ReLU.

Stacking Layers and the Oversmoothing Problem

With k GCN layers, each node’s representation captures its k-hop neighbourhood.

k=2: captures 2-hop info (good for most citation/social graph tasks).
k≫1: oversmoothing — all node representations converge to the same vector as information diffuses uniformly across the whole graph, making nodes indistinguishable.

This is GCN’s main limitation. GAT, GPRGNN, and JK-Nets propose various solutions.

✅ Key Takeaways

GCN layer: H' = σ(Ã_norm · H · W), where Ã_norm is the degree-normalised adjacency with self-loops.
All neighbours contribute equally (weighted only by degree normalisation) — no attention, no learned weights on edges.
Simple and fast: just sparse matrix multiplication + a linear layer.
Main weakness: oversmoothing with deep stacks; all node embeddings converge to the same vector.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

GCN: Graph Convolutional Networks

From Spectral Theory to a Practical Layer

The GCN Formula

Breaking It Down Step by Step

What GCN Is in MPNN Terms

Stacking Layers and the Oversmoothing Problem

✅ Key Takeaways

Share on

You May Also Enjoy

GIN: Graph Isomorphism Network — The Most Expressive GNN

GraphSAGE: Inductive Learning on Large Graphs

GAT: Graph Attention Networks

Message Passing: The Universal GNN Framework