APPNP: Personalized PageRank Meets Graph Neural Networks

4 minute read

Published:

TL;DR: APPNP (Klicpera et al., 2019) separates two entangled operations in GCN: (1) transform features via a neural network f_θ(X), (2) propagate via K steps of Personalized PageRank. The teleport probability α keeps each node anchored to its own representation, preventing over-smoothing even for K=10.

The Problem: Transformation and Propagation Are Entangled

In GCN, each layer simultaneously:

  • Aggregates neighbour features (graph propagation)
  • Transforms the aggregated features (learned weights)

This entanglement means you cannot independently control how deep you propagate (how large a neighbourhood you use) and how expressive the per-node transformation is. More layers → both deeper propagation AND more transformation depth → over-smoothing and vanishing gradients.

APPNP’s Decoupling

APPNP breaks this into two explicit, separate steps:

Step 1: Transform — apply an MLP to the raw node features:

H = f_θ(X) ∈ ℝ^{N × F}

This is a standard neural network (any depth, any nonlinearity) that processes each node independently. No graph structure is used here.

Step 2: Propagate — run K steps of Personalized PageRank starting from H:

Z^{(0)} = H Z^{(k+1)} = (1 − α) S̃ Z^{(k)} + α H Ŷ = softmax( Z^{(K)} )

Where S̃ is the normalised adjacency, α ∈ (0,1) is the teleport probability, and H is the initial (transformed) representation.

Why Personalized PageRank?

Regular PageRank diffuses information globally — after many steps, every node’s representation converges to the global stationary distribution (over-smoothing). The “personalized” variant prevents this:

At each step, with probability α, the random walk teleports back to the starting node’s initial representation H[v]. This keeps each node “anchored” to its own identity while still aggregating from the neighbourhood.

The closed-form solution of this Personalized PageRank propagation is:

Z* = α (I − (1−α) S̃)⁻¹ H

APPNP approximates this with K power iterations.

Why this prevents over-smoothing: In standard K-step propagation (GCN/SGC), each step mixes the representation with neighbours', eventually washing out local identity. The teleport probability α constantly re-injects the original (pre-propagation) representation, preventing the node from drifting too far from its own features regardless of how many propagation steps are used.

Hyperparameter Choices

  • α = 0.1 to 0.2: works well empirically. Small α → more propagation weight; large α → more anchoring.
  • K = 10 to 20: APPNP can use 10-20 propagation steps without over-smoothing (vs GCN which degrades beyond 2-3 layers).
  • The MLP (step 1) is typically 2 layers with ReLU and dropout.

APPNP and the Transformation-Propagation Paradigm

APPNP pioneered the idea of separating transformation from propagation, which became influential:

ModelTransformationPropagation
GCNInterleavedInterleaved
SGCNone (identity)S̃ᴷ (pre-computed)
APPNPMLP on XPersonalized PageRank
SIGNMLP on XMultiple S̃ᵏ X concatenated

Pre-computation Option

Like SGC, the propagation step in APPNP does not involve learned parameters. The K iterations of Personalized PageRank can be computed on the final transformed features H — but since H depends on learned θ, this must be recomputed each epoch. However, the graph structure (S̃) is fixed, so the sparse matrix operations are efficient.

Performance

On Cora, CiteSeer, and Pubmed (standard node classification benchmarks):

ModelCoraCiteSeerPubmed
GCN (2 layers)81.570.379.0
GAT83.072.579.0
APPNP (K=10, α=0.1)83.371.880.1

APPNP matches or exceeds GCN/GAT at a fraction of the parameter count.

Summary

PropertyGCNAPPNP
Propagation + transformationInterleavedSeparate
Depth before over-smoothing2-3 layers10-20 steps
Over-smoothing mechanismRepeated averagingTeleport prevents drift
ParametersK × W^{(k)}One MLP
Theoretical basisChebNet (K=1)Personalized PageRank

APPNP is not just a better GCN — it is a proof of concept that separating what to compute (transformation) from where to get information (propagation) is a productive design principle. Nearly all scalable modern GNNs inherit this idea.

References