APPNP: Personalized PageRank Meets Graph Neural Networks

4 minute read

Published: April 08, 2024

TL;DR: APPNP (Klicpera et al., 2019) separates two entangled operations in GCN: (1) transform features via a neural network f_θ(X), (2) propagate via K steps of Personalized PageRank. The teleport probability α keeps each node anchored to its own representation, preventing over-smoothing even for K=10.

The Problem: Transformation and Propagation Are Entangled

In GCN, each layer simultaneously:

Aggregates neighbour features (graph propagation)
Transforms the aggregated features (learned weights)

This entanglement means you cannot independently control how deep you propagate (how large a neighbourhood you use) and how expressive the per-node transformation is. More layers → both deeper propagation AND more transformation depth → over-smoothing and vanishing gradients.

APPNP’s Decoupling

APPNP breaks this into two explicit, separate steps:

Step 1: Transform — apply an MLP to the raw node features:

H = f_θ(X) ∈ ℝ^{N × F}

This is a standard neural network (any depth, any nonlinearity) that processes each node independently. No graph structure is used here.

Step 2: Propagate — run K steps of Personalized PageRank starting from H:

Z^{(0)} = H Z^{(k+1)} = (1 − α) S̃ Z^{(k)} + α H Ŷ = softmax( Z^{(K)} )

Where S̃ is the normalised adjacency, α ∈ (0,1) is the teleport probability, and H is the initial (transformed) representation.

Why Personalized PageRank?

Regular PageRank diffuses information globally — after many steps, every node’s representation converges to the global stationary distribution (over-smoothing). The “personalized” variant prevents this:

At each step, with probability α, the random walk teleports back to the starting node’s initial representation H[v]. This keeps each node “anchored” to its own identity while still aggregating from the neighbourhood.

The closed-form solution of this Personalized PageRank propagation is:

Z* = α (I − (1−α) S̃)⁻¹ H

APPNP approximates this with K power iterations.

Why this prevents over-smoothing: In standard K-step propagation (GCN/SGC), each step mixes the representation with neighbours', eventually washing out local identity. The teleport probability α constantly re-injects the original (pre-propagation) representation, preventing the node from drifting too far from its own features regardless of how many propagation steps are used.

Hyperparameter Choices

α = 0.1 to 0.2: works well empirically. Small α → more propagation weight; large α → more anchoring.
K = 10 to 20: APPNP can use 10-20 propagation steps without over-smoothing (vs GCN which degrades beyond 2-3 layers).
The MLP (step 1) is typically 2 layers with ReLU and dropout.

APPNP and the Transformation-Propagation Paradigm

APPNP pioneered the idea of separating transformation from propagation, which became influential:

Model	Transformation	Propagation
GCN	Interleaved	Interleaved
SGC	None (identity)	S̃ᴷ (pre-computed)
APPNP	MLP on X	Personalized PageRank
SIGN	MLP on X	Multiple S̃ᵏ X concatenated

Pre-computation Option

Like SGC, the propagation step in APPNP does not involve learned parameters. The K iterations of Personalized PageRank can be computed on the final transformed features H — but since H depends on learned θ, this must be recomputed each epoch. However, the graph structure (S̃) is fixed, so the sparse matrix operations are efficient.

Performance

On Cora, CiteSeer, and Pubmed (standard node classification benchmarks):

Model	Cora	CiteSeer	Pubmed
GCN (2 layers)	81.5	70.3	79.0
GAT	83.0	72.5	79.0
APPNP (K=10, α=0.1)	83.3	71.8	80.1

APPNP matches or exceeds GCN/GAT at a fraction of the parameter count.

Summary

Property	GCN	APPNP
Propagation + transformation	Interleaved	Separate
Depth before over-smoothing	2-3 layers	10-20 steps
Over-smoothing mechanism	Repeated averaging	Teleport prevents drift
Parameters	K × W^{(k)}	One MLP
Theoretical basis	ChebNet (K=1)	Personalized PageRank

APPNP is not just a better GCN — it is a proof of concept that separating what to compute (transformation) from where to get information (propagation) is a productive design principle. Nearly all scalable modern GNNs inherit this idea.

References

Klicpera, J., Bojchevski, A., & Günnemann, S. (2019). Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

APPNP: Personalized PageRank Meets Graph Neural Networks

The Problem: Transformation and Propagation Are Entangled

APPNP’s Decoupling

Why Personalized PageRank?

Hyperparameter Choices

APPNP and the Transformation-Propagation Paradigm

Pre-computation Option

Performance

Summary

References

Share on

You May Also Enjoy

Open Problems and Future Directions in Sheaf Neural Networks

Sheaf GNNs for Molecular Property Prediction

Sheaf GNNs for Heterophilic Node Classification: Benchmarks and Results

Sheaves Meet Attention: Transformer-Inspired Sheaf Models