APPNP: Personalized PageRank Meets Graph Neural Networks
Published:
The Problem: Transformation and Propagation Are Entangled
In GCN, each layer simultaneously:
- Aggregates neighbour features (graph propagation)
- Transforms the aggregated features (learned weights)
This entanglement means you cannot independently control how deep you propagate (how large a neighbourhood you use) and how expressive the per-node transformation is. More layers → both deeper propagation AND more transformation depth → over-smoothing and vanishing gradients.
APPNP’s Decoupling
APPNP breaks this into two explicit, separate steps:
Step 1: Transform — apply an MLP to the raw node features:
This is a standard neural network (any depth, any nonlinearity) that processes each node independently. No graph structure is used here.
Step 2: Propagate — run K steps of Personalized PageRank starting from H:
Where S̃ is the normalised adjacency, α ∈ (0,1) is the teleport probability, and H is the initial (transformed) representation.
Why Personalized PageRank?
Regular PageRank diffuses information globally — after many steps, every node’s representation converges to the global stationary distribution (over-smoothing). The “personalized” variant prevents this:
At each step, with probability α, the random walk teleports back to the starting node’s initial representation H[v]. This keeps each node “anchored” to its own identity while still aggregating from the neighbourhood.
The closed-form solution of this Personalized PageRank propagation is:
APPNP approximates this with K power iterations.
Hyperparameter Choices
- α = 0.1 to 0.2: works well empirically. Small α → more propagation weight; large α → more anchoring.
- K = 10 to 20: APPNP can use 10-20 propagation steps without over-smoothing (vs GCN which degrades beyond 2-3 layers).
- The MLP (step 1) is typically 2 layers with ReLU and dropout.
APPNP and the Transformation-Propagation Paradigm
APPNP pioneered the idea of separating transformation from propagation, which became influential:
| Model | Transformation | Propagation |
|---|---|---|
| GCN | Interleaved | Interleaved |
| SGC | None (identity) | S̃ᴷ (pre-computed) |
| APPNP | MLP on X | Personalized PageRank |
| SIGN | MLP on X | Multiple S̃ᵏ X concatenated |
Pre-computation Option
Like SGC, the propagation step in APPNP does not involve learned parameters. The K iterations of Personalized PageRank can be computed on the final transformed features H — but since H depends on learned θ, this must be recomputed each epoch. However, the graph structure (S̃) is fixed, so the sparse matrix operations are efficient.
Performance
On Cora, CiteSeer, and Pubmed (standard node classification benchmarks):
| Model | Cora | CiteSeer | Pubmed |
|---|---|---|---|
| GCN (2 layers) | 81.5 | 70.3 | 79.0 |
| GAT | 83.0 | 72.5 | 79.0 |
| APPNP (K=10, α=0.1) | 83.3 | 71.8 | 80.1 |
APPNP matches or exceeds GCN/GAT at a fraction of the parameter count.
Summary
| Property | GCN | APPNP |
|---|---|---|
| Propagation + transformation | Interleaved | Separate |
| Depth before over-smoothing | 2-3 layers | 10-20 steps |
| Over-smoothing mechanism | Repeated averaging | Teleport prevents drift |
| Parameters | K × W^{(k)} | One MLP |
| Theoretical basis | ChebNet (K=1) | Personalized PageRank |
APPNP is not just a better GCN — it is a proof of concept that separating what to compute (transformation) from where to get information (propagation) is a productive design principle. Nearly all scalable modern GNNs inherit this idea.
References
- Klicpera, J., Bojchevski, A., & Günnemann, S. (2019). Predict then Propagate: Graph Neural Networks meet Personalized PageRank. ICLR 2019.
- Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web. Stanford InfoLab.
