Random Walk Positional Encodings

5 minute read

Published: April 19, 2024

TL;DR: RWPE encodes node v as a vector [RW^1[v,v], RW^2[v,v], ..., RW^k[v,v]] — the probability of landing back on v after 1, 2, ..., k random walk steps. This captures local structural roles (cycle membership, connectivity patterns) without sign ambiguity or expensive eigendecomposition.

Random walk positional encoding — Random Walk Structural Encodings (Dwivedi et al., 2022)

Intuition First

Imagine a random walker starting at node v. After k steps, what is the probability they ended up back at v? If v is in a tight clique, the walker is likely to boomerang back quickly — high return probability. If v is on a long path, the walker drifts away and rarely returns — low return probability.

By stacking the return probabilities at steps 1, 2, 3, …, K into a vector, we get a structural fingerprint for v. Nodes in triangles get elevated P³[v,v]; nodes in 4-cycles get elevated P⁴[v,v]. The fingerprint tells the model what local structure surrounds v — without any of the sign ambiguity of Laplacian eigenvectors.

A node inside a clique (left) has many short paths back to itself, giving high P^k[v,v] at small k. A node on a long path (right) has a very different profile. These profiles serve as structural fingerprints.

The Idea

The random walk matrix is P = D⁻¹A (row-stochastic adjacency). P^k[u,v] is the probability of reaching v from u in exactly k steps.

The diagonal entries P^k[v,v] — the probability of returning to v in k steps — encode v’s local structural role:

Nodes in dense cliques have high return probabilities (many short cycles)
Nodes on long paths have low return probabilities (random walks escape quickly)
Nodes in k-cycles have elevated P^k[v,v]

pe_v = [P^1[v,v], P^2[v,v], ..., P^K[v,v]] ∈ ℝᴷ

Why RWPE Works as a Structural Encoding

Each entry P^k[v,v] measures “how often does a k-step random walk starting from v return to v?” This is directly related to:

Cycle membership: v in a triangle → P³[v,v] > 0
Clique membership: v in a clique-k → high P^k[v,v]
Degree: high-degree nodes are returned to more often

RWPE is a structural encoding: it encodes the local structural role of v. Two nodes with identical local structure get the same RWPE — intended behaviour. Nodes with different cycle membership, degree patterns, or connectivity get different PEs.

Comparison with LapPE

Property	LapPE	RWPE
Type	Positional (global position)	Structural (local role)
Sign ambiguity	Yes — requires fix	No (probabilities are positive)
Computation	O(k·	E	·T) eigenvector	O(K·	E	) sparse power iteration
Captures	Global graph geometry	Local structural patterns
Distinguishes	Globally unique positions	Structurally unique roles
Used by	SAN, GPS	GPS, CRaWL, GRIT

Computing RWPE Efficiently

pe_v[k] = (D⁻¹A)^k [v,v] = e_vᵀ P^k e_v

You only need the diagonal of P^k. Using the recursion P^k = P · P^{k-1}, you can compute the full sequence iteratively:

P = D_inv @ A          # row-stochastic: P[v,:] sums to 1
Pk = torch.eye(N)      # P^0 = I
rwpe = []
for k in range(1, K+1):
    Pk = Pk @ P
    rwpe.append(Pk.diagonal())  # landing probs at step k
pe = torch.stack(rwpe, dim=-1)  # [N, K]

Cost: K sparse matrix multiplications, each O(

). Total: O(K ·

). Much cheaper than eigendecomposition for large K.

Key Insight: RWPE is a structural encoding, not a positional one. Two nodes with the same local topology (e.g., both in a triangle on degree-3 nodes) get the same RWPE vector — intentionally. This is correct for tasks that ask "what is this node's structural role?" but wrong for tasks that ask "which specific node is this?". For the latter, use LapPE.

Worked Example: Triangle vs. Star Center

Triangle node (3-clique, degree 2): P = [[0, 0.5, 0.5], [0.5, 0, 0.5], [0.5, 0.5, 0]]. Diagonal of P^k:

P¹[v,v] = 0 (can’t stay)
P²[v,v] = 0.5 (go to either neighbour and come back)
P³[v,v] = 0.25 (short triangle path)

Star center (degree 4, no cycles): all walks from the center go to leaves at step 1, come back at step 2. No odd-length return paths.

P¹[v,v] = 0
P²[v,v] = 1.0 (all 4 walks: center→leaf→center)
P³[v,v] = 0 (must go to leaf again, can’t return in odd steps)

These profiles [0, 0.5, 0.25, ...] vs [0, 1.0, 0, ...] are clearly different — the model can immediately distinguish a clique member from a star hub, which GCN without PE cannot.

RWPE in Practice

GPS (Rampášek et al., 2022) found RWPE competitive with LapPE and often preferable:

No sign ambiguity → more stable training
Cheaper to compute → scales better
Captures structural roles (cycles, cliques) rather than global positions

For tasks where structural role matters more than global position (molecular property prediction, graph classification), RWPE often outperforms LapPE.

Summary

RWPE is the practical, sign-free alternative to LapPE. It encodes local structural fingerprints — cycle membership, clique participation, escape probabilities — through simple sparse power iterations. When global position matters, use LapPE; when local structural role matters, use RWPE; when both matter, use both (GPS does this).

References

Dwivedi, V. P., Lim, A. T., Beaini, D., & Lió, P. (2021). Graph Neural Networks with Learnable Structural and Positional Representations. ICLR 2022.
Rampasek, L., Galkin, M., Dwivedi, V. P., Lim, A. T., Wolf, G., & Beaini, D. (2022). Recipe for a General, Powerful, Scalable Graph Transformer. NeurIPS 2022 (GPS — uses RWPE as default positional encoding).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Random Walk Positional Encodings

Intuition First

The Idea

Why RWPE Works as a Structural Encoding

Comparison with LapPE

Computing RWPE Efficiently

Worked Example: Triangle vs. Star Center

RWPE in Practice

Summary

References

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization

Alessio Borgi

Intuition First

The Idea

Why RWPE Works as a Structural Encoding

Comparison with LapPE

Computing RWPE Efficiently

Worked Example: Triangle vs. Star Center

RWPE in Practice

Summary

References

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 FoPE: Fourier Position Embedding for Length Generalization

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization