GNNs for Recommender Systems

5 minute read

Published: May 22, 2024

TL;DR: The user-item interaction graph is bipartite: users on one side, items on the other, with edges representing clicks/purchases/ratings. GCN-style propagation on this graph captures multi-hop collaborative signals — "users who liked what you liked also liked X." LightGCN simplifies this to pure propagation without transformation, achieving state-of-the-art efficiency.

PinSage recommendation GNN — PinSage: graph convolutional network for web-scale recommender systems (Ying et al., 2018)

Recommendation as a Graph Problem

Intuition First: Matrix factorisation is like learning that “Alice likes comedies” and “this film is a comedy” and multiplying those two vectors. It captures direct user–item similarity but cannot represent the chain: “Alice liked this film, Bob also liked it, Bob also liked that other film, so Alice might like that other film too.” GNNs capture this multi-hop chain by propagating information along the bipartite graph — 2-hop neighbours of Alice (items liked by users who liked Alice’s items) are exactly the collaborative filtering signal that matrix factorisation misses.

Traditional collaborative filtering: learn user embedding e_u and item embedding e_i; predict score as e_u · e_i. This captures pairwise similarity but not higher-order structure.

GNN approach: build a bipartite graph where user u is connected to item i if u interacted with i. Run GNN to produce user/item embeddings that capture multi-hop neighbourhood structure:

1-hop: items u has interacted with (or users who interacted with i)
2-hop: items interacted with by users who also interacted with u’s items (“collaborative filtering signal”)
3-hop: transitive similarities

The Bipartite User-Item Graph

G = (U ∪ I, E) where (u, i) ∈ E if user u interacted with item i

Message passing on this bipartite graph:

User aggregation (from items):

h^{(k)}_u = AGG({ h^{(k-1)}_i : i ∈ N(u) })

Item aggregation (from users):

h^{(k)}_i = AGG({ h^{(k-1)}_u : u ∈ N(i) })

After K layers, h^{(K)}_u encodes the K-hop neighbourhood — capturing collaborative filtering signals up to K hops.

LightGCN (He et al., 2020)

LightGCN makes a key simplification: remove weight matrices and non-linearities. The propagation is pure averaging:

h^{(k)}_u = Σ_{i ∈ N(u)} (1/√|N(u)||N(i)|) h^{(k-1)}_i h^{(k)}_i = Σ_{u ∈ N(i)} (1/√|N(i)||N(u)|) h^{(k-1)}_u

Final embedding: weighted combination of all layers:

e_u = Σ_{k=0}^{K} α_k h^{(k)}_u

Where α_k = 1/(K+1) typically. Score: ê_{ui} = e_u · e_i.

Why remove transformations? Empirically, on collaborative filtering benchmarks, removing W_k and σ(·) improves performance. The collaborative filtering signal is in the propagation, not the transformation — adding learnable matrices introduces overfitting without expressiveness gains.

LightGCN's key insight: Standard GCNs were designed for graphs with rich node features. In collaborative filtering, nodes have only ID embeddings (no features). The transformation W is not useful — it merely maps one random initialisation to another. Pure propagation propagates collaborative signals without adding noise. This is the recommender-system-specific reason why simpler is better.

PinSage (Ying et al., 2018)

Pinterest’s GNN for image recommendation — one of the first industrial deployments of GNNs.

Scale: 3 billion nodes (pins + boards), 18 billion edges, 7500 GPUs.

Key innovations:

GraphSAGE-style sampling: for each node, sample a fixed-size neighbourhood (not full neighbourhood) — makes computation tractable at scale
Random walk importance sampling: sample neighbours by importance (how often they co-occur in random walks), not uniformly
Curriculum training: gradually increase neighbourhood size during training

NGCF and Variants

NGCF (Wang et al., 2019): adds explicit feature interaction in message passing:

m_{ui} = (W_1 h_i + W_2 (h_i ⊙ h_u)) / √|N(u)||N(i)|

The Hadamard product h_i ⊙ h_u captures user-item feature interactions. LightGCN showed this adds overfitting without expressive benefit on standard benchmarks — but for rich feature settings it can help.

GNN propagation on the bipartite graph: Alice and Bob both liked Item2 (1-hop). Bob also liked Item3 (2-hop). LightGCN propagates this signal to suggest Item3 for Alice.

Session-Based Recommendation

Standard CF assumes all past interactions are known for each user. Session-based recommendation has no long-term user history — only the current session (sequence of clicks).

SR-GNN (Wu et al., 2019): model a session as a directed graph (clicks are edges from previous item to next item). Run GCN on session graph, then use attention to extract user intent from node embeddings. This captures transition patterns between items within a session.

Knowledge Graph-Enhanced Recommendation

KGNN-LS / KGCN: enrich the item side with a knowledge graph (item → category, brand, attributes). GNN propagates over both the user-item graph and the item knowledge graph simultaneously.

Benefit: cold-start items with no interactions can leverage KG features (genre, director for movies) to receive recommendations from users with similar taste in KG-related items.

Summary

Model	Key idea	Scale
Matrix Factorisation	Pairwise similarity only	Any
NGCF	GCN + feature interaction	Millions
LightGCN	GCN without transformation	Billions (efficient)
PinSage	GraphSAGE + importance sampling	3 billion nodes
SR-GNN	Session graph + GCN	Millions

GNNs are now the dominant paradigm for production recommendation systems at scale — deployed by Pinterest, Alibaba, Amazon, Netflix, and most major e-commerce platforms.

References

Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., & Leskovec, J. (2018). Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD 2018 (PinSage: GraphSAGE-based GNN for Pinterest’s 3-billion-node pin-board graph at production scale).
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., & Wang, M. (2020). LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. SIGIR 2020 (LightGCN: ablation study showing that removing feature transformation and activation from GCN improves collaborative filtering).
Wu, S., Tang, Y., Zhu, Y., Wang, L., Xie, X., & Tan, T. (2019). Session-Based Recommendation with Graph Neural Networks. AAAI 2019 (SR-GNN: models session sequences as directed graphs for next-item prediction with GNN).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

GNNs for Recommender Systems

Recommendation as a Graph Problem

The Bipartite User-Item Graph

LightGCN (He et al., 2020)

PinSage (Ying et al., 2018)

NGCF and Variants

Session-Based Recommendation

Knowledge Graph-Enhanced Recommendation

Summary

References

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization

Alessio Borgi

Recommendation as a Graph Problem

The Bipartite User-Item Graph

LightGCN (He et al., 2020)

PinSage (Ying et al., 2018)

NGCF and Variants

Session-Based Recommendation

Knowledge Graph-Enhanced Recommendation

Summary

References

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 FoPE: Fourier Position Embedding for Length Generalization

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization