GNNs for Recommender Systems

4 minute read

Published:

TL;DR: The user-item interaction graph is bipartite: users on one side, items on the other, with edges representing clicks/purchases/ratings. GCN-style propagation on this graph captures multi-hop collaborative signals — "users who liked what you liked also liked X." LightGCN simplifies this to pure propagation without transformation, achieving state-of-the-art efficiency.

Recommendation as a Graph Problem

Traditional collaborative filtering: learn user embedding e_u and item embedding e_i; predict score as e_u · e_i. This captures pairwise similarity but not higher-order structure.

GNN approach: build a bipartite graph where user u is connected to item i if u interacted with i. Run GNN to produce user/item embeddings that capture multi-hop neighbourhood structure:

  • 1-hop: items u has interacted with (or users who interacted with i)
  • 2-hop: items interacted with by users who also interacted with u’s items (“collaborative filtering signal”)
  • 3-hop: transitive similarities

The Bipartite User-Item Graph

G = (U ∪ I, E) where (u, i) ∈ E if user u interacted with item i

Message passing on this bipartite graph:

User aggregation (from items):

h^{(k)}_u = AGG({ h^{(k-1)}_i : i ∈ N(u) })

Item aggregation (from users):

h^{(k)}_i = AGG({ h^{(k-1)}_u : u ∈ N(i) })

After K layers, h^{(K)}_u encodes the K-hop neighbourhood — capturing collaborative filtering signals up to K hops.

LightGCN (He et al., 2020)

LightGCN makes a key simplification: remove weight matrices and non-linearities. The propagation is pure averaging:

h^{(k)}_u = Σ_{i ∈ N(u)} (1/√|N(u)||N(i)|) h^{(k-1)}_i h^{(k)}_i = Σ_{u ∈ N(i)} (1/√|N(i)||N(u)|) h^{(k-1)}_u

Final embedding: weighted combination of all layers:

e_u = Σ_{k=0}^{K} α_k h^{(k)}_u

Where α_k = 1/(K+1) typically. Score: ê_{ui} = e_u · e_i.

Why remove transformations? Empirically, on collaborative filtering benchmarks, removing W_k and σ(·) improves performance. The collaborative filtering signal is in the propagation, not the transformation — adding learnable matrices introduces overfitting without expressiveness gains.

LightGCN's key insight: Standard GCNs were designed for graphs with rich node features. In collaborative filtering, nodes have only ID embeddings (no features). The transformation W is not useful — it merely maps one random initialisation to another. Pure propagation propagates collaborative signals without adding noise. This is the recommender-system-specific reason why simpler is better.

PinSage (Ying et al., 2018)

Pinterest’s GNN for image recommendation — one of the first industrial deployments of GNNs.

Scale: 3 billion nodes (pins + boards), 18 billion edges, 7500 GPUs.

Key innovations:

  1. GraphSAGE-style sampling: for each node, sample a fixed-size neighbourhood (not full neighbourhood) — makes computation tractable at scale
  2. Random walk importance sampling: sample neighbours by importance (how often they co-occur in random walks), not uniformly
  3. Curriculum training: gradually increase neighbourhood size during training

NGCF and Variants

NGCF (Wang et al., 2019): adds explicit feature interaction in message passing:

m_{ui} = (W_1 h_i + W_2 (h_i ⊙ h_u)) / √|N(u)||N(i)|

The Hadamard product h_i ⊙ h_u captures user-item feature interactions. LightGCN showed this adds overfitting without expressive benefit on standard benchmarks — but for rich feature settings it can help.

Session-Based Recommendation

Standard CF assumes all past interactions are known for each user. Session-based recommendation has no long-term user history — only the current session (sequence of clicks).

SR-GNN (Wu et al., 2019): model a session as a directed graph (clicks are edges from previous item to next item). Run GCN on session graph, then use attention to extract user intent from node embeddings. This captures transition patterns between items within a session.

Knowledge Graph-Enhanced Recommendation

KGNN-LS / KGCN: enrich the item side with a knowledge graph (item → category, brand, attributes). GNN propagates over both the user-item graph and the item knowledge graph simultaneously.

Benefit: cold-start items with no interactions can leverage KG features (genre, director for movies) to receive recommendations from users with similar taste in KG-related items.

Summary

ModelKey ideaScale
Matrix FactorisationPairwise similarity onlyAny
NGCFGCN + feature interactionMillions
LightGCNGCN without transformationBillions (efficient)
PinSageGraphSAGE + importance sampling3 billion nodes
SR-GNNSession graph + GCNMillions

GNNs are now the dominant paradigm for production recommendation systems at scale — deployed by Pinterest, Alibaba, Amazon, Netflix, and most major e-commerce platforms.

References