GraphSAGE: Inductive Learning on Large Graphs
Published:
The Inductive vs. Transductive Distinction
Transductive GNNs (GCN, GAT): learn embeddings for the specific nodes in the training graph. If you add a new node tomorrow, you have to re-train — or at least run another forward pass with the full adjacency matrix.
Inductive GNNs (GraphSAGE): learn a function that maps a node’s local neighbourhood to an embedding. Apply this function to any neighbourhood — seen or unseen — to get an embedding.
This matters enormously in practice:
- Pinterest uses GraphSAGE to embed new pins (items) in real-time as users upload them.
- Social networks onboard new users continuously — their profiles must be embedded immediately.
The Algorithm
For each node v at each layer k:
1. SAMPLE: S_v = random sample of min(K, |N(v)|) neighbours
2. AGG: agg_v = AGGREGATE({ h_u^(k-1) : u ∈ S_v })
3. UPDATE: h_v^k = σ( W^k · concat(h_v^(k-1), agg_v) )
4. NORM: h_v^k = h_v^k / ||h_v^k||₂
The key novelty: concatenate the node’s own previous representation with the aggregated neighbourhood representation, then apply a shared learned W. This ensures the node retains its own identity while incorporating neighbour information.
Aggregator Choices
GraphSAGE offers three built-in aggregators:
| Aggregator | Formula | Properties |
|---|---|---|
| Mean | mean({h_u : u ∈ S}) | Fast, size-invariant, similar to GCN |
| Max-pooling | max(σ(W·h_u)) per dim | Captures extreme features |
| LSTM | LSTM on random order of S | Highest capacity, non-symmetric |
The LSTM aggregator technically violates permutation invariance (LSTMs care about input order) — GraphSAGE handles this by randomly permuting neighbour order each training step, which empirically works well.
Mini-Batch Training
Because GraphSAGE uses neighbourhood sampling, it supports mini-batch training on arbitrarily large graphs:
- Sample a batch of target nodes.
- Sample their K-hop neighbourhoods (expanding the computation graph).
- Compute embeddings bottom-up: 0-hop → 1-hop → … → target nodes.
- Update W via backprop.
This is how Pinterest’s PinSage scales to graphs with billions of nodes and edges.
✅ Key Takeaways
- GraphSAGE is inductive: learns an aggregation function, not per-node embeddings — generalises to new nodes.
- Neighbourhood sampling (K neighbours per node) enables mini-batch training on billion-scale graphs.
- Concatenates own representation + aggregated neighbourhood before the linear transform — preserving node identity.
- Used in production at Pinterest, LinkedIn, and other platforms for real-time item/user embedding.
