Graph Neural Networks: Learning on Graphs
Published:
Graphs Are Everywhere
A graph G = (V, E) consists of:
- Nodes (V): entities — atoms, people, papers, intersections.
- Edges (E): relationships — bonds, friendships, citations, roads.
- Features on nodes and/or edges: atom type, age, year, speed limit.
Real-world data that’s naturally a graph:
- Molecules: atoms = nodes, bonds = edges. Predicting drug toxicity or binding affinity.
- Social networks: users = nodes, follows/friends = edges. Recommendation, fraud detection.
- Knowledge graphs: entities = nodes, relations = edges. Question answering, link prediction.
- Citation networks: papers = nodes, citations = edges. Classifying papers by topic.
- Road networks: intersections = nodes, roads = edges. Route planning, traffic prediction.
Why Not Just Use Standard Neural Networks?
A standard MLP takes a fixed-size vector as input. Graphs have:
- Variable size — different graphs have different numbers of nodes and edges.
- No canonical ordering — there’s no “first” node; permuting nodes shouldn’t change predictions.
- Relational structure — the patterns live in the connections, not just the individual features.
GNNs are designed to respect all three of these properties.
The Core Idea: Aggregate from Neighbours
Every GNN follows the same fundamental principle, called message passing:
Each node’s new representation = function(its current representation, representations of its neighbours)
After k iterations, node v’s embedding captures information from all nodes up to k hops away (its k-hop neighbourhood).
This is beautiful because:
- Nearby nodes influence each other (just like in the real world).
- The same aggregation function works on graphs of any size.
- The function is learned from data, so it adapts to the task.
Three Task Levels
GNNs can produce predictions at three granularities:
| Level | What you predict | Example |
|---|---|---|
| Node | Label for each node | Is this user a bot? |
| Edge | Label or score for each edge | Will A befriend B? |
| Graph | Label for the whole graph | Is this molecule toxic? |
For node tasks: use the node embeddings directly. For graph tasks: readout (pooling) the node embeddings into a single graph vector.
The Landscape of GNN Architectures
| Model | Year | Key idea |
|---|---|---|
| GCN | 2016 | Spectral convolution → normalised averaging |
| GAT | 2018 | Attention weights on edges |
| GraphSAGE | 2017 | Inductive learning via neighbourhood sampling |
| GIN | 2019 | Most expressive aggregator (sum + MLP) |
| Sheaf NN | 2022+ | Section-space diffusion, generalises GCN |
✅ Key Takeaways
- Graphs model relational data: atoms, users, papers, intersections — any entities with relationships.
- GNNs learn by iterative neighbourhood aggregation: after k layers, each node knows about its k-hop neighbourhood.
- The same model works on graphs of any size and any node ordering — it's permutation invariant/equivariant.
- Supports node-, edge-, and graph-level predictions via readout pooling.
