Message Passing: The Universal GNN Framework

3 minute read

Published: February 04, 2024

TL;DR: Message Passing Neural Networks (Gilmer et al., 2017) provide a unified framework for all GNNs. Each layer runs three steps: MESSAGE (what each neighbour sends), AGGREGATE (collect all messages), UPDATE (compute new node representation). Choosing different functions for each step gives you different GNN architectures.

The Framework

The MPNN framework (Gilmer et al., 2017, NeurIPS) defines GNN computation through a series of message passing steps. At each step t:

m^t+1_v = AGGREGATE({ MSG(h^t_v, h^t_u, e_uv) : u ∈ N(v) })

h^t+1_v = UPDATE(h^t_v, m^t+1_v)

Where:

h^t_v — representation of node v at step t.
N(v) — neighbours of v.
e_{uv} — optional edge feature between u and v.
MSG — the message function.
AGGREGATE — combines all messages (must be permutation-invariant).
UPDATE — computes new representation from old + aggregated message.

Figure 1: Node B receives messages from its three neighbours A, C, D. The messages are aggregated (e.g., summed or averaged), then combined with B's own representation in an UPDATE function to produce a new h_B.

Step 1: Message Function

The message function computes what each neighbour sends. The simplest choice: just send the neighbour’s features.

MSG(h_v, h_u, e_uv) = h_u         # GCN: just pass neighbour features
MSG(h_v, h_u, e_uv) = W · h_u     # Linear transform first
MSG(h_v, h_u, e_uv) = α · W · h_u # GAT: scale by attention weight

Including edge features allows the model to distinguish bond types in a molecule or relationship types in a knowledge graph.

Step 2: Aggregate Function

The aggregation combines all messages. It must be permutation-invariant (the order of neighbours shouldn’t matter):

Aggregator	Formula	Properties
Sum	Σ m_u	Captures size of neighbourhood
Mean	(1/	N	) Σ m_u	Normalised, size-invariant
Max	max_u m_u	Captures the most extreme feature
Attention-weighted	Σ α_u m_u	Adaptive, like GAT

GIN (see separate post) proves that sum is the most powerful aggregator for distinguishing graph structures. Mean and max lose information.

Step 3: Update Function

Given the aggregated message and the old representation, compute the new one:

h_v^new = σ(W · concat(h_v, agg_message))  # GCN-style
h_v^new = GRU(h_v, agg_message)            # Recurrent update
h_v^new = MLP(concat(h_v, agg_message))    # GraphSAGE-style

A Running Example: Molecule Property Prediction

Consider predicting if a molecule is toxic:

Nodes = atoms (features: atom type, charge, is_aromatic)
Edges = bonds (features: bond type: single/double/triple)
After k MPNN layers, each atom knows about its k-hop neighbourhood.
A readout aggregates all atom embeddings into a graph embedding.
An MLP predicts toxicity from the graph embedding.

After 3 layers, an atom “knows” about the atoms 3 bonds away — capturing local chemical environments like functional groups.

✅ Key Takeaways

All GNNs are instances of MPNN: choose MSG, AGGREGATE, and UPDATE functions.
AGGREGATE must be permutation-invariant. Sum is the most expressive choice (GIN).
After k layers, each node's embedding captures its k-hop neighbourhood.
Graph-level predictions require a readout function that pools node embeddings into a single vector.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Message Passing: The Universal GNN Framework

The Framework

Step 1: Message Function

Step 2: Aggregate Function

Step 3: Update Function

A Running Example: Molecule Property Prediction

✅ Key Takeaways

Share on

You May Also Enjoy

GIN: Graph Isomorphism Network — The Most Expressive GNN

GraphSAGE: Inductive Learning on Large Graphs

GAT: Graph Attention Networks

GCN: Graph Convolutional Networks