GNNs for Traffic Forecasting

4 minute read

Published:

TL;DR: A city's sensor network is a fixed graph (sensors = nodes, road connections = edges). At each timestamp, sensors report speed/volume. The task: given the last T timesteps, predict the next H timesteps. GNNs capture "traffic jam propagates downstream" (spatial); RNNs/convolutions capture "rush hour occurs every morning" (temporal). The best models combine both.

The Traffic Forecasting Task

Input: X โˆˆ โ„^{N ร— T ร— d} โ€” N sensor readings over T past timesteps, each with d features (speed, volume, occupancy)

Output: Xฬ‚ โˆˆ โ„^{N ร— H ร— d} โ€” predictions for H future timesteps

Graph: G = (V, E, W) where V = sensors, E = road segments connecting sensors, W = edge weights (distance, travel time, or correlation)

Standard benchmarks:

  • METR-LA: 207 sensors on LA freeways, 4 months, 5-min intervals
  • PEMS-BAY: 325 sensors in Bay Area, 6 months

Typical forecasting horizons: 15 min (3 steps), 30 min (6 steps), 60 min (12 steps).

Why Graphs Improve over ARIMA and LSTM

ARIMA / LSTM (per-sensor): each sensor is modelled independently. Cannot capture spatial correlations โ€” โ€œupstream congestion causes downstream slowdownโ€ is invisible.

CNN on grid: grids work for regular spatial layouts (weather stations on a regular grid). Traffic networks are irregular โ€” sensors follow road geometry, not a grid.

GNN + temporal model: captures both spatial (road network structure) and temporal (recurrent patterns) dependencies.

DCRNN (Diffusion Convolutional Recurrent Neural Network)

DCRNN (Li et al., 2018) uses bidirectional random walk diffusion as the spatial module inside a sequence-to-sequence GRU:

Diffusion convolution (captures directional traffic flow):

H = ฮฃ_{k=0}^{K} ( (D_O^{-1} A)^k X W_k^{fwd} + (D_I^{-1} A^T)^k X W_k^{bwd} )

Forward diffusion follows traffic direction (upstream โ†’ downstream). Backward diffusion captures reverse influence (road closure downstream affects upstream traffic).

Encoder-decoder: DCRNN encodes T past steps with a diffusion-GRU encoder, decodes H future steps with a decoder using scheduled sampling (avoids exposure bias).

Result on METR-LA: MAE 2.77 for 60-min horizon, vs 3.99 for LSTM (without graph) โ€” 31% improvement.

Why diffusion (not standard GCN)? Traffic is a directed flow โ€” a jam at sensor A propagates to sensors A' downstream, not to sensors A'' upstream. Standard GCN uses a symmetric adjacency (undirected). Diffusion convolution with directed adjacency D^{-1}_O A captures the directional flow correctly. This is a domain-specific structural choice that significantly improves accuracy.

STGCN (Spatio-Temporal Graph Convolutional Network)

STGCN (Yu et al., 2018) replaces recurrence with 1D temporal convolutions for speed:

Block: [Temporal gated conv] โ†’ [Spatial ChebNet] โ†’ [Temporal gated conv]

Temporal gated convolution (GLU):

Y = X * ฮ˜_1 โŠ™ ฯƒ(X * ฮ˜_2) (element-wise gating)

No recurrence โ†’ fully parallelisable over time โ†’ 10ร— faster training than DCRNN.

Result: similar accuracy to DCRNN on METR-LA, much faster training.

Graph Wave Net (Wu et al., 2019)

Adds an adaptive adjacency matrix that is learned from data, not just from road geometry:

ร‚ = softmax( ReLU( E_1 E_2^T ) )

Where E_1, E_2 โˆˆ โ„^{N ร— d} are learnable node embeddings. The adaptive adjacency captures non-geographic correlations (sensors far apart but behaviourally correlated โ€” e.g., parallel highways).

Also uses dilated causal convolutions (like WaveNet) for temporal modelling โ€” wider receptive field than standard 1D conv without more parameters.

Industrial Deployment

Google Maps: uses graph-based models for ETA (estimated time of arrival) prediction. The road network is a graph; historical traffic patterns are the training signal. GNNs helped reduce ETA prediction error by 50%+ in some regions.

DiDi / Uber: ride-hailing platforms use traffic forecasting to optimise driver positioning and surge pricing. GNNs process city-wide sensor networks in real-time.

Summary

ModelSpatialTemporalSpeed
ARIMANoneStatisticalFast
LSTMNoneRecurrentMedium
DCRNNDiffusion GCNEncoder-decoder GRUSlow (recurrent)
STGCNChebNetGated 1D convFast (parallel)
Graph Wave NetAdaptive adjacencyDilated causal convFast

Traffic forecasting is the canonical spatio-temporal GNN application โ€” clean problem definition, public benchmarks, and real-world deployment at scale. Progress here has directly translated into improved navigation systems, logistics optimisation, and urban planning tools.

References