Spatio-Temporal GNNs: Learning on Graphs Through Time

4 minute read

Published:

TL;DR: In spatio-temporal GNNs, the graph structure is fixed (road network, sensor grid) but node features evolve over time as time series. The model combines a GNN (spatial: neighbours influence each other) with a sequence model (temporal: past influences future). Two architectures โ€” DCRNN (GNN inside RNN) and STGCN (GNN + 1D conv) โ€” dominate traffic forecasting benchmarks.

The Spatio-Temporal Setting

Given:

  • Fixed graph G = (V, E) โ€” the spatial structure (road network, weather stations)
  • Time series at each node: X_t โˆˆ โ„^{N ร— d} for t = 1, โ€ฆ, T
  • Goal: predict X_{T+1}, โ€ฆ, X_{T+H} from X_{T-ฯ„+1}, โ€ฆ, X_T

The key insight: sensors at nearby nodes are correlated. A traffic jam upstream affects downstream sensors. A temperature reading in Paris is informative for predicting Frankfurt. The graph structure encodes which nodes influence each other.

Two Architectures

DCRNN (Diffusion Convolutional Recurrent Neural Network)

DCRNN replaces the linear transformation in a GRU with a diffusion convolution โ€” a GNN layer that captures directional information flow:

Standard GRU update:

h_t = GRU( x_t, h_{t-1} )

DCRNN (replace linear with graph conv):

h_t = GRU( GCN(x_t, A), GCN(h_{t-1}, A) )

Specifically, DCRNN uses bidirectional random walk diffusion to capture both forward and backward traffic flow directions:

GCN(X) = ฮฃ_{k=0}^{K} ( (D_O^{-1} A)^k W_k^{fwd} + (D_I^{-1} A^T)^k W_k^{bwd} ) X

For traffic: forward diffusion follows traffic direction; backward diffusion captures reverse influence.

Encoder-decoder: DCRNN uses an encoder (GRU on past T steps) and a decoder (GRU for future H steps), with scheduled sampling to avoid exposure bias.

STGCN (Spatio-Temporal Graph Convolutional Network)

STGCN alternates spatial (graph convolution) and temporal (1D convolution) blocks:

Input: (N ร— T ร— d)
       โ†“
Temporal conv (1D across time axis)
       โ†“
Spatial conv (GCN across node axis)
       โ†“
Temporal conv
       โ†“
... repeat
       โ†“
Output: (N ร— H ร— d)

Each temporal block uses a gated 1D convolution (GLU: gated linear unit) across the time dimension. Each spatial block uses ChebNet or standard GCN across the node dimension.

Advantage over DCRNN: all-convolutional โ€” no recurrence โ†’ parallelisable across time steps โ†’ much faster training.

DCRNN vs STGCN: DCRNN captures long-range temporal dependencies via GRU hidden states but is sequential (slow training). STGCN is faster (parallel convolutions) but has limited temporal receptive field (fixed kernel size ร— number of layers). On standard traffic benchmarks (METR-LA, PEMS-BAY), both achieve similar accuracy; STGCN is preferred when training speed matters.

Graph Construction for ST-GNNs

The spatial graph is typically constructed from domain knowledge:

Traffic: node = sensor station, edge = road segment (weighted by distance or travel time)

Weather: node = weather station, edge = geographic proximity (threshold by km distance)

Energy: node = power generator/consumer, edge = transmission line

Some methods learn the graph adaptively:

  • MTGNN: learns the graph topology jointly with the ST-GNN
  • GWaveNet: adaptive adjacency matrix learned from data

Benchmarks

  • METR-LA: 207 traffic sensors in Los Angeles, 4 months, 5-minute intervals
  • PEMS-BAY: 325 sensors in Bay Area, 6 months
  • Solar-Energy: 137 solar plants, 6 months of production data
  • Electricity: 321 electricity consumption time series

Standard task: 15/30/60-minute horizon prediction. Metrics: MAE, MAPE, RMSE.

Recent Advances

GWaveNet (Wu et al., 2019): adds an adaptive adjacency matrix (no predefined graph), trained jointly with the rest. This allows the model to capture non-geographic correlations (sensors far apart but behaviourally correlated).

AGCRN (Bai et al., 2020): fully adaptive โ€” learns node-specific patterns and graph structure simultaneously.

GMAN (Zheng et al., 2020): attention-based approach. Replaces GCN with spatial attention and uses temporal attention across time steps.

Summary

ModelSpatialTemporalParallel?
DCRNNDiffusion GCNGRU encoder-decoderNo (recurrent)
STGCNChebNet/GCNGated 1D convYes
GWaveNetAdaptive adjacencyDilated causal convYes
GMANSpatial attentionTemporal attentionYes

Spatio-temporal GNNs are the dominant framework for sensor network prediction โ€” wherever measurements at graph nodes evolve over time and spatial correlations matter. The field is rapidly incorporating Transformer-style attention to replace both spatial and temporal convolutions.

References