Spatio-Temporal GNNs: Learning on Graphs Through Time
Published:
The Spatio-Temporal Setting
Given:
- Fixed graph G = (V, E) โ the spatial structure (road network, weather stations)
- Time series at each node: X_t โ โ^{N ร d} for t = 1, โฆ, T
- Goal: predict X_{T+1}, โฆ, X_{T+H} from X_{T-ฯ+1}, โฆ, X_T
The key insight: sensors at nearby nodes are correlated. A traffic jam upstream affects downstream sensors. A temperature reading in Paris is informative for predicting Frankfurt. The graph structure encodes which nodes influence each other.
Two Architectures
DCRNN (Diffusion Convolutional Recurrent Neural Network)
DCRNN replaces the linear transformation in a GRU with a diffusion convolution โ a GNN layer that captures directional information flow:
Standard GRU update:
DCRNN (replace linear with graph conv):
Specifically, DCRNN uses bidirectional random walk diffusion to capture both forward and backward traffic flow directions:
For traffic: forward diffusion follows traffic direction; backward diffusion captures reverse influence.
Encoder-decoder: DCRNN uses an encoder (GRU on past T steps) and a decoder (GRU for future H steps), with scheduled sampling to avoid exposure bias.
STGCN (Spatio-Temporal Graph Convolutional Network)
STGCN alternates spatial (graph convolution) and temporal (1D convolution) blocks:
Input: (N ร T ร d)
โ
Temporal conv (1D across time axis)
โ
Spatial conv (GCN across node axis)
โ
Temporal conv
โ
... repeat
โ
Output: (N ร H ร d)
Each temporal block uses a gated 1D convolution (GLU: gated linear unit) across the time dimension. Each spatial block uses ChebNet or standard GCN across the node dimension.
Advantage over DCRNN: all-convolutional โ no recurrence โ parallelisable across time steps โ much faster training.
Graph Construction for ST-GNNs
The spatial graph is typically constructed from domain knowledge:
Traffic: node = sensor station, edge = road segment (weighted by distance or travel time)
Weather: node = weather station, edge = geographic proximity (threshold by km distance)
Energy: node = power generator/consumer, edge = transmission line
Some methods learn the graph adaptively:
- MTGNN: learns the graph topology jointly with the ST-GNN
- GWaveNet: adaptive adjacency matrix learned from data
Benchmarks
- METR-LA: 207 traffic sensors in Los Angeles, 4 months, 5-minute intervals
- PEMS-BAY: 325 sensors in Bay Area, 6 months
- Solar-Energy: 137 solar plants, 6 months of production data
- Electricity: 321 electricity consumption time series
Standard task: 15/30/60-minute horizon prediction. Metrics: MAE, MAPE, RMSE.
Recent Advances
GWaveNet (Wu et al., 2019): adds an adaptive adjacency matrix (no predefined graph), trained jointly with the rest. This allows the model to capture non-geographic correlations (sensors far apart but behaviourally correlated).
AGCRN (Bai et al., 2020): fully adaptive โ learns node-specific patterns and graph structure simultaneously.
GMAN (Zheng et al., 2020): attention-based approach. Replaces GCN with spatial attention and uses temporal attention across time steps.
Summary
| Model | Spatial | Temporal | Parallel? |
|---|---|---|---|
| DCRNN | Diffusion GCN | GRU encoder-decoder | No (recurrent) |
| STGCN | ChebNet/GCN | Gated 1D conv | Yes |
| GWaveNet | Adaptive adjacency | Dilated causal conv | Yes |
| GMAN | Spatial attention | Temporal attention | Yes |
Spatio-temporal GNNs are the dominant framework for sensor network prediction โ wherever measurements at graph nodes evolve over time and spatial correlations matter. The field is rapidly incorporating Transformer-style attention to replace both spatial and temporal convolutions.
References
- Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018). Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. ICLR 2018 (DCRNN: bidirectional diffusion GCN with GRU encoder-decoder for traffic prediction).
- Yu, B., Yin, H., & Zhu, Z. (2018). Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. IJCAI 2018 (STGCN: gated 1D temporal convolution + Chebyshev spatial convolution, fully parallelisable).
- Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., & Zhang, C. (2020). Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. KDD 2020 (GWaveNet: adaptive adjacency matrix + dilated causal convolution for long-range temporal patterns).
