SLAM: Simultaneous Localisation and Mapping
Published:

The SLAM Problem
SLAM (Simultaneous Localisation and Mapping) is one of the most fundamental problems in mobile robotics. The challenge is circular: accurate localisation requires a map of the environment, but building an accurate map requires knowing your location.
Formally, SLAM estimates the joint posterior:
where \(\mathbf{x}_{1:t}\) is the robot trajectory, \(\mathbf{m}\) is the map, \(\mathbf{z}_{1:t}\) are observations (laser scans, images), and \(\mathbf{u}_{1:t}\) are control inputs (odometry). The map can be represented as a set of landmarks \(\mathbf{m} = \{l_1, \dots, l_K\}\) in feature-based SLAM, or as an occupancy grid in metric SLAM.
EKF-SLAM
Extended Kalman Filter SLAM maintains a joint Gaussian over the robot pose and all landmark positions: state vector \(\mathbf{x} = [\mathbf{x}_r^T, l_1^T, \dots, l_K^T]^T \in \mathbb{R}^{3 + 2K}\) (for a 2D robot with 2D landmarks). The covariance matrix \(\Sigma \in \mathbb{R}^{(3+2K) \times (3+2K)}\) captures uncertainty and correlations.
The EKF proceeds in two steps per timestep:
- Predict: propagate state through motion model \(f(\mathbf{x}_r, \mathbf{u})\), inflate uncertainty via process noise \(Q\).
- Update: incorporate each new landmark observation \(\mathbf{z}_i = h(\mathbf{x}_r, l_i) + \mathbf{n}\) via Kalman gain.
EKF-SLAM scales quadratically with the number of landmarks โ the full covariance matrix grows as \(O(K^2)\), making it impractical for large environments (\(K > 1000\)).
FastSLAM: Particle Filters
FastSLAM (Montemerlo et al. 2002) exploits a key conditional independence: given the robot trajectory, each landmark can be estimated independently. Using the Rao-Blackwellised particle filter, each particle maintains its own trajectory estimate plus \(K\) independent 2D Kalman filters for landmarks:
This reduces complexity to \(O(KM)\) where \(M\) is the number of particles, enabling SLAM in larger environments.
Graph-Based SLAM
Modern SLAM systems use a pose graph formulation. Robot poses \(x_1, \dots, x_T\) are nodes; relative pose measurements (from odometry or visual matching) are edges with associated uncertainty. Loop closure โ recognising a previously visited place โ adds long-range edges that correct accumulated drift.
SLAM then becomes a maximum a posteriori (MAP) estimation problem solved by nonlinear least squares:
where \(\Omega_{ij}\) is the information matrix of measurement \((i,j)\). Libraries g2o and GTSAM solve this efficiently using sparse Cholesky factorisation, exploiting the graphโs sparsity. iSAM2 extends this to incremental updates.
ORB-SLAM3 (Campos et al. 2021) is the state-of-the-art feature-based SLAM system supporting monocular, stereo, and RGB-D cameras. It extracts ORB (Oriented FAST and Rotated BRIEF) features, tracks them across frames, builds a pose graph, and performs bundle adjustment. It achieves centimetre-level accuracy in real-time on consumer hardware.
Neural SLAM
Recent systems replace classical map representations with implicit neural representations. iMAP (Sucar et al. 2021) represents the scene as a neural radiance field (NeRF): a small MLP \(f_\theta(x, y, z) \to (\text{color}, \text{density})\). The pose and map are jointly optimised by minimising photometric and depth rendering losses.
NICE-SLAM (Zhu et al. 2022) extends iMAP with a hierarchical feature grid for faster optimisation in larger environments. Neural SLAM produces photorealistic reconstructions and generalises well to novel viewpoints, but remains slower than classical SLAM and harder to scale to large scenes.
References
- Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. MIT Press (Ch. 10: SLAM).
- Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE TRO. arXiv:1502.00956.
- Sucar, E., Liu, S., Ortiz, J., & Davison, A. J. (2021). iMAP: Implicit mapping and positioning in real-time. ICCV. arXiv:2103.12352.
