Topological Regularisation in Deep Learning
Published:
Motivation: When L2 Is Not Enough
Standard regularisers (L1, L2, dropout) control the complexity of functions in terms of smoothness or sparsity. They cannot enforce topological properties:
- A segmentation with L2-regularised weights can still produce disconnected regions when the true object is connected.
- A latent space with small weight norms can have arbitrary topological structure, ignoring the data manifold’s topology.
Topological regularisation adds complementary constraints that operate on the topology of outputs or representations.
Topology-Preserving Segmentation
Given a binary segmentation mask \(Y \in \{0,1\}^{m \times n}\) and a ground truth \(Y^* \in \{0,1\}^{m \times n}\):
- Compute cubical persistence of \(Y\) (filtered by predicted probability).
- Compute cubical persistence of \(Y^*\) (filtered by distance transform).
- Add a Betti matching loss:
where \(d_W\) is the Wasserstein distance between persistence diagrams. This penalises topological discrepancies: extra holes, missing connected components, etc.
Hu et al. (2019) demonstrated that this loss significantly reduces topological errors in neuron and vessel segmentation, even when the pixel-wise accuracy (cross-entropy loss) is similar.
Latent Space Topology
For autoencoders, the encoder \(e: X \to Z\) maps data to a latent space \(Z\). If we know the data lives near a manifold \(M\) with known topology (e.g., a circle for periodic data), we can regularise:
\[\mathcal{L}_{latent} = d_B(\mathrm{dgm}(\mathrm{Rips}(e(X))), \mathrm{dgm}_{\text{target}})\]This forces the latent space point cloud \(e(X)\) to have the same topological structure as the target manifold.
Decision Boundary Regularisation
For classifiers, the decision boundary \(\{x : f(x) = 0.5\}\) is a level set of the output function. Topological regularisation can:
- Encourage the boundary to be connected (one smooth surface, not many fragments).
- Penalise high-persistence \(H_0\) features in the boundary (no spurious isolated components).
Computational Cost
The main overhead is computing persistence diagrams during training. For:
- Cubical persistence (images): \(O(n \log n)\) per image, feasible in mini-batch training.
- Rips persistence (point clouds/graphs): \(O(n^3)\) worst case, often amortised using sparse filtrations.
References
- X. Hu, L. Li, D. Samaras, C. Chen, “Topology-Preserving Deep Image Segmentation,” NeurIPS 2019. arXiv:1906.05404.
- C. Hofer, F. Graf, B. Rieck, M. Niethammer, R. Kwitt, “Graph Filtration Learning,” ICML 2020.
- B. Rieck et al., “Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology,” ICLR 2019.
