Topological Autoencoders: Preserving Shape in Latent Space
Published:
Standard Autoencoders and Topological Distortion
A standard autoencoder minimises \(\|x - \hat{x}\|^2\) (reconstruction loss). This encourages point-wise fidelity but makes no guarantees about the global structure of the latent space.
Problem: Methods like t-SNE and UMAP empirically preserve local structure (nearby points stay nearby) but can distort global topology — a single connected manifold can be fragmented into disconnected clusters, or a loop can be contracted to a point.
The TopoAE Objective
Topological Autoencoder (Moor et al. 2020) defines:
where:
- \(\mathrm{dgm}(X)\) = persistence diagram of the Rips filtration on input points \(X\).
- \(\mathrm{dgm}(Z)\) = persistence diagram of the Rips filtration on latent codes \(Z = e_\theta(X)\).
- \(d_T\) is a differentiable approximation to the bottleneck/Wasserstein distance.
Computing the Topological Loss
The Wasserstein distance between diagrams can be computed via linear assignment (the Hungarian algorithm). For batches, the key insight is:
- Each diagram point \((b_k, d_k)\) in \(\mathrm{dgm}(X)\) corresponds to a specific edge or simplex in the filtration.
- Each diagram point in \(\mathrm{dgm}(Z)\) similarly corresponds to latent-space distances.
- Matching diagram points gives a correspondence between input-space features and latent-space features.
- The loss measures how much the birth/death times shift.
Gradients flow back through: \(\mathrm{loss} \to d_k^{(Z)} \to d(z_i, z_j) \to z_i = e_\theta(x_i) \to \theta\)
What Topology Is Preserved
For \(H_0\) (connected components): the encoder is forced to map disconnected components in \(X\) to disconnected regions in \(Z\), and connected data to connected latent space.
For \(H_1\) (loops): if the input data lies near a circle, the latent space is forced to also have a circular topology, not a line.
Comparison with Other Methods
| Method | Preserves Local Structure | Preserves Global Topology |
|---|---|---|
| PCA | Partially | No |
| t-SNE | Yes | No |
| UMAP | Yes | Partially |
| VAE | No guarantee | No |
| TopoAE | Yes (via reconstruction) | Yes (via topology loss) |
References
- M. Moor, M. Horn, B. Rieck, K. Borgwardt, “Topological Autoencoders,” ICML 2020. arXiv:1906.00722.
- L. McInnes, J. Healy, J. Melville, “UMAP: Uniform Manifold Approximation and Projection,” arXiv:1802.03426.
