TDA for Point Clouds: Shape Analysis at Every Scale
Published:
The Sampling Setup
Goal: Infer topological properties of an unknown shape \(X \subseteq \mathbb{R}^d\) from a finite sample \(P = \{p_1, \ldots, p_n\}\) drawn near \(X\).
Approach: Build a Rips filtration \(\mathrm{Rips}(P, r)_{r \geq 0}\) and compute its persistence diagram.
Key guarantee (Chazal et al.): If \(P\) is an \(\varepsilon\)-sample of \(X\) (every point of \(X\) is within \(\varepsilon\) of some \(p \in P\)), then:
So the diagram from the point cloud converges to the “true” diagram of the shape as sampling density increases.
Reading the Diagram
\(H_0\) diagram: Bars \((0, d_i)\) for each connected component. The single infinite bar corresponds to the global component. Finite bars record when components merge; short bars indicate noisy clustering, long bars indicate clearly separated clusters.
\(H_1\) diagram: Bars \((b_i, d_i)\) for loops/tunnels. A long bar \((b, d)\) with \(d \gg b\) indicates a robust hole at scale \([b, d]\). For a sample from a circle \(S^1\): one long \(H_1\) bar, plus many short bars from sampling noise.
\(H_2\) diagram: Bars for enclosed voids/cavities. For a sample from \(S^2\): one long \(H_2\) bar. For a torus \(T^2\): two \(H_1\) bars and one \(H_2\) bar.
Shape Comparison
Persistence diagrams are shape descriptors: two point clouds sampled from the same shape have similar diagrams (stability), and different shapes produce different diagrams.
Applications:
- Protein structure: compare molecular shapes by their \(H_0, H_1, H_2\) diagrams; similar shapes have similar diagrams regardless of orientation or small conformational changes.
- Medical imaging: compare bone shapes between healthy and diseased patients using \(H_2\) persistence of surface point clouds.
- 3D object retrieval: use persistence images of shape point clouds as fixed-size feature vectors for nearest-neighbour search.
Practical Workflow
- Subsample if \(n > 10000\) (e.g., farthest point sampling to \(1000\) points).
- Build Rips filtration up to scale \(r_{\max}\) (estimate from \(k\)-NN distances).
- Compute persistence with Ripser (seconds for 1000 points).
- Vectorise diagram (persistence image, PersLay) for downstream ML.
- Compare using bottleneck distance or kernel methods.
References
- F. Chazal, B. Michel, “An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists,” arXiv:1710.04019, 2017.
- H. Edelsbrunner, J. Harer, Computational Topology, AMS, 2010.
- U. Bauer, “Ripser: Efficient Computation of Vietoris-Rips Persistence Barcodes,” J. Applied and Computational Topology, 2021.
