Witness Complexes: Sparse Topology at Scale
Published:
Motivation: Scalability of Vietoris-Rips
The Vietoris-Rips complex on \(n\) points has up to \(O(n^k)\) simplices of dimension \(k\). For \(n = 10^5\) and \(k = 2\), this is \(10^{10}\) triangles — computationally infeasible. We need a way to capture the same topology with fewer simplices.
| The key insight: most data points carry redundant topological information. If we pick a small representative set of landmarks \(L \subset P\) ($$ | L | \ll | P | \(), we can define a complex on\)L\(that captures the topology of\)P$$ by using the remaining points as witnesses. |
Lazy Witness Complex
Choose landmarks \(L = \{l_1, \ldots, l_m\} \subset P\). For a data point \(z \in P\), let \(m_k(z)\) be its \((k+1)\)-th nearest landmark distance (the distance to the \((k+1)\)-th closest landmark, counting from 0).
The lazy witness complex \(W(P, L, \nu)\) at parameter \(\nu \geq 0\) includes a simplex \(\sigma = [l_{i_0}, \ldots, l_{i_k}]\) (a set of landmarks) if there exists a witness \(z \in P\) such that:
As \(\nu\) increases from 0 to ∞, we get the witness filtration, whose persistent homology approximates the topology of \(P\).
Landmark Selection
The choice of landmarks affects the quality of the approximation. Common strategies:
- Random selection: simple, works for large \(n\).
- Maxmin sequential: iteratively select the point farthest from all currently selected landmarks — provides a \(\varepsilon\)-net guarantee.
- k-means centroids: landmarks represent cluster centres.
The number of landmarks \(m\) typically ranges from \(\sqrt{n}\) to \(n/10\). With \(m = 300\) landmarks, a complex with millions of points becomes tractable.
References
- V. de Silva & G. Carlsson, “Topological Estimation Using Witness Complexes,” SPBG 2004. link.
- G. Carlsson, “Topology and Data,” Bulletin of the AMS, 2009.
