TDA in Drug Discovery: Molecular Topology

3 minute read

Published:

TL;DR: Molecular structures — atom position clouds with element types — are natural inputs for TDA. By building element-specific Rips filtrations (e.g., on carbon atoms, on nitrogen atoms, on C–N atom pairs), one computes persistence diagrams that encode ring systems, binding pockets, and molecular cavities. These topological molecular descriptors predict ADMET properties and binding affinities better than many classical fingerprints, especially when combined with GNNs.

Molecular Topology

A molecule can be represented as a 3D point cloud: \(P = \{(x_i, \text{element}_i)\}\) where \(x_i \in \mathbb{R}^3\) is the 3D position of atom \(i\) and \(\text{element}_i \in \{\text{C}, \text{N}, \text{O}, \text{S}, \ldots\}\) is the element type.

Traditional fingerprints (Morgan/ECFP, MACCS keys): encode local graph structure (subgraph patterns up to radius \(r\)). These miss:

  • 3D spatial arrangement.
  • Multi-scale geometric features.
  • Cavities and voids.

Topological fingerprints from persistent homology capture all of the above.

Element-Specific Filtrations

Cang & Wei (2017) introduced element-specific TDA: instead of one Rips filtration on all atoms, compute separate filtrations for each element type and pair:

  • \(\mathrm{Rips}(P_C)\) — carbon-only complex; \(H_1\) captures aromatic rings and ring systems.
  • \(\mathrm{Rips}(P_N)\) — nitrogen atoms; encodes nitrogen-containing rings (pyridine, imidazole).
  • \(\mathrm{Rips}(P_{C,O})\) — carbon-oxygen pairs; captures carbonyl and ether geometry.

Each element-specific diagram is vectorised (persistence images) and concatenated into a multi-channel topological fingerprint.

Performance: On benchmark ADMET datasets (solubility, toxicity, metabolic stability), element-specific TDA features achieve state-of-the-art performance among non-deep-learning methods and are competitive with GNNs.

Protein Binding Site Detection

H₂ persistence of the protein surface point cloud captures cavities (enclosed voids) that correspond to binding pockets:

  • A large, long-lived \(H_2\) bar = a deep, geometrically robust cavity.
  • Birth scale \(b\) ≈ pocket entrance width; death scale \(d\) ≈ pocket depth.

This gives a scale-parameterised pocket detection without requiring a threshold on solvent-accessible surface area.

Binding pocket score: $$\mathrm{PS} = \max_{(b,d) \in H_2(\text{surface})} (d - b) \cdot d$$

Protein-Ligand Interaction

For a protein-ligand complex:

  1. Build element-specific Rips filtrations on the complex and on the apo protein.
  2. Compute the difference in persistence diagrams (before and after ligand binding).
  3. The “topological fingerprint of binding” = the difference diagram.

This captures how the ligand changes the topological environment of the binding pocket — complementary to docking score energy terms.

Key Insight: The power of TDA in drug discovery comes from capturing ring systems — both aromatic rings (critical for drug-likeness) and macrocyclic loops (increasingly important drug scaffolds). Classical 2D fingerprints count ring patterns by subgraph matching; TDA captures them geometrically via H₁ persistence. The geometric information (ring size, 3D arrangement, conformation) is automatically encoded in the birth-death times, giving TDA a natural advantage for 3D-QSAR applications.

References

  • Z. Cang, G.-W. Wei, “TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions,” PLOS Computational Biology, 2017.
  • K. Xia, G.-W. Wei, “Persistent Homology Analysis of Protein Structure, Flexibility, and Folding,” International Journal for Numerical Methods in Biomedical Engineering, 2014.
  • C. Nguyen, Z. Cang, K. Wu, M. Chen, Y. Nie, G.-W. Wei, “Mathematical Deep Learning for D and F Block Organometallic and Inorganic Chemistry,” J. Chem. Inf. Model., 2018.