TDA in Drug Discovery: Molecular Topology
Published:
Molecular Topology
A molecule can be represented as a 3D point cloud: \(P = \{(x_i, \text{element}_i)\}\) where \(x_i \in \mathbb{R}^3\) is the 3D position of atom \(i\) and \(\text{element}_i \in \{\text{C}, \text{N}, \text{O}, \text{S}, \ldots\}\) is the element type.
Traditional fingerprints (Morgan/ECFP, MACCS keys): encode local graph structure (subgraph patterns up to radius \(r\)). These miss:
- 3D spatial arrangement.
- Multi-scale geometric features.
- Cavities and voids.
Topological fingerprints from persistent homology capture all of the above.
Element-Specific Filtrations
Cang & Wei (2017) introduced element-specific TDA: instead of one Rips filtration on all atoms, compute separate filtrations for each element type and pair:
- \(\mathrm{Rips}(P_C)\) — carbon-only complex; \(H_1\) captures aromatic rings and ring systems.
- \(\mathrm{Rips}(P_N)\) — nitrogen atoms; encodes nitrogen-containing rings (pyridine, imidazole).
- \(\mathrm{Rips}(P_{C,O})\) — carbon-oxygen pairs; captures carbonyl and ether geometry.
Each element-specific diagram is vectorised (persistence images) and concatenated into a multi-channel topological fingerprint.
Performance: On benchmark ADMET datasets (solubility, toxicity, metabolic stability), element-specific TDA features achieve state-of-the-art performance among non-deep-learning methods and are competitive with GNNs.
Protein Binding Site Detection
H₂ persistence of the protein surface point cloud captures cavities (enclosed voids) that correspond to binding pockets:
- A large, long-lived \(H_2\) bar = a deep, geometrically robust cavity.
- Birth scale \(b\) ≈ pocket entrance width; death scale \(d\) ≈ pocket depth.
This gives a scale-parameterised pocket detection without requiring a threshold on solvent-accessible surface area.
Protein-Ligand Interaction
For a protein-ligand complex:
- Build element-specific Rips filtrations on the complex and on the apo protein.
- Compute the difference in persistence diagrams (before and after ligand binding).
- The “topological fingerprint of binding” = the difference diagram.
This captures how the ligand changes the topological environment of the binding pocket — complementary to docking score energy terms.
References
- Z. Cang, G.-W. Wei, “TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions,” PLOS Computational Biology, 2017.
- K. Xia, G.-W. Wei, “Persistent Homology Analysis of Protein Structure, Flexibility, and Folding,” International Journal for Numerical Methods in Biomedical Engineering, 2014.
- C. Nguyen, Z. Cang, K. Wu, M. Chen, Y. Nie, G.-W. Wei, “Mathematical Deep Learning for D and F Block Organometallic and Inorganic Chemistry,” J. Chem. Inf. Model., 2018.
