SE(3)-Transformers: Attention with 3D Symmetry
Published:
The Gap SE(3)-Transformers Fill
EGNN achieves E(n) equivariance with simple relative-position updates, but only captures first-order geometric information (vectors, l=1). For tasks requiring higher-order geometric features — polarisability tensors (l=2), octupoles (l=3) — or where orientation-specific attention is needed, more sophisticated geometric representations are required.
SE(3)-Transformers use spherical harmonics as a basis for geometric features, allowing the model to capture arbitrary-order rotational information while maintaining exact SE(3) equivariance.
Spherical Harmonics as Geometric Features
Spherical harmonics Y_l^m(r̂) are functions defined on the unit sphere, indexed by degree l ≥ 0 and order m ∈ [-l, l]:
- l=0 (1 function): constant — encodes scalar information
- l=1 (3 functions): like x, y, z components — encodes vector information
- l=2 (5 functions): like quadrupole components — encodes tensor information
- l=k (2k+1 functions): irreducible representations of SO(3) of degree k
A feature at degree l (an “l-feature”) is a (2l+1)-dimensional vector that transforms under rotation R via the Wigner-D matrix D^l(R):
This is the mathematical definition of equivariance at degree l.
The SE(3)-Transformer Layer
Invariant Attention Weights
For node pair (i, j), compute attention weights from invariant quantities:
Where q_i (query) and k_{ij} (key) are invariant scalars derived from node features and distances. Invariant attention weights ensure that rotating the molecule doesn’t change which nodes attend to each other.
Equivariant Values
| Value features for the pair (i,j) are constructed using spherical harmonics of the direction vector r̂_{ij} = (r_i - r_j)/ | r_i - r_j | : |
Where ⊗ denotes the tensor product of irreducible representations (via Clebsch-Gordan coefficients), combining node j’s l-features with the geometric information from the direction to j.
Equivariant Attention Output
The weighted sum of equivariant values is equivariant. Attention weights α_{ij} are invariant scalars — they scale but don’t rotate the equivariant values.
Tensor Products and Clebsch-Gordan Coefficients
| Combining two irreps of degrees l_1 and l_2 via tensor product produces irreps of degrees | l_1 - l_2 | , …, l_1 + l_2 (triangle rule). The Clebsch-Gordan coefficients C^{l_1 l_2 l}_{m_1 m_2 m} mediate this combination: |
This is the mathematically correct way to combine geometric features of different degrees — analogous to how matrix multiplication combines vectors in ordinary linear algebra.
Cost of Higher-Order Features
The tensor product computation scales as O(L^3) in the maximum degree L:
- L=1 (vectors only): fast, same as EGNN
- L=2 (quadrupole level): 8× more expensive than L=1
- L=3: 27× more expensive
For molecular tasks, L=2 or L=3 typically provides diminishing returns beyond L=1, while L=1 (EGNN-like) is often sufficient for energies and forces.
SE(3)-Transformers vs EGNN
| Property | EGNN | SE(3)-Transformer |
|---|---|---|
| Max feature degree | l=1 (vectors) | Arbitrary l |
| Attention mechanism | None (simple AGG) | Multi-head attention |
| Computational cost | O(E d) | O(E d L³) |
| Expressive power | Lower | Higher (higher-order tensors) |
| Implementation | Simple | Complex (CG coefficients) |
Applications
- Protein structure prediction and docking (require direction-sensitive features)
- Crystal property prediction (crystal symmetries require higher-order features)
- Force field learning for quantum chemistry
- Molecular conformation generation
Summary
SE(3)-Transformers provide the full power of equivariant geometric deep learning: attention mechanisms that respect 3D symmetry, and geometric features that encode information at arbitrary tensor degree. The price is computational complexity scaling as O(L³) in feature degree. For tasks where L=1 suffices, EGNN is preferred; for tasks requiring higher-order geometric information, SE(3)-Transformers or their successors (NequIP, MACE) are the right choice.
References
- Fuchs, F. B., Worrall, D. E., Fischer, V., & Welling, M. (2020). SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks. NeurIPS 2020 (SE(3)-Transformers: invariant attention weights with equivariant geometric value aggregation via spherical harmonics).
- Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kohlhoff, K., & Riley, P. (2018). Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds. arXiv 2018 (TFN: the foundational SE(3)-equivariant MPNN using Clebsch-Gordan tensor products).
- Liao, Y.-L., & Smidt, T. (2022). Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs. ICLR 2023 (Equiformer: successor integrating SE(3)-equivariant attention into a Transformer architecture).
