Sheaves and Heterophily: A Complete Theoretical Account
Published:
The Homophily Assumption in GCN
| Homophily: the tendency of nodes to connect to nodes of the same class. The homophily ratio h(G) = | {(u,v)∈E : y_u=y_v} | / | E | quantifies this. |
GCN’s loss: minimising node classification cross-entropy implicitly minimises a combination of:
- Task loss: predicting the right labels
Smoothness penalty: the Dirichlet energy E(H) = Σ_{(u,v)} h_u − h_v ²
The smoothness penalty penalises any feature difference across edges — whether the edge connects same-class or different-class nodes. On homophilic graphs (most edges connect same-class nodes), this is a useful regulariser. On heterophilic graphs, it is actively harmful: the model is penalised for correctly representing that different-class neighbours should have different features.
Formal statement: For a GCN with L layers and normalised aggregation, the learned representations H^{(L)} minimise:
where λ is determined by the learning rate and layer depth. The Dirichlet energy term forces all-neighbour feature similarity.
The Heterophily Challenge: What the Energy Imposes
On a bipartite heterophilic graph (alternating class labels, every edge crosses class boundaries):
The only zero-energy signal is the constant — but different classes should have different representations. So GCN is forced to trade off task accuracy against smoothness, and cannot achieve both.
What a good model should do on a heterophilic graph: Different-class pairs should have maximally different features; same-class pairs should have similar features. This is the opposite of what the standard Dirichlet energy encodes.
How Sheaf Maps Encode Heterophily
For a heterophilic edge (u, v) with different class labels y_u ≠ y_v, the sheaf restriction maps can encode the “correct” relational structure:
Example: scalar sheaf with negative map
Choose F_{u▷e} = +1 and F_{v▷e} = −1 (or vice versa). Then the consistency condition is:
Adjacent nodes of opposite class should have opposite feature values. The global section is the antipodal assignment — not a constant. The Sheaf Dirichlet energy is zero when x_u = −x_v, even though x_u ≠ x_v.
Example: orthogonal map with 180° rotation
Choose F_{u▷e} = I and F_{v▷e} = −I (negation). Global sections satisfy x_u = −x_v — same as above, but in d dimensions.
More generally: choose F_{v▷e} = R for any rotation R. Global sections then satisfy x_u = Rᵀ x_v — adjacent nodes’ features are related by a rotation, not equal. This represents heterophilic graphs where the relational structure is a structured rotation between class representations.
The General Heterophily Theorem
Theorem (Bodnar et al., 2022, Theorem 1, informal): For any desired node assignment x* where x_u ≠ x_v for heterophilic edges (u,v), there exist restriction maps F such that x* is a global section of F (i.e., x* ∈ ker(Δ_F)).
Proof: For each heterophilic edge e = (u,v), choose F_{u▷e} and F_{v▷e} such that F_{u▷e}x_u = F_{v▷e}xv. This is always satisfiable (e.g., by setting F{u▷e} = I and F_{v▷e} = (x_v)⁻¹ x_u when these are invertible). □
Interpretation: Any desired feature assignment — including one that maximally separates different-class nodes — can be made a global section of some sheaf. NSD learns these maps from data, finding the restriction maps that make the task-optimal features lie in ker(Δ_F).
Why Signed Attention Is Not Enough
FAGCN (Bo et al., 2021) uses signed attention a_{uv} ∈ [−1, +1], which allows negative edge weights. This is the scalar sheaf case: F_{u▷e} = 1, F_{v▷e} = a_{uv}.
The scalar sheaf can represent edge-wise sign information (same or opposite direction), but cannot represent rotational relationships between features. For d-dimensional features, all d channels must agree or disagree uniformly — there is no per-channel or rotational heterophily representation.
Diagonal sheaf maps (as in NSD-diag) allow d independent signs per edge — different feature channels can have different relationship types simultaneously.
General/orthogonal sheaf maps allow arbitrary rotational relationships — representing the full d-dimensional relational geometry.
Necessary and Sufficient Conditions for Sheaf Heterophily
Necessary condition: For sheaf diffusion to correctly classify heterophilic graphs, the global sections of the learned sheaf must include the task-optimal node features (or a transformation of them that is linearly separable).
Sufficient condition: If the MLP sheaf predictor can represent the restriction maps that make the task-optimal features consistent (the theorem above guarantees such maps exist), and if the learning algorithm finds them, then sheaf diffusion converges to representations that correctly classify heterophilic nodes.
In practice: NSD with diagonal maps has been empirically shown to find approximately correct maps on most heterophilic benchmarks — the maps encode the class-conditional relational geometry automatically.
Comparison with Other Heterophily Methods
| Method | Key mechanism | Limitation |
|---|---|---|
| H2GCN (Zhu et al., 2020) | Ego + neighbourhood features | Feature engineering, no principled framework |
| GPRGNN (Chien et al., 2021) | Learnable polynomial in L | Can use high-frequency components but not richer relational structure |
| FAGCN (Bo et al., 2021) | Signed edge attention (scalar sheaf) | Scalar only — all channels treated uniformly |
| NSD | Learned d×d restriction maps | Full relational geometry, requires choosing d |
| PNSD | Learned maps + polynomial filter | Maps + spectral flexibility |
The sheaf framework provides the most principled treatment of heterophily — it explains why heterophily is hard for GCN (wrong null space) and how to fix it (change the null space via restriction maps).
Empirical Validation
On Cornell (h = 0.11, highly heterophilic): 85% accuracy for NSD vs 57% for GCN. The gap is entirely explained by the heterophily account above — GCN’s Dirichlet energy penalises the correct class-conditional feature assignment, while NSD’s Sheaf Dirichlet energy (with learned maps) rewards it.
References
- Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion. NeurIPS 2022 (the main theoretical source for the heterophily account of sheaf GNNs).
- Zhu, M., Wang, X., Shi, C., Ji, H., & Cui, P. (2020). Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. NeurIPS 2020 (H2GCN: the benchmark paper defining heterophily and establishing the empirical baseline that NSD surpasses).
- Bo, D., Wang, X., Shi, C., & Shen, H. (2021). Beyond Low-Frequency Information in Graph Convolutional Networks. AAAI 2021 (FAGCN: signed attention — the scalar sheaf special case of NSD’s full matrix maps).
