Open Problems and Future Directions in Sheaf Neural Networks

7 minute read

Published:

TL;DR: The field of sheaf neural networks is young (2020–2024) and several foundational questions remain open: (1) scalability to graphs with millions of nodes, (2) stochastic and probabilistic sheaves for uncertainty quantification, (3) sheaf autoencoders for generative models, (4) formal connections to persistent homology and TDA, (5) unified Sheaf Transformer architectures, and (6) sheaf-based explanation methods. This post maps the open landscape.

Open Problem 1: Scalability

The challenge: NSD and PNSD predict restriction maps via per-edge MLPs — one forward pass per edge. For a graph with E = 10⁷ edges, this requires 10⁷ MLP evaluations per layer, which is too slow for large-scale deployment.

Current approaches:

  • Mini-batch training with neighbourhood sampling (GraphSAGE-style): sample K neighbours per node, predict maps only for sampled edges
  • Shared maps: instead of per-edge maps, predict a per-node transformation that approximates per-edge maps
  • Cluster-GCN style: cluster the graph, predict maps within clusters only

Key research question: Can we design a sheaf GNN that:

  1. Has O(N) map prediction cost (not O(E))
  2. Maintains the theoretical guarantees of NSD (heterophily handling, oversmoothing avoidance)
  3. Achieves competitive accuracy on large heterophilic datasets (Amazon, ogbn-arxiv)?

Promising direction: Predict restriction maps from node features alone (not node pairs): F_{v▷e} = f(h_v) — this gives O(N) map prediction, at the cost of losing the inter-node relational information. Combining this with shared global maps per edge type could balance cost and expressiveness.

Open Problem 2: Stochastic Sheaves

The challenge: Current sheaf GNNs produce deterministic restriction maps. For uncertainty quantification, we want probabilistic restriction maps: F_{v▷e} ~ p(Fh_u, h_v, G), giving a distribution over sheaves rather than a single sheaf.

Why this matters:

  • In drug discovery, uncertain molecular property predictions are as important as accurate ones
  • On heterophilic graphs with noisy labels, some edges may have uncertain relational structure
  • For out-of-distribution detection, sheaf uncertainty can signal when the model’s relational assumptions break down

Formulation:

F_{v▷e} ~ N(μ_{v▷e}, Σ_{v▷e}) (Gaussian restriction maps)

The resulting Sheaf Laplacian Δ_F is random — its distribution determines the uncertainty in the diffusion.

Research questions: What is the distribution of the eigenvalues of Δ_F when the maps are random? How does map uncertainty propagate to prediction uncertainty? Can we train a stochastic sheaf GNN via variational inference?

Open Problem 3: Sheaf Autoencoders

The challenge: Sheaf GNNs are discriminative models — they map input graphs to node representations for classification. Can they be used as generative models — generating graphs with specified sheaf structure?

Sheaf autoencoder: Encode a graph G with sheaf F into a latent code z; decode z to reconstruct (G, F). The decoder must generate both:

  1. The graph topology (nodes, edges) — standard graph generation challenge
  2. The sheaf structure (restriction maps) — new challenge

Why sheaf autoencoders? For molecular generation (generating molecules with desired properties), the restriction maps encode bond chemistry. A sheaf autoencoder could generate molecules by specifying the desired restriction map structure — the chemical relationships between atoms — and decoding to a compatible graph.

Research questions: What is the right latent space for sheaf structure? How do we train a sheaf decoder (generating restriction maps from latent codes)? How do we evaluate the quality of generated sheaves?

Open Problem 4: Persistent Sheaf Homology

The challenge: Persistent homology (Edelsbrunner & Harer, 2010) tracks how topological features (connected components, loops) appear and disappear as a filtration parameter increases. Can this be extended to sheaves?

Persistent sheaf homology: Define a filtration on a sheaf: for each threshold ε, consider the sub-sheaf F_ε consisting only of edges with (F_{u▷e} − F_{v▷e}) _F < ε. Track how the global section space H⁰(G, F_ε) changes with ε.

Potential applications:

  • Multi-scale sheaf analysis: different ε values reveal relational structure at different scales
  • Sheaf barcodes: persistence diagrams for sheaf features — a topological signature of the graph’s relational geometry
  • Comparison of sheaves: the bottleneck distance between sheaf barcodes as a sheaf similarity metric

Research questions: Is persistent sheaf homology stable (small changes in F give small changes in barcodes)? Can it be computed efficiently? Does it provide useful features for graph classification tasks?

Open Problem 5: Unified Sheaf Transformer Theory

The challenge: The theoretical framework for sheaf GNNs (heterophily, oversmoothing, expressiveness) is well-developed for diffusion-based models. Can it be extended to sheaf Transformers?

Specific questions:

  • Does a Sheaf Transformer with global attention still avoid oversmoothing? (Global attention may or may not preserve the null space structure)
  • Can we prove heterophily handling guarantees for sheaf attention mechanisms?
  • What is the expressiveness of a Sheaf Transformer relative to 1-WL and higher-order WL tests?
  • Does gauge equivariance (SheafAN-style) survive the addition of global attention?

Open Problem 6: Sheaf Interpretability

The challenge: The restriction maps F_{v▷e} encode the relational geometry of each edge. Can we interpret what the learned maps mean in terms of the underlying domain?

For citation networks: Do the learned maps cluster edges by paper discipline? Do they encode citation direction (citing vs cited) or citation strength?

For molecular graphs: Do the maps cluster by bond type (single/double/triple)? Do they encode electronegativity differences between bonded atoms?

Approach: Visualise the maps as elements of O(d) (for orthogonal maps): each map is a rotation in ℝ^d — its rotation angle and axis can be plotted and compared across edges. If edges of the same type cluster in this rotation space, the maps are interpretable.

Research question: Is there a principled sheaf interpretability method analogous to GNN attribution methods (GNN-Explainer, GradCAM for graphs)?

Open Problem 7: Sheaf Normalisation Flows

The challenge: Normalising flows learn a diffeomorphism from a simple distribution to a complex one. On graphs, this requires handling the non-Euclidean structure. Sheaf-based normalising flows could model distributions over graph-valued data.

Sheaf flow: Learn a time-evolving sheaf F_t such that at t=0 the node stalks follow a simple distribution (e.g., Gaussian), and at t=1 they follow the data distribution. The flow is driven by the sheaf ODE:

dH(t)/dt = −Δ_{F_t} H(t) + b_t

where b_t is a bias term (the “source” of the flow). This is a sheaf-based continuous normalising flow.

Open Problem 8: Multi-Scale Sheaf Pooling

The challenge: Graph pooling for graph classification requires aggregating node representations into a single graph representation. Standard pooling (mean, sum, max) loses structural information. Sheaf pooling could use the global section space H⁰(G, F) as the graph representation — a topologically meaningful aggregate.

Sheaf pooling: The global sections form a d·dim(H⁰)-dimensional representation of the graph. This is computed as the null space of Δ_F — a principled global summary.

Hierarchical sheaf pooling: Coarsen the graph by contracting nodes into super-nodes; define restriction maps between the fine and coarse sheaves. This gives a multi-scale sheaf — a new architecture for graph classification with topological guarantees.

A Research Roadmap

Near-term (achievable with current tools):

  • Scalable sheaf GNNs via node-level map prediction
  • Sheaf interpretability via rotation space visualisation
  • Sheaf GNNs on large-scale benchmarks (ogbn-arxiv, ogbn-products)

Medium-term (requires new theory):

  • Persistent sheaf homology algorithms
  • Sheaf Transformer expressiveness theory
  • Stochastic sheaf GNNs via variational inference

Long-term (requires new frameworks):

  • Sheaf generative models
  • Persistent sheaf learning for graph classification
  • Sheaf-based multi-scale learning

References

  • Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion. NeurIPS 2022 (the core architecture that all open problems build on).
  • Edelsbrunner, H., & Harer, J. (2010). Computational Topology: An Introduction. AMS 2010 (persistent homology — the TDA framework extended to sheaves in open problem 4).
  • Curry, J. (2014). Sheaves, Cosheaves and Applications. PhD Thesis, Penn 2014 (sheaf homology and cosheaf theory — mathematical foundation for open problems 3 and 4).