HetSheaf: Heterogeneous Graphs Meet Cellular Sheaves

7 minute read

Published: May 26, 2026

TL;DR: Standard heterogeneous GNNs add type-specific layers. HetSheaf instead encodes node and edge types directly in the sheaf, so one unified model can handle heterogeneous graphs with far fewer parameters.

Paper: "Heterogeneous Sheaf Neural Networks" · arXiv:2409.08036
Authors: L. Braithwaite, A. Borgi, G. Onorato, K. Tarantelli, F. Restuccia, F. Silvestri, P. Liò
Venue: arXiv preprint, 2024 · 📄 Read the paper

First page of the Heterogeneous Sheaf Neural Networks paper — Paper preview — Heterogeneous Sheaf Neural Networks (Braithwaite et al., 2024).

The Problem: Heterogeneity is Expensive

Real-world graphs are rarely uniform. In a knowledge graph, nodes can be people, organisations, or concepts; edges can be authored, affiliated with, or cited. This is heterogeneity: multiple node types and edge types, each with its own feature space.

Existing heterogeneous GNNs — R-GCN, HAN, HGT — handle this by adding type-specific modules: one transformation matrix per relation type, one attention head per meta-path, or separate encoders per node type. The result is parameter bloat and architectural complexity that grows with the number of types.

HetSheaf asks: can we encode heterogeneity in the structure rather than the architecture?

The Intuition in One Sentence

HetSheaf treats node type and edge type not as a reason to build a different neural layer for every relation, but as a reason to build a richer local geometry: typed stalk semantics, type-conditioned restriction maps, same global propagation rule.

The Core Idea: Type-Aware Sheaves

A cellular sheaf assigns a vector space (a stalk) to each node and edge, plus a restriction map for each endpoint of each edge that says “how does the signal on this node relate to the signal on this edge?” In standard Sheaf Neural Networks, all stalks are the same size and restriction maps are unconstrained.

HetSheaf makes two changes:

Type-aware stalks: Each node and edge keeps the same stalk dimension, but the stalk content is made type-aware through the learned sheaf construction. In other words, HetSheaf does not yet assign different stalk sizes to different types; instead, it uses a shared-dimensional local space whose semantics depend on node and edge type. Allowing genuinely different stalk dimensions across types is a natural future direction, but it is not part of the current method.
Conditioned restriction maps: The restriction map for each edge endpoint is conditioned on the node features, node type, and edge type. This lets the model learn type-specific relational structure automatically, without separate architectural components per relation.

HetSheaf framework overview comparing architecture-level heterogeneity with sheaf-level heterogeneity — Figure 1 — HetSheaf’s main overview contrasts the standard approach of baking heterogeneity into the architecture with the sheaf-based alternative: node and edge types are absorbed directly into local stalk spaces and restriction maps, so the propagation rule itself stays unified and geometry-aware.

HetSheaf assigns a different restriction map per edge type. Three node types (person, org, concept) with different local geometries share one propagation rule — heterogeneity lives in the maps, not the architecture.

Mini-Example: 3-Node Heterogeneous Graph

Consider a small graph with nodes A (person), B (org), C (concept), two edges e_AB and e_BC.

Standard SNN (type-blind): both edges use the same learned matrix W ∈ ℝ^{2×2}.

 F_{A→e_AB} = W, F_{B→e_AB} = W
 F_{B→e_BC} = W, F_{C→e_BC} = W

HetSheaf (type-conditioned): each edge type gets its own restriction map, conditioned on the node’s type embedding τ and features h_v.

 F_{A→e_AB} = MLP(h_A, τ_person, τ_{P→O}) ∈ ℝ^{2×2}
 F_{B→e_AB} = MLP(h_B, τ_org, τ_{P→O}) ∈ ℝ^{2×2}
 F_{B→e_BC} = MLP(h_B, τ_org, τ_{O→C}) ∈ ℝ^{2×2}
 F_{C→e_BC} = MLP(h_C, τ_concept, τ_{O→C}) ∈ ℝ^{2×2}

The disagreement at e_AB that the Sheaf Laplacian penalises is then:

 (δ₀ x)_{e_AB} = F_{A→e_AB} x_A − F_{B→e_AB} x_B

With type-conditioned maps, this disagreement is measured in the coordinate frame appropriate to the A–B relation, not in a generic frame shared by all edge types. A person-to-org relationship and an org-to-concept relationship are penalised on their own terms.

Sheaf Predictors

The restriction maps can be instantiated in different ways, giving a family of Heterogeneous Sheaf Predictors (HSPs). The paper explores several variants ranging from linear maps to nonlinear maps conditioned on concatenated node/edge features.

Heterogeneous Sheaf Predictor variants including Sheaf-NSD, ensemble, NE, EE, TE, NT, ET, and types — Figure 2 — The Heterogeneous Sheaf Predictor family shows how expressive power increases as restriction maps are conditioned on richer typed context. The variants progressively inject node-type functions, edge-type functions, or both, making it clear that HetSheaf is a framework for typed local geometry rather than one fixed predictor.

SheafPool: Graph-Level Readout

The graph-classification problem is simple to state:

ordinary GNNs can sum or average node embeddings directly;
sheaf GNNs cannot, because each node lives in its own local coordinate frame.

So naive pooling is not just weak, but wrong. Two stalk vectors can represent the same geometric content in different bases, and direct averaging would treat them as different. The final graph embedding would then depend on arbitrary local frame choices instead of only on the graph.

SheafPool fixes this by making the readout basis-invariant.

In practice, it does four things:

Normalise each stalk locally.
Align stalks to a shared anchor frame.
Score nodes with invariant attention weights.
Pool the aligned stalks into one graph representation.

The key idea is: align first, pool later. That is what makes graph-level prediction well-defined in sheaf space, especially on heterogeneous graphs where local geometry is already more complex.

SheafPool architecture with whitening, anchor-guided alignment, invariant attention weights, stalk pooling, and invariant graph feature extraction — Figure 3 — SheafPool solves the core graph-level readout problem step by step: whiten each stalk, align residual orientations with a shared anchor frame, compute invariant attention weights, pool aligned stalks into a receive-only token, and finally extract graph features through channel-wise invariant energies. This is what makes graph classification well-defined under local basis changes.

Results

On the Heterogeneous Graph Benchmark (HGB) — covering node classification, link prediction, and graph classification across multiple heterogeneous datasets — HetSheaf achieves:

Up to +2 percentage points higher Macro F1 on node classification vs. both homogeneous (GCN, GAT, GIN, GraphSAGE) and heterogeneous (R-GCN, HAT, HGT) baselines.
Up to 99.62% F1 on link prediction benchmarks.
10× parameter reduction vs. type-specialised baselines while maintaining competitive performance.
SheafPool delivers +42pp over mean pooling on graph classification tasks.

Key Insight: Type-specific neural modules scale as O(T²) in the number of relation types T — every pair of node types may need its own transformation. Type-aware restriction maps in HetSheaf scale as O(T): each type gets a conditioning signal, but the propagation rule stays the same. Encoding heterogeneity in geometry rather than architecture is the key to parameter-efficient relational learning.

Why This Matters

The important shift is conceptual. Most heterogeneous GNNs ask: “what new neural block do I need for this new graph type?” HetSheaf asks: “what local compatibility structure does this graph already have?” Once that structure is encoded in the sheaf, the downstream model becomes simpler, more principled, and easier to scale as the number of types grows.

✅ Key Takeaways

HetSheaf moves heterogeneity from the architecture into the data structure via type-aware sheaves.
Restriction maps conditioned on node/edge types encode relational structure without type-specific modules.
SheafPool provides a basis-change-invariant graph-level readout — essential for correct graph classification with sheaves.
State-of-the-art on HGB with up to 10× fewer parameters than specialised baselines.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

HetSheaf: Heterogeneous Graphs Meet Cellular Sheaves

The Problem: Heterogeneity is Expensive

The Intuition in One Sentence

The Core Idea: Type-Aware Sheaves

Mini-Example: 3-Node Heterogeneous Graph

Sheaf Predictors

SheafPool: Graph-Level Readout

Results

Why This Matters

✅ Key Takeaways

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization

Alessio Borgi

The Problem: Heterogeneity is Expensive

The Intuition in One Sentence

The Core Idea: Type-Aware Sheaves

Mini-Example: 3-Node Heterogeneous Graph

Sheaf Predictors

SheafPool: Graph-Level Readout

Results

Why This Matters

✅ Key Takeaways

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 FoPE: Fourier Position Embedding for Length Generalization

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization