What Is a Sheaf? From Topology to Graph Learning
Published:
The Intuition: Consistent Local Information
Imagine a weather network: temperature sensors at every city (nodes), with the understanding that adjacent cities should have related temperatures. A sheaf formalises this by:
- Assigning a data space to every city: the set of possible temperature readings.
- Assigning a data space to every road between cities: the pair of readings that are “consistent” for that road.
- Specifying a restriction map per road that says: “if city u has reading x_u, what does that imply at the road endpoint?”
A global section is an assignment of readings to all cities such that every road’s restriction is satisfied — the whole network is self-consistent.
This is exactly the structure a Sheaf Neural Network learns: not just node features, but the relational geometry between them.
Sheaves in Classical Topology
Historically, sheaves arose in algebraic geometry and topology (Leray, 1940s; Serre, 1950s). A sheaf on a topological space X assigns:
- A set (or vector space) F(U) to each open set U ⊆ X
- A restriction map ρ_{UV} : F(U) → F(V) for every inclusion V ⊆ U
- Consistency axioms: ρ_{UW} = ρ_{VW} ∘ ρ_{UV} for W ⊆ V ⊆ U
| The classic example: the sheaf of continuous functions. F(U) = {continuous functions on U}. For V ⊆ U, ρ_{UV}(f) = f | _V (restrict the function to the smaller set). A global section is a function defined consistently everywhere on X. |
From Topology to Graphs: Cellular Sheaves
A graph G = (V, E) is a 1-dimensional CW complex: a topological space built from 0-cells (nodes) and 1-cells (edges). A cellular sheaf F on G assigns:
- A vector space F(v) ≅ ℝ^{d_v} to each node v — called the node stalk
- A vector space F(e) ≅ ℝ^{d_e} to each edge e — called the edge stalk
- A linear map F_{v→e} : F(v) → F(e) for each incidence (v is an endpoint of e) — the restriction map
No consistency axioms are needed for a 1-dimensional CW complex (they would involve 2-cells, which a graph doesn’t have). The simplicity is what makes cellular sheaves tractable for graph learning.
What the Restriction Maps Encode
The restriction map F_{v→e} : ℝ^d → ℝ^d (assuming equal stalk dimensions) describes how node v’s data “projects onto” the shared edge. The two restriction maps for edge e = (u, v) — namely F_{u→e} and F_{v→e} — describe how u and v respectively relate to the shared edge.
Special cases:
| Restriction maps | Interpretation |
|---|---|
| F_{v→e} = I (identity) for all (v, e) | Standard graph: nodes should be equal |
| F_{v→e} ∈ {+I, −I} | Signed edges: nodes should agree or disagree |
| F_{v→e} = diag(d₁,…,d_d) | Feature-wise scaling: different channels scale differently |
| F_{v→e} ∈ O(d) | Orthogonal rotation: nodes related by a rotation |
| F_{v→e} ∈ ℝ^{d×d} | General linear: arbitrary learned relationship |
Global Sections: When Everything Agrees
A 0-cochain is a collection of vectors x = (x_v)_{v∈V} with x_v ∈ F(v) — an assignment of data to every node.
A global section is a 0-cochain where every restriction is satisfied: for every edge e = (u, v),
In words: the data at u, projected to the edge stalk, equals the data at v, projected to the edge stalk. The system is globally consistent.
The space of global sections, denoted H⁰(G, F) = ker(δ₀), is the analogue of the null space of the standard graph Laplacian. For standard GCN, H⁰ consists of constant functions. For a sheaf with non-trivial maps, H⁰ is much richer — it can contain functions that vary across nodes while still satisfying the relational constraints imposed by the restriction maps.
The Coboundary and Inconsistency
The coboundary operator δ₀ : C⁰(G, F) → C¹(G, F) maps node-level data to edge-level disagreement:
for edge e = (u, v) with a chosen orientation. The disagreement (δ₀ x)_e ∈ F(e) measures how inconsistent x is at edge e. The Sheaf Laplacian Δ_F = δ₀ᵀδ₀ accumulates this inconsistency into a node-level operator:
This is the central operator for sheaf-based graph learning. Its null space is exactly the space of global sections.
The Three Key Objects
| Object | Symbol | Role in graph learning |
|---|---|---|
| Node stalk | F(v) ≅ ℝ^d | Feature space at each node |
| Restriction map | F_{v→e} ∈ ℝ^{d×d} | Relational geometry per edge |
| Sheaf Laplacian | Δ_F = δ₀ᵀδ₀ | Aggregation operator (replaces L) |
The Sheaf Laplacian Δ_F is an (Nd) × (Nd) block matrix, where N is the number of nodes. Its (u, v)-block (for an edge e = (u,v)) is:
When all restriction maps are the identity: [Δ_F]{uv} = −I (if u∼v), [Δ_F]{uu} = deg(u)·I — exactly d copies of the standard graph Laplacian.
Summary
A sheaf is not exotic mathematics — it is a principled framework for attaching structured data to a space and asking when that data is globally consistent. On a graph:
- Node stalks hold feature vectors
- Restriction maps encode the relational geometry between adjacent nodes
- Global sections are the self-consistent configurations — the null space of the Sheaf Laplacian
- The Sheaf Laplacian measures inconsistency and drives diffusion
Everything else in this series — architectures, theory, applications — is a consequence of this foundational structure.
References
- Hansen, J., & Gebhart, T. (2020). Sheaf Neural Networks. NeurIPS 2020 GRL+ Workshop (the first paper to apply cellular sheaves to GNNs).
- Ghrist, R. (2014). Elementary Applied Topology. Createspace 2014 (ch. 5 covers cellular sheaves with graph examples and is freely available).
- Curry, J. (2014). Sheaves, Cosheaves and Applications. PhD Thesis, Penn 2014 (mathematical foundation; ch. 2–3 are the relevant sections for graph learning).
