What Is a Sheaf? From Topology to Graph Learning

6 minute read

Published:

TL;DR: A sheaf assigns data (vectors, functions, sets) to the parts of a space, plus restriction maps that say how data on larger pieces relates to data on smaller pieces. A global section is a consistent assignment across the whole space — one where all the local pieces agree. On a graph, nodes and edges are the "pieces", restriction maps encode inter-node relationships, and the Sheaf Laplacian measures how inconsistent a signal is.

The Intuition: Consistent Local Information

Imagine a weather network: temperature sensors at every city (nodes), with the understanding that adjacent cities should have related temperatures. A sheaf formalises this by:

  1. Assigning a data space to every city: the set of possible temperature readings.
  2. Assigning a data space to every road between cities: the pair of readings that are “consistent” for that road.
  3. Specifying a restriction map per road that says: “if city u has reading x_u, what does that imply at the road endpoint?”

A global section is an assignment of readings to all cities such that every road’s restriction is satisfied — the whole network is self-consistent.

This is exactly the structure a Sheaf Neural Network learns: not just node features, but the relational geometry between them.

Sheaves in Classical Topology

Historically, sheaves arose in algebraic geometry and topology (Leray, 1940s; Serre, 1950s). A sheaf on a topological space X assigns:

  • A set (or vector space) F(U) to each open set U ⊆ X
  • A restriction map ρ_{UV} : F(U) → F(V) for every inclusion V ⊆ U
  • Consistency axioms: ρ_{UW} = ρ_{VW} ∘ ρ_{UV} for W ⊆ V ⊆ U
The classic example: the sheaf of continuous functions. F(U) = {continuous functions on U}. For V ⊆ U, ρ_{UV}(f) = f_V (restrict the function to the smaller set). A global section is a function defined consistently everywhere on X.

From Topology to Graphs: Cellular Sheaves

A graph G = (V, E) is a 1-dimensional CW complex: a topological space built from 0-cells (nodes) and 1-cells (edges). A cellular sheaf F on G assigns:

  • A vector space F(v) ≅ ℝ^{d_v} to each node v — called the node stalk
  • A vector space F(e) ≅ ℝ^{d_e} to each edge e — called the edge stalk
  • A linear map F_{v→e} : F(v) → F(e) for each incidence (v is an endpoint of e) — the restriction map
F = { F(v), F(e), F_{v→e} : v ∈ V, e ∈ E, v incident to e }

No consistency axioms are needed for a 1-dimensional CW complex (they would involve 2-cells, which a graph doesn’t have). The simplicity is what makes cellular sheaves tractable for graph learning.

What the Restriction Maps Encode

The restriction map F_{v→e} : ℝ^d → ℝ^d (assuming equal stalk dimensions) describes how node v’s data “projects onto” the shared edge. The two restriction maps for edge e = (u, v) — namely F_{u→e} and F_{v→e} — describe how u and v respectively relate to the shared edge.

Special cases:

Restriction mapsInterpretation
F_{v→e} = I (identity) for all (v, e)Standard graph: nodes should be equal
F_{v→e} ∈ {+I, −I}Signed edges: nodes should agree or disagree
F_{v→e} = diag(d₁,…,d_d)Feature-wise scaling: different channels scale differently
F_{v→e} ∈ O(d)Orthogonal rotation: nodes related by a rotation
F_{v→e} ∈ ℝ^{d×d}General linear: arbitrary learned relationship
Key insight: Standard GCN's message passing is a sheaf neural network with all restriction maps fixed to the identity. The "oversmoothing" problem is exactly the consequence of this assumption: the Sheaf Laplacian with identity maps forces all node features to agree, collapsing to constants. By learning richer restriction maps, sheaf GNNs can represent the relational structure of heterophilic graphs — where connected nodes should differ in a *structured way*, not randomly.

Global Sections: When Everything Agrees

A 0-cochain is a collection of vectors x = (x_v)_{v∈V} with x_v ∈ F(v) — an assignment of data to every node.

A global section is a 0-cochain where every restriction is satisfied: for every edge e = (u, v),

F_{u→e} x_u = F_{v→e} x_v

In words: the data at u, projected to the edge stalk, equals the data at v, projected to the edge stalk. The system is globally consistent.

The space of global sections, denoted H⁰(G, F) = ker(δ₀), is the analogue of the null space of the standard graph Laplacian. For standard GCN, H⁰ consists of constant functions. For a sheaf with non-trivial maps, H⁰ is much richer — it can contain functions that vary across nodes while still satisfying the relational constraints imposed by the restriction maps.

The Coboundary and Inconsistency

The coboundary operator δ₀ : C⁰(G, F) → C¹(G, F) maps node-level data to edge-level disagreement:

(δ₀ x)_e = F_{v→e} x_v − F_{u→e} x_u

for edge e = (u, v) with a chosen orientation. The disagreement (δ₀ x)_e ∈ F(e) measures how inconsistent x is at edge e. The Sheaf Laplacian Δ_F = δ₀ᵀδ₀ accumulates this inconsistency into a node-level operator:

Δ_F = δ₀ᵀ δ₀

This is the central operator for sheaf-based graph learning. Its null space is exactly the space of global sections.

The Three Key Objects

ObjectSymbolRole in graph learning
Node stalkF(v) ≅ ℝ^dFeature space at each node
Restriction mapF_{v→e} ∈ ℝ^{d×d}Relational geometry per edge
Sheaf LaplacianΔ_F = δ₀ᵀδ₀Aggregation operator (replaces L)

The Sheaf Laplacian Δ_F is an (Nd) × (Nd) block matrix, where N is the number of nodes. Its (u, v)-block (for an edge e = (u,v)) is:

[Δ_F]_{uv} = −F_{u→e}ᵀ F_{v→e} [Δ_F]_{uu} = Σ_{e incident to u} F_{u→e}ᵀ F_{u→e}

When all restriction maps are the identity: [Δ_F]{uv} = −I (if u∼v), [Δ_F]{uu} = deg(u)·I — exactly d copies of the standard graph Laplacian.

Summary

A sheaf is not exotic mathematics — it is a principled framework for attaching structured data to a space and asking when that data is globally consistent. On a graph:

  • Node stalks hold feature vectors
  • Restriction maps encode the relational geometry between adjacent nodes
  • Global sections are the self-consistent configurations — the null space of the Sheaf Laplacian
  • The Sheaf Laplacian measures inconsistency and drives diffusion

Everything else in this series — architectures, theory, applications — is a consequence of this foundational structure.

References

  • Hansen, J., & Gebhart, T. (2020). Sheaf Neural Networks. NeurIPS 2020 GRL+ Workshop (the first paper to apply cellular sheaves to GNNs).
  • Ghrist, R. (2014). Elementary Applied Topology. Createspace 2014 (ch. 5 covers cellular sheaves with graph examples and is freely available).
  • Curry, J. (2014). Sheaves, Cosheaves and Applications. PhD Thesis, Penn 2014 (mathematical foundation; ch. 2–3 are the relevant sections for graph learning).