What Is a Sheaf? From Topology to Graph Learning

5 minute read

Published:

TL;DR: A cellular sheaf on a graph assigns a vector space ("stalk") to each node and each edge, with linear "restriction maps" from node stalks to adjacent edge stalks. A global section is an assignment of vectors to all nodes that is "consistent" — the restriction maps agree at every edge. The Sheaf Laplacian measures the degree of global inconsistency.

Sheaves in Ordinary Mathematics

In mathematics, a sheaf is a tool for tracking local data (defined on open sets of a topological space) and understanding when local data can be assembled into global data.

The key property: local-to-global consistency. Data is consistent locally at every overlap → data assembles into a unique global section.

For our purposes (cellular sheaves on graphs), we use a discretisation: the “topological space” is the graph, “open sets” are nodes and edges, and “local data” are vectors in the stalks.

Cellular Sheaves on Graphs

A cellular sheaf F on graph G = (V, E) assigns:

  1. Node stalks: a vector space F(v) = ℝ^{d_v} to each node v
  2. Edge stalks: a vector space F(e) = ℝ^{d_e} to each edge e
  3. Restriction maps: for each edge e = (u,v) and each incident node w ∈ {u,v}:
    • F(w ⊴ e): F(w) → F(e) — a linear map from the node stalk to the edge stalk

Notation: F_{v→e} denotes the restriction map from node v to edge e.

The Cochain Complex

The stalks and restriction maps define a cochain complex:

C^0(G, F) --δ₀--> C^1(G, F)

Where:

  • C^0 = ⊕_v F(v): the space of all node assignments (0-cochains)
  • C^1 = ⊕_e F(e): the space of all edge assignments (1-cochains)
  • δ₀: C^0 → C^1 is the coboundary map (codifferential):
(δ₀ x)_e = F_{v→e} x_v - F_{u→e} x_u for edge e = (u,v)

The coboundary δ₀ x measures the disagreement between u’s and v’s contributions to edge e.

Global Sections

A global section is a vector x ∈ C^0 (assignment of vectors to all nodes) such that:

(δ₀ x)_e = 0 for all edges e

i.e., F_{v→e} x_v = F_{u→e} x_u for all edges e=(u,v). The two endpoints “agree” at every edge.

The space of global sections ker(δ₀) measures how much consistent global data the sheaf supports.

Special case — trivial sheaf: F(v) = ℝ, F(e) = ℝ, all restriction maps = 1. Then δ₀ x = x_v - x_u is just the ordinary graph gradient, and global sections are constant functions on connected components. This recovers the standard GCN setting.

Intuition via temperature: Imagine nodes are thermometers at different locations. The "sheaf" models how readings should relate across edges — maybe a north-facing thermometer at A and a south-facing one at B should read slightly differently even if they measure the same "true" temperature. The restriction maps encode this "transformation rule." A global section represents a consistent temperature assignment across the network after applying all local transformations.

The Standard Graph as a Trivial Sheaf

Standard GCN corresponds to the trivial sheaf:

  • F(v) = ℝ^d for all v (all stalks the same space)
  • F(e) = ℝ^d for all e
  • F_{v→e} = I_d (identity map) for all v, e

The coboundary δ₀ x = x_v - x_u is just the edge-difference of features. The Sheaf Laplacian reduces to the standard graph Laplacian L = δ₀^T δ₀.

GCN propagation H ← (I - L/2) H is graph diffusion — it minimises the Dirichlet energy Σ_{(u,v)} x_u - x_v ².

Non-Trivial Sheaves Allow Disagreement

With non-trivial restriction maps, the “agreement” condition becomes F_{u→e} x_u = F_{v→e} x_v — x_u and x_v are not required to be equal, only to agree after transformation.

This allows adjacent nodes to have different but compatible features. In a heterophilic graph, two nodes with different labels might have very different features, but a learned sheaf map could rotate one into the other’s space — making them “consistent” under the sheaf even though they are numerically different.

Why Sheaves for Graphs?

The sheaf framework provides:

  1. Richer aggregation: edges have their own “mediation” structure (restriction maps)
  2. Heterophily handling: adjacent nodes with different features are not forced to agree — the restriction maps can accommodate difference
  3. Mathematical guarantees: the Sheaf Laplacian inherits spectral theory from the standard Laplacian, with richer structure
  4. Interpretability: the consistency defect δ₀ x ² measures “how heterophilic” the data is under the learned sheaf

Summary

ConceptStandard GCNCellular Sheaf
Node dataVectors h_v ∈ ℝ^dVectors x_v ∈ F(v) = ℝ^{d_v}
Edge dataNoneVectors x_e ∈ F(e) = ℝ^{d_e}
Agreement conditionh_u = h_v at edgesF_{u→e} x_u = F_{v→e} x_v
LaplacianL = D - AΔ_F = δ₀^T δ₀ (Sheaf Laplacian)
Global sectionsConstant functionsker(δ₀): consistent assignments

The sheaf framework generalises the standard graph to a richer structure that can encode per-edge relational information. The Sheaf Laplacian, covered in the next post, is the key operator that makes this actionable for graph learning.

References