Diagonal, Orthogonal, and General Sheaf Maps

5 minute read

Published:

TL;DR: Sheaf restriction maps F_{u→e} can be scalars (d=1), diagonal (d parameters), orthogonal (d(d-1)/2 parameters), or general d×d matrices (d² parameters). General maps are most expressive but expensive. Orthogonal maps offer a good trade-off: they can represent rotations and reflections (enough for most geometric relationships) at lower cost than general maps.

The Design Space of Restriction Maps

In Neural Sheaf Diffusion, the MLP outputs a restriction map F_{u→e} for each directed edge. The choice of matrix class constrains both what relationships the sheaf can represent and how much computation is required.

Scalar Maps (d_e = 1)

F_{u→e} ∈ ℝ (a single scalar)

Parameters per edge: 1 (from the original d-dimensional node space to a 1-dimensional edge space).

What it represents: attention weight — how much of u’s contribution flows to edge e.

Sheaf Laplacian block: (Δ_F){uv} = -f{u→e} · f_{v→e} where f are scalars.

Connection to existing models: scalar sheaf maps recover GAT (graph attention network) with fixed attention weights.

Limitation: cannot represent directional transformations — just scaling.

Diagonal Maps

F_{u→e} = diag(f_1, ..., f_d) ∈ ℝ^{d×d} (d parameters)

What it represents: per-dimension scaling — emphasise some feature dimensions over others.

Sheaf Laplacian block: (Δ_F)_{uv} = -diag(f_1^u f_1^v, …, f_d^u f_d^v). This is a diagonal matrix — the Sheaf Laplacian is block-diagonal in each feature dimension, so different dimensions are independent.

Advantage: O(d) parameters per edge (vs O(d²) for general); fast Sheaf Laplacian construction.

Limitation: cannot model inter-dimensional coupling — what node u thinks dimension 1 means is the same as what node v thinks dimension 1 means. Only magnitudes differ, not directions.

Orthogonal Maps

F_{u→e} ∈ O(d) = {Q ∈ ℝ^{d×d} : Q^T Q = I}

Parameters per edge: d(d-1)/2 (dimensions of the Lie group O(d)).

What it represents: rotations and reflections — a rigid transformation of feature space.

Key property: F^T_{u→e} F_{u→e} = I_d, so the diagonal block simplifies:

(Δ_F)_{vv} = Σ_{e ∋ v} F^T_{v→e} F_{v→e} = deg(v) · I_d

This means the diagonal blocks are scalar multiples of the identity — greatly simplifying the Sheaf Laplacian.

Why orthogonal maps are special: With orthogonal restriction maps, the Sheaf Laplacian block (Δ_F)_{uv} = -Q_u^T Q_v where Q_u, Q_v ∈ O(d). This is a rotation matrix — it expresses "how much the feature spaces of u and v are rotated relative to each other." Sheaf diffusion with orthogonal maps is equivalent to diffusion on a graph where each node has its own coordinate frame, and the edge maps express the frame rotation between neighbours. This is the discrete analogue of a connection Laplacian in differential geometry.

Connection geometry: orthogonal sheaves on graphs correspond exactly to flat vector bundles with orthogonal structure group — a classical object in differential geometry. This gives orthogonal sheaf GNNs a rich theoretical foundation connecting graph learning to Riemannian geometry.

General Linear Maps

F_{u→e} ∈ ℝ^{d_e × d} (d_e · d parameters)

Parameters per edge: d² (for square d×d maps) or d_e × d (for rectangular).

What it represents: arbitrary linear transformations — mixing, scaling, rotating, and projecting features.

Most expressive: can represent any linear relationship between u’s and v’s feature spaces.

Cost: O(d²) parameters per edge, O(d²) per Sheaf Laplacian block, O(E d²) total for the full Sheaf Laplacian.

Summary Comparison

Map typeParameters/edgeFeature couplingGeometric meaning
Scalar1NoneAttention weight
DiagonaldPer-dimensionFeature selection
Orthogonald(d-1)/2Full (rotation)Frame rotation
GeneralFullArbitrary linear

Practical Recommendations

Use diagonal maps when: graphs are large (E » 1000), computation is a bottleneck, and per-dimension scaling is sufficient.

Use orthogonal maps when: geometry of the feature space matters (the maps should be interpretable as rotations), or when the theoretical connection to differential geometry is valuable.

Use general maps when: maximum expressiveness is needed and the graph is small enough (molecules, proteins with E < 10,000).

Impact on Sheaf Laplacian Sparsity

For all map types, the Sheaf Laplacian Δ_F has the same sparsity pattern as the standard graph Laplacian, but each scalar entry is replaced by a d×d block. The total size is (Nd) × (Nd) with at most 2E non-zero blocks (plus N diagonal blocks).

For diagonal maps, each block is diagonal — the Sheaf Laplacian is sparse in the d-expanded sense, enabling efficient sparse operations.

For general maps, each block is dense — the full Sheaf Laplacian requires O(E d²) storage.

References