Learning Sheaf Maps: Parameterisation Strategies Compared
Published:
The Core Choice
Every sheaf GNN requires a decision: what is the allowed form of the restriction maps F_{v▷e} : ℝ^d → ℝ^d?
This single choice determines:
- The number of parameters per edge
- The expressiveness of the relational geometry
- Whether the model has gauge symmetry
- The structure of the Sheaf Laplacian’s null space
- Computational cost of map learning and Laplacian construction
Type 1: Scalar Maps (d=1 effective)
Form: F_{v▷e} = s_{v▷e} · I where s_{v▷e} ∈ ℝ is a scalar.
Parameters per edge: 2 scalars (one per endpoint).
Sheaf Laplacian blocks:
(scalar multiple of identity — the Sheaf Laplacian is a scalar-weighted graph Laplacian tensor-product with I_d).
Null space: Same dimension as standard graph Laplacian null space × d. Global sections = constant-per-component functions, same as GCN.
Expressive power: Equivalent to a signed graph Laplacian — can represent positive (same-class, homophily) or negative (different-class, heterophily) edges, but with identity relational geometry.
Relation to prior work: Scalar sheaves are exactly the signed graph Laplacians used in SSGC (Zhu et al., 2021). FAGCN’s signed attention (a_{uv} ∈ [−1, +1]) is a soft scalar sheaf.
When to use: When computational cost is paramount, or as a baseline to test whether sheaf structure (beyond signs) is needed.
Type 2: Diagonal Maps
Form: F_{v▷e} = diag(f₁{v▷e}, …, f_d{v▷e}) where f_k ∈ ℝ.
Parameters per edge: 2d scalars.
Sheaf Laplacian blocks:
A diagonal matrix — each feature dimension has its own independent signed weight.
Null space: Can be larger than standard Laplacian null space. Each feature dimension has its own scalar sheaf; the overall null space is the intersection of d independent scalar sheaf null spaces.
Expressive power: Can represent d independent signed weights per edge — different channels can be treated as homophilic (positive weight) or heterophilic (negative weight). This decouples the heterophily handling per feature dimension.
When to use: The recommended default for most tasks. Provides the best accuracy-vs-cost tradeoff in NSD experiments.
MLP output: The sheaf predictor MLP outputs a 2d-dimensional vector per edge (d values for each endpoint’s diagonal entries).
Type 3: Orthogonal Maps
Form: F_{v▷e} = O_{v▷e} ∈ O(d) (orthogonal matrix, OO^T = I, det O = ±1).
Parameters per edge: 2·d(d−1)/2 = d(d−1) angles (each O_{v▷e} parameterised by d(d−1)/2 Cayley/Givens parameters).
Sheaf Laplacian blocks:
The off-diagonal block is an orthogonal matrix — this is the Connection Laplacian.
Null space: Global sections are parallel-transported signals — signals consistent with the connection. For a flat connection (trivial holonomy), dim ker = d. For non-flat connections, dim ker can be lower.
Expressive power: Can represent arbitrary rotations between adjacent nodes (but no scaling). This is the natural choice for geometric data where relative orientations matter.
Gauge equivariance: Yes — the Connection Laplacian is O(d)-gauge-equivariant by construction. Equivariant sheaf GNNs require orthogonal maps.
When to use: Geometric data (molecules, point clouds), synchronisation tasks, when gauge equivariance is required.
| Key limitation: Cannot scale features — | O_{v▷e} x | = | x | . If feature magnitude carries task-relevant information, orthogonal maps discard it. |
Type 4: General Linear Maps
Form: F_{v▷e} ∈ ℝ^{d×d} (no constraint).
Parameters per edge: 2d² scalars.
Sheaf Laplacian blocks:
Null space: The null space is the intersection of d² linear constraints — highly task-dependent. Can be very large (if many maps share common null vectors) or trivial.
Expressive power: Maximum — can represent any linear relational structure between adjacent nodes. Subsumes scalar, diagonal, and orthogonal maps as special cases.
Risk: With d² parameters per map, general maps have high capacity and can overfit on small graphs. The Sheaf Laplacian may become nearly rank-deficient if the maps degenerate.
Regularisation: L2 regularisation on map norms, or constraining the maps to be near-orthogonal, helps prevent degeneracy.
When to use: Large graphs with abundant training data; tasks with complex relational structure that cannot be captured by simpler map types.
Symmetric Maps: A Useful Intermediate
Form: F_{v▷e} = Fᵀ_{v▷e} ∈ S(d) (symmetric matrix).
Parameters per edge: 2·d(d+1)/2 = d(d+1) per edge.
Property: The Sheaf Laplacian blocks [Δ_F]{uv} = −F{u▷e}ᵀ F_{v▷e} are symmetric (since F is symmetric and the product of symmetric matrices is symmetric iff they commute — but this is approximately true if maps are near-diagonal).
When to use: When the relational geometry is undirected (the map from u to e is “the same” as from e to u in some sense). Fewer parameters than general, more expressive than diagonal.
Comparison Table
| Map type | Params/edge | Laplacian block | Gauge equiv | Scaling | Heterophily |
|---|---|---|---|---|---|
| Scalar | 2 | Scalar × I | No | Yes | Via sign |
| Diagonal | 2d | Diagonal matrix | No | Yes | Per-channel sign |
| Orthogonal | d(d−1) | Orthogonal matrix | Yes | No | Via rotation |
| Symmetric | d(d+1) | Symmetric matrix | No | Yes | Via eigenvalue |
| General | 2d² | Arbitrary matrix | No | Yes | Maximum |
Impact on Null Space Dimension
The null space dimension dim(H⁰) = dim ker(Δ_F) determines the long-time attractor of sheaf diffusion — what information is preserved at large depth.
| Map type | dim H⁰ (connected graph, generic maps) |
|---|---|
| Identity (GCN) | d (constant functions) |
| Scalar | d (scalar sheaf → same as identity) |
| Diagonal | ≥ d (depends on sign pattern) |
| Orthogonal (flat) | d (parallel-transported sections) |
| Orthogonal (non-flat) | < d |
| General | ≥ 0 (depends on learned maps) |
The key insight: NSD with general or diagonal maps can learn maps that increase dim(H⁰) beyond d — the model adapts its oversmoothing attractor to the task.
Practical Recommendations
- Start with diagonal maps — they work well empirically, have few parameters, and are interpretable.
- Use orthogonal maps when gauge equivariance is needed or the data has a natural geometric interpretation.
- Use general maps only with sufficient training data (>1k nodes per class) and appropriate regularisation.
- Never use scalar maps unless the goal is to test whether sheaf structure beyond signs is beneficial.
- Stalk dimension d=2 or d=3 usually suffices — increasing d beyond 5 rarely helps and increases cost.
References
- Bodnar, C., Giovanni, F. D., Chamberlain, B. P., Liò, P., & Bronstein, M. M. (2022). Neural Sheaf Diffusion. NeurIPS 2022 (ablation over map types: general, diagonal, orthogonal, symmetric).
- Barbero, F., Bodnar, C., de Ocáriz Borde, H. S., Bronstein, M., Veličković, P., & Liò, P. (2022). Sheaf Attention Networks. NeurIPS 2022 Workshop (orthogonal maps with attention — gauge-equivariant architecture).
- Singer, A. (2011). Angular Synchronisation by Eigenvectors and Semidefinite Programming. Applied and Computational Harmonic Analysis (orthogonal maps as connection Laplacian — motivates the orthogonal parameterisation).
