Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Activation Functions in Neural Networks: Why Non-Linearity Matters

7 minute read

Published:

Activation functions are the reason neural networks can model curved decision boundaries instead of collapsing into one giant linear map. This chapter builds the intuition first, then walks through the classical functions that shaped deep learning.

FoPE: Fourier Position Embedding for Length Generalization

4 minute read

Published:

FoPE rethinks long-context positional encoding from a frequency-domain perspective. Instead of only stretching RoPE heuristically, it explicitly improves attention’s periodic extension so Transformers generalize more gracefully to longer sequences.

Position Interpolation: Extending RoPE with Minimal Fine-Tuning

4 minute read

Published:

Position Interpolation rescales positions before applying RoPE so a model trained on short contexts can be adapted to longer ones with surprisingly little fine-tuning. It became the reference baseline for long-context RoPE extension.

XPos: Length-Extrapolatable Rotary Embeddings

4 minute read

Published:

XPos modifies RoPE with a multiplicative decay that keeps relative rotations while stabilising magnitude at long distance. It is one of the cleanest attempts to make rotary embeddings extrapolate better.

p-RoPE: What Makes Rotary Positional Encodings Useful?

4 minute read

Published:

This paper does two things at once: it explains what RoPE is really doing inside a trained LLM, and it proposes p-RoPE, a partial rotary variant that drops the lowest frequencies to preserve stronger semantic channels.

SheafPool: Basis-Invariant Graph Readout for Sheaf Neural Networks

5 minute read

Published:

SheafPool solves a key missing piece in sheaf GNNs: graph-level pooling. Instead of averaging stalk vectors in arbitrary local bases, it aligns them into a shared canonical frame and builds a readout that is invariant to local basis changes.

GAPE: Remember to Forget — Gated Adaptive Positional Encoding

8 minute read

Published:

GAPE is a drop-in RoPE augmentation that adds content-aware attention logit biases: a query-gate suppresses irrelevant distant context while a key-gate preserves salient distant tokens. Provably sharper attention and improved long-context robustness — no architecture changes needed.

PolyNSD: Polynomial Neural Sheaf Diffusion

7 minute read

Published:

PolyNSD replaces the NSD propagation operator with a degree-K Chebyshev polynomial in the normalised sheaf Laplacian, achieving SOTA on homo- and heterophilic benchmarks with only diagonal restriction maps and dramatically lower memory usage.

Z-SASLM: Zero-Shot Style Blending via Spherical Interpolation

5 minute read

Published:

Z-SASLM is a zero-shot, fine-tuning-free style blending pipeline that replaces linear latent interpolation with SLERP along the geodesic of the hypersphere, preserving latent manifold structure when blending multiple styles. Published at CVPR 2025 Workshop.

HetSheaf: Heterogeneous Graphs Meet Cellular Sheaves

5 minute read

Published:

HetSheaf encodes graph heterogeneity directly in the sheaf data structure — type-aware stalks and restriction maps conditioned on node and edge types — instead of specialised architectural components, achieving +2pp on HGB with 10× fewer parameters.

LongRoPE: Extending Context to 2 Million Tokens

6 minute read

Published:

LongRoPE (Microsoft, 2024) pushes RoPE-based context to 2M tokens by searching for optimal per-dimension rescaling factors — far outperforming NTK or YaRN at extreme lengths.

YaRN: Yet Another RoPE Extensionn Method

5 minute read

Published:

YaRN combines NTK scaling for high-frequency dimensions with linear interpolation for low-frequency ones, plus a temperature correction — achieving better long-context performance with minimal fine-tuning.

The Transformer Block: Putting It All Together

6 minute read

Published:

A single Transformer block combines attention, residuals, layer norm, and an FFN into one reusable unit. Understanding this block is understanding the Transformer.

Residual Connections: Why Transformers Can Be Deep

5 minute read

Published:

Without residual connections, training a 96-layer Transformer would be practically impossible. The skip connection is a simple addition that solves the vanishing gradient problem and enables arbitrary depth.

Layer Normalization in Transformers

5 minute read

Published:

Layer norm is not optional plumbing. It determines training stability, gradient flow, and whether deep Transformers converge at all. Pre-LN vs Post-LN is not a detail — it changes training dynamics fundamentally.

Query, Key, Value: The Intuition Behind QKV

5 minute read

Published:

Q, K, and V are not arbitrary labels. They map precisely onto search queries, database labels, and retrieved content — a framework you already understand.

ALiBi: Attention with Linear Biases

3 minute read

Published:

ALiBi skips traditional positional embeddings entirely and just subtracts a distance penalty from attention scores. Zero extra parameters, excellent extrapolation. Press et al., 2022.

RoPE: Rotary Position Embeddings

4 minute read

Published:

RoPE encodes position by rotating query and key vectors by an angle proportional to position. The clever result: absolute encoding produces relative attention for free — and it’s now the dominant PE for large language models.

Relative Positional Encodings: It’s All About Distance

3 minute read

Published:

Instead of asking ‘where am I?’, relative PEs ask ‘how far are these two tokens apart?’ Shaw et al. and T5 both use this idea to build models that generalise better to variable-length inputs.

Learned Positional Encodings: Data-Driven Position

3 minute read

Published:

Instead of a fixed formula, why not just train position embeddings from scratch — like word embeddings? That’s exactly what BERT and GPT-1 do. Here’s how and when it works.

Sinusoidal Positional Encodings: The Original Solution

3 minute read

Published:

The PE method from the 2017 ‘Attention Is All You Need’ paper uses sine and cosine waves at different frequencies. Learn why this elegant choice encodes position without any training.

Positional Encodings: Why Position Matters

3 minute read

Published:

Transformers see all tokens at once — which means without help they’d treat ‘cat ate mouse’ and ‘mouse ate cat’ the same. Positional encodings fix this. Here’s the full landscape.

Multi-Head Attention: Many Eyes on the Data

2 minute read

Published:

One attention head sees one relationship. Multiple heads running in parallel let the model capture syntax, semantics, and coreference simultaneously — here’s how.

Self-Attention: Teaching Machines to Focus

4 minute read

Published:

Self-attention is the core of every Transformer. Learn how Query, Key, and Value vectors let every token directly attend to every other — and why that matters.

Transformers: The Architecture That Changed AI

7 minute read

Published:

A self-contained guide to the Transformer — the engine behind GPT, BERT, and modern AI. Learn how attention replaces recurrence and why every major AI system uses it.

PartecipationsAndTalks

portfolio

projects

MoonBot Navigation

Autonomous lunar rover navigation and interaction — winner of the TESP 2025 Competition.

RoboMAT

MATLAB library for robotics simulations, kinematics, dynamics, control, and path planning.

UniDrive: University Carpooling App

A Flutter/Dart mobile app that connects university students for ride-sharing — schedule, match, and split commutes within the campus community.

publications

Heterogeneous Sheaf Neural Networks

Published in arXiv preprint arXiv:2409.08036, 2024

HetSheaf is a cellular-sheaf framework for heterogeneous graphs that encodes node and edge types through type-aware local feature spaces and learned restriction maps — without specialised architectural components. The companion SheafPool readout is invariant to basis changes and enables graph-level prediction. Gains of up to +2 pp on the Heterogeneous Graph Benchmark with up to 10× fewer parameters.

Recommended citation: Braithwaite, L.; Borgi, A.; Onorato, G.; Tarantelli, K.; Restuccia, F.; Silvestri, F.; Liò, P. (2024). "Heterogeneous Sheaf Neural Networks." arXiv:2409.08036.
Go to the Webpage | Download Paper | Download Bibtex

Z-SASLM: Zero-Shot Style-Aligned SLI Blending for Latent Manipulation

Published in CVPR (Computer Vision and Pattern Recognition) 2025 Workshops (Nashville, USA 🇺🇸), 2025

Z-SASLM introduces a zero-shot, fine-tuning-free approach to style alignment in diffusion models by blending multiple reference styles directly in latent space using spherical linear interpolation (SLI) with learned, context-aware weights. The method avoids model retraining, preserves content semantics, and yields consistent style transfer across prompts and seeds.

Recommended citation: Borgi, A.; Maiano, L.; Amerini, I. (2025). "Z-SASLM: Zero-Shot Style-Aligned SLI Blending for Latent Manipulation." CVPR 2025 Workshops.
Go to the Webpage | Download Paper | Download Poster | Download Bibtex | GitHub Code

Remember to Forget: Gated Adaptive Positional Encoding

Published in arXiv preprint arXiv:2605.10414, 2026

GAPE (Gated Adaptive Positional Encoding) addresses core limitations of RoPE in long-context language models. A content-aware bias is injected directly into attention logits while preserving rotary geometry: query-dependent and key-dependent gates suppress irrelevant distant tokens while protecting salient context, improving attention sharpness and long-context performance on retrieval and standard benchmarks.

Recommended citation: Ali, R.; Borgi, A.; Irwin, C.; Severino, M.; Liò, P. (2026). "Remember to Forget: Gated Adaptive Positional Encoding." arXiv:2605.10414.
Go to the Webpage | Download Paper | Download Bibtex

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.