Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More
Published:
Not every activation is a hidden-layer curve. Some produce probabilities, some implement learned gates, some shrink values toward zero, and some are designed for very specialized settings such as implicit neural representations.
Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating
Published:
Once ReLU became the default, researchers started asking a better question: can we keep the easy optimization while making the activation smoother, softer, and more expressive? This chapter covers the modern answers.
Activation Functions in Neural Networks: Why Non-Linearity Matters
Published:
Activation functions are the reason neural networks can model curved decision boundaries instead of collapsing into one giant linear map. This chapter builds the intuition first, then walks through the classical functions that shaped deep learning.
FoPE: Fourier Position Embedding for Length Generalization
Published:
FoPE rethinks long-context positional encoding from a frequency-domain perspective. Instead of only stretching RoPE heuristically, it explicitly improves attention’s periodic extension so Transformers generalize more gracefully to longer sequences.
Position Interpolation: Extending RoPE with Minimal Fine-Tuning
Published:
Position Interpolation rescales positions before applying RoPE so a model trained on short contexts can be adapted to longer ones with surprisingly little fine-tuning. It became the reference baseline for long-context RoPE extension.
XPos: Length-Extrapolatable Rotary Embeddings
Published:
XPos modifies RoPE with a multiplicative decay that keeps relative rotations while stabilising magnitude at long distance. It is one of the cleanest attempts to make rotary embeddings extrapolate better.
p-RoPE: What Makes Rotary Positional Encodings Useful?
Published:
This paper does two things at once: it explains what RoPE is really doing inside a trained LLM, and it proposes p-RoPE, a partial rotary variant that drops the lowest frequencies to preserve stronger semantic channels.
SheafPool: Basis-Invariant Graph Readout for Sheaf Neural Networks
Published:
SheafPool solves a key missing piece in sheaf GNNs: graph-level pooling. Instead of averaging stalk vectors in arbitrary local bases, it aligns them into a shared canonical frame and builds a readout that is invariant to local basis changes.
GAPE: Remember to Forget — Gated Adaptive Positional Encoding
Published:
GAPE is a drop-in RoPE augmentation that adds content-aware attention logit biases: a query-gate suppresses irrelevant distant context while a key-gate preserves salient distant tokens. Provably sharper attention and improved long-context robustness — no architecture changes needed.
PolyNSD: Polynomial Neural Sheaf Diffusion
Published:
PolyNSD replaces the NSD propagation operator with a degree-K Chebyshev polynomial in the normalised sheaf Laplacian, achieving SOTA on homo- and heterophilic benchmarks with only diagonal restriction maps and dramatically lower memory usage.
Z-SASLM: Zero-Shot Style Blending via Spherical Interpolation
Published:
Z-SASLM is a zero-shot, fine-tuning-free style blending pipeline that replaces linear latent interpolation with SLERP along the geodesic of the hypersphere, preserving latent manifold structure when blending multiple styles. Published at CVPR 2025 Workshop.
HetSheaf: Heterogeneous Graphs Meet Cellular Sheaves
Published:
HetSheaf encodes graph heterogeneity directly in the sheaf data structure — type-aware stalks and restriction maps conditioned on node and edge types — instead of specialised architectural components, achieving +2pp on HGB with 10× fewer parameters.
LongRoPE: Extending Context to 2 Million Tokens
Published:
LongRoPE (Microsoft, 2024) pushes RoPE-based context to 2M tokens by searching for optimal per-dimension rescaling factors — far outperforming NTK or YaRN at extreme lengths.
YaRN: Yet Another RoPE Extensionn Method
Published:
YaRN combines NTK scaling for high-frequency dimensions with linear interpolation for low-frequency ones, plus a temperature correction — achieving better long-context performance with minimal fine-tuning.
NTK-Aware Scaling: Extending Context Without Fine-Tuning
Published:
NTK-Aware Scaling extends the context window of RoPE-based models by rescaling frequencies using Neural Tangent Kernel theory — with no fine-tuning required.
The Transformer Block: Putting It All Together
Published:
A single Transformer block combines attention, residuals, layer norm, and an FFN into one reusable unit. Understanding this block is understanding the Transformer.
Feed-Forward Networks: The Forgotten Half of Transformers
Published:
The FFN block holds two-thirds of a Transformer’s parameters and does most of its factual recall. Yet it is almost always overlooked in introductions to attention.
Residual Connections: Why Transformers Can Be Deep
Published:
Without residual connections, training a 96-layer Transformer would be practically impossible. The skip connection is a simple addition that solves the vanishing gradient problem and enables arbitrary depth.
Layer Normalization in Transformers
Published:
Layer norm is not optional plumbing. It determines training stability, gradient flow, and whether deep Transformers converge at all. Pre-LN vs Post-LN is not a detail — it changes training dynamics fundamentally.
Encoder vs Decoder vs Encoder-Decoder Transformers
Published:
BERT, GPT, and T5 are all Transformers — but their architectures are fundamentally different. One comparison table clarifies the entire landscape.
Cross-Attention: How Models Attend to Another Sequence
Published:
Cross-attention lets one sequence query information from a completely different sequence. It is the bridge between encoder and decoder, and the core of multimodal AI.
Attention Masks: Causal, Padding, and Bidirectional
Published:
The difference between GPT, BERT, and T5 is largely a masking decision. Learn how causal, padding, and bidirectional masks shape what each token is allowed to see.
Query, Key, Value: The Intuition Behind QKV
Published:
Q, K, and V are not arbitrary labels. They map precisely onto search queries, database labels, and retrieved content — a framework you already understand.
Scaled Dot-Product Attention: Why the √d Matters
Published:
Dividing by √d_k is not just a trick — it prevents softmax from saturating and dying in high-dimensional spaces. Here’s the math and the intuition.
ALiBi: Attention with Linear Biases
Published:
ALiBi skips traditional positional embeddings entirely and just subtracts a distance penalty from attention scores. Zero extra parameters, excellent extrapolation. Press et al., 2022.
RoPE: Rotary Position Embeddings
Published:
RoPE encodes position by rotating query and key vectors by an angle proportional to position. The clever result: absolute encoding produces relative attention for free — and it’s now the dominant PE for large language models.
Relative Positional Encodings: It’s All About Distance
Published:
Instead of asking ‘where am I?’, relative PEs ask ‘how far are these two tokens apart?’ Shaw et al. and T5 both use this idea to build models that generalise better to variable-length inputs.
Learned Positional Encodings: Data-Driven Position
Published:
Instead of a fixed formula, why not just train position embeddings from scratch — like word embeddings? That’s exactly what BERT and GPT-1 do. Here’s how and when it works.
Sinusoidal Positional Encodings: The Original Solution
Published:
The PE method from the 2017 ‘Attention Is All You Need’ paper uses sine and cosine waves at different frequencies. Learn why this elegant choice encodes position without any training.
Positional Encodings: Why Position Matters
Published:
Transformers see all tokens at once — which means without help they’d treat ‘cat ate mouse’ and ‘mouse ate cat’ the same. Positional encodings fix this. Here’s the full landscape.
Multi-Head Attention: Many Eyes on the Data
Published:
One attention head sees one relationship. Multiple heads running in parallel let the model capture syntax, semantics, and coreference simultaneously — here’s how.
Self-Attention: Teaching Machines to Focus
Published:
Self-attention is the core of every Transformer. Learn how Query, Key, and Value vectors let every token directly attend to every other — and why that matters.
Transformers: The Architecture That Changed AI
Published:
A self-contained guide to the Transformer — the engine behind GPT, BERT, and modern AI. Learn how attention replaces recurrence and why every major AI system uses it.
PartecipationsAndTalks
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2 
projects
AdaViT (Adaptive Vision Transformers)
Adaptive Vision Transformer with dynamic token sparsification and halting for efficient image classification.
ALPR: Automatic License Plate Recognition System
End-to-end real-time license plate detection and OCR pipeline with dual GUIs — one for security managers, one for drivers — built with PyTorch and Streamlit.
(AMR) Autonomous Mobile Robotics Cleaning Robot
Autonomous mobile robot for indoor cleaning with navigation, obstacle avoidance, and task orchestration.
AutoDriveCarSimulator: Autonomous Driving with CNNs
A simulation platform for developing and testing autonomous driving algorithms — using CNNs to map raw camera frames to steering and throttle commands.
BioHeat PINNs: Temperature Estimation with Bio-Heat Equation using Physics-Informed Neural Networks
Physics-Informed Neural Networks for real-time temperature estimation via the Pennes Bio-Heat Equation, supporting hyperthermia therapy control.
CareConnect: AI-Driven Hospital Environment Monitoring System
An AI system for querying hospital environmental sensor data via natural language chat, generating real-time graphs, and triggering automated actions via LangChain and MQTT.
Clustering-Deepening: Clustering Algorithms for Object Tracking & Image Segmentation
An in-depth study of clustering algorithms — from k-Means to DBSCAN and GMMs — applied to object tracking and image segmentation.
ElectricCompany-TicketingSystem: IT Infrastructure & Ticketing for an Electric Consultancy
End-to-end analysis and implementation of IT infrastructure (disaster recovery, smart working, fleet management) and a full ticketing system for an electric consultancy firm.
EmailSpamDetector: Spam Detection with Bidirectional LSTMs
Classifies spam and ham emails using a Bidirectional LSTM — capturing both forward and backward temporal context in email text for high-accuracy filtering.
HelpDeskSystem: Web-Based Customer Support Platform
A full-stack web help desk for issue tracking and customer support — with ticket management, user authentication, and real-time status updates.
Home-Automation: Smart Home with IoT and Arduino
End-to-end smart home system — from a physical miniature house build to Arduino-powered sensors, automated routines, and a companion mobile app.
InstaSocial: Photo-Sharing Social Platform
A full-stack Instagram-like photo sharing app — upload, explore, like, and comment — built with Vue.js frontend, Go REST API, and Docker deployment.
Java-CategoryTheory: A Category Theory Library in Java
A Java library that models core Category Theory constructs — categories, functors, natural transformations — and demonstrates their practical role in software design.
MLPipelineOptimizationStudy: End-to-End ML Pipeline Exploration
A systematic exploration of ML pipeline optimisation — covering preprocessing, feature engineering, model selection, and hyperparameter tuning across multiple algorithms.
MoonBot Navigation
Autonomous lunar rover navigation and interaction — winner of the TESP 2025 Competition.
NSIO: Neural Search Indexing Optimization
Optimising the Differentiable Search Index (DSI) with data augmentation and parameter-efficient fine-tuning (LoRA, QLoRA, AdaLoRA) — evaluated on MS MARCO.
PC-Performance-Monitoring: Statistical Analysis & ML for System Metrics
Collects, analyses, and visualises PC performance metrics — then applies ML clustering to detect anomalies and performance degradation patterns.
QRCodeGenerator: Custom Static QR Code Generator
Generate static, unlimited-use QR codes with custom styles, embedded icons, and optional captions — entirely in Python.
RealTime-VLM: Real-Time Vision-Language Model Inference in the Browser
Browser-based real-time VLM inference — continuously captures webcam frames and feeds them to any OpenAI-compatible vision API with sub-second latency.
RoboMAT
MATLAB library for robotics simulations, kinematics, dynamics, control, and path planning.
RTAD5G: Real-Time Anomaly Detection in 5G Networks
A real-time anomaly detection pipeline for 5G network telemetry, developed in collaboration with Hewlett Packard Enterprise (HPE).
SkinMe: Deep Learning for Skin Disease Detection
A deep learning application that classifies skin conditions from dermoscopic images using CNNs and LSTMs, supporting early diagnosis assistance.
StyleAligned: Zero-Shot Style Alignment in Text-to-Image Generation
A zero-shot framework for consistent style transfer in text-to-image generation — using minimal shared attention to propagate a reference style without fine-tuning.
UniDrive: University Carpooling App
A Flutter/Dart mobile app that connects university students for ride-sharing — schedule, match, and split commutes within the campus community.
XGNNs: Model-level Explanation of Graph Neural Networks with RL through Graph Generation
Model-level explanations for GNNs via reinforcement-learned graph generation on MUTAG.
Z-SASLM: Zero-Shot Multi-Style Image Synthesis via Spherical Linear Interpolation
CVPR 2025 workshop paper — a zero-shot framework for smooth multi-style image synthesis using Spherical Linear Interpolation in the latent space of diffusion models.
publications
Heterogeneous Sheaf Neural Networks
Published in arXiv preprint arXiv:2409.08036, 2024
HetSheaf is a cellular-sheaf framework for heterogeneous graphs that encodes node and edge types through type-aware local feature spaces and learned restriction maps — without specialised architectural components. The companion SheafPool readout is invariant to basis changes and enables graph-level prediction. Gains of up to +2 pp on the Heterogeneous Graph Benchmark with up to 10× fewer parameters.
Recommended citation: Braithwaite, L.; Borgi, A.; Onorato, G.; Tarantelli, K.; Restuccia, F.; Silvestri, F.; Liò, P. (2024). "Heterogeneous Sheaf Neural Networks." arXiv:2409.08036.
Go to the Webpage | Download Paper | Download Bibtex
Z-SASLM: Zero-Shot Style-Aligned SLI Blending for Latent Manipulation
Published in CVPR (Computer Vision and Pattern Recognition) 2025 Workshops (Nashville, USA 🇺🇸), 2025
Z-SASLM introduces a zero-shot, fine-tuning-free approach to style alignment in diffusion models by blending multiple reference styles directly in latent space using spherical linear interpolation (SLI) with learned, context-aware weights. The method avoids model retraining, preserves content semantics, and yields consistent style transfer across prompts and seeds.
Recommended citation: Borgi, A.; Maiano, L.; Amerini, I. (2025). "Z-SASLM: Zero-Shot Style-Aligned SLI Blending for Latent Manipulation." CVPR 2025 Workshops.
Go to the Webpage | Download Paper | Download Poster | Download Bibtex | GitHub Code
Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves
Published in arXiv preprint arXiv:2512.00242, 2025
ArXiv preprint on Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves.
Recommended citation: Borgi, A.; Silvestri F.; Liò P. (2025). "Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves.
Go to the Webpage | Download Paper | Download Bibtex
Remember to Forget: Gated Adaptive Positional Encoding
Published in arXiv preprint arXiv:2605.10414, 2026
GAPE (Gated Adaptive Positional Encoding) addresses core limitations of RoPE in long-context language models. A content-aware bias is injected directly into attention logits while preserving rotary geometry: query-dependent and key-dependent gates suppress irrelevant distant tokens while protecting salient context, improving attention sharpness and long-context performance on retrieval and standard benchmarks.
Recommended citation: Ali, R.; Borgi, A.; Irwin, C.; Severino, M.; Liò, P. (2026). "Remember to Forget: Gated Adaptive Positional Encoding." arXiv:2605.10414.
Go to the Webpage | Download Paper | Download Bibtex
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.
