Remember to Forget: Gated Adaptive Positional Encoding

Published in arXiv preprint arXiv:2605.10414, 2026

Abstract

We address limitations of Rotary Positional Encoding (RoPE) in language models when handling extended sequences. We propose GAPE (Gated Adaptive Positional Encoding), which introduces a content-aware bias directly into the attention logits while preserving the rotary geometry. The mechanism uses query-dependent and key-dependent gates to manage attention distribution, suppressing irrelevant distant context while protecting salient tokens. GAPE improves attention sharpness and long-context performance on retrieval and benchmark tasks.

Key Contributions

GAPE — a drop-in extension to RoPE that adds content-aware gating to attention logits without modifying the rotary geometry.
Query-dependent and key-dependent gates that learn to remember salient tokens and forget irrelevant distant context.
Improved attention sharpness and long-context retrieval performance without additional parameters in the positional layer.
Empirical gains on long-context benchmarks, demonstrating practical utility for extended-context language modelling.

Resources

📄 ArXiv: https://arxiv.org/abs/2605.10414
🧾 BibTeX: https://arxiv.org/bibtex/2605.10414

Recommended citation: Ali, R.; Borgi, A.; Irwin, C.; Severino, M.; Liò, P. (2026). "Remember to Forget: Gated Adaptive Positional Encoding." arXiv:2605.10414.
Download Paper | Download Bibtex

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Abstract

Key Contributions

Resources

Share on