Remember to Forget: Gated Adaptive Positional Encoding
Published in arXiv preprint arXiv:2605.10414, 2026
Abstract
We address limitations of Rotary Positional Encoding (RoPE) in language models when handling extended sequences. We propose GAPE (Gated Adaptive Positional Encoding), which introduces a content-aware bias directly into the attention logits while preserving the rotary geometry. The mechanism uses query-dependent and key-dependent gates to manage attention distribution, suppressing irrelevant distant context while protecting salient tokens. GAPE improves attention sharpness and long-context performance on retrieval and benchmark tasks.
Key Contributions
- GAPE โ a drop-in extension to RoPE that adds content-aware gating to attention logits without modifying the rotary geometry.
- Query-dependent and key-dependent gates that learn to remember salient tokens and forget irrelevant distant context.
- Improved attention sharpness and long-context retrieval performance without additional parameters in the positional layer.
- Empirical gains on long-context benchmarks, demonstrating practical utility for extended-context language modelling.
Resources
- ๐ ArXiv: https://arxiv.org/abs/2605.10414
- ๐งพ BibTeX: https://arxiv.org/bibtex/2605.10414
Recommended citation: Ali, R.; Borgi, A.; Irwin, C.; Severino, M.; Liรฒ, P. (2026). "Remember to Forget: Gated Adaptive Positional Encoding." arXiv:2605.10414.
Download Paper | Download Bibtex
