Remember to Forget: Gated Adaptive Positional Encoding

Published in arXiv preprint arXiv:2605.10414, 2026

Abstract

We address limitations of Rotary Positional Encoding (RoPE) in language models when handling extended sequences. We propose GAPE (Gated Adaptive Positional Encoding), which introduces a content-aware bias directly into the attention logits while preserving the rotary geometry. The mechanism uses query-dependent and key-dependent gates to manage attention distribution, suppressing irrelevant distant context while protecting salient tokens. GAPE improves attention sharpness and long-context performance on retrieval and benchmark tasks.

Key Contributions

  • GAPE โ€” a drop-in extension to RoPE that adds content-aware gating to attention logits without modifying the rotary geometry.
  • Query-dependent and key-dependent gates that learn to remember salient tokens and forget irrelevant distant context.
  • Improved attention sharpness and long-context retrieval performance without additional parameters in the positional layer.
  • Empirical gains on long-context benchmarks, demonstrating practical utility for extended-context language modelling.

Resources

Recommended citation: Ali, R.; Borgi, A.; Irwin, C.; Severino, M.; Liรฒ, P. (2026). "Remember to Forget: Gated Adaptive Positional Encoding." arXiv:2605.10414.
Download Paper | Download Bibtex