Remember to Forget: Gated Adaptive Positional Encoding
Published in arXiv preprint arXiv:2605.10414, 2026
GAPE (Gated Adaptive Positional Encoding) addresses core limitations of RoPE in long-context language models. A content-aware bias is injected directly into attention logits while preserving rotary geometry: query-dependent and key-dependent gates suppress irrelevant distant tokens while protecting salient context, improving attention sharpness and long-context performance on retrieval and standard benchmarks.
