XPos: Length-Extrapolatable Rotary Embeddings
Published:

Why RoPE Still Struggles at Long Range
RoPE is elegant because relative position emerges from rotating the query and key vectors by position-dependent angles. But when sequence length grows far beyond training, those rotations can still become hard for the model to use reliably. The issue is not only phase wrapping. It is also that attention scores at large distances become less well-conditioned.
So the question behind XPos is: can we keep RoPE’s relative-position geometry, but make its long-range behaviour numerically more stable?
The Core Modification
Standard RoPE rotates each 2D query-key pair by an angle that depends on token position. XPos applies the same rotation idea, but introduces a position-dependent scale term. In simplified form:
where:
- (R_m) is the usual RoPE rotation at position (m)
- (\alpha) is a learned or fixed scale base close to 1
The important property is that the relative phase structure is preserved, but the magnitude now changes with distance in a controlled way.
What Problem This Solves
With plain RoPE, extending sequence length can distort the effective distribution of attention logits. XPos tries to counteract that by making long-range interactions decay in a smoother, more stable way.
In practice, that means:
- better length extrapolation than plain RoPE in some settings
- less brittle long-context behaviour
- a small change to the positional mechanism, not a full architectural rewrite
How to Think About XPos
If RoPE says, “position is a rotation angle,” then XPos says:
position is mostly a rotation angle, but distance should also slightly reweight how strongly those rotated vectors interact.
That extra degree of control is what makes XPos interesting. It is still fundamentally a rotary method, but it acknowledges that angle alone is not always enough when context length grows.
When It Is Useful
XPos is most useful when:
- you already like RoPE’s relative-position behaviour
- you want better extrapolation without switching to a completely different scheme
- you care about long context, but do not want to redesign the attention mechanism
It is less famous than YaRN or LongRoPE in today’s LLM tooling, but conceptually it is one of the cleanest “make RoPE more stable” ideas.
✅ Key Takeaways
- XPos is a rotary embedding extension, not a brand-new positional family.
- It keeps RoPE's relative rotation structure but adds distance-aware scaling.
- The goal is better length extrapolation and better-behaved attention logits at long range.
- You can read it as a principled "RoPE, but more stable" design.
References
- [1] Sun, Y., Dong, L., Huang, S., Ma, S., Xia, F., Wang, S., Xue, J., Chen, J., Wei, F. (2022). A Length-Extrapolatable Transformer. arXiv 2022.
- [2] Su, J., Lu, Y., Pan, S., Murtadha, A., Wen, B., Liu, Y. (2021). RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv 2021.
