FoPE: Fourier Position Embedding for Length Generalization

6 minute read

Published: May 29, 2026

TL;DR: FoPE looks at positional encoding through the Fourier lens. Its core claim is that long-context failure is partly a frequency-domain problem: attention extends periodically, but existing encodings do not control that periodic extension well enough. FoPE explicitly improves that behaviour, which leads to better length generalization than plain RoPE-style extrapolation tricks.

Paper: "Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization" · arXiv:2412.17739
Authors: Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Youbang Sun, Yuchen Fan, Xuekai Zhu, Biqing Qi, Ning Ding, Bowen Zhou
Venue: arXiv 2024 / ICML 2025 code release · 📄 Read the paper

First page of the FoPE paper — Paper preview — Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization (Hua et al., 2024).

Figure 1 — FoPE reframes the long-context problem as a periodic-extension problem in attention's frequency domain. Instead of only stretching positional coordinates, it tries to improve how attention behaves when that positional pattern extends beyond the training window. Source: [1].

The big shift: many context-extension methods are heuristic fixes on top of RoPE. FoPE instead starts from a more structural question: if attention behaves periodically in the Fourier sense, how should that periodic extension be designed so the model keeps generalizing when sequences get longer?

Intuition First: The Periodic Extension Problem

Every positional encoding scheme must answer: “what happens when the model is asked about position 8000 but was only trained up to position 4096?”

Most answers are: “stretch or rescale the coordinates so 8000 looks like something familiar.” That is the PI / NTK / YaRN family of answers.

FoPE asks a more fundamental question: “what is the natural mathematical continuation of the positional signal beyond the training window, and how can we design the encoding so that continuation is well-behaved?”

Think of it this way: a sine wave of frequency f, when you extend its domain, simply continues the same oscillation — that is a clean periodic extension. If your positional encoding is a superposition of sinusoids, the behaviour at longer contexts is entirely determined by how those sinusoids continue. FoPE explicitly controls that continuation in the frequency domain, rather than just rescaling inputs and hoping for the best.

Both curves are identical within the training window. Beyond the training cutoff, a well-designed periodic extension (teal, FoPE-style) continues the oscillation cleanly. A naive extension (red dashed) drifts in amplitude and frequency — the model has no guarantee its attention patterns will be stable in the out-of-distribution region.

Why This Paper Matters

Most of the long-context positional-encoding story has focused on practical recipes:

interpolate positions
rescale RoPE frequencies
blend low- and high-frequency dimensions differently

Those methods work, but they are often introduced as engineering tricks. FoPE is interesting because it tries to explain the same problem more fundamentally. The paper argues that length generalization should be understood through the frequency-domain behaviour of attention, especially through how positional structure extends periodically beyond the training range.

The Core Idea

The name gives away the perspective: Fourier Position Embedding. The method is built around the observation that attention and positional encoding have a natural periodic structure, and that this structure matters once context length grows past what the model saw in training.

In simplified terms, FoPE says:

\[ \text{good long-context positional encoding} \;\approx\; \text{good periodic extension in the frequency domain} \]

That is not the exact implementation formula, but it is the right mental model. The paper is less about “one more RoPE scaling constant” and more about controlling how the positional signal continues when the model is pushed to unseen lengths.

How It Differs from RoPE Extensions

RoPE and its descendants already use sinusoidal or rotational structure, so they are naturally tied to Fourier ideas. But most RoPE extension methods still act locally:

rescale positions
rescale frequencies
reweight frequency bands

FoPE is more global in spirit. It asks whether the periodic continuation itself is well shaped for attention. That is why it belongs in the same family as long-context RoPE methods, but still feels conceptually different from NTK scaling or YaRN.

When This Is Useful

FoPE is useful if you want to understand not only how to extend context, but why some positional schemes generalize better than others. It is especially valuable as a conceptual bridge between:

classical Fourier-style positional encodings
rotary embeddings and their long-context fixes
newer attempts to reason about extrapolation through signal-processing or kernel views

So even if you deploy YaRN or LongRoPE in practice, FoPE is the kind of paper that sharpens the mental model behind the whole field.

Where It Fits in the Series

If you read the positional-encoding chapters as a progression, FoPE belongs late in the story:

Sinusoidal / learned / relative explain the early positional ideas
RoPE turns position into rotation
PI / NTK / YaRN / LongRoPE show practical long-context fixes
FoPE steps back and asks what the periodic extension should look like in the first place

That is why it is useful: it is not just another trick, but a more explanatory lens on long-context behaviour.

✅ Key Takeaways

FoPE treats length generalization as a frequency-domain and periodic-extension problem.
It is conceptually close to RoPE extensions, but more explanatory than purely heuristic.
The method is useful for understanding why some long-context positional encodings extrapolate better.
In the positional-encoding story, FoPE belongs after the practical long-context RoPE fixes.

References

[1] Hua, E., Jiang, C., Lv, X., Zhang, K., Sun, Y., Fan, Y., Zhu, X., Qi, B., Ding, N., Zhou, B. (2024). Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization. arXiv 2024.
[2] GitHub repository: TsinghuaC3I/Fourier-Position-Embedding.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

FoPE: Fourier Position Embedding for Length Generalization

Intuition First: The Periodic Extension Problem

Why This Paper Matters

The Core Idea

How It Differs from RoPE Extensions

When This Is Useful

Where It Fits in the Series

✅ Key Takeaways

References

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

Position Interpolation: Extending RoPE with Minimal Fine-Tuning

Alessio Borgi

Intuition First: The Periodic Extension Problem

Why This Paper Matters

The Core Idea

How It Differs from RoPE Extensions

When This Is Useful

Where It Fits in the Series

✅ Key Takeaways

References

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 Position Interpolation: Extending RoPE with Minimal Fine-Tuning

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

Position Interpolation: Extending RoPE with Minimal Fine-Tuning