T5: Every NLP Task as Text-to-Text
Published:
The Unifying Idea
Different NLP models used to need different architectures and training objectives. BERT needs a classification head for sentiment; a seq2seq model for translation; another model for QA.
T5 asks: what if we just described the task in natural language as part of the input?
Every task gets a text prefix that tells the model what to do. The model is trained with teacher-forcing on the target text. At inference, it generates the answer token-by-token.
Architecture: Full Encoder-Decoder
T5 uses the original Transformer’s full encoder-decoder:
- Encoder: reads the input (prefix + content), builds rich contextual representations.
- Decoder: generates the output token-by-token, attending to both its own previous outputs (causal self-attention) and the encoder representations (cross-attention).
This is different from BERT (encoder only) and GPT (decoder only).
Pre-Training: Span Corruption
T5 doesn’t use masked LM (BERT-style) — it uses span corruption: randomly select spans of 2–5 consecutive tokens, replace each span with a single sentinel token, and train the decoder to reconstruct the original spans.
Original: "The cat sat on the mat."
Corrupted: "The cat <extra_id_0> the <extra_id_1>."
Target: "<extra_id_0> sat on <extra_id_1> mat."
This is more efficient than masking individual tokens and produces better representations.
Scale: T5-Small to T5-11B
| Model | Params |
|---|---|
| T5-small | 60M |
| T5-base | 220M |
| T5-large | 770M |
| T5-XL | 3B |
| T5-XXL / T5-11B | 11B |
Flan-T5 (2022) is T5 further fine-tuned on 1,836 language tasks — making it an excellent open-source instruction-following model.
✅ Key Takeaways
- T5 unifies all NLP tasks as text-in, text-out using task prefixes in natural language.
- Uses full encoder-decoder architecture with cross-attention connecting the two halves.
- Pre-trained via span corruption rather than token masking — more efficient.
- Flan-T5 adds instruction fine-tuning, making it a strong open-source baseline for reasoning tasks.
