juliekallini.bsky.social
@juliekallini.bsky.social
As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.
April 23, 2025 at 3:18 PM
MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale
April 23, 2025 at 3:18 PM
If you’re at #ICLR2025, come see me present 💪MrT5 on Thursday (4/24)!

🪧 Poster: 10–12:30 in Hall 3 + 2B (#273)
⚡️ Lightning talk: right after in Opal 103–104 (Session on Tokenizer-Free, End-to-end Architectures)

Plus, MrT5 has many exciting updates 🧵
April 23, 2025 at 3:18 PM