juliekallini.bsky.social
@juliekallini.bsky.social
I once again want to thank my wonderful coauthors for making this work possible!

@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social

Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!
April 23, 2025 at 3:18 PM
🧩 Want to use MrT5?

On HuggingFace 🤗 we’re releasing:
- MrT5 Small (300M params): stanfordnlp/mrt5-small
- MrT5 Large (1.23B): stanfordnlp/mrt5-large

And if you haven’t already, check out the paper!

Paper: arxiv.org/abs/2410.20771
Github Repo: github.com/jkallini/mrt5
GitHub - jkallini/mrt5: Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models." - jkallini/mrt5
github.com
April 23, 2025 at 3:18 PM
As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.
April 23, 2025 at 3:18 PM
MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale
April 23, 2025 at 3:18 PM