Lightnews — Scholar-powered news

juliekallini.bsky.social

@juliekallini.bsky.social

I once again want to thank my wonderful coauthors for making this work possible!

@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social

Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!

April 23, 2025 at 3:18 PM

juliekallini.bsky.social

@juliekallini.bsky.social

🧩 Want to use MrT5?

On HuggingFace 🤗 we’re releasing:
- MrT5 Small (300M params): stanfordnlp/mrt5-small
- MrT5 Large (1.23B): stanfordnlp/mrt5-large

And if you haven’t already, check out the paper!

Paper: arxiv.org/abs/2410.20771
Github Repo: github.com/jkallini/mrt5

GitHub - jkallini/mrt5: Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."

Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models." - jkallini/mrt5

github.com

April 23, 2025 at 3:18 PM

juliekallini.bsky.social

@juliekallini.bsky.social

As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.

April 23, 2025 at 3:18 PM

juliekallini.bsky.social

@juliekallini.bsky.social

MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale

April 23, 2025 at 3:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news