Lightnews — Scholar-powered news

@juliekallini.bsky.social

31 followers 15 following 6 posts

Posts Replies Media Videos

juliekallini.bsky.social

@juliekallini.bsky.social

As the models get larger, MrT5 gets better.

At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.

This means: better efficiency–performance trade-offs in high-resource settings.

April 23, 2025 at 3:18 PM

juliekallini.bsky.social

@juliekallini.bsky.social

MrT5 is a variant of ByT5 that dynamically shortens inputs for faster inference, addressing the limitations of tokenizer-free modeling!

In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale

April 23, 2025 at 3:18 PM

juliekallini.bsky.social

@juliekallini.bsky.social

If you’re at #ICLR2025, come see me present 💪MrT5 on Thursday (4/24)!

🪧 Poster: 10–12:30 in Hall 3 + 2B (#273)
⚡️ Lightning talk: right after in Opal 103–104 (Session on Tokenizer-Free, End-to-end Architectures)

Plus, MrT5 has many exciting updates 🧵

April 23, 2025 at 3:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news