@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social
Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!
@shikharmurty.bsky.social Chris Manning @cgpotts.bsky.social @robertcsordas.bsky.social
Can’t wait to connect with folks @iclr-conf.bsky.social—come say hi if you're around!
On HuggingFace 🤗 we’re releasing:
- MrT5 Small (300M params): stanfordnlp/mrt5-small
- MrT5 Large (1.23B): stanfordnlp/mrt5-large
And if you haven’t already, check out the paper!
Paper: arxiv.org/abs/2410.20771
Github Repo: github.com/jkallini/mrt5
On HuggingFace 🤗 we’re releasing:
- MrT5 Small (300M params): stanfordnlp/mrt5-small
- MrT5 Large (1.23B): stanfordnlp/mrt5-large
And if you haven’t already, check out the paper!
Paper: arxiv.org/abs/2410.20771
Github Repo: github.com/jkallini/mrt5
At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.
This means: better efficiency–performance trade-offs in high-resource settings.
At 1.23B params, the gap in PPL between ByT5 and MrT5 shrinks dramatically—suggesting that MrT5’s deletion mechanism scales effectively with model size.
This means: better efficiency–performance trade-offs in high-resource settings.
In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale
In the final version, we include:
- A new controller algorithm for targeted compression rates
- More baselines and downstream tasks
- MrT5 at 1.23B parameter scale