Thibaut Boissin
thib-s.bsky.social
Thibaut Boissin
@thib-s.bsky.social
Reposted by Thibaut Boissin
🔥I am super excited for the official release of an open-source library we've been working on for about a year!

🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)
January 20, 2026 at 4:03 PM
Good news: I managed to get an extra 1.6x speedup of the Newton Schulz algorithm (which is at the core of Dion/Muon). It reaches nearly a 3x speedup over the plain torch implementation !
September 21, 2025 at 8:06 PM
Sharing my journey to learn triton: still wip but io optimization yields some decent runtime improvement (around 25% on 512x512) on Newton Schulz (as used in Dion/Muon).
August 10, 2025 at 10:15 AM
My journey with Triton
August 7, 2025 at 10:00 AM
It's likely better to have a larger model in FP4 than a smaller one in FP8 (if you can train it):
- Improved non-linearity utilization with larger feature vects
- Enhanced hardware utilization on blackwell archs.
- Stress-test your training, yields models robust to input noise

more below
August 3, 2025 at 11:01 AM
Beyond robustness: Lipschitz networks = stability.
Different inits, different seeds, different weights—same function.
A thread 🧵
July 25, 2025 at 7:44 PM
Some bad, but creative, training losses 👌
June 10, 2025 at 9:55 PM