Jannis Born
banner
jannisblrn.bsky.social
Jannis Born
@jannisblrn.bsky.social
Research Scientist @IBM - AI for Scientific Discovery! Tech & sports enthusiast
Jonas Zausinger*, Lars Pennig*, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh & Michael Danziger.

💻 GitHub code: ibm.biz/ntl-code
GitHub - tum-ai/number-token-loss: A regression-alike loss to improve numerical reasoning in language models
A regression-alike loss to improve numerical reasoning in language models - tum-ai/number-token-loss
ibm.biz
July 3, 2025 at 9:21 PM
It was an incredible experience to run this project 🚀 But it only really came to life through the endless effort of all the amazing co-authors 🔥💪

🌐 Landing page: ibm.biz/ntl-main
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models
While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...
ibm.biz
July 3, 2025 at 9:21 PM
5. Text-task friendly: Doesn’t interfere with CE on purely textual tasks 📚
6. Scalable: Tested up to 3B, e.g., with hashtag#IBMGranite 3.2🚀
7. Plug-and-play: It’s “just a loss,” so it’s super easy to adopt 🔢
📄 ICML paper: ibm.biz/ntl-paper
Regress, Don’t Guess – Number Token Loss
A regression-like loss on number tokens for language models.
ibm.biz
July 3, 2025 at 9:21 PM
1. Better math performance: NTL consistently boosts accuracy on math benchmarks (e.g., GSM-8K) 📊
2. Lightning-fast: 100× faster to compute than CE, so there’s no training overhead ⚡
3. Model-agnostic: Works with Transformers, Mamba, etc. 🤖
(continued ⬇️ )
🎛️ Hugging Face Spaces demo: ibm.biz/ntl-demo
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models
While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...
ibm.biz
July 3, 2025 at 9:21 PM
In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )
July 3, 2025 at 9:21 PM
Great to hear! 🙃 Let me know if there are questions
January 16, 2025 at 8:58 PM
Full poster
December 14, 2024 at 10:48 PM