Lightnews — Scholar-powered news

Jannis Born

@jannisblrn.bsky.social

Jonas Zausinger*, Lars Pennig*, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh & Michael Danziger.

💻 GitHub code: ibm.biz/ntl-code

GitHub - tum-ai/number-token-loss: A regression-alike loss to improve numerical reasoning in language models

A regression-alike loss to improve numerical reasoning in language models - tum-ai/number-token-loss

ibm.biz

July 3, 2025 at 9:21 PM

Jannis Born

@jannisblrn.bsky.social

It was an incredible experience to run this project 🚀 But it only really came to life through the endless effort of all the amazing co-authors 🔥💪

🌐 Landing page: ibm.biz/ntl-main

Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

ibm.biz

July 3, 2025 at 9:21 PM

Jannis Born

@jannisblrn.bsky.social

5. Text-task friendly: Doesn’t interfere with CE on purely textual tasks 📚
6. Scalable: Tested up to 3B, e.g., with hashtag#IBMGranite 3.2🚀
7. Plug-and-play: It’s “just a loss,” so it’s super easy to adopt 🔢
📄 ICML paper: ibm.biz/ntl-paper

Regress, Don’t Guess – Number Token Loss

A regression-like loss on number tokens for language models.

ibm.biz

July 3, 2025 at 9:21 PM

Jannis Born

@jannisblrn.bsky.social

1. Better math performance: NTL consistently boosts accuracy on math benchmarks (e.g., GSM-8K) 📊
2. Lightning-fast: 100× faster to compute than CE, so there’s no training overhead ⚡
3. Model-agnostic: Works with Transformers, Mamba, etc. 🤖
(continued ⬇️ )
🎛️ Hugging Face Spaces demo: ibm.biz/ntl-demo

Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models

While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

ibm.biz

July 3, 2025 at 9:21 PM

Jannis Born

@jannisblrn.bsky.social

In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )

July 3, 2025 at 9:21 PM

Jannis Born

@jannisblrn.bsky.social

Great to hear! 🙃 Let me know if there are questions

January 16, 2025 at 8:58 PM

Jannis Born

@jannisblrn.bsky.social

Full poster

December 14, 2024 at 10:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news