Lightnews — Scholar-powered news

Tiancheng Hu

@tiancheng.bsky.social

Great fun working on this with @bminixhofer.bsky.social and Prof. Collier at @cambridgeltl.bsky.social.

Special thanks to Paul Martin, and Arcee AI's Mergekit library.

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

TL;DR: The alignment-calibration trade-off is real, but you don't have to be stuck with the endpoints.

Model merging provides a simple, powerful dial to find the perfect balance of capability and reliability for YOUR application.

Paper here: arxiv.org/abs/2510.17426 (8/8)

Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging

The "alignment tax" of post-training is typically framed as a drop in task accuracy. We show it also involves a severe loss of calibration, making models overconfident, less reliable, and model output...

arxiv.org

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

Better calibration has benefits beyond accuracy scores. It helps reduce "mode collapse" in generation tasks, leading to more diverse generations (and higher utility too), as measured on NoveltyBench. It improves model performance on group-level simulation tasks too! (7/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

And it gets better with scale! 📈
The benefits of merging, both the accuracy boost and the stability of the "sweet spot", become even more pronounced in larger, more capable models. This echoes prior work which shows merging bigger models are more effective and stable. (6/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

The Pareto-superior frontier is a general phenomenon we observe across model families (Gemma, Qwen), sizes, and datasets, where we can consistently find a better-balanced model. We show Qwen 2.5 results on BBH and MMLU-Pro below. (5/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

It's NOT a zero-sum game between base and instruct.
We find a "sweet spot" merge that is Pareto-superior: it has HIGHER accuracy than both parents while substantially restoring the calibration lost during alignment. (4/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

Our solution is simple and computationally cheap: model merging.
By interpolating between the well-calibrated base model and its capable but overconfident instruct counterpart, we create a continuous spectrum to navigate this trade-off. No retraining needed.
(3/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

Let's start by redefining the problem. We argue the "alignment tax" MUST include the severe loss of model calibration.
Instruction tuning doesn't just nudge performance; it wrecks calibration, causing a huge spike in overconfidence. (2/8)

October 30, 2025 at 5:00 PM

Tiancheng Hu

@tiancheng.bsky.social

Huge thanks to my amazing collaborators @joachimbaumann.bsky.social @Lorenzo Lupo @nigelcollier.bsky.social @dirkhovy.bsky.social and especially @paul-rottger.bsky.social
@cambridgeltl.bsky.social
Work partially done during my visit to @milanlp.bsky.social. Highly recommended!

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

Check out the paper and data for details!
Paper: arxiv.org/abs/2510.17516
Data: huggingface.co/datasets/pit...
Website: simbench.tiancheng.hu (9/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

Overall, by making progress measurable, SImBench provides the foundation to build more faithful LLM simulators.
Moving forward, we should work on better training strategies for improving LLM social simulators. These will most likely diverge from advances in chat / coding models. (8/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

We find simulation ability correlates most strongly with deep, knowledge-intensive general reasoning (MMLU-Pro, r=0.94), rather than competition math (AIME, r=0.48)
To simulate humans well, a model needs a broad, nuanced understanding of the world. (7/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

Why does this happen? We dug deeper and found two opposing forces:
✅ a helpful direct effect (+6.46 score): models get much better at following instructions
❌ a harmful indirect effect (-1.74 score): models become less diverse
The challenge: how do we get the good without the bad? (6/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

This echos findings in the calibration literature: currently alignment algorithms typically optimize for the single best answer (improving pass@1), causing overconfidence at the expense of the full distribution.

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

There’s also an alignment-simulation tradeoff:
Instruction-tuning (the process that makes LLMs helpful and safe) improves their ability to predict consensus opinions.
BUT, it actively harms their ability to predict diverse, pluralistic opinions where humans disagree. (5/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

We found a clear log-linear scaling trend.
Across the model families we could test, bigger models are consistently better simulators. Performance reliably increases with model size. This suggests that future, larger models hold the potential to become highly accurate simulators. (4/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

The best model we tested on release, Claude 3.7 Sonnet, scores just 40.8 out of 100. A lot of room for improvement for LLM social simulators! Interestingly, more test-time compute doesn’t help. This suggests that simulation requires a different type of reasoning than math / coding. (3/9)

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

October 28, 2025 at 4:54 PM

Tiancheng Hu

@tiancheng.bsky.social

SimBench is a big, unified benchmark built from 20 diverse datasets with a global participant pool.
It spans moral dilemmas, economic games, psych assessments & more to rigorously test how well LLMs can predict group-level human responses across a wide range of tasks. (2/9)

October 28, 2025 at 4:54 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news