Lightnews — Scholar-powered news

Andreas Hochlehnert

@ahochlehnert.bsky.social

180 followers 81 following 17 posts

PhD student in ML at Tübingen AI Center & International Max-Planck Research School for Intelligent Systems

Posts Replies Media Videos

Andreas Hochlehnert

@ahochlehnert.bsky.social

5/ What actually works?
🔹 RL methods over distillations? Often negligible gains, prone to overfitting.

🔹 Supervised finetuning (SFT) on reasoning traces? Stable & generalizable.

April 10, 2025 at 3:38 PM

Andreas Hochlehnert

@ahochlehnert.bsky.social

4/ Variance is everywhere:

– Random seed: swings Pass@1 by 5–15pp
– Temperature/top-p: another ±10pp
– Software & Hardware? Yes, even that changes scores

🎯 Single-seed results on small datasets are essentially noise.

April 10, 2025 at 3:37 PM

Andreas Hochlehnert

@ahochlehnert.bsky.social

🧵1/ 🚨 New paper: A Sober Look at Progress in Language Model Reasoning
We re-evaluate recent SFT and RL models for mathematical reasoning and find most gains vanish under rigorous, multi-seed, standardized evaluation.

📊 bethgelab.github.io/sober-reason...
📄 arxiv.org/abs/2504.07086

April 10, 2025 at 3:36 PM

Andreas Hochlehnert

@ahochlehnert.bsky.social

🔸 Some questions reference figures that aren't included! Text-only models can't infer missing visuals. [4/6]

February 17, 2025 at 6:25 PM

Andreas Hochlehnert

@ahochlehnert.bsky.social

🔸 Mathematical proofs are a challenge. There's no automated way to verify them, and answers often only show an initial equation, leading to unreliable training signals. [3/6]

February 17, 2025 at 6:25 PM

Andreas Hochlehnert

@ahochlehnert.bsky.social

Blog (For Updates): huggingface.co/datasets/bet...

🔸 Some questions contain subquestions, but only one answer is labeled. The model may get penalized for "wrong" but valid reasoning. [2/6]

Example of multiple questions asked in the analyzed datasets

February 17, 2025 at 6:24 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news