Claas Voelcker
banner
cvoelcker.bsky.social
Claas Voelcker
@cvoelcker.bsky.social
For professional, see https://cvoelcker.de

If I seem very angry, check if I have been watered in the last 24 hours.

Now 🇺🇸 flavoured, previously available in 🇨🇦 and 🇩🇪
Pinned
And most importantly, I'm at least 200% nicer than I appear in any given moment, so let me know if you really just want to geek out about random shit!
I'm Claas, a hopefully-soon-finished-PhD researcher at @uoft.bsky.social . I work on reinforcement learning, especially on deep model-based methods, and am dabbling in diffusion policies and large-scale imitation learning.
I'm way too political and loud in general, so please be warned.
Hyperparameters are a social construct (this is not irony or just sh*tposting)
February 10, 2026 at 5:01 AM
OMFG my former boss got featured on r/LinkedInLunatics 😂😂😂
February 9, 2026 at 10:57 PM
Reposted by Claas Voelcker
🎉 Really excited, our paper "XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning" has been accepted at #ICLR2026.

If you are interested in reinforcement learning, sample-efficiency, compute-efficiency go check it out. See you in Rio!
🚀 New preprint! Introducing XQC— a simple, well-conditioned actor-critic that achieves SOTA sample efficiency in #RL
✅ ~4.5× fewer parameters than SimbaV2
✅ Scales to vision-based RL
👉 arxiv.org/pdf/2509.25174

Thanks to Florian Vogt @joemwatson.bsky.social @jan-peters.bsky.social
February 3, 2026 at 10:33 AM
"PPO is not good, a thousand labs just reward tuned for it" is something I want to get tattooed so badly...
February 2, 2026 at 8:13 PM
Looking at the math scores, I'd say at pass@1 (which is the thing RL actually optimizes for), the method is clearly outperformed by RL, at least within the math distribution. So the claim seems ... wrong? Am I missing something?
Also, why do people write "RL doesn't work" papers so passionately?
They found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model.

Paper: Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
( www.arxiv.org/abs/2601.21590 )
February 1, 2026 at 4:55 AM
This feels like true diffusion model training on LLMs 😂 start from pure information-free LinkedIn prose, iteratively refine Goethe and Shakespeare
Pretty cool project on /r/localllama - they take human written text and sloppify it 10x with 4o-mini, then train the model to de-slop by reversing the transformation
January 31, 2026 at 8:49 PM
Why is pass@k a metric? Does any proper LLM usecase actually generate 16 different answers and then picks the best one on ... vibes? This smells like massive cooking on the test set (or verifier reward).
January 29, 2026 at 9:34 PM
I refuse to accept Texan weather as a real thing. It’s January for C sake! I want snow ❄️ and hot chocolate ☕️
January 29, 2026 at 9:29 PM
I still hate coding with AI, but luckily, I hate writing data processing pipelines and webcrawlers more 😁
January 29, 2026 at 7:57 PM
Reposted by Claas Voelcker
A new paper in Nature informs us there's a new AI benchmark called Humanity’s Last Exam.

Yep, it's that same old HLE. They have submitted the paper 07 May 2025. And no, I don't know what the point of publishing it like that is either. Looks good on CVs, I guess.
January 29, 2026 at 7:18 PM
❌ Institutional impact from cool new paper? 🦗
✅ Institutional impact from making an LLM library installable on our hell-hole of a cluster? 🏆
Tkt Smart GIF
Alt: meme for Smart as a GIF. A guy poking at his temple and smiling knowingly
media.tenor.com
January 29, 2026 at 7:29 PM
How many great strategies for steering claude et al. are just sparkling placebo effect? Magical thinking? Asking for a friend...
January 29, 2026 at 5:49 PM
Nothing has made me such a hardliner on rejecting unscientific woo in health/nutrition/etc as having a family member undergo tumour treatments over years. Last year alone HUGE breakthroughs in treatment of brain tumours has given countless patients a new lease on life www.nejm.org/doi/full/10....
January 29, 2026 at 3:34 PM
How much people fetishizes absolutely terrible gig jobs like “checks notes” Uber driving and long-distance trucking in the comments is wild…
January 29, 2026 at 2:29 PM
I need a list of what LLM RLVR modifications are ad-hoc hacks, and which one can be justified from principles... Seems like this is all over the place. For half of these modifications, a sane RL researcher says "duh", and for the other half it's "Ew, why???" ... I need to write this, right?
January 28, 2026 at 11:31 PM
My job can be reliably done by a small script that std deviation is not actually any form of confidence interval for an estimator of the mean.
January 27, 2026 at 3:16 AM
Reposted by Claas Voelcker
The other paper accepted to @iclr-conf.bsky.social 2026 🇧🇷. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social
I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

📖 Replicable Reinforcement Learning with Linear Function Approximation

🔗 arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
Replicable Reinforcement Learning with Linear Function Approximation
Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...
arxiv.org
January 26, 2026 at 4:08 PM
Or… you can chat with us in 🇧🇷 Rio 🇧🇷 as we are going to @iclr-conf.bsky.social to present our paper!!!
🤔 Want to use REPPO (cvoelcker.de/projects/rep...) but hate jax? 🤔
😮 Want to have stable on-policy RL without filling your GPU with an enormous replay buffer? 😮
🤖 Are you a roboticist and just want your RL code to run? 🤖

🎉 Fear not, we started adding new REPPO versions! 🎉
github.com/cvoelcker/rs...
Relative Entropy Pathwise Policy Optimization | Claas A. Voelcker
A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.
cvoelcker.de
January 26, 2026 at 2:37 PM
It is incredibly funny to get photos from my husband in Toronto of massive amounts of snow, while all of Austin is in apocalypse mode because there are 5 cm of fluff ❄️❄️❄️
January 25, 2026 at 8:26 PM
OK, so, LLM coding models are kinda good, but LLM implementations themselves are ABSOLUTE DOGSHIT?! Like, wtf is the amount of breaking changes and random conflicts in absolutely every framework... This is worse than mujoco anno 2019
January 25, 2026 at 6:26 AM
Reposted by Claas Voelcker
This week at reading group 📚
@pranav-nlp.bsky.social presented "You Cannot Sound Like GPT": Signs of language discrimination and resistance in computer science publishing.

Paper: arxiv.org/abs/2505.08127

#NLProc
January 23, 2026 at 1:35 PM
@icmlconf.bsky.social has cracked 25000 submissions 😂 (yes, there is a larger ICLR bulk in there, but still)
a woman is singing into a microphone and says `` may the odds be ever in your favor '' .
ALT: a woman is singing into a microphone and says `` may the odds be ever in your favor '' .
media.tenor.com
January 23, 2026 at 2:53 PM
There are no in-the-box solutions to out-of-the box problems. You can tweak the scientific system, but without at least acknowledging the real extra-scientific pressures on the publication and review system, every conversation is incomplete. Publications support extra-scientific goals.
Nothing about publishing will improve until we collectively acknowledge that our systems were not built to be (a) hiring filters for generational big tech wealth and (b) the last remaining sane immigration paths to many countries, but espc. the US. Thanks for coming to my TED talk…
January 22, 2026 at 3:27 PM
Nothing about publishing will improve until we collectively acknowledge that our systems were not built to be (a) hiring filters for generational big tech wealth and (b) the last remaining sane immigration paths to many countries, but espc. the US. Thanks for coming to my TED talk…
January 22, 2026 at 1:37 PM
I need everybody to stop mentioning Sokol until we clean up after ourselves 😂
January 21, 2026 at 9:22 PM