Lightnews — Scholar-powered news

Claas Voelcker

@cvoelcker.bsky.social

2.7K followers 550 following 790 posts

For professional, see https://cvoelcker.de

If I seem very angry, check if I have been watered in the last 24 hours.

Now 🇺🇸 flavoured, previously available in 🇨🇦 and 🇩🇪

Posts Replies Media Videos

Pinned

Claas Voelcker @cvoelcker.bsky.social · Nov 11

And most importantly, I'm at least 200% nicer than I appear in any given moment, so let me know if you really just want to geek out about random shit!

Claas Voelcker @cvoelcker.bsky.social · Nov 11

I'm Claas, a hopefully-soon-finished-PhD researcher at @uoft.bsky.social . I work on reinforcement learning, especially on deep model-based methods, and am dabbling in diffusion policies and large-scale imitation learning.
I'm way too political and loud in general, so please be warned.

Claas Voelcker

@cvoelcker.bsky.social

Hyperparameters are a social construct (this is not irony or just sh*tposting)

February 10, 2026 at 5:01 AM

Claas Voelcker

@cvoelcker.bsky.social

OMFG my former boss got featured on r/LinkedInLunatics 😂😂😂

February 9, 2026 at 10:57 PM

Reposted by Claas Voelcker

Daniel Palenicek

@daniel-palenicek.bsky.social

🎉 Really excited, our paper "XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning" has been accepted at #ICLR2026.

If you are interested in reinforcement learning, sample-efficiency, compute-efficiency go check it out. See you in Rio!

Daniel Palenicek @daniel-palenicek.bsky.social · Oct 2

🚀 New preprint! Introducing XQC— a simple, well-conditioned actor-critic that achieves SOTA sample efficiency in #RL
✅ ~4.5× fewer parameters than SimbaV2
✅ Scales to vision-based RL
👉 arxiv.org/pdf/2509.25174

Thanks to Florian Vogt @joemwatson.bsky.social @jan-peters.bsky.social

February 3, 2026 at 10:33 AM

Claas Voelcker

@cvoelcker.bsky.social

"PPO is not good, a thousand labs just reward tuned for it" is something I want to get tattooed so badly...

February 2, 2026 at 8:13 PM

Claas Voelcker

@cvoelcker.bsky.social

Looking at the math scores, I'd say at pass@1 (which is the thing RL actually optimizes for), the method is clearly outperformed by RL, at least within the math distribution. So the claim seems ... wrong? Am I missing something?
Also, why do people write "RL doesn't work" papers so passionately?

Sung Kim @sungkim.bsky.social · 9d

They found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model.

Paper: Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening
( www.arxiv.org/abs/2601.21590 )

February 1, 2026 at 4:55 AM

Claas Voelcker

@cvoelcker.bsky.social

This feels like true diffusion model training on LLMs 😂 start from pure information-free LinkedIn prose, iteratively refine Goethe and Shakespeare

Max Spero @max.pangram.com · 10d

Pretty cool project on /r/localllama - they take human written text and sloppify it 10x with 4o-mini, then train the model to de-slop by reversing the transformation

January 31, 2026 at 8:49 PM

Claas Voelcker

@cvoelcker.bsky.social

Why is pass@k a metric? Does any proper LLM usecase actually generate 16 different answers and then picks the best one on ... vibes? This smells like massive cooking on the test set (or verifier reward).

January 29, 2026 at 9:34 PM

Claas Voelcker

@cvoelcker.bsky.social

I refuse to accept Texan weather as a real thing. It’s January for C sake! I want snow ❄️ and hot chocolate ☕️

Screenshot of Apple weather app showing 68 degree F or ca 20 degrees C

January 29, 2026 at 9:29 PM

Claas Voelcker

@cvoelcker.bsky.social

I still hate coding with AI, but luckily, I hate writing data processing pipelines and webcrawlers more 😁

January 29, 2026 at 7:57 PM

Reposted by Claas Voelcker

Pekka Lund

@pekka.bsky.social

A new paper in Nature informs us there's a new AI benchmark called Humanity’s Last Exam.

Yep, it's that same old HLE. They have submitted the paper 07 May 2025. And no, I don't know what the point of publishing it like that is either. Looks good on CVs, I guess.

Javi Ibarrondo @jibarrondo.bsky.social · 11d

www.nature.com/articles/s41... 🧪

A benchmark of expert-level academic questions to assess AI capabilities - Nature

Humanity’s Last Exam, a multi-modal benchmark at the frontier of human knowledge, is designed to be an expert-level closed-ended academic benchmark with broad subject coverage.

www.nature.com

January 29, 2026 at 7:18 PM

Claas Voelcker

@cvoelcker.bsky.social

❌ Institutional impact from cool new paper? 🦗
✅ Institutional impact from making an LLM library installable on our hell-hole of a cluster? 🏆

Tkt Smart GIF

Alt: meme for Smart as a GIF. A guy poking at his temple and smiling knowingly

media.tenor.com

January 29, 2026 at 7:29 PM

Claas Voelcker

@cvoelcker.bsky.social

How many great strategies for steering claude et al. are just sparkling placebo effect? Magical thinking? Asking for a friend...

January 29, 2026 at 5:49 PM

Claas Voelcker

@cvoelcker.bsky.social

Nothing has made me such a hardliner on rejecting unscientific woo in health/nutrition/etc as having a family member undergo tumour treatments over years. Last year alone HUGE breakthroughs in treatment of brain tumours has given countless patients a new lease on life www.nejm.org/doi/full/10....

January 29, 2026 at 3:34 PM

Claas Voelcker

@cvoelcker.bsky.social

How much people fetishizes absolutely terrible gig jobs like “checks notes” Uber driving and long-distance trucking in the comments is wild…

Forbes @forbes.com · 13d

CEO Raquel Urtasan says the funds, including $250 million from Uber, will help get 25,000 robotaxis on Uber’s platform and launch trucking operations with Volvo.

Robot Trucker Waabi Wades Into Robotaxi Battle With Billion Dollar Raise

CEO Raquel Urtasan says the funds, including $250 million from Uber, will help get 25,000 robotaxis on Uber’s platform and launch trucking operations with Volvo.

www.forbes.com

January 29, 2026 at 2:29 PM

Claas Voelcker

@cvoelcker.bsky.social

I need a list of what LLM RLVR modifications are ad-hoc hacks, and which one can be justified from principles... Seems like this is all over the place. For half of these modifications, a sane RL researcher says "duh", and for the other half it's "Ew, why???" ... I need to write this, right?

January 28, 2026 at 11:31 PM

Claas Voelcker

@cvoelcker.bsky.social

My job can be reliably done by a small script that std deviation is not actually any form of confidence interval for an estimator of the mean.

January 27, 2026 at 3:16 AM

Reposted by Claas Voelcker

Marcel Hussing

@marcelhussing.bsky.social

The other paper accepted to @iclr-conf.bsky.social 2026 🇧🇷. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social

Marcel Hussing @marcelhussing.bsky.social · Oct 26

I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

📖 Replicable Reinforcement Learning with Linear Function Approximation

🔗 arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)

Replicable Reinforcement Learning with Linear Function Approximation

Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...

arxiv.org

January 26, 2026 at 4:08 PM

Claas Voelcker

@cvoelcker.bsky.social

Or… you can chat with us in 🇧🇷 Rio 🇧🇷 as we are going to @iclr-conf.bsky.social to present our paper!!!

Claas Voelcker @cvoelcker.bsky.social · 23d

🤔 Want to use REPPO (cvoelcker.de/projects/rep...) but hate jax? 🤔
😮 Want to have stable on-policy RL without filling your GPU with an enormous replay buffer? 😮
🤖 Are you a roboticist and just want your RL code to run? 🤖

🎉 Fear not, we started adding new REPPO versions! 🎉
github.com/cvoelcker/rs...

Relative Entropy Pathwise Policy Optimization | Claas A. Voelcker

A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.

cvoelcker.de

January 26, 2026 at 2:37 PM

Claas Voelcker

@cvoelcker.bsky.social

It is incredibly funny to get photos from my husband in Toronto of massive amounts of snow, while all of Austin is in apocalypse mode because there are 5 cm of fluff ❄️❄️❄️

Image of a road in Austin, TX, lightly covered with snow

January 25, 2026 at 8:26 PM

Claas Voelcker

@cvoelcker.bsky.social

OK, so, LLM coding models are kinda good, but LLM implementations themselves are ABSOLUTE DOGSHIT?! Like, wtf is the amount of breaking changes and random conflicts in absolutely every framework... This is worse than mujoco anno 2019

January 25, 2026 at 6:26 AM

Reposted by Claas Voelcker

MilaNLP Lab

@milanlp.bsky.social

This week at reading group 📚
@pranav-nlp.bsky.social presented "You Cannot Sound Like GPT": Signs of language discrimination and resistance in computer science publishing.

Paper: arxiv.org/abs/2505.08127

#NLProc

January 23, 2026 at 1:35 PM

Claas Voelcker

@cvoelcker.bsky.social

@icmlconf.bsky.social has cracked 25000 submissions 😂 (yes, there is a larger ICLR bulk in there, but still)

a woman is singing into a microphone and says `` may the odds be ever in your favor '' .

ALT: a woman is singing into a microphone and says `` may the odds be ever in your favor '' .

media.tenor.com

January 23, 2026 at 2:53 PM

Claas Voelcker

@cvoelcker.bsky.social

There are no in-the-box solutions to out-of-the box problems. You can tweak the scientific system, but without at least acknowledging the real extra-scientific pressures on the publication and review system, every conversation is incomplete. Publications support extra-scientific goals.

Claas Voelcker @cvoelcker.bsky.social · 19d

Nothing about publishing will improve until we collectively acknowledge that our systems were not built to be (a) hiring filters for generational big tech wealth and (b) the last remaining sane immigration paths to many countries, but espc. the US. Thanks for coming to my TED talk…

January 22, 2026 at 3:27 PM

Claas Voelcker

@cvoelcker.bsky.social

January 22, 2026 at 1:37 PM

Claas Voelcker

@cvoelcker.bsky.social

I need everybody to stop mentioning Sokol until we clean up after ourselves 😂

Sharon Goldman @sharongoldman.bsky.social · 20d

NEW: NeurIPS,one of the world’s top academic AI conferences, accepted research papers with 100+ AI-hallucinated citations, new report claims

fortune.com/2026/01/21/n...

NeurIPS papers contained 100+ AI-hallucinated citations, new report claims | Fortune

An analysis of NeurIPS 2025 papers by startup GPTZero reveals how AI-generated citations are slipping into elite academic research.

fortune.com

January 21, 2026 at 9:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news