Lightnews — Scholar-powered news

Anikait Singh

@asap7772.bsky.social

PhD Student @StanfordAILab @stanfordnlp.bsky.social, Previously SR @GoogleDeepMind.bsky.social, Undergraduate @Berkeley_AI

Posts Replies Media Videos

Anikait Singh

@asap7772.bsky.social

9/N For more details, please check out the paper and website (with code coming soon)!

Paper: arxiv.org/abs/2510.02263
Website: cohenqu.github.io/rlad.github....

I will also be presenting this at the RAM2 Workshop at CoLM next week, so please stop by!

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires ...

arxiv.org

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

8/N More qualitatively, in the solution, we see (in cyan) references (“cheatsheet”) and keywords from the abstraction being used meaningfully in the reasoning trace of the solution generator model, showcasing that strategies can be elicited through abstractions.

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

7/N We additionally perform analysis of the abstractions and solutions that RLAD generates. Here, RLAD produces solutions with greater semantic diversity across different abstractions (left) and higher adherence of the solution to the abstraction (right) compared to baselines.

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

6/N Furthermore, the abstraction generator shows weak-to-strong generalization, where if we swap out the solution generator with o4-mini (with a 24K token budget), conditioning on abstractions consistently yields higher pass@k accuracy compared to question-only conditioning.

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

5/N On AIME 2025, as inference compute grows, efficiency improves when more budget is devoted to abstraction over solution generation—robust across all normalization offsets 𝑘₀. Local errors can be corrected with retries, but fresh abstractions help once retries are exhausted!

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

4/N We evaluate RLAD on Math Reasoning on benchmarks such as AIME 2025, DeepScaleR Hard, AMC 2023, achieving consistent accuracy gains over the base Qwen 3-1.7B model and DAPO. Performance is measured without (w/o), with (w/), and with the best abstraction among 4 samples.

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

3/N We instantiate a two-player RL framework:
1. An Abstraction Generator proposes reasoning strategies.
2. A Solution Generator uses that strategy to produce an answer.
The reward corresponds to the average success rate, leading the first player to find useful abstractions.

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

2/N Reasoning requires going beyond pattern-matching and recall to the execution of algorithmic procedures. RLVR aims to induce this, but models often underthink—switching logic midstream. Instead, can we optimize “breadth", training models to explore a wider array of strategies?

October 3, 2025 at 7:33 PM

Anikait Singh

@asap7772.bsky.social

Could you add me too :)

November 22, 2024 at 12:59 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news