Anikait Singh
banner
asap7772.bsky.social
Anikait Singh
@asap7772.bsky.social
PhD Student @StanfordAILab @stanfordnlp.bsky.social, Previously SR @GoogleDeepMind.bsky.social, Undergraduate @Berkeley_AI
9/N For more details, please check out the paper and website (with code coming soon)!

Paper: arxiv.org/abs/2510.02263
Website: cohenqu.github.io/rlad.github....

I will also be presenting this at the RAM2 Workshop at CoLM next week, so please stop by!
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires ...
arxiv.org
October 3, 2025 at 7:33 PM
8/N More qualitatively, in the solution, we see (in cyan) references (“cheatsheet”) and keywords from the abstraction being used meaningfully in the reasoning trace of the solution generator model, showcasing that strategies can be elicited through abstractions.
October 3, 2025 at 7:33 PM
7/N We additionally perform analysis of the abstractions and solutions that RLAD generates. Here, RLAD produces solutions with greater semantic diversity across different abstractions (left) and higher adherence of the solution to the abstraction (right) compared to baselines.
October 3, 2025 at 7:33 PM
6/N Furthermore, the abstraction generator shows weak-to-strong generalization, where if we swap out the solution generator with o4-mini (with a 24K token budget), conditioning on abstractions consistently yields higher pass@k accuracy compared to question-only conditioning.
October 3, 2025 at 7:33 PM
5/N On AIME 2025, as inference compute grows, efficiency improves when more budget is devoted to abstraction over solution generation—robust across all normalization offsets 𝑘₀. Local errors can be corrected with retries, but fresh abstractions help once retries are exhausted!
October 3, 2025 at 7:33 PM
4/N We evaluate RLAD on Math Reasoning on benchmarks such as AIME 2025, DeepScaleR Hard, AMC 2023, achieving consistent accuracy gains over the base Qwen 3-1.7B model and DAPO. Performance is measured without (w/o), with (w/), and with the best abstraction among 4 samples.
October 3, 2025 at 7:33 PM
3/N We instantiate a two-player RL framework:
1. An Abstraction Generator proposes reasoning strategies.
2. A Solution Generator uses that strategy to produce an answer.
The reward corresponds to the average success rate, leading the first player to find useful abstractions.
October 3, 2025 at 7:33 PM
2/N Reasoning requires going beyond pattern-matching and recall to the execution of algorithmic procedures. RLVR aims to induce this, but models often underthink—switching logic midstream. Instead, can we optimize “breadth", training models to explore a wider array of strategies?
October 3, 2025 at 7:33 PM
Could you add me too :)
November 22, 2024 at 12:59 AM