We scaffold cognitive structures from successful traces to guide reasoning.
Major gains on ill-structured problems🌟
Models possess latent capabilities—they just don't deploy them adaptively without explicit guidance.
We scaffold cognitive structures from successful traces to guide reasoning.
Major gains on ill-structured problems🌟
Models possess latent capabilities—they just don't deploy them adaptively without explicit guidance.
Research concentrates on easily quantifiable behaviors—sequential organization (55%), decomposition (60%)
Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️
Research concentrates on easily quantifiable behaviors—sequential organization (55%), decomposition (60%)
Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️
28 elements across 4 dimensions—reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)
28 elements across 4 dimensions—reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)
Xingshuai Huang, Di Wu, Benoit Boulet
Action editor: Baoxiang Wang
https://openreview.net/forum?id=8K16dplpE0
#reinforcement #conditioning #learns
Xingshuai Huang, Di Wu, Benoit Boulet
Action editor: Baoxiang Wang
https://openreview.net/forum?id=8K16dplpE0
#reinforcement #conditioning #learns
Learn more → buff.ly/6xLHLk6
Learn more → buff.ly/6xLHLk6
arxiv.org/abs/2510.21686
academicjobsonline.org/ajo/jobs/30971
academicjobsonline.org/ajo/jobs/30971
- generate student rollouts
- query teacher distribution forced on student history
- update using the reverse KL divergence at each step
thinkingmachines.ai/blog/on-poli...
- generate student rollouts
- query teacher distribution forced on student history
- update using the reverse KL divergence at each step
thinkingmachines.ai/blog/on-poli...
> all you see is tokens
> you don't care, it's all abstracted away
> you live for a world of pure ideas, chain of concepts, reasoning streams
> tokens don't exist.
The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri & Madaan et al.
buff.ly/olKwF3X
The Art of Scaling Reinforcement Learning Compute for LLMs
Khatri & Madaan et al.
buff.ly/olKwF3X
simonwillison.net/2025/Oct/14/...
simonwillison.net/2025/Oct/14/...
🔗 github.com/rasbt/LLMs-f...
🔗 github.com/rasbt/LLMs-f...
📑 arxiv.org/abs/2510.02375
[1/10]🧵
📑 arxiv.org/abs/2510.02375
[1/10]🧵
Amine El hattami, Nicolas Chapados, Christopher Pal
Action editor: Colin Raffel
https://openreview.net/forum?id=p0KTYl2B9T
#scheduling #scheduled #training
Amine El hattami, Nicolas Chapados, Christopher Pal
Action editor: Colin Raffel
https://openreview.net/forum?id=p0KTYl2B9T
#scheduling #scheduled #training
In the feature learning regime, we map this connection: phase diagrams of scaling exponents <-> spectral signatures of trained weights. The paper is: arxiv.org/abs/2509.24882
In the feature learning regime, we map this connection: phase diagrams of scaling exponents <-> spectral signatures of trained weights. The paper is: arxiv.org/abs/2509.24882