Here's a short thread on my thoughts. Overall extremely pleased with a few minor surprises.
Here's a short thread on my thoughts. Overall extremely pleased with a few minor surprises.
Basically pushes RLVR & self-refinement to gold-level scores on IMO 2025.
Coincidentally, I am currently working on a chapter on self-refinement, and this comes in handy as a nice, scaled-up case study.
Basically pushes RLVR & self-refinement to gold-level scores on IMO 2025.
Coincidentally, I am currently working on a chapter on self-refinement, and this comes in handy as a nice, scaled-up case study.
Project: shaochenze.github.io/blog/2025/CA...
Paper: arxiv.org/abs/2510.27688
Repo: github.com/shaochenze/c...
Project: shaochenze.github.io/blog/2025/CA...
Paper: arxiv.org/abs/2510.27688
Repo: github.com/shaochenze/c...
- Core recurrent “thinking” block, T: generates latent “thoughts”.
- Latent decoder, D: un-embeds from latent space to the output (language) space.
Paper: arxiv.org/abs/2510.07358
- Core recurrent “thinking” block, T: generates latent “thoughts”.
- Latent decoder, D: un-embeds from latent space to the output (language) space.
Paper: arxiv.org/abs/2510.07358
HuggingFace 👉 huggingface.co/PaddlePaddle...
AI Studio 👉 paddleocr.ai/latest/en/in...
HuggingFace 👉 huggingface.co/PaddlePaddle...
AI Studio 👉 paddleocr.ai/latest/en/in...
They propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge).
Paper: arxiv.org/abs/2510.02375
They propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge).
Paper: arxiv.org/abs/2510.02375
github.com/steveyegge/b...
github.com/steveyegge/b...
The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6
www.youtube.com/watch?v=rNlU...
The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6
www.youtube.com/watch?v=rNlU...
- +20–40% more non-zero gradients
- Up to 93 rollouts for hard tasks (w/o extra compute)
- +2–4 avg points, +9 peak gains on math benchmarks
- ~2× cheaper than uniform allocation
Paper: arxiv.org/abs/2509.25849
- +20–40% more non-zero gradients
- Up to 93 rollouts for hard tasks (w/o extra compute)
- +2–4 avg points, +9 peak gains on math benchmarks
- ~2× cheaper than uniform allocation
Paper: arxiv.org/abs/2509.25849
Model: huggingface.co/fangwu97/Dee...
Model: huggingface.co/fangwu97/Dee...
Repo: github.com/rbalestr-lab...
Repo: github.com/rbalestr-lab...