Julien Pourcel
jul-p.bsky.social
Julien Pourcel
@jul-p.bsky.social
PhD student at INRIA (FLOWERS team) working on LLM4code | Prev. (MVA) ENS ParisSaclay
🤗 This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025
July 10, 2025 at 4:04 PM
I’ll be at ICML next week—let’s chat if you’re interested in self-improving LLMs, program synthesis, ARC, or other related subjects.
July 10, 2025 at 4:04 PM
Want to learn more? We've made everything public:

📗 Blog Post: julienp.netlify.app/posts/soar/
🤗 Models (7/14/32/72/123b) & Data: huggingface.co/collections/...
💻 Code: github.com/flowersteam/...
📄 Paper: icml.cc/virtual/2025...
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Article about SOAR paper
julienp.netlify.app
July 10, 2025 at 4:04 PM
🚀 **Broader Impact**: This isn't just about ARC puzzles. SOAR's framework could enhance program synthesis tasks where search-based LLM methods are limited by static model capabilities (FunSearch, AlphaEvolve, … )
July 10, 2025 at 4:04 PM
🌟 **Test-Time Learning**: Even on new problems, SOAR continues improving by focusing on solutions that work well on the given examples. This enables real-time adaptation to novel challenges.
July 10, 2025 at 4:04 PM
📈 **Results**:
- Qwen-7B model: 6% → 36% accuracy
- Qwen-32B model: 13% → 45% accuracy
- Mistral-Large-2: 20% -> 46% accuracy
- Combined ensemble: 52% on ARC-AGI test set
- Outperforms much larger models like o3-mini and Claude-4-Sonnet
July 10, 2025 at 4:04 PM
🎯 Key Insight: Failed programs aren't useless! Through "hindsight relabeling," SOAR treats each failed program as the *correct* solution to a different (synthetic) problem. This massively expands the training data diversity.
July 10, 2025 at 4:04 PM
🧠 **The Learning Process**: The system learns TWO skills simultaneously:
- **Sampling**: Generate better initial solutions
- **Refinement**: Enhance initial solutions
We also find that learning both together works better than specializing!
July 10, 2025 at 4:04 PM
🔄 SOAR doesn't just search harder — it gets SMARTER. It alternates between:
- Evolutionary search: LLM samples and refines candidate programs.
- Hindsight learning: The model learns from all its search attempts, successes and failures, to fine-tune its skills for the next round.
July 10, 2025 at 4:04 PM
🔬 Why This Matters? Most coding tasks are too hard for even the best language models to solve in one shot. Traditional search methods help, but they hit a wall because the model’s abilities are fixed. SOAR breaks through this barrier by letting the model improve itself over time
July 10, 2025 at 4:04 PM