Lightnews — Scholar-powered news

Julien Pourcel

@jul-p.bsky.social

16 followers 220 following 11 posts

PhD student at INRIA (FLOWERS team) working on LLM4code | Prev. (MVA) ENS ParisSaclay

Posts Replies Media Videos

Julien Pourcel

@jul-p.bsky.social

🤗 This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

I’ll be at ICML next week—let’s chat if you’re interested in self-improving LLMs, program synthesis, ARC, or other related subjects.

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

Want to learn more? We've made everything public:

📗 Blog Post: julienp.netlify.app/posts/soar/
🤗 Models (7/14/32/72/123b) & Data: huggingface.co/collections/...
💻 Code: github.com/flowersteam/...
📄 Paper: icml.cc/virtual/2025...

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

Article about SOAR paper

julienp.netlify.app

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🚀 **Broader Impact**: This isn't just about ARC puzzles. SOAR's framework could enhance program synthesis tasks where search-based LLM methods are limited by static model capabilities (FunSearch, AlphaEvolve, … )

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🌟 **Test-Time Learning**: Even on new problems, SOAR continues improving by focusing on solutions that work well on the given examples. This enables real-time adaptation to novel challenges.

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

📈 **Results**:
- Qwen-7B model: 6% → 36% accuracy
- Qwen-32B model: 13% → 45% accuracy
- Mistral-Large-2: 20% -> 46% accuracy
- Combined ensemble: 52% on ARC-AGI test set
- Outperforms much larger models like o3-mini and Claude-4-Sonnet

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🎯 Key Insight: Failed programs aren't useless! Through "hindsight relabeling," SOAR treats each failed program as the *correct* solution to a different (synthetic) problem. This massively expands the training data diversity.

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🧠 **The Learning Process**: The system learns TWO skills simultaneously:
- **Sampling**: Generate better initial solutions
- **Refinement**: Enhance initial solutions
We also find that learning both together works better than specializing!

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🔄 SOAR doesn't just search harder — it gets SMARTER. It alternates between:
- Evolutionary search: LLM samples and refines candidate programs.
- Hindsight learning: The model learns from all its search attempts, successes and failures, to fine-tune its skills for the next round.

July 10, 2025 at 4:04 PM

Julien Pourcel

@jul-p.bsky.social

🔬 Why This Matters? Most coding tasks are too hard for even the best language models to solve in one shot. Traditional search methods help, but they hit a wall because the model’s abilities are fixed. SOAR breaks through this barrier by letting the model improve itself over time

July 10, 2025 at 4:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news