Julien Pourcel (@jul-p)

🤗 This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025

10.07.2025 16:04 👍 2 🔁 0 💬 0 📌 0

I’ll be at ICML next week—let’s chat if you’re interested in self-improving LLMs, program synthesis, ARC, or other related subjects.

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI Article about SOAR paper

Want to learn more? We've made everything public:

📗 Blog Post: julienp.netlify.app/posts/soar/
🤗 Models (7/14/32/72/123b) & Data: huggingface.co/collections/...
💻 Code: github.com/flowersteam/...
📄 Paper: icml.cc/virtual/2025...

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

🚀 **Broader Impact**: This isn't just about ARC puzzles. SOAR's framework could enhance program synthesis tasks where search-based LLM methods are limited by static model capabilities (FunSearch, AlphaEvolve, … )

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

🌟 **Test-Time Learning**: Even on new problems, SOAR continues improving by focusing on solutions that work well on the given examples. This enables real-time adaptation to novel challenges.

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

📈 **Results**:
- Qwen-7B model: 6% → 36% accuracy
- Qwen-32B model: 13% → 45% accuracy
- Mistral-Large-2: 20% -> 46% accuracy
- Combined ensemble: 52% on ARC-AGI test set
- Outperforms much larger models like o3-mini and Claude-4-Sonnet

10.07.2025 16:04 👍 1 🔁 0 💬 1 📌 0

🎯 Key Insight: Failed programs aren't useless! Through "hindsight relabeling," SOAR treats each failed program as the *correct* solution to a different (synthetic) problem. This massively expands the training data diversity.

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

🧠 **The Learning Process**: The system learns TWO skills simultaneously:
- **Sampling**: Generate better initial solutions
- **Refinement**: Enhance initial solutions
We also find that learning both together works better than specializing!

10.07.2025 16:04 👍 1 🔁 0 💬 1 📌 0

🔄 SOAR doesn't just search harder — it gets SMARTER. It alternates between:
- Evolutionary search: LLM samples and refines candidate programs.
- Hindsight learning: The model learns from all its search attempts, successes and failures, to fine-tune its skills for the next round.

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

🔬 Why This Matters? Most coding tasks are too hard for even the best language models to solve in one shot. Traditional search methods help, but they hit a wall because the model’s abilities are fixed. SOAR breaks through this barrier by letting the model improve itself over time

10.07.2025 16:04 👍 0 🔁 0 💬 1 📌 0

Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!)

It brings LLMs from just a few percent on ARC-AGI-1 up to 52%

We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code.

🧵

10.07.2025 16:04 👍 1 🔁 0 💬 2 📌 1

Julien Pourcel

Latest posts by Julien Pourcel @jul-p