π€ This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025
π€ This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025
Iβll be at ICML next weekβletβs chat if youβre interested in self-improving LLMs, program synthesis, ARC, or other related subjects.
Want to learn more? We've made everything public:
π Blog Post: julienp.netlify.app/posts/soar/
π€ Models (7/14/32/72/123b) & Data: huggingface.co/collections/...
π» Code: github.com/flowersteam/...
π Paper: icml.cc/virtual/2025...
π **Broader Impact**: This isn't just about ARC puzzles. SOAR's framework could enhance program synthesis tasks where search-based LLM methods are limited by static model capabilities (FunSearch, AlphaEvolve, β¦ )
π **Test-Time Learning**: Even on new problems, SOAR continues improving by focusing on solutions that work well on the given examples. This enables real-time adaptation to novel challenges.
π **Results**:
- Qwen-7B model: 6% β 36% accuracy
- Qwen-32B model: 13% β 45% accuracy
- Mistral-Large-2: 20% -> 46% accuracy
- Combined ensemble: 52% on ARC-AGI test set
- Outperforms much larger models like o3-mini and Claude-4-Sonnet
π― Key Insight: Failed programs aren't useless! Through "hindsight relabeling," SOAR treats each failed program as the *correct* solution to a different (synthetic) problem. This massively expands the training data diversity.
π§ **The Learning Process**: The system learns TWO skills simultaneously:
- **Sampling**: Generate better initial solutions
- **Refinement**: Enhance initial solutions
We also find that learning both together works better than specializing!
π SOAR doesn't just search harder β it gets SMARTER. It alternates between:
- Evolutionary search: LLM samples and refines candidate programs.
- Hindsight learning: The model learns from all its search attempts, successes and failures, to fine-tune its skills for the next round.
π¬ Why This Matters? Most coding tasks are too hard for even the best language models to solve in one shot. Traditional search methods help, but they hit a wall because the modelβs abilities are fixed. SOAR breaks through this barrier by letting the model improve itself over time
Introducing SOAR π, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!)
It brings LLMs from just a few percent on ARC-AGI-1 up to 52%
Weβre releasing the finetuned LLMs, a dataset of 5M generated programs and the code.
π§΅