Momchil Tomov (@momchiltomov)

Hello all! 👋 🚨 New Preprint Alert! 🚨

Code World Models for General Game-Playing. ♟️🎲 ♣️♥️♠️♦️

I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!

🧵 1/N

09.10.2025 19:27 👍 54 🔁 9 💬 2 📌 4

Yes! I'm putting out the ad for that early next year. Let's get in touch.

03.12.2025 22:41 👍 1 🔁 0 💬 1 📌 0

Here are several examples of real-world cut-ins. TreeIRL anticipates the cut-in and brakes comfortably, while the other baselines either brake too late or brake uncomfortably (see inset history of vehicle kinematics).

18.09.2025 15:49 👍 3 🔁 0 💬 0 📌 0

Tree achieves 1-2 orders of magnitude improvement in safety, while also improving comfort and progress! On the road, it is by far the best planner.

18.09.2025 15:48 👍 3 🔁 0 💬 1 📌 0

We compare TreeIRL against multiple classical and SOTA planners in 7000+ nuPlan simulations. But the most exciting result is from deploying and evaluating the planners on real self-driving cars in Las Vegas.

18.09.2025 15:48 👍 3 🔁 0 💬 2 📌 0

We feed the MCTS trajectories into a deep scoring function trained with IRL to choose the most human-like among them.

The IRL network is trained on many hours of human export demonstrations to effectively reverse-engineer the intrinsic reward function of human driving.

18.09.2025 15:48 👍 3 🔁 0 💬 1 📌 0

MCTS uses search + ML to efficiently explore combinatorially large search spaces. In most applications (e.g. AlphaGo), MCTS outputs a single next best action.

The main innovation is to reporpose MCTS to ouput a *set of possible sequences* of actions (i.e., trajectories).

18.09.2025 15:47 👍 3 🔁 0 💬 1 📌 0

Why it matters (cont'd):

🧩 Flexible framework that can be extended with imitation learning and reinforcement learning.

‼️ Underscores importance of diverse metrics and real-world evaluation.

18.09.2025 15:47 👍 3 🔁 0 💬 1 📌 0

Why this matters:

🛣️ First real-world evaluation of MCTS-based planner on public roads.

📊 Comprehensive comparison across simulation and **500+ miles of urban driving** in Las Vegas.

🏆 Beats classical + SOTA planners, balancing safety, progress, comfort, and human-likeness.

18.09.2025 15:47 👍 3 🔁 0 💬 1 📌 0

TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation a...

💡The key idea is to use Monte Carlo tree search (MCTS) to find a promising set of safe candidate trajectories and inverse reinforcement learning (IRL) to choose the most human-like trajectory among them.

Read the full paper here --> arxiv.org/abs/2509.13579

18.09.2025 15:39 👍 4 🔁 0 💬 1 📌 0

Excited to share a new preprint based on my work this past year:

**TreeIRL** is a novel planner that combines classical search with learning-based methods to achieve state-of-the-art performance in simulation and in **real-world autonomous driving**! 🚘 🤖 🚀

18.09.2025 15:39 👍 27 🔁 6 💬 1 📌 0

Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics Bukwich and Campbell et al. show that mice integrate elapsed time and reward intake, scaled by a latent patience variable, to decide when to leave virtual “patches.” Frontal cortex ramping activity ma...

Our paper on foraging is now published in Neuron! Read it here:

www.cell.com/neuron/fullt...

This project was co-led by Michael Bukwich (not on Bluesky) and me, with major contributions from all co-authors. Huge thanks to the whole team!

07.08.2025 17:35 👍 82 🔁 22 💬 2 📌 1

Momchil Tomov

Latest posts by Momchil Tomov @momchiltomov