Momchil Tomov's Avatar

Momchil Tomov

@momchiltomov

Cognitive Neuroscientist @ Harvard, AI Researcher @ Motional Models of human & robot decision making in complex environments, including video games and urban driving. https://www.momchiltomov.com/

179
Followers
305
Following
10
Posts
06.06.2025
Joined
Posts Following

Latest posts by Momchil Tomov @momchiltomov

Post image

Hello all! πŸ‘‹ 🚨 New Preprint Alert! 🚨

Code World Models for General Game-Playing. β™ŸοΈπŸŽ² ♣️β™₯️♠️♦️

I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!

🧡 1/N

09.10.2025 19:27 πŸ‘ 54 πŸ” 9 πŸ’¬ 2 πŸ“Œ 4

Yes! I'm putting out the ad for that early next year. Let's get in touch.

03.12.2025 22:41 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Here are several examples of real-world cut-ins. TreeIRL anticipates the cut-in and brakes comfortably, while the other baselines either brake too late or brake uncomfortably (see inset history of vehicle kinematics).

18.09.2025 15:49 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Tree achieves 1-2 orders of magnitude improvement in safety, while also improving comfort and progress! On the road, it is by far the best planner.

18.09.2025 15:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We compare TreeIRL against multiple classical and SOTA planners in 7000+ nuPlan simulations. But the most exciting result is from deploying and evaluating the planners on real self-driving cars in Las Vegas.

18.09.2025 15:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image

We feed the MCTS trajectories into a deep scoring function trained with IRL to choose the most human-like among them.

The IRL network is trained on many hours of human export demonstrations to effectively reverse-engineer the intrinsic reward function of human driving.

18.09.2025 15:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

MCTS uses search + ML to efficiently explore combinatorially large search spaces. In most applications (e.g. AlphaGo), MCTS outputs a single next best action.

The main innovation is to reporpose MCTS to ouput a *set of possible sequences* of actions (i.e., trajectories).

18.09.2025 15:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Why it matters (cont'd):

🧩 Flexible framework that can be extended with imitation learning and reinforcement learning.

‼️ Underscores importance of diverse metrics and real-world evaluation.

18.09.2025 15:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Why this matters:

πŸ›£οΈ First real-world evaluation of MCTS-based planner on public roads.

πŸ“Š Comprehensive comparison across simulation and **500+ miles of urban driving** in Las Vegas.

πŸ† Beats classical + SOTA planners, balancing safety, progress, comfort, and human-likeness.

18.09.2025 15:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation a...

πŸ’‘The key idea is to use Monte Carlo tree search (MCTS) to find a promising set of safe candidate trajectories and inverse reinforcement learning (IRL) to choose the most human-like trajectory among them.

Read the full paper here --> arxiv.org/abs/2509.13579

18.09.2025 15:39 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Excited to share a new preprint based on my work this past year:

**TreeIRL** is a novel planner that combines classical search with learning-based methods to achieve state-of-the-art performance in simulation and in **real-world autonomous driving**! 🚘 πŸ€– πŸš€

18.09.2025 15:39 πŸ‘ 27 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0
Preview
Competitive integration of time and reward explains value-sensitive foraging decisions and frontal cortex ramping dynamics Bukwich and Campbell et al. show that mice integrate elapsed time and reward intake, scaled by a latent patience variable, to decide when to leave virtual β€œpatches.” Frontal cortex ramping activity ma...

Our paper on foraging is now published in Neuron! Read it here:

www.cell.com/neuron/fullt...

This project was co-led by Michael Bukwich (not on Bluesky) and me, with major contributions from all co-authors. Huge thanks to the whole team!

07.08.2025 17:35 πŸ‘ 82 πŸ” 22 πŸ’¬ 2 πŸ“Œ 1