Hello all! π π¨ New Preprint Alert! π¨
Code World Models for General Game-Playing. βοΈπ² β£οΈβ₯οΈβ οΈβ¦οΈ
I am pleased to announce our new paper, which provides an extremely sample-efficient way to create an agent that can perform well in multi-agent, partially-observed, symbolic environments!
π§΅ 1/N
09.10.2025 19:27
π 54
π 9
π¬ 2
π 4
Yes! I'm putting out the ad for that early next year. Let's get in touch.
03.12.2025 22:41
π 1
π 0
π¬ 1
π 0
Here are several examples of real-world cut-ins. TreeIRL anticipates the cut-in and brakes comfortably, while the other baselines either brake too late or brake uncomfortably (see inset history of vehicle kinematics).
18.09.2025 15:49
π 3
π 0
π¬ 0
π 0
Tree achieves 1-2 orders of magnitude improvement in safety, while also improving comfort and progress! On the road, it is by far the best planner.
18.09.2025 15:48
π 3
π 0
π¬ 1
π 0
We compare TreeIRL against multiple classical and SOTA planners in 7000+ nuPlan simulations. But the most exciting result is from deploying and evaluating the planners on real self-driving cars in Las Vegas.
18.09.2025 15:48
π 3
π 0
π¬ 2
π 0
We feed the MCTS trajectories into a deep scoring function trained with IRL to choose the most human-like among them.
The IRL network is trained on many hours of human export demonstrations to effectively reverse-engineer the intrinsic reward function of human driving.
18.09.2025 15:48
π 3
π 0
π¬ 1
π 0
MCTS uses search + ML to efficiently explore combinatorially large search spaces. In most applications (e.g. AlphaGo), MCTS outputs a single next best action.
The main innovation is to reporpose MCTS to ouput a *set of possible sequences* of actions (i.e., trajectories).
18.09.2025 15:47
π 3
π 0
π¬ 1
π 0
Why it matters (cont'd):
π§© Flexible framework that can be extended with imitation learning and reinforcement learning.
βΌοΈ Underscores importance of diverse metrics and real-world evaluation.
18.09.2025 15:47
π 3
π 0
π¬ 1
π 0
Why this matters:
π£οΈ First real-world evaluation of MCTS-based planner on public roads.
π Comprehensive comparison across simulation and **500+ miles of urban driving** in Las Vegas.
π Beats classical + SOTA planners, balancing safety, progress, comfort, and human-likeness.
18.09.2025 15:47
π 3
π 0
π¬ 1
π 0
Excited to share a new preprint based on my work this past year:
**TreeIRL** is a novel planner that combines classical search with learning-based methods to achieve state-of-the-art performance in simulation and in **real-world autonomous driving**! π π€ π
18.09.2025 15:39
π 27
π 6
π¬ 1
π 0