Regularized self-play RL in grounded simulation effectively adapts driving policies to completely new cities. ๐ฝ -> ๐ผ
Really enjoyed collaborating on this work, led by Zilin and Saeed! Check out Zilin's post below for a great summary
๐งต: x.com/nirhso/statu...
๐: arxiv.org/abs/2602.15891
20.02.2026 20:09
๐ 21
๐ 3
๐ฌ 0
๐ 2
The most important finding from this analysis! See the post for more details
08.02.2026 20:20
๐ 5
๐ 1
๐ฌ 0
๐ 0
PufferDrive 2.0 release - PufferDrive
High-throughput autonomous driving simulator built on PufferLib.
Several fast evals are included, too! Check out our release post:
emerge-lab.github.io/PufferDrive/...
Work done with Spencer Cheng* (co-first), Pragnay Mandavilli, Julian Hunt, Kevin Joseph, Waรซl Doulazmi, Valentin Charraut, Aditya Gupta, Joseph Suarez, and
@eugenevinitsky.bsky.social
30.12.2025 16:12
๐ 5
๐ 0
๐ฌ 0
๐ 0
PufferDrive 2.0 release
YouTube video by Daphne Cornelisse
What if you could train agents on a ๐ฑ๐ฒ๐ฐ๐ฎ๐ฑ๐ฒ of driving experience in ๐๐ป๐ฑ๐ฒ๐ฟ ๐ฎ๐ป ๐ต๐ผ๐๐ฟ, on a single GPU?
Excited to share ๐๐ช๐๐๐๐ง๐ฟ๐ง๐๐ซ๐ 2.0: A fast, friendly driving simulator with RL training via PufferLib at ๐ฏ๐ฌ๐ฌ๐ ๐๐๐ฒ๐ฝ๐/๐๐ฒ๐ฐ ๐ก + ๐
youtu.be/LfQ324R-cbE?...
30.12.2025 16:12
๐ 53
๐ 10
๐ฌ 3
๐ 1
Estimating cognitive biases with attention-aware inverse planning
People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to objects in their...
Excited to share a new preprint, accepted as a spotlight at #NeurIPS2025!
Humans are imperfect decision-makers, and autonomous systems should understand how we deviate from idealized rationality
Our paper aims to address this! ๐๐ง โจ
arxiv.org/abs/2510.25951
a ๐งตโคต๏ธ
13.11.2025 13:20
๐ 62
๐ 14
๐ฌ 1
๐ 2
How to catch subtle RL bugs before they catch you
Tools and habits for reliable, fast RL experimentation and development
Rapid RL experimentation is great. But how do you catch silent errors before they slip by?
In this post, I share tools and habits that help me move quickly from idea to result without sacrificing reliability.
13.10.2025 11:29
๐ 41
๐ 5
๐ฌ 0
๐ 1
The single biggest epistemic challenge in the internet era is remaining calibrated about what "normal" people think while the internet throws up an infinite wall of crazy. Thousands of people sharing an absurd opinion on the internet tells you very little!
08.09.2025 18:43
๐ 128
๐ 11
๐ฌ 8
๐ 7
Overnight runs are the overnight oats of research โ prep, forget, and rewarding by morning
19.04.2025 00:44
๐ 14
๐ 4
๐ฌ 0
๐ 0
Self-play for Self-driving and where Scaling Reinforcement Learning is Heading with Eugene Vinitsky
YouTube video by Interconnects AI
Building a "human-level" simulated driver that zero-shot generalizes to many benchmarks: a fun interview with @natolambert.bsky.social
www.youtube.com/watch?v=2Q66...
12.03.2025 19:19
๐ 18
๐ 3
๐ฌ 0
๐ 1
This was joint work with Aarav Pandya, Kevin Joseph, Joseph Suรกrez, and @eugenevinitsky.bsky.social
28.02.2025 17:19
๐ 2
๐ 0
๐ฌ 0
๐ 0
Results (2): Beyond in-distribution generalization, our agents show partial robustness to scenarios that rarely occur in the data.
More importantly, results show that agents can be fine-tuned in minutes to reach near-perfect performance in such cases.
28.02.2025 17:19
๐ 2
๐ 0
๐ฌ 1
๐ 0
Results (1): Self-play scales well with data. With 10,000 training scenarios, the model approaches nearly the ceiling of our benchmark, achieving a 99.81% goal-reaching rate, 0.44% collision rate, and 0.31% off-road rate on 10,000 held-out test scenarios.
28.02.2025 17:19
๐ 2
๐ 0
๐ฌ 1
๐ 0
We train sim agents using self-play PPO on 10K+ scenarios from the Waymo Open Dataset in GPUDrive, under a semi-realistic framework for human perception and control.
Agents learn goal-directed behavior, avoiding collisions and staying on the road.
28.02.2025 17:19
๐ 2
๐ 0
๐ฌ 1
๐ 0
SOTA generative models trained on large human datasets show unintended behaviors like crashes (5-6%) and off-road events (6-12%) in benchmarks for nominal driving.
Unpredictable deviations make it hard to separate signal from noise.
28.02.2025 17:19
๐ 3
๐ 0
๐ฌ 1
๐ 0
Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars.
We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.
28.02.2025 17:19
๐ 34
๐ 5
๐ฌ 3
๐ 1
Challenge accepted
26.02.2025 18:13
๐ 2
๐ 0
๐ฌ 2
๐ 0
Oh and, stay tuned for another big release tomorrow!
20.02.2025 18:53
๐ 2
๐ 0
๐ฌ 0
๐ 0
Huge thanks to my incredible collaborators for making this possible: Saman Kazemkhani, Aarav Pandya, @eugenevinitsky.bsky.social , Joseph Suarez for converting the sim to a package and optimizing the PPO loop, and Kevin Joseph for all his help with data processing, tutorials, and more! ๐
20.02.2025 18:53
๐ 4
๐ 0
๐ฌ 2
๐ 0
GPUDrive got accepted to ICLR 2025!
With that, we release GPUDrive v0.4.0! ๐จ You can now install the repo and run your first fast PPO experiment in under 10 minutes.
Iโm honestly so excited about the new opportunities and research the sim makes possible. ๐ 1/2
20.02.2025 18:53
๐ 45
๐ 4
๐ฌ 2
๐ 1
A large group of us (spearheaded by Denizalp Goktas) have put out a position paper on paths towards foundation models for strategic decision-making. Language models still lack these capabilities so we'll need to build them: hal.science/hal-04925309...
18.02.2025 18:33
๐ 33
๐ 7
๐ฌ 2
๐ 0