Ville (@villekuosmanen)

Does learning from experience benefit small AI robotics models? Replicating the RL loop behind Physical Intelligence’s Pi*0.6 foundation model without VLAs or diffusion

Full writeup, more demos, and thoughts here: villekuosmanen.medium.com/3de024f930e0

Hoping to revisit this project after the holidays with diffusion-based model architectures which I expect would give better results!

22.12.2025 11:36 👍 0 🔁 0 💬 0 📌 0

As an ablation, we train an ACT model with human-collected data only (full rollouts and interventions), and get much better results - proving the problem is caused by the model not learning differentiated distributions for positive and negative advantage actions.

22.12.2025 11:36 👍 0 🔁 0 💬 1 📌 0

However, the results are underwhelming - our Advantage-conditioned model seems to learn the average of all actions.

Diffusion models are specifically designed to allow conditional generation - perhaps adding a single token into the inputs of a transformer is not enough?

22.12.2025 11:34 👍 0 🔁 0 💬 1 📌 0

By replicating the Advantage calculation from the Pi paper (along with some heuristics), we can divide actions as positive or negative advantage. Adding this “Advantage token” to our inputs should allow the trained model to predict positive actions only…

22.12.2025 11:28 👍 1 🔁 0 💬 1 📌 0

I had previously shown how the ACT architecture can be used as a robust value function.

By changing the value prediction head from regression to binned classification with cross-entropy loss (like in the Pi paper) we get accurate reward predictions for each time step…

22.12.2025 11:27 👍 1 🔁 0 💬 1 📌 0

Last month Physical Intelligence published an elegant RL method for improving VLAs with both success and failure data.

We replicated the paper to see if the same method works with small, single-task ACT models…

22.12.2025 11:26 👍 1 🔁 0 💬 1 📌 0

duplicates or corrupted episodes in your LeRobot datasets?

building a data studio to find and delete them, as well as much more!

private beta coming soon, DM if you want access and are happy with occasional bugs

22.04.2025 20:05 👍 1 🔁 0 💬 0 📌 0

I still can't get over that this is a real graph!

I wonder if it holds true with pi0 or is only a feature of weaker, lower FPS foundation models like OpenVLA and Octo?

22.03.2025 21:44 👍 1 🔁 0 💬 0 📌 0

generalist-distillation.github.io/static/high_...

basically trajectories from rolling out RL policies generate more useful data than human teleoperation.

genuinely surprised by this though "tasks with well defined reward functions" is probably doing some heavy lifting here

22.03.2025 21:34 👍 0 🔁 0 💬 2 📌 0

Reading the RLDG and this statement is crazy 🤯

is RL just going to solve robot manipulation?

22.03.2025 21:34 👍 1 🔁 0 💬 1 📌 0

Chinese scientists ‘make first perfect replica’ of tooth enamel New material almost identical in structure to human enamel – which does not regenerate itself – can grow on teeth and last permanently, researchers say.

A perfect replica of tooth enamel, created by Zhejiang University researchers: www.scmp.com/news/china/s...

21.03.2025 11:09 👍 22 🔁 5 💬 0 📌 1

Does a CLIP style model for video exist?

Would be super useful for robotics. Currently I see people encoding camera frames individually which is super inefficient.

21.01.2025 16:04 👍 1 🔁 0 💬 0 📌 0

Navigation World Models

Navigation World Models from Meta seems super exciting to me! I think deep learning will eventually supplant SLAM in navigation, and the NWM seems like a step toward the right direction.

Hoping the model & code can be open-sourced in the future! www.amirbar.net/nwm/

13.12.2024 16:25 👍 0 🔁 0 💬 0 📌 0

My definition would be something like "a model for controlling one or more type of robot, that is able to do different kinds of tasks based on prompting."

06.12.2024 09:24 👍 1 🔁 0 💬 0 📌 0

Interesting 👀 thanks!

02.12.2024 18:26 👍 1 🔁 0 💬 0 📌 0

Do you think this is because the current benchmarks are not good enough, or something more fundamental (i.e. there can never be a good benchmark because of X reasons)?

Asking because I might do a side project on this space if it looks like a high-value problem to solve.

02.12.2024 15:59 👍 0 🔁 0 💬 1 📌 0

Sorry, I am struggling to understand how this is related to robotics benchmarks? Looks like some kind of IoT management platform to me...

02.12.2024 12:22 👍 0 🔁 0 💬 1 📌 0

I know @cpaxton.bsky.social has done a few OVMM benchmarks in the past. But these were pretty strongly coupled with the Habitat simulator and deploying an unrelated model would require lots of engineering.

Think better tools to reduce this engineering time to connect to sim would be a good start.

02.12.2024 12:14 👍 2 🔁 0 💬 1 📌 0

I agree.

Perhaps we need multiple different simulations & easy ways to run any policy implementation against all of them.

If it performs well in diverse sims, is probably performs well in the real world as well.

02.12.2024 12:12 👍 0 🔁 0 💬 0 📌 0

Does Bluesky have a techy industrial / manufacturing side, or is that community still in Twitter?

Out yourself in the replies if you are building in this space!

02.12.2024 11:13 👍 2 🔁 0 💬 1 📌 0

I actually really like Twitter's new Blue Tick policy, it makes is so much easier to find real, interesting people in the sea of bots.

Would like for Bluesky to add something like that as well. I'd probably pay a few £ a month for it.

02.12.2024 07:45 👍 1 🔁 0 💬 1 📌 0

What robotics benchmarks are people currently using?

Do we need a better benchmark to compare robotics foundation models? Is anyone working on this?

02.12.2024 07:44 👍 9 🔁 1 💬 4 📌 0

The sim2real has essentially solved locomotion which made me adjust my priors. But modelling physics and reward functions for mobile manipulation is much harder than for locomotion. But I'd place a cautious bet on the approach working.

14.11.2024 10:54 👍 1 🔁 0 💬 1 📌 0

I'm interested in following the growth in sim2real techniques. With GenAI you could create infinitely diverse envs, though major engineering challenges remain.

14.11.2024 10:53 👍 1 🔁 0 💬 2 📌 0

Ville

Latest posts by Ville @villekuosmanen