Full writeup, more demos, and thoughts here: villekuosmanen.medium.com/3de024f930e0
Hoping to revisit this project after the holidays with diffusion-based model architectures which I expect would give better results!
Full writeup, more demos, and thoughts here: villekuosmanen.medium.com/3de024f930e0
Hoping to revisit this project after the holidays with diffusion-based model architectures which I expect would give better results!
As an ablation, we train an ACT model with human-collected data only (full rollouts and interventions), and get much better results - proving the problem is caused by the model not learning differentiated distributions for positive and negative advantage actions.
However, the results are underwhelming - our Advantage-conditioned model seems to learn the average of all actions.
Diffusion models are specifically designed to allow conditional generation - perhaps adding a single token into the inputs of a transformer is not enough?
By replicating the Advantage calculation from the Pi paper (along with some heuristics), we can divide actions as positive or negative advantage. Adding this βAdvantage tokenβ to our inputs should allow the trained model to predict positive actions onlyβ¦
I had previously shown how the ACT architecture can be used as a robust value function.
By changing the value prediction head from regression to binned classification with cross-entropy loss (like in the Pi paper) we get accurate reward predictions for each time stepβ¦
Last month Physical Intelligence published an elegant RL method for improving VLAs with both success and failure data.
We replicated the paper to see if the same method works with small, single-task ACT modelsβ¦
duplicates or corrupted episodes in your LeRobot datasets?
building a data studio to find and delete them, as well as much more!
private beta coming soon, DM if you want access and are happy with occasional bugs
I still can't get over that this is a real graph!
I wonder if it holds true with pi0 or is only a feature of weaker, lower FPS foundation models like OpenVLA and Octo?
generalist-distillation.github.io/static/high_...
basically trajectories from rolling out RL policies generate more useful data than human teleoperation.
genuinely surprised by this though "tasks with well defined reward functions" is probably doing some heavy lifting here
Reading the RLDG and this statement is crazy π€―
is RL just going to solve robot manipulation?
A perfect replica of tooth enamel, created by Zhejiang University researchers: www.scmp.com/news/china/s...
Does a CLIP style model for video exist?
Would be super useful for robotics. Currently I see people encoding camera frames individually which is super inefficient.
Navigation World Models from Meta seems super exciting to me! I think deep learning will eventually supplant SLAM in navigation, and the NWM seems like a step toward the right direction.
Hoping the model & code can be open-sourced in the future! www.amirbar.net/nwm/
My definition would be something like "a model for controlling one or more type of robot, that is able to do different kinds of tasks based on prompting."
Interesting π thanks!
Do you think this is because the current benchmarks are not good enough, or something more fundamental (i.e. there can never be a good benchmark because of X reasons)?
Asking because I might do a side project on this space if it looks like a high-value problem to solve.
Sorry, I am struggling to understand how this is related to robotics benchmarks? Looks like some kind of IoT management platform to me...
I know @cpaxton.bsky.social has done a few OVMM benchmarks in the past. But these were pretty strongly coupled with the Habitat simulator and deploying an unrelated model would require lots of engineering.
Think better tools to reduce this engineering time to connect to sim would be a good start.
I agree.
Perhaps we need multiple different simulations & easy ways to run any policy implementation against all of them.
If it performs well in diverse sims, is probably performs well in the real world as well.
Does Bluesky have a techy industrial / manufacturing side, or is that community still in Twitter?
Out yourself in the replies if you are building in this space!
I actually really like Twitter's new Blue Tick policy, it makes is so much easier to find real, interesting people in the sea of bots.
Would like for Bluesky to add something like that as well. I'd probably pay a few Β£ a month for it.
What robotics benchmarks are people currently using?
Do we need a better benchmark to compare robotics foundation models? Is anyone working on this?
The sim2real has essentially solved locomotion which made me adjust my priors. But modelling physics and reward functions for mobile manipulation is much harder than for locomotion. But I'd place a cautious bet on the approach working.
I'm interested in following the growth in sim2real techniques. With GenAI you could create infinitely diverse envs, though major engineering challenges remain.