Many thanks to @marcelhussing.bsky.social, Shubhankar Patankar, and advisors @danisbassett.bsky.social, @jmendezm.bsky.social, @ericeaton.bsky.social for the collaboration and guidance that made this work possible π¦Ύ.
π§΅9/9
Many thanks to @marcelhussing.bsky.social, Shubhankar Patankar, and advisors @danisbassett.bsky.social, @jmendezm.bsky.social, @ericeaton.bsky.social for the collaboration and guidance that made this work possible π¦Ύ.
π§΅9/9
πResult 3: the model learns meaningful compositional structure.
Attention & intervention analyses reveal structured dependencies. The learned task graph differs from prior hand-designed architectures & better reflects which components matter for action & reward prediction.
π§΅8/9
πResult 2: iterative compositional generation solves almost all tasks over time.
Over refinement rounds, our model yields successful trajectories for nearly every task, outperforming monolithic generation and providing a strong foundation for downstream policy learning.
π§΅7/9
πResult 1: compositional generation of unseen tasks enables strong performance & improves with iteration.
Policies trained on synthetic data from our model outperform monolithic & standard DiT baselines, and quickly surpass multitask RL baselines without any new real data.
π§΅6/9
Starting from data on ~22% of tasks, we iteratively generate data for unseen combinations, evaluate via offline RL & add datasets that yield strong policies to the next iteration training set.
Component-local updates prevent cross-task corruption, mitigating model collapse.
π§΅5/9
Key design choice: semantic, compositional tokenization.
Each transition is tokenized by task components, not arbitrary patches. Each observation component has its own encoder and decoder, so synthetic data only updates the parts involved in that task, not the entire model.
π§΅4/9
We model tasks as a functionally compositional graph with state components, action, reward, and terminal as nodes.
Rather than hard-coding, a diffusion transformer learns this graph. Attention enables info exchange between components, capturing structure directly from data.
π§΅3/9
To ground this idea, we use CompoSuite, where manipulation tasks are defined by composing a robot, object, obstacle and objective, yielding 256 tasks with shared components but distinct solutions. Observations include symbolic robot state, object, obstacle, and goal poses.
π§΅2/9
Most prior work uses generative models to upsample data within a single task.
We ask a different question:
π Can we exploit the compositional structure of manipulation tasks to generate data for unseen task combinations using conditional generative models?
π§΅1/9
π€ Robotic manipulation tasks grow combinatorially, but data collection still scales linearly.
Is there a better way to obtain expert datasets at scale?π€
Excited to share our latest work, Iterative Compositional Data Generation for Robot Control.
π doi.org/10.48550/arXiv.2512.10891
π§΅π
Anh-Quan Pham, Marcel Hussing, Shubhankar P. Patankar, Dani S. Bassett, Jorge Mendez-Mendez, Eric Eaton
Iterative Compositional Data Generation for Robot Control
https://arxiv.org/abs/2512.10891
Anh-Quan Pham, Kabir Ram Puri, Shreyas Raorane
SBAMP: Sampling Based Adaptive Motion Planning
https://arxiv.org/abs/2511.12022
it appears your low-hanging-fruit phd project wasn't so risk free after all, mr. bond
Did you get a chance to try the Kaya Toast too? Btw I spent 5 months doing an RL research internship there last year and made a list of must-visit spots for when my parents visited. Happy to share the list if youβre interested/have time π«‘
One of the courses I'm taking this sem revolves around this book, & I love it so far. It really provides new perspectives & approaches to understanding which robotics problems are solved and which aren't (my background is in RL so forgive me if the content is already common knowledge to people).
True. I imagine a "reviewer mentor" just means triple the work, including read your assigned papers, mentees' assigned papers AND mentees' reviews to give feedback.
Do you mind sharing whaf were the obvious reason? I think it's a very good format to follow
π (r we pretending that we didnt know?)
I remember there was one video in which they admitted that most shots take between 1-10 trials I think (the one where they dropped a basketball from an airplane took 2 iirc). It's still a surprisingly low number of trials compared to the avr person doing it.
Except the fact that Dude Perfect admitted they only posted their perfect shots.
Many people just didn't
Reminds me of the time I heard about people implementing progress bars to make users "feel" the process is faster. I think these cases can be referred to as examples of consumer behavior bias, things that are there to make people feel good :)
I haven't try o1 pro so no idea about its performance yet
I had similar thoughts when reading about Jibo's shutdown
Cloud-based makes sense for keeping costs down, especially when rolling new updates. Really hope they maintain a local version or a way to keep things running too, since robots can bring much more emotional bonds than other old brick hardware
Curious to see the learning process π