Jiafei Duan (@djiafei)

Top 10 Robotics Papers of 2024 - Community Survey (Open nomination) We are gathering the Top 10 Robotics Papers for 2024 in the following areas: Navigation Manipulation Whole-body (Humanoids/Locomotion) Foundation Models for Robotics Systems Benchmarks & Simulation Pl...

🌟 Open Call for Nominations: Top 10 Robotics Papers of 2024 📚

🏆 Categories:
1️⃣ Navigation 🧭
2️⃣ Manipulation 🤖
3️⃣ Whole-Body Motion (Humanoids/Locomotion) 🚶‍♂️
4️⃣ Foundation Models for Robotics 🧠
5️⃣ Robotic Systems ⚙️
6️⃣ Benchmarks & Simulation 🧪

docs.google.com/forms/d/e/1F...

18.12.2024 00:54 👍 2 🔁 0 💬 0 📌 0

SAT: Spatial Aptitude Training for Multimodal Language Models Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spati...

Paper link: arxiv.org/abs/2412.07755

11.12.2024 16:12 👍 2 🔁 0 💬 0 📌 0

10/🧵Curious for more? Check out our paper for the full breakdown: "SAT: Spatial Aptitude Training for Multimodal Language Models" by @ARRay693 @ehsanik @anikembhavi @rosemhendrix @RanjayKrishna @KuoHaoZeng @kate_saenko_ @drbashkirova and et al.

11.12.2024 16:12 👍 2 🔁 0 💬 1 📌 0

9/🧵The takeaway: Dynamic spatial QAs improve static QA performance too!
Mixing static & dynamic training data results in significant accuracy gains across all tasks. 📊

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

8/🧵Challenges MLMs face:
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

7/🧵SAT enables five complex spatial tasks:
Egocentric Movement
Object Movement
Allocentric Perspective
Goal Aiming
Action Consequence
Each task tests unique dimensions of spatial cognition.🧠

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

6/🧵How does SAT generate data?
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

5/🧵Here's the kicker: Fine-tuning on SAT makes the open-source LLaVA-13B model match or surpass proprietary giants like GPT4-V in spatial reasoning! 🎯

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

4/🧵 Results? SAT improves performance not only on its own dataset but also boosts zero-shot spatial reasoning:
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. 💪

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

3/🧵Example tasks SAT tackles:
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

2/🧵SAT introduces 218K question-answer pairs for 22K synthetic scenes created using a photorealistic physics engine. It goes beyond static benchmarks to tackle dynamic reasoning tasks like egocentric actions, object movement, & perspective-taking. 🔍

11.12.2024 16:12 👍 0 🔁 0 💬 1 📌 0

1/🧵 Why does spatial reasoning matter? 🌎 Cognitive science shows spatial reasoning is foundational to intelligence, impacting geometry, physics, and physical world reasoning. Yet, MLMs struggle with it, especially in dynamic real-world scenarios. Enter SAT! ⚒️

11.12.2024 16:12 👍 2 🔁 0 💬 1 📌 0

🚀Excited to introduce our latest work- SAT: Spatial Aptitude Training, a groundbreaking approach to enhance spatial reasoning in Multimodal Language Models (MLMs). SAT isn't just about understanding static object positions but dives deep into dynamic spatial reasoning. 🧵👇

11.12.2024 16:12 👍 1 🔁 1 💬 1 📌 0

A scene from maniskill,
Prompt: Move the mobile robot to the table and place the red bowl onto the table.

10.12.2024 18:51 👍 0 🔁 0 💬 0 📌 0

I think text-2-video is not that bad, at least we see some good robot motion for humanoid properly cause they trained on a lot of human video. But what is not good, is image-2-video generation

10.12.2024 18:50 👍 1 🔁 0 💬 0 📌 0

I am impressed by Sora, and seeing potential for using it in robotics.

10.12.2024 05:40 👍 1 🔁 0 💬 2 📌 0

We’ve been investigating how sim, while wrong, can be useful for real-world robotic RL! In our #NeurIPS2024 work, we theoretically showed how naive sim2real transfer can be inefficient, but if you *learn to explore* in sim, this transfers to the real world! We show this works on real robots! 🧵(1/6)

06.12.2024 00:46 👍 13 🔁 5 💬 2 📌 0

Everyone I see that moved from X to here seems to gain back the same number of following except me 🤣

04.12.2024 10:25 👍 1 🔁 0 💬 1 📌 0

Is there a robotics starter pack?

28.11.2024 04:45 👍 1 🔁 0 💬 1 📌 0

“Hello, World!”

27.11.2024 08:11 👍 3 🔁 0 💬 1 📌 0

Jiafei Duan

Latest posts by Jiafei Duan @djiafei