Jiafei Duan's Avatar

Jiafei Duan

@djiafei

Robotics PhD student @uwcse|Graduate Student Researcher @allen_ai |Ex-@NVIDIA |@ASTARsg scholars|BEng from @ntueee. Research in robot learning and embodied AI www.duanjiafei.com

160
Followers
328
Following
19
Posts
27.11.2024
Joined
Posts Following

Latest posts by Jiafei Duan @djiafei

Preview
Top 10 Robotics Papers of 2024 - Community Survey (Open nomination) We are gathering the Top 10 Robotics Papers for 2024 in the following areas: Navigation Manipulation Whole-body (Humanoids/Locomotion) Foundation Models for Robotics Systems Benchmarks & Simulation Pl...

🌟 Open Call for Nominations: Top 10 Robotics Papers of 2024 πŸ“š

πŸ† Categories:
1️⃣ Navigation 🧭
2️⃣ Manipulation πŸ€–
3️⃣ Whole-Body Motion (Humanoids/Locomotion) πŸšΆβ€β™‚οΈ
4️⃣ Foundation Models for Robotics 🧠
5️⃣ Robotic Systems βš™οΈ
6️⃣ Benchmarks & Simulation πŸ§ͺ

docs.google.com/forms/d/e/1F...

18.12.2024 00:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
SAT: Spatial Aptitude Training for Multimodal Language Models Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spati...

Paper link: arxiv.org/abs/2412.07755

11.12.2024 16:12 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

10/🧡Curious for more? Check out our paper for the full breakdown: "SAT: Spatial Aptitude Training for Multimodal Language Models" by @ARRay693 @ehsanik @anikembhavi @rosemhendrix @RanjayKrishna @KuoHaoZeng @kate_saenko_ @drbashkirova and et al.

11.12.2024 16:12 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

9/🧡The takeaway: Dynamic spatial QAs improve static QA performance too!
Mixing static & dynamic training data results in significant accuracy gains across all tasks. πŸ“Š

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

8/🧡Challenges MLMs face:
Even strong models perform near-randomly on SAT's dynamic tasks.
Egocentric movement and multiview reasoning remain tough nuts to crack.

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

7/🧡SAT enables five complex spatial tasks:
Egocentric Movement
Object Movement
Allocentric Perspective
Goal Aiming
Action Consequence
Each task tests unique dimensions of spatial cognition.🧠

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

6/🧡How does SAT generate data?
Uses ProcTHOR for 3D scenes.
Procedurally generates static & dynamic QAs.
Scalable, cost-effective, & adaptable for new tasks. 🏠

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

5/🧡Here's the kicker: Fine-tuning on SAT makes the open-source LLaVA-13B model match or surpass proprietary giants like GPT4-V in spatial reasoning! 🎯

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

4/🧡 Results? SAT improves performance not only on its own dataset but also boosts zero-shot spatial reasoning:
+23% on CVBench
+9% on BLINK (harder benchmarks)
+18% on Visual Spatial Relations (VSR) dataset. πŸ’ͺ

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

3/🧡Example tasks SAT tackles:
Static: Is object X to the left of object Y?
Dynamic: How did the camera move between frames? Did the object get closer or further?
Perspective: What does object placement look like from point X?

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

2/🧡SAT introduces 218K question-answer pairs for 22K synthetic scenes created using a photorealistic physics engine. It goes beyond static benchmarks to tackle dynamic reasoning tasks like egocentric actions, object movement, & perspective-taking. πŸ”

11.12.2024 16:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

1/🧡 Why does spatial reasoning matter? 🌎 Cognitive science shows spatial reasoning is foundational to intelligence, impacting geometry, physics, and physical world reasoning. Yet, MLMs struggle with it, especially in dynamic real-world scenarios. Enter SAT! βš’οΈ

11.12.2024 16:12 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

πŸš€Excited to introduce our latest work- SAT: Spatial Aptitude Training, a groundbreaking approach to enhance spatial reasoning in Multimodal Language Models (MLMs). SAT isn't just about understanding static object positions but dives deep into dynamic spatial reasoning. πŸ§΅πŸ‘‡

11.12.2024 16:12 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

A scene from maniskill,
Prompt: Move the mobile robot to the table and place the red bowl onto the table.

10.12.2024 18:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think text-2-video is not that bad, at least we see some good robot motion for humanoid properly cause they trained on a lot of human video. But what is not good, is image-2-video generation

10.12.2024 18:50 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

I am impressed by Sora, and seeing potential for using it in robotics.

10.12.2024 05:40 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image

We’ve been investigating how sim, while wrong, can be useful for real-world robotic RL! In our #NeurIPS2024 work, we theoretically showed how naive sim2real transfer can be inefficient, but if you *learn to explore* in sim, this transfers to the real world! We show this works on real robots! 🧡(1/6)

06.12.2024 00:46 πŸ‘ 13 πŸ” 5 πŸ’¬ 2 πŸ“Œ 0

Everyone I see that moved from X to here seems to gain back the same number of following except me 🀣

04.12.2024 10:25 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Is there a robotics starter pack?

28.11.2024 04:45 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

β€œHello, World!”

27.11.2024 08:11 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0