Mehdi S. M. Sajjadi's Avatar

Mehdi S. M. Sajjadi

@msajjadi.com

Research Scientist Tech Lead & Manager Google DeepMind msajjadi.com

105
Followers
83
Following
8
Posts
26.11.2024
Joined
Posts Following

Latest posts by Mehdi S. M. Sajjadi @msajjadi.com

Preview
D4RT: Unified, Fast 4D Scene Reconstruction & Tracking Meet D4RT, a unified AI model for 4D scene reconstruction and tracking.

D4RT: Teaching AI to see the world in four dimensions
deepmind.google/blog/d4rt-te...

We just released a Google DeepMind blog post on our latest work, please check it out!

The project website & tech report can be found at d4rt-paper.github.io

22.01.2026 15:07 πŸ‘ 11 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

πŸ”₯ Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time
d4rt-paper.github.io

Building on the SRT architecture (srt-paper.github.io), D4RT unlocks a flexible interface for Dynamic 4D Reconstruction and Tracking.

It's truly been a privilege to work with this incredibly talented team.

09.12.2025 12:47 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Looking forward to it!

02.11.2025 05:42 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Scaling 4D Representations

Scaling 4D Representations

Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d

10.07.2025 11:52 πŸ‘ 20 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io

09.04.2025 14:04 πŸ‘ 23 πŸ” 9 πŸ’¬ 1 πŸ“Œ 0
Video vs. image diffusion representations

Video vs. image diffusion representations

Feature visualization for image and video diffusion

Feature visualization for image and video diffusion

Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?

We investigated this question and more in our latest work, please check it out!

*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001

13.02.2025 16:11 πŸ‘ 6 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Check out @tkipf.bsky.social's post on MooG, the latest in our line of research on self-supervised neural scene representations learned from raw pixels:

SRT: srt-paper.github.io
OSRT: osrt-paper.github.io
RUST: rust-paper.github.io
DyST: dyst-paper.github.io
MooG: moog-paper.github.io

13.01.2025 15:25 πŸ‘ 13 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Viorica Patraucean on LinkedIn: Super excited to share our recent work on designing more efficient video… Super excited to share our recent work on designing more efficient video models: TRecViT https://lnkd.in/ehh4gGbn alternates SSM blocks (LRUs) that integrate…

Authors:
Viorica Pătrăucean, Xu Owen He, Joseph Heyward, Chuhan Zhang, Mehdi S. M. Sajjadi, George-Cristian Muraru, Artem Zholus, Mahdi Karami, Ross Goroshin, Yutian Chen, Simon Osindero, João Carreira, Razvan Pascanu

Original post:
www.linkedin.com/posts/vioric...

10.01.2025 15:44 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
TRecViT architecture

TRecViT architecture

TRecViT: A Recurrent Video Transformer
arxiv.org/abs/2412.14294

Causal, 3Γ— fewer parameters, 12Γ— less memory, 5Γ— higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition.

Code and checkpoints out soon.

10.01.2025 15:44 πŸ‘ 25 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0