D4RT: Teaching AI to see the world in four dimensions
deepmind.google/blog/d4rt-te...
We just released a Google DeepMind blog post on our latest work, please check it out!
The project website & tech report can be found at d4rt-paper.github.io
D4RT: Teaching AI to see the world in four dimensions
deepmind.google/blog/d4rt-te...
We just released a Google DeepMind blog post on our latest work, please check it out!
The project website & tech report can be found at d4rt-paper.github.io
π₯ Efficiently Reconstructing Dynamic Scenes One π― D4RT at a Time
d4rt-paper.github.io
Building on the SRT architecture (srt-paper.github.io), D4RT unlocks a flexible interface for Dynamic 4D Reconstruction and Tracking.
It's truly been a privilege to work with this incredibly talented team.
Looking forward to it!
Scaling 4D Representations
Scaling 4D Representations
Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.
Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io
Video vs. image diffusion representations
Feature visualization for image and video diffusion
Generative Video Diffusion: does a model trained with this objective learn better features compared to image generation?
We investigated this question and more in our latest work, please check it out!
*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001
Check out @tkipf.bsky.social's post on MooG, the latest in our line of research on self-supervised neural scene representations learned from raw pixels:
SRT: srt-paper.github.io
OSRT: osrt-paper.github.io
RUST: rust-paper.github.io
DyST: dyst-paper.github.io
MooG: moog-paper.github.io
Authors:
Viorica PΔtrΔucean, Xu Owen He, Joseph Heyward, Chuhan Zhang, Mehdi S. M. Sajjadi, George-Cristian Muraru, Artem Zholus, Mahdi Karami, Ross Goroshin, Yutian Chen, Simon Osindero, JoΓ£o Carreira, Razvan Pascanu
Original post:
www.linkedin.com/posts/vioric...
TRecViT architecture
TRecViT: A Recurrent Video Transformer
arxiv.org/abs/2412.14294
Causal, 3Γ fewer parameters, 12Γ less memory, 5Γ higher FLOPs than (non-causal) ViViT, matching / outperforming on Kinetics & SSv2 action recognition.
Code and checkpoints out soon.