I'll be presenting our work at #cvpr2025. We finetune video models to be 3d consistent without any 3d supervision!
Feel free to stop by our poster or reach out to chat:
Sunday, Jun 15, 4-6pm
ExHall D, poster #168
cvpr.thecvf.com/virtual/2025...
13.06.2025 17:31
π 5
π 0
π¬ 0
π 0
a photograph of a cat, overlaid with the words: "stop using chatgpt i can also give you misinformation and i'm beautiful"
i'm mostly not out here just to repost memes, but this is in fact a photograph of me soooo
29.04.2025 20:54
π 17206
π 4927
π¬ 121
π 64
Radiance Fields and the Future of Generative Media
YouTube video by Jon Barron
Here's a recording of my 3DV keynote from a couple weeks ago. If you're already familiar with my research, I recommend skipping to ~22 minutes in where I get to the fun stuff (whether or not 3D has been bitter-lesson'ed by video generation models)
www.youtube.com/watch?v=hFlF...
28.04.2025 20:52
π 60
π 12
π¬ 2
π 1
Introducing MegaSaM!
Accurate, fast, & robust structure + camera estimation from casual monocular videos of dynamic scenes!
MegaSaM outputs camera parameters and consistent video depth, scaling to long videos with unconstrained camera paths and complex scene dynamics!
06.12.2024 17:42
π 68
π 18
π¬ 1
π 4
Introducing πStereo4Dπ
A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories.
We used Stereo4D to make a dataset of over 100k real-world 4D scenes.
13.12.2024 03:13
π 59
π 12
π¬ 2
π 3
Check out CAT4D: our new paper that turns (text, sparse images, videos) => (dynamic 3D scenes)!
I can't get over how cool the interactive demo is.
Try it out for yourself on the project page: cat-4d.github.io
28.11.2024 02:52
π 63
π 14
π¬ 1
π 1
This work wouldn't have been possible without my internship mentor Kai, advisors @snavely.bsky.social @bharathhariharan.bsky.social, and other coauthors.
Project Page: genechou.com/kfcw
N/N
21.11.2024 16:19
π 2
π 0
π¬ 0
π 0
Our method outperforms commercial models in terms of geometric and appearance consistency, and we show that video models trained with 3D-aware objectives can be useful as 3D priors for downstream tasks such as SfM and 3DGS. Check out our paper for more details! 5/N
21.11.2024 16:19
π 1
π 0
π¬ 1
π 0
We design a self-supervised method that takes advantage of the consistency of videos and variability of multiview internet photos to train a 3D-aware video model without ANY 3D annotations. We name our method KFC-W (KeyFrame-Conditioned video generation in-the-Wild). 4/N
21.11.2024 16:19
π 3
π 1
π¬ 1
π 0
This task is difficult for video models! They perform visually pleasing morphing but can ignore scene identity and hallucinate new structures. Our main insight is that training for general video synthesis is not enough: we need to introduce scalable, 3D-aware objectives. 3/N
21.11.2024 16:19
π 3
π 0
π¬ 1
π 0
We propose the task of generating videos from sparse (2-5), unposed internet photos. A modelβs ability to capture underlying geometry, recognize scene identity, and relate frames in terms of camera position reflects its understanding of 3D structure and scene layout. 2/N
21.11.2024 16:19
π 2
π 0
π¬ 1
π 0
We've released our paper "Generating 3D-Consistent Videos from Unposed Internet Photos"! Video models like Luma generate pretty videos, but sometimes struggle with 3D consistency. We can do better by scaling them with 3D-aware objectives. 1/N
page: genechou.com/kfcw
21.11.2024 16:19
π 111
π 17
π¬ 3
π 2