The ImageNet is not a benchmark though. It is the training dataset enabling deep learning revolution :) And only then it became (worse and worse) benchmark
The ImageNet is not a benchmark though. It is the training dataset enabling deep learning revolution :) And only then it became (worse and worse) benchmark
Wow! Never thought I would see it on the internet ever again. I've been using this picture as an avatar for at least 20 years.
Glad it's available in higher resolution than it used to.
You cannot make this thing up
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection
Yang Cao, Feize Wu, Dave Zhenyu Chen, Yingji Zhong, Lanqing Hong, Dan Xu
tl;dr: VGGT feature are not that great for detection unless you bell&whistle.
arxiv.org/abs/2603.00912
Multimodal Large Language Models as Image Classifiers
Nikita Kisel, Illia Volkov @klara-cz.bsky.social Jiri Matas
tl;dr: if you evaluate good (chatGPT) model on a dirty (ImageNet) test set, it is bad. Yes, ImageNet test is bad nowadays. +insights from labeling.
arxiv.org/abs/2603.065...
Lol. I got stuck and rage-quit at convincing chat-bot stage.
π§΅For the last seven years, I kept re-implementing the same pattern: A parallel map loop that divides the work among several processes or threads. My very first attempts were built on Pythonβs standard tools, e.g., multiprocessing.map... β©οΈ
To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. π
Let me introduce our new paper: Multimodal Large Language Models as Image Classifiers
β Multimodal LLMs are increasingly used for visual tasks, but evaluating their image classification ability has produced conflicting conclusions.
Link: arxiv.org/html/2603.06...
I've made a SatAst, a small collection of hand-annotated satellite to astronaut image correspondences, public on github: github.com/georg-bn/sat.... This benchmark is part of the RoMa v2 paper, see Johan's thread below. bsky.app/profile/pars...
"Authors should not use negative v-spaces to change the template layout."
The template layout:
yes
Utonia: Toward One Encoder for All Point Clouds
Yujia Zhang, Xiaoyang Wu, Yunhan Yang, Xianzhe Fan, Han Li, Yuechen Zhang, Zehao Huang, Naiyan Wang, Hengshuang Zhao
tl;dr: PointTransformerv3 pretrained on tons of different data
arxiv.org/abs/2603.03283
ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
@haian-jin.bsky.social Rundi Wu, Tianyuan Zhang, Ruiqi Gao, @jonbarron.bsky.social @snavely.bsky.social @holynski.bsky.social
tl;dr: more test-time-training for getting scene latent.
arxiv.org/abs/2603.04385
DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation
Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh, Kevin Blackburn-Matzen, Evangelos Kalogerakis, Chuang Gan, Joon-Young Lee
tl;dr: low-res multivew (Pi3-distilled) + highres single view( MoGe2 ft)
arxiv.org/abs/2603.03744
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers
tl;dr: let VGGT output latents -> decode point cloud.
arxiv.org/abs/2603.04179
Dark3R: Learning Structure from Motion in the Dark
Andrew Y Guo, Anagh Malik, SaiKiran Tedla, Yutong Dai, Yiqian Qin, Zach Salehe, Benjamin Attal, Sotiris Nousias, Kyros Kutulakos, David B. Lindell
tl;dr: LoRa for MASt3R to make it work on low-light.
arxiv.org/abs/2603.05330
#CVPR2026
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
Weirong Chen, @chuanxiaz.bsky.social, @ganlinzhang.xyz, Andrea Vedaldi, @dcremers.bsky.social
tl;dr: TripoSG+VGGT
layout issue with tables?
arxiv.org/abs/2603.04179
Submit your paper of structured reconstruction -- CAD, semantic, wireframe, city monitoring, etc., to USM3D 2026!
cmt3.research.microsoft.com/USM2026
Deadline: March 24, 2026.
@cvprconference.bsky.social
#CVPR2026
#USM3D2026 #USM3D
Wow
#CVPR2026 One more week to submit your work to the Embedded Vision Workshop @ CVPR! @cvprconference.bsky.social (new deadline: March 11)
Info at: embeddedvisionworkshop.wordpress.com
AI writing is like store-bought cake. It might be perfectly fine, maybe even as good as something you could make yourself, but itβs weird to give it to someone and say itβs homemade
Image Matching Challenge 2026.
It is named "IMC 2025 On-going".
- It will be living for longer than "until next CVPR" - multiyear leaderboard.
- No prize, but invite to talk about solution at CVPR.
- Dataset+metrics same as 2025.
www.kaggle.com/competitions...
#CVPR2026
@cvprconference.bsky.social
You still have 2 weeks to submit your paper to Image Matching Workshop at #CVPR2026
Deadline: March 16.
Topics: anything related to image matching and 3D reconstruction.
cmt3.research.microsoft.com/IMW2026
@cvprconference.bsky.social
Clarification about dual submissions to @eccv.bsky.social and @cvprconference.bsky.social Findings track.
If you submit the same work to ECCV, please do not opt in to CVPR 2026 Findingsβopting in would make it a dual submission. Opt-in instructions will be sent once the logistics are finalized.
Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?
Hongyu Li, Kuan Liu, Yuan Chen, Juntao Hu, Huimin Lu, Guanjie Chen, Xue Liu, Guangming Lu, Hong Huang
tl;dr: Flux and NanoBanana fail at precise color filling.
arxiv.org/abs/2603.00166
I am delighted (that pun will make sense in a second) that @alistairfoggin.bsky.social's first paper, CroCoDiLight, has been accepted to ICLR. The idea came from a group discussion on the CroCo paper from @naverlabseurope.bsky.social and realising it might implicitly already understand relighting.
Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation
Han Xue, Nan Min, Xiaotong Liu, Wendi Chen, Yuan Fang, Jun Lv, Cewu Lu, Chuan Wen
tl;dr: fisheye cameras are great for robotics, when they can see non-textureless environment.
arxiv.org/abs/2603.02139
You still have 2 weeks to submit your paper to Image Matching Workshop at #CVPR2026
Deadline: March 16.
Topics: anything related to image matching and 3D reconstruction.
cmt3.research.microsoft.com/IMW2026
@cvprconference.bsky.social