Dmytro Mishkin's Avatar

Dmytro Mishkin

@ducha-aiki

Marrying classical CV and Deep Learning. I do things, which work, rather than being novel, but not working. http://dmytro.ai

2,594
Followers
156
Following
1,340
Posts
06.12.2023
Joined
Posts Following

Latest posts by Dmytro Mishkin @ducha-aiki

The ImageNet is not a benchmark though. It is the training dataset enabling deep learning revolution :) And only then it became (worse and worse) benchmark

10.03.2026 20:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Wow! Never thought I would see it on the internet ever again. I've been using this picture as an avatar for at least 20 years.
Glad it's available in higher resolution than it used to.

10.03.2026 17:40 πŸ‘ 16 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

You cannot make this thing up

10.03.2026 18:56 πŸ‘ 23 πŸ” 5 πŸ’¬ 2 πŸ“Œ 1
Post image Post image Post image Post image

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

Yang Cao, Feize Wu, Dave Zhenyu Chen, Yingji Zhong, Lanqing Hong, Dan Xu

tl;dr: VGGT feature are not that great for detection unless you bell&whistle.
arxiv.org/abs/2603.00912

10.03.2026 11:19 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Multimodal Large Language Models as Image Classifiers

Nikita Kisel, Illia Volkov @klara-cz.bsky.social Jiri Matas

tl;dr: if you evaluate good (chatGPT) model on a dirty (ImageNet) test set, it is bad. Yes, ImageNet test is bad nowadays. +insights from labeling.
arxiv.org/abs/2603.065...

10.03.2026 10:56 πŸ‘ 7 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Lol. I got stuck and rage-quit at convincing chat-bot stage.

09.03.2026 21:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

🧡For the last seven years, I kept re-implementing the same pattern: A parallel map loop that divides the work among several processes or threads. My very first attempts were built on Python’s standard tools, e.g., multiprocessing.map... ↩️

09.03.2026 19:57 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. πŸ“ˆ

09.03.2026 20:08 πŸ‘ 4 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Post image

Let me introduce our new paper: Multimodal Large Language Models as Image Classifiers

❓ Multimodal LLMs are increasingly used for visual tasks, but evaluating their image classification ability has produced conflicting conclusions.

Link: arxiv.org/html/2603.06...

09.03.2026 20:08 πŸ‘ 11 πŸ” 3 πŸ’¬ 2 πŸ“Œ 1

I've made a SatAst, a small collection of hand-annotated satellite to astronaut image correspondences, public on github: github.com/georg-bn/sat.... This benchmark is part of the RoMa v2 paper, see Johan's thread below. bsky.app/profile/pars...

07.03.2026 18:05 πŸ‘ 7 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
I'm Not a Robot Prove your humanity once and for all

neal.fun/not-a-robot/

06.03.2026 15:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

"Authors should not use negative v-spaces to change the template layout."

The template layout:

06.03.2026 07:56 πŸ‘ 28 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0

yes

06.03.2026 15:34 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Utonia: Toward One Encoder for All Point Clouds

Yujia Zhang, Xiaoyang Wu, Yunhan Yang, Xianzhe Fan, Han Li, Yuechen Zhang, Zehao Huang, Naiyan Wang, Hengshuang Zhao
tl;dr: PointTransformerv3 pretrained on tons of different data
arxiv.org/abs/2603.03283

06.03.2026 14:31 πŸ‘ 12 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

@haian-jin.bsky.social Rundi Wu, Tianyuan Zhang, Ruiqi Gao, @jonbarron.bsky.social @snavely.bsky.social @holynski.bsky.social

tl;dr: more test-time-training for getting scene latent.
arxiv.org/abs/2603.04385

06.03.2026 14:09 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation

Tuan Duc Ngo, Jiahui Huang, Seoung Wug Oh, Kevin Blackburn-Matzen, Evangelos Kalogerakis, Chuang Gan, Joon-Young Lee
tl;dr: low-res multivew (Pi3-distilled) + highres single view( MoGe2 ft)
arxiv.org/abs/2603.03744

06.03.2026 14:02 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers

tl;dr: let VGGT output latents -> decode point cloud.
arxiv.org/abs/2603.04179

06.03.2026 13:54 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1
Post image Post image Post image Post image

Dark3R: Learning Structure from Motion in the Dark

Andrew Y Guo, Anagh Malik, SaiKiran Tedla, Yutong Dai, Yiqian Qin, Zach Salehe, Benjamin Attal, Sotiris Nousias, Kyros Kutulakos, David B. Lindell

tl;dr: LoRa for MASt3R to make it work on low-light.
arxiv.org/abs/2603.05330
#CVPR2026

06.03.2026 13:29 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Weirong Chen, @chuanxiaz.bsky.social, @ganlinzhang.xyz, Andrea Vedaldi, @dcremers.bsky.social

tl;dr: TripoSG+VGGT

layout issue with tables?
arxiv.org/abs/2603.04179

05.03.2026 17:06 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Submit your paper of structured reconstruction -- CAD, semantic, wireframe, city monitoring, etc., to USM3D 2026!

cmt3.research.microsoft.com/USM2026
Deadline: March 24, 2026.
@cvprconference.bsky.social
#CVPR2026
#USM3D2026 #USM3D

04.03.2026 13:19 πŸ‘ 5 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Wow

04.03.2026 13:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

#CVPR2026 One more week to submit your work to the Embedded Vision Workshop @ CVPR! @cvprconference.bsky.social (new deadline: March 11)

Info at: embeddedvisionworkshop.wordpress.com

04.03.2026 12:02 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

AI writing is like store-bought cake. It might be perfectly fine, maybe even as good as something you could make yourself, but it’s weird to give it to someone and say it’s homemade

03.03.2026 19:03 πŸ‘ 523 πŸ” 66 πŸ’¬ 16 πŸ“Œ 8
Preview
Image Matching Challenge 2025 Ongoing Ongoing leaderboard for Image Matching Challenge 2025.

Image Matching Challenge 2026.
It is named "IMC 2025 On-going".
- It will be living for longer than "until next CVPR" - multiyear leaderboard.
- No prize, but invite to talk about solution at CVPR.
- Dataset+metrics same as 2025.
www.kaggle.com/competitions...
#CVPR2026
@cvprconference.bsky.social

02.03.2026 13:37 πŸ‘ 7 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0
Post image

You still have 2 weeks to submit your paper to Image Matching Workshop at #CVPR2026

Deadline: March 16.
Topics: anything related to image matching and 3D reconstruction.
cmt3.research.microsoft.com/IMW2026
@cvprconference.bsky.social

02.03.2026 13:42 πŸ‘ 7 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0

Clarification about dual submissions to @eccv.bsky.social and @cvprconference.bsky.social Findings track.

If you submit the same work to ECCV, please do not opt in to CVPR 2026 Findingsβ€”opting in would make it a dual submission. Opt-in instructions will be sent once the logistics are finalized.

03.03.2026 13:40 πŸ‘ 10 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?

Hongyu Li, Kuan Liu, Yuan Chen, Juntao Hu, Huimin Lu, Guanjie Chen, Xue Liu, Guangming Lu, Hong Huang

tl;dr: Flux and NanoBanana fail at precise color filling.
arxiv.org/abs/2603.00166

03.03.2026 12:22 πŸ‘ 6 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

I am delighted (that pun will make sense in a second) that @alistairfoggin.bsky.social's first paper, CroCoDiLight, has been accepted to ICLR. The idea came from a group discussion on the CroCo paper from @naverlabseurope.bsky.social and realising it might implicitly already understand relighting.

03.03.2026 10:40 πŸ‘ 9 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image Post image Post image Post image

Rethinking Camera Choice: An Empirical Study on Fisheye Camera Properties in Robotic Manipulation

Han Xue, Nan Min, Xiaotong Liu, Wendi Chen, Yuan Fang, Jun Lv, Cewu Lu, Chuan Wen

tl;dr: fisheye cameras are great for robotics, when they can see non-textureless environment.
arxiv.org/abs/2603.02139

03.03.2026 11:04 πŸ‘ 7 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

You still have 2 weeks to submit your paper to Image Matching Workshop at #CVPR2026

Deadline: March 16.
Topics: anything related to image matching and 3D reconstruction.
cmt3.research.microsoft.com/IMW2026
@cvprconference.bsky.social

02.03.2026 13:42 πŸ‘ 7 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0