Got a few labeled images lying around? You can use them to drastically improve your open-vocabulary segmentation! Check out RnS, which boosts OVS baselines by up to 34%. ππ
Got a few labeled images lying around? You can use them to drastically improve your open-vocabulary segmentation! Check out RnS, which boosts OVS baselines by up to 34%. ππ
Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?
Tilemachos Aravanis @stojnicv.xyz @billpsomas.bsky.social Nikos Komodakis @gtolias.bsky.social
tl;dr: almost yes if use 1-3 images, no if more(fig 6)
arxiv.org/abs/2602.23339
#CVPR2026
Excited to share that our paper "Global-Aware Edge Prioritization for Pose Graph Initialization" has been accepted to CVPR 2026! #CVPR2026 See you soon in Denver!π₯³π₯³ Code is coming soonπ§
βHow would you do an accurate and efficient pose graph initialization in a global manner? arxiv.org/abs/2602.21963
1/n Attention, Please! π
Our work βRevisiting Attentive Probing Through the Lens of Efficiencyβ has been accepted at #ICLR2026.
We introduce Efficient Probing (EP) β a lightweight, multi-query attentive probing method for frozen encoders.
Paper + code at the end π
What if position encodings were designed for vision from scratch? We introduce PaPEβParabolic Position Encoding. Outperforms RoPE on 7/8 datasets and extrapolates to higher resolutions without fine-tuning or position interpolation. Paper, code, and website in thread π§΅
Sorry for that. Should be allowed now
I would like to try it if possible?
I have an opening for a two years post-doc position on instance-level (personalized) visual generation. Eligibility: (i) <=7 years from Ph.D. (ii) studies or 1 year outside of Czechia (ii) >=3 journal with IF or CORE A*/A conference papers. Deadline: 15 Feb.
Details: www.euraxess.cz/jobs/399390
1/n REGLUE Your Latents! π
We introduce REGLUE: a unified framework that entangles VAE latents β Global β Local semantics for faster, higher-fidelity image generation.
Links (paper + code) at the endπ
Announcing the first AI for Peace Workshop @ ICLR 2026. The workshop aims to provide a forum for examining the relationship between AI research and its military, surveillance, and conflict-related applications.
aiforpeaceworkshop.github.io
I guess it is related to this email from few days ago. However, it seems someone forgot to close the access as stated in the email, so we are seeing things as changes are happening.
Are you at BMVC tomorrow and interested in VLMs?
Come see our poster "Image Recognition with Vision and
Language Embeddings of VLMs". ποΈπ
TLDR: We benchmark VLMs on language- and vision-based classification, and propose a simple, training-free vision-language fusion.
Link: arxiv.org/pdf/2509.09311
We have PhD opportunity (start date Sep 2026) at the University of Edinburgh at the intersection of biodiversity mapping and zoonotic disease prediction.
It is part of the UKRI AI Centre for Doctoral Training in Biomedical Innovation based in the School of Informatics:
ai4bi-cdt.ed.ac.uk
23rd of October. @iccv.bsky.social We present our work on βLarge-scale Pretraining for Grounded Video Caption Generationβ with Cordelia Schmid and @josef-sivic.bsky.social at Exhibit Hall I #434 on the morning session. Weβll have a search demo for our dataset as well! See you there!! π
#skyvision
Paper: arxiv.org/abs/2508.10637
Work withΒ @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social
We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.
Are you at @iccv.bsky.social #ICCV2025? Come by our poster.
Β π
October 22, 2025, 14:30 β 16:30 HST
Β π Location: Exhibit Hall I, Poster #207
Paper: arxiv.org/abs/2508.10637
Work withΒ @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social.
We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.
I second the Hyperion
πΊ Just 4 days to go!
Join us in Honolulu for the Instance-Level Recognition and Generation Workshop at #ICCV2025 π
ποΈ Oct 19, 8:30amβ12:30pm π Room 306 A
Weβll have amazing keynotes, plus oral and poster sessions featuring accepted and invited papers.
Donβt miss it!
ilr-workshop.github.io/ICCVW2025/
The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.
cmp.felk.cvut.cz/colloquium/
Super happy that QuARI: Query Adaptive Retrieval Improvement was accepted at #NeurIPS2025. You can significantly boost retrieval performance for very hard retrieval tasks by learning query-specific transformations of your encoders. w/ @jacobsn.bsky.social @pless.bsky.social arxiv.org/pdf/2505.21647
Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...
π new state-of-the-art on ILIAS dataset!
Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.
See the results π vrg.fel.cvut.cz/ilias/
Nice idea for cross-modal retrieval by @gtolias.bsky.social and team arxiv.org/abs/2509.00177
Use the text to generate images using an image generative model, and augment the text query with these. A bit brute-force if you ask me, but effectively captures the visual diversity.
The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
@ryan-ramos.bsky.social @stojnicv.xyz @gkordo.bsky.social Yuta Nakashima @gtolias.bsky.social
@noagarciad.bsky.social
tl;dr: CLIP sees difference DSLR vs iphone, DINO doesn't.
arxiv.org/abs/2508.10637
1/
When it comes to the CVL term we specifically went with it to discriminate CLIP-like VLMs from the VLMs that can generate text as the term VLM is overused and means many different things in different papers. It to an extent also follows the naming from arxiv.org/pdf/2405.17247