Vladan Stojnić's Avatar

Vladan Stojnić

@stojnicv.xyz

Ph.D. student at Visual Recognition Group, Czech Technical University in Prague πŸ”— https://stojnicv.xyz

650
Followers
238
Following
47
Posts
29.06.2023
Joined
Posts Following

Latest posts by Vladan Stojnić @stojnicv.xyz

Got a few labeled images lying around? You can use them to drastically improve your open-vocabulary segmentation! Check out RnS, which boosts OVS baselines by up to 34%. πŸ“ˆπŸ‘‡

09.03.2026 16:32 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Retrieve and Segment: Are a Few Examples Enough to Bridge the Supervision Gap in Open-Vocabulary Segmentation?

Tilemachos Aravanis @stojnicv.xyz @billpsomas.bsky.social Nikos Komodakis @gtolias.bsky.social

tl;dr: almost yes if use 1-3 images, no if more(fig 6)
arxiv.org/abs/2602.23339
#CVPR2026

27.02.2026 16:17 πŸ‘ 7 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Excited to share that our paper "Global-Aware Edge Prioritization for Pose Graph Initialization" has been accepted to CVPR 2026! #CVPR2026 See you soon in Denver!πŸ₯³πŸ₯³ Code is coming soon🚧
❓How would you do an accurate and efficient pose graph initialization in a global manner? arxiv.org/abs/2602.21963

26.02.2026 15:54 πŸ‘ 10 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Post image

1/n Attention, Please! πŸš€

Our work β€œRevisiting Attentive Probing Through the Lens of Efficiency” has been accepted at #ICLR2026.

We introduce Efficient Probing (EP) β€” a lightweight, multi-query attentive probing method for frozen encoders.

Paper + code at the end πŸ‘‡

20.02.2026 15:03 πŸ‘ 11 πŸ” 4 πŸ’¬ 1 πŸ“Œ 1

What if position encodings were designed for vision from scratch? We introduce PaPEβ€”Parabolic Position Encoding. Outperforms RoPE on 7/8 datasets and extrapolates to higher resolutions without fine-tuning or position interpolation. Paper, code, and website in thread 🧡

04.02.2026 08:22 πŸ‘ 36 πŸ” 7 πŸ’¬ 3 πŸ“Œ 0

Sorry for that. Should be allowed now

13.01.2026 21:09 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I would like to try it if possible?

13.01.2026 15:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Postdoctoral research position in Instance-level visual generation Czech Technical University in Prague (CTU) offers a fellowship program, the CTU Global Postdoc Fellowship. This new and attractive two-year fellowship-program offers excellent researchers who have rec...

I have an opening for a two years post-doc position on instance-level (personalized) visual generation. Eligibility: (i) <=7 years from Ph.D. (ii) studies or 1 year outside of Czechia (ii) >=3 journal with IF or CORE A*/A conference papers. Deadline: 15 Feb.
Details: www.euraxess.cz/jobs/399390

08.01.2026 11:11 πŸ‘ 12 πŸ” 10 πŸ’¬ 2 πŸ“Œ 1
Post image

1/n REGLUE Your Latents! πŸš€

We introduce REGLUE: a unified framework that entangles VAE latents βž• Global βž• Local semantics for faster, higher-fidelity image generation.

Links (paper + code) at the endπŸ‘‡

27.12.2025 10:26 πŸ‘ 14 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Home Building Bridges, Not Weapons: AI for Peaceful Progress

Announcing the first AI for Peace Workshop @ ICLR 2026. The workshop aims to provide a forum for examining the relationship between AI research and its military, surveillance, and conflict-related applications.
aiforpeaceworkshop.github.io

16.12.2025 09:27 πŸ‘ 32 πŸ” 11 πŸ’¬ 2 πŸ“Œ 9
Post image

I guess it is related to this email from few days ago. However, it seems someone forgot to close the access as stated in the email, so we are seeing things as changes are happening.

09.12.2025 14:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Are you at BMVC tomorrow and interested in VLMs?
Come see our poster "Image Recognition with Vision and
Language Embeddings of VLMs". πŸ‘οΈπŸ“•

TLDR: We benchmark VLMs on language- and vision-based classification, and propose a simple, training-free vision-language fusion.
Link: arxiv.org/pdf/2509.09311

23.11.2025 18:49 πŸ‘ 3 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Post image

We have PhD opportunity (start date Sep 2026) at the University of Edinburgh at the intersection of biodiversity mapping and zoonotic disease prediction.

It is part of the UKRI AI Centre for Doctoral Training in Biomedical Innovation based in the School of Informatics:
ai4bi-cdt.ed.ac.uk

21.11.2025 13:48 πŸ‘ 5 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

23rd of October. @iccv.bsky.social We present our work on β€œLarge-scale Pretraining for Grounded Video Caption Generation” with Cordelia Schmid and @josef-sivic.bsky.social at Exhibit Hall I #434 on the morning session. We’ll have a search demo for our dataset as well! See you there!! πŸš€

23.10.2025 09:12 πŸ‘ 12 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1

#skyvision

21.10.2025 18:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Processing and acquisition traces in visual encoders: What does CLIP know about your camera? Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

Paper: arxiv.org/abs/2508.10637

Work withΒ @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social

21.10.2025 18:15 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.

21.10.2025 18:15 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Are you at @iccv.bsky.social #ICCV2025? Come by our poster.

Β πŸ“… October 22, 2025, 14:30 – 16:30 HST
Β πŸ“ Location: Exhibit Hall I, Poster #207

21.10.2025 18:15 πŸ‘ 12 πŸ” 5 πŸ’¬ 2 πŸ“Œ 2
Preview
Processing and acquisition traces in visual encoders: What does CLIP know about your camera? Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they in...

Paper: arxiv.org/abs/2508.10637

Work withΒ @ryan-ramos.bsky.social, @gkordo.bsky.social, Yuta Nakashima, @gtolias.bsky.social, and @noagarciad.bsky.social.

21.10.2025 18:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

We show that representations from some foundation models, especially CVLs like CLIP, encode information about image metadata. More surprisingly we show that such metadata traces can even affect the performance on semantic downstream tasks.

21.10.2025 18:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I second the Hyperion

20.10.2025 08:46 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🌺 Just 4 days to go!
Join us in Honolulu for the Instance-Level Recognition and Generation Workshop at #ICCV2025 🏝
πŸ—“οΈ Oct 19, 8:30am–12:30pm πŸ“ Room 306 A

We’ll have amazing keynotes, plus oral and poster sessions featuring accepted and invited papers.
Don’t miss it!
ilr-workshop.github.io/ICCVW2025/

15.10.2025 15:54 πŸ‘ 9 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1
Post image

The Visual Recognition Group at CTU in Prague organizes the 50th Pattern Recognition and Computer Vision Colloquium with
Torsten Sattler, Paul-Edouard Sarlin, Vicky Kalogeiton, Spyros Gidaris, Anna Kukleva, and Lukas Neumann.
On Thursday Oct 9, 11:00-17:00.

cmp.felk.cvut.cz/colloquium/

06.10.2025 15:13 πŸ‘ 25 πŸ” 6 πŸ’¬ 2 πŸ“Œ 4
Post image

Super happy that QuARI: Query Adaptive Retrieval Improvement was accepted at #NeurIPS2025. You can significantly boost retrieval performance for very hard retrieval tasks by learning query-specific transformations of your encoders. w/ @jacobsn.bsky.social @pless.bsky.social arxiv.org/pdf/2505.21647

18.09.2025 18:55 πŸ‘ 20 πŸ” 4 πŸ’¬ 1 πŸ“Œ 2

Crash test your foundational models for object recognition at its finest granularity. Here are the updated results on our instance-level image retrieval benchmark (ILIAS-CVPR'25). DINOv3 and Perception Encoder (PE) are included, with DINOv3 being the new SoA! Oh, but no, look at this...

08.09.2025 13:57 πŸ‘ 12 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Post image

πŸš€ new state-of-the-art on ILIAS dataset!

Curious how well the latest models can recognize particular objects?
We evaluated the base and large variants of DINOv3 and Perception Encoder (PE) on instance-level image retrieval.

See the results πŸ‘‰ vrg.fel.cvut.cz/ilias/

05.09.2025 14:18 πŸ‘ 13 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1
Preview
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders This work explores text-to-image retrieval for queries that specify or describe a semantic category. While vision-and-language models (VLMs) like CLIP offer a straightforward open-vocabulary solution,...

Nice idea for cross-modal retrieval by @gtolias.bsky.social and team arxiv.org/abs/2509.00177
Use the text to generate images using an image generative model, and augment the text query with these. A bit brute-force if you ask me, but effectively captures the visual diversity.

03.09.2025 07:08 πŸ‘ 6 πŸ” 1 πŸ’¬ 2 πŸ“Œ 1
Preview
Pattern Recognition and Computer Vision Colloquium - past speakers

The Colloquium in Pattern Recognition and Computer Vision of the Visual Recognition Group at CTU in Prague has a long tradition dating back to 1998. The list of all speakers is available docs.google.com/spreadsheets.... Enjoy! The 50th edition is coming soon cmp.felk.cvut.cz/colloquium/

01.09.2025 12:55 πŸ‘ 17 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Post image Post image Post image Post image

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

@ryan-ramos.bsky.social @stojnicv.xyz @gkordo.bsky.social Yuta Nakashima @gtolias.bsky.social
@noagarciad.bsky.social
tl;dr: CLIP sees difference DSLR vs iphone, DINO doesn't.
arxiv.org/abs/2508.10637
1/

25.08.2025 12:04 πŸ‘ 15 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0

When it comes to the CVL term we specifically went with it to discriminate CLIP-like VLMs from the VLMs that can generate text as the term VLM is overused and means many different things in different papers. It to an extent also follows the naming from arxiv.org/pdf/2405.17247

18.08.2025 15:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0