6/n ๐ท๐๐๐๐๐๐๐๐๐๐๐
๐บ๐๐๐๐๐๐๐๐๐๐๐
๐ฅค ๐น๐ต๐บ can easily be employed for fine-grained tasks like ๐๐๐๐๐๐๐๐๐๐๐๐
๐๐๐๐๐๐๐๐๐๐๐๐ by simply expanding the support set with a few examples of a specific instance, letting it ๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐ ๐๐๐๐๐
๐๐ ๐๐๐๐๐.
09.03.2026 16:19
๐ 0
๐ 0
๐ฌ 1
๐ 0
5/n ๐ฉ๐๐๐
๐๐๐๐ ๐๐๐ ๐ฎ๐๐
โก ๐น๐ต๐บ improves over different kinds of OVS approaches ๐๐ 14.1% ๐๐ ๐๐๐๐๐๐๐, while maintaining open-vocabulary generalization.
09.03.2026 16:19
๐ 0
๐ 0
๐ฌ 1
๐ 0
4/n ๐ซ๐๐๐๐๐๐ ๐ญ๐๐-๐๐๐๐ ๐บ๐๐๐๐๐๐๐๐
We investigate multiple ๐๐๐-๐๐๐๐ settings where visual or textual information may be missing for some test classes.
๐ We consistently improve respective baselines, making ๐น๐ต๐บ a ๐๐๐๐๐๐๐๐๐ and ๐๐๐๐-๐๐๐๐๐
๐ถ๐ฝ๐บ method.
09.03.2026 16:19
๐ 0
๐ 0
๐ฌ 1
๐ 0
3/n ๐ฏ๐๐ it works?
๐พ ๐น๐ต๐บ stores ๐ฝ๐ณ๐ด ๐๐๐๐๐๐๐๐ from visual and textual examples in a ๐๐๐๐๐๐-๐๐๐๐๐๐๐๐๐ manner.
๐ผ๏ธ At test time, it ๐๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ to train a linear classifier on both modalities.
09.03.2026 16:19
๐ 0
๐ 0
๐ฌ 1
๐ 0
2/n Zero-shot open-vocabulary segmentation (OVS) is significantly underperforming fully supervised.
๐ ๐น๐ต๐บ ๐๐๐๐
๐๐๐ ๐๐๐๐ ๐๐๐ using a few pixel-level annotated visual examples along with class names.
With a few adaptation steps on each test image, we improve zero-shot ๐๐ up to 34% on average.
09.03.2026 16:19
๐ 0
๐ 0
๐ฌ 1
๐ 0
1/n #CVPR2026 Accepted Paper๐
๐จ๐๐ ๐ ๐ญ๐๐ ๐ฌ๐๐๐๐๐๐๐ ๐ฌ๐๐๐๐๐ ๐๐ ๐ฉ๐๐๐
๐๐ ๐๐๐ ๐บ๐๐๐๐๐๐๐๐๐๐ ๐ฎ๐๐ ๐๐ ๐ถ๐๐๐-๐ฝ๐๐๐๐๐๐๐๐๐ ๐บ๐๐๐๐๐๐๐๐๐๐๐?
๐น๐๐๐๐๐๐๐ ๐๐๐
๐บ๐๐๐๐๐๐ (๐น๐ต๐บ) answers this question.
Paper/code at the end๐๐ผ
09.03.2026 16:19
๐ 8
๐ 2
๐ฌ 1
๐ 1
๐๐๐
26.02.2026 23:05
๐ 1
๐ 0
๐ฌ 0
๐ 0
Excited to share that our paper "Global-Aware Edge Prioritization for Pose Graph Initialization" has been accepted to CVPR 2026! #CVPR2026 See you soon in Denver!๐ฅณ๐ฅณ Code is coming soon๐ง
โHow would you do an accurate and efficient pose graph initialization in a global manner? arxiv.org/abs/2602.21963
26.02.2026 15:54
๐ 10
๐ 3
๐ฌ 1
๐ 0
Global-Aware Edge Prioritization for Pose Graph Initialization
@weitong8591.bsky.social, @gtolias.bsky.social, Jiri Matas, @danielbarath.bsky.social
tl;dr: rank pose graph edges->global consistency->improve SfM
arxiv.org/abs/2602.21963
26.02.2026 13:24
๐ 9
๐ 2
๐ฌ 1
๐ 1
Sleeping while waiting on an โanywhere in the worldโ paper decision release. #CVPR2026
20.02.2026 21:21
๐ 17
๐ 1
๐ฌ 2
๐ 0
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
As fine-tuning becomes impractical at scale, probing is emerging as the preferred evaluation protocol. However, standard linear probing can understate the capability of models whose pre-training optim...
8/8 Resources ๐
Paper: arxiv.org/abs/2506.10178
Code: github.com/billpsomas/e...
Joint work with: Dionysis Christopoulos,@eirinibaltzi.bsky.social,@ikakogeorgiou.bsky.social, @tim-arav.bsky.social,Nikos Komodakis,Konstantinos Karantzalos,Yannis Avrithis,@gtolias.bsky.social.
See you @ ICLR 2026๐ง๐ท
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 0
๐ 0
7/n Take-home messages ๐ก
EP:
- Plug-and-play.
- Compatible with all pre-training families.
- Unlocks the potential of encoders optimized for local representations.
- Complementary with PEFT.
- Better to have it, than not to have it. ๐
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
6/n EP + PEFT = ๐ฅ
- EP captures information that LoRA alone does not, and vice versa.
- LoRA+EP improves over both pure EP and pure LoRA.
๐ Example: a LoRA+EP configuration with 250K params reaches 72%, 4.3% above linear probing (67.7%), while using over 3ร fewer parameters.
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
5/n Interpretability ๐
- EP queries specialize in distinct spatial regions.
- Attention maps are complementary.
- Semantic correspondences emerge (e.g. tails, feet).
- Verified quantitatively too.
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
4/n Designed for local representations๐งฉ
๐ Across ImageNet-1K:
- Consistent gains over k-NN and Linear Probing (LP).
- Particularly strong improvements for MIM, VL, and generative.
- Minimal overhead.
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
3/n Core observation โ๏ธ
Prior attentive probing uses redundant projections.
๐ Introducing Efficient Probing (EP):
๐ Multi-query cross-attention.
๐ Plug-and-play on top of frozen encoders.
๐ธ Lightweight and parameter-efficient.
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
2/n Why revisit probing? ๐ค
- Linear probing underestimates encoders optimizing local representations.
- Full fine-tuning is costly at scale.
- Attentive probing helps, yet methods are over-parametrized and not well-studied.
๐ Can we get attention benefits without that much overhead?
20.02.2026 15:03
๐ 0
๐ 0
๐ฌ 1
๐ 0
1/n Attention, Please! ๐
Our work โRevisiting Attentive Probing Through the Lens of Efficiencyโ has been accepted at #ICLR2026.
We introduce Efficient Probing (EP) โ a lightweight, multi-query attentive probing method for frozen encoders.
Paper + code at the end ๐
20.02.2026 15:03
๐ 11
๐ 4
๐ฌ 1
๐ 1
Would love to try
13.01.2026 18:33
๐ 1
๐ 0
๐ฌ 0
๐ 0
Best promo anyone could make for this position ๐๐พ๐ฐ And, amazingly, everything said is true ๐
09.01.2026 05:36
๐ 2
๐ 0
๐ฌ 1
๐ 0
Postdoctoral research position in Instance-level visual generation
Czech Technical University in Prague (CTU) offers a fellowship program, the CTU Global Postdoc Fellowship. This new and attractive two-year fellowship-program offers excellent researchers who have rec...
I have an opening for a two years post-doc position on instance-level (personalized) visual generation. Eligibility: (i) <=7 years from Ph.D. (ii) studies or 1 year outside of Czechia (ii) >=3 journal with IF or CORE A*/A conference papers. Deadline: 15 Feb.
Details: www.euraxess.cz/jobs/399390
08.01.2026 11:11
๐ 12
๐ 10
๐ฌ 2
๐ 1
๐New task: Instance-level Image+TextโImage Retrieval
๐Given a query image + an edit (โduring nightโ), retrieve the same specific instance after the change โ not just any similar object.
๐ขNew dataset on HF: i-CIR huggingface.co/datasets/bil...
๐ฅDownload, run, and share results!
06.01.2026 20:00
๐ 12
๐ 5
๐ฌ 0
๐ 0
12/12 Joint work with Giorgos Petsangourakis, Christos Sgouropoulos, Theodoros Giannakopoulos, Giorgos Sfikas, @ikakogeorgiou.bsky.social.
27.12.2025 10:32
๐ 1
๐ 0
๐ฌ 0
๐ 0
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion
Latent diffusion models (LDMs) achieve state-of-the-art image synthesis, yet their reconstruction-style denoising objective provides only indirect semantic supervision: high-level semantics emerge slo...
11/n Summary๐
REGLUE shows that the way we leverage VFM semantics matters for diffusion. Combining compact local semantics with global context yields faster convergence and state-of-the-art image generation.
๐arXiv: arxiv.org/abs/2512.16636
๐ปProject: reglueyourlatents.github.io
27.12.2025 10:30
๐ 1
๐ 0
๐ฌ 1
๐ 0
10/n Faster convergence๐ฅ
REGLUE (SiT-B/2) achieves 12.9 and 28.7 FID at 400K iterations in conditional and unconditional generation, respectively, outperforming REPA, ReDi, and REG. REGLUE (SiT-XL/2) matches 1M-step SOTA performance in just 700k iterations (~30% fewer steps).
27.12.2025 10:30
๐ 0
๐ 0
๐ฌ 1
๐ 0
9/n Alignment effects โ
External alignment complements joint modeling, but its benefits depend on the signal. Local alignment yields consistent gains, whereas global-only alignment can degrade performance. Spatial joint modeling remains the primary driver.
27.12.2025 10:29
๐ 1
๐ 0
๐ฌ 1
๐ 0
8/n Local > Global Semantics๐งฉ
Our analysis shows that jointly modeling with patch-level semantics drives most gains. The global [CLS] helps, but fine-grained spatial features deliver a strongly larger FID improvement, highlighting the importance of local structure for diffusion.
27.12.2025 10:29
๐ 0
๐ 0
๐ฌ 1
๐ 0