Agneet Chatterjee (@agneet)

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Text-to-Image (T2I) and multimodal large language models (MLLMs) have been adopted in solutions for several computer vision and multimodal learning tasks. However, it has been found that such vision-l...

We also develop a benchmark to evaluate spatial understanding of VLM's. The core idea is to use synthetic images which avoids any possibility of test time leakage: arxiv.org/abs/2408.02231

26.11.2024 15:26 👍 1 🔁 0 💬 0 📌 0

@csprofkgd.bsky.social could you add me too? Thank you!

24.11.2024 21:11 👍 1 🔁 0 💬 1 📌 0

Agneet Chatterjee

Latest posts by Agneet Chatterjee @agneet