We also develop a benchmark to evaluate spatial understanding of VLM's. The core idea is to use synthetic images which avoids any possibility of test time leakage: arxiv.org/abs/2408.02231
26.11.2024 15:26
๐ 1
๐ 0
๐ฌ 0
๐ 0