Congrats! Looks like time is a big failure case for these models (cc @neuralnoise.com @aryopg.bsky.social @rohit-saxena.bsky.social )
bsky.app/profile/emil...
Congrats! Looks like time is a big failure case for these models (cc @neuralnoise.com @aryopg.bsky.social @rohit-saxena.bsky.social )
bsky.app/profile/emil...
Work done with @neuralnoise.com Frank Keller
We tested state-of-the-art multimodal LLMs on this challenging taskβand they struggled! π€π
We also propose a new method:
π₯SEGMENT & SUMMARIZE, a training-free approach that outperforms existing models by:
πΉ Segmenting the poster into logical regions
πΉ Performing local & global summarization
π PosterSum features 16,305 poster-abstract pairs from major ML conferences.
Task: Summarize a research poster image into a concise abstract summary.
Can multimodal LLMs truly understand research poster images?π
π We introduce PosterSumβa new multimodal benchmark for scientific poster summarization!
π Dataset: huggingface.co/datasets/rohitsaxena/PosterSum
π Paper: arxiv.org/abs/2502.17540
πββοΈ
I'd love to be added!
Thanks
Would love to be added!
Hello, can you please add me? Thanks
I'd love to be added!
Thanks