Our paper VisOnlyQA has been accepted to
@colmweb.org #COLM2025! See you in Montrealπ
We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"π§
arxiv.org/abs/2412.00947
bsky.app/profile/ryok...
13.07.2025 19:05
π 0
π 0
π¬ 0
π 0
Excited to share that Communications of the ACM featured an article that includes an interview with me about LLM self-correction! I mainly discuss self-correction before o1, but I believe it still offers some takeaways.
cacm.acm.org/news/self-co...
arxiv.org/abs/2406.01297
06.03.2025 14:26
π 0
π 0
π¬ 0
π 0
VLMEvalKit now supports our VisOnlyQA dataset π₯π₯π₯
github.com/open-compass...
VisOnlyQA reveals that even recent LVLMs like GPT-4o and Gemini 1.5 Pro stumble on simple visual perception questions, e.g., "What is the degree of angle AOD?"π§
arxiv.org/abs/2412.00947
06.12.2024 15:38
π 2
π 0
π¬ 0
π 0
Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).
We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]
04.12.2024 19:05
π 0
π 0
π¬ 1
π 0
We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]
04.12.2024 19:05
π 0
π 0
π¬ 1
π 0
VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]
04.12.2024 19:05
π 0
π 0
π¬ 1
π 0
π’ New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet...
We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n]
arxiv.org/abs/2412.00947
github.com/psunlpgroup/...
04.12.2024 19:05
π 1
π 0
π¬ 1
π 2
Iβm on the academic job market this year! Iβm completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems π§΅
04.12.2024 13:26
π 71
π 17
π¬ 3
π 2
GitHub - ryokamoi/llm-self-correction-papers: Papers on Self-Correction of LLMs
Papers on Self-Correction of LLMs. Contribute to ryokamoi/llm-self-correction-papers development by creating an account on GitHub.
Curious about LLM self-correction? Check out our reading list!
π github.com/ryokamoi/llm...
We feature papers & blogs in
* Key self-correction papers
* Negative results in self-correction
* Projects inspired by OpenAI o1
29.11.2024 21:27
π 10
π 2
π¬ 1
π 0
NLP at UT Austin
Join the conversation
A starter pack for the NLP and Computational Linguistics researchers at UT Austin!
go.bsky.app/75g9JLT
22.11.2024 17:18
π 22
π 7
π¬ 0
π 0
We at UT Linguistics are hiring for π₯ 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!
Apply here π apply.interfolio.com/158280
19.11.2024 22:38
π 12
π 7
π¬ 0
π 1