Trending
Ryo Kamoi's Avatar

Ryo Kamoi

@ryokamoi

#NLProc PhD Student at Penn State. Prev: MS at UT Austin, BE at Keio Univ, Intern at Microsoft OAR and Amazon Alexa. https://ryokamoi.github.io/

181
Followers
77
Following
11
Posts
26.12.2023
Joined
Posts Following

Latest posts by Ryo Kamoi @ryokamoi

Our paper VisOnlyQA has been accepted to
@colmweb.org #COLM2025! See you in Montreal🍁
We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947
bsky.app/profile/ryok...

13.07.2025 19:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Excited to share that Communications of the ACM featured an article that includes an interview with me about LLM self-correction! I mainly discuss self-correction before o1, but I believe it still offers some takeaways.
cacm.acm.org/news/self-co...
arxiv.org/abs/2406.01297

06.03.2025 14:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

VLMEvalKit now supports our VisOnlyQA dataset πŸ”₯πŸ”₯πŸ”₯
github.com/open-compass...

VisOnlyQA reveals that even recent LVLMs like GPT-4o and Gemini 1.5 Pro stumble on simple visual perception questions, e.g., "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947

06.12.2024 15:38 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information Errors in understanding visual information in images (i.e., visual perception errors) remain a major source of mistakes in Large Vision Language Models (LVLMs). While further analysis is essential, th...

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang

Paper: arxiv.org/abs/2412.00947
Data: huggingface.co/collections/...
Code: github.com/psunlpgroup/...

04.12.2024 19:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).

We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]

04.12.2024 19:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]

04.12.2024 19:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]

04.12.2024 19:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

πŸ“’ New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet...
We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n]
arxiv.org/abs/2412.00947
github.com/psunlpgroup/...

04.12.2024 19:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 2
Post image

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧡

04.12.2024 13:26 πŸ‘ 71 πŸ” 17 πŸ’¬ 3 πŸ“Œ 2
Preview
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction framework...

This reading list is based on our survey paper. Don't forget to check it out as well πŸ˜‰

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
arxiv.org/abs/2406.01297

29.11.2024 21:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - ryokamoi/llm-self-correction-papers: Papers on Self-Correction of LLMs Papers on Self-Correction of LLMs. Contribute to ryokamoi/llm-self-correction-papers development by creating an account on GitHub.

Curious about LLM self-correction? Check out our reading list!
πŸ“š github.com/ryokamoi/llm...

We feature papers & blogs in
* Key self-correction papers
* Negative results in self-correction
* Projects inspired by OpenAI o1

29.11.2024 21:27 πŸ‘ 10 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
NLP at UT Austin Join the conversation

A starter pack for the NLP and Computational Linguistics researchers at UT Austin!
go.bsky.app/75g9JLT

22.11.2024 17:18 πŸ‘ 22 πŸ” 7 πŸ’¬ 0 πŸ“Œ 0
Post image

We at UT Linguistics are hiring for πŸ”₯ 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!

Apply here πŸ‘‰ apply.interfolio.com/158280

19.11.2024 22:38 πŸ‘ 12 πŸ” 7 πŸ’¬ 0 πŸ“Œ 1
Preview
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs Abstract. Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction...

Hello Bluesky. It was great to talk with so many people at
#EMNLP2024!
The paper we presented, a survey paper on self-correction of LLMs, is now on MIT Press!

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
direct.mit.edu/tacl/article...

19.11.2024 15:10 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0