Lightnews — Scholar-powered news

Ryo Kamoi

@ryokamoi.bsky.social

#NLProc PhD Student at Penn State. Prev: MS at UT Austin, BE at Keio Univ, Intern at Microsoft OAR and Amazon Alexa.
https://ryokamoi.github.io/

Posts Replies Media Videos

Ryo Kamoi

@ryokamoi.bsky.social

Our paper VisOnlyQA has been accepted to
@colmweb.org #COLM2025! See you in Montreal🍁
We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947
bsky.app/profile/ryok...

July 13, 2025 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

VLMEvalKit now supports our VisOnlyQA dataset 🔥🔥🔥
github.com/open-compass...

VisOnlyQA reveals that even recent LVLMs like GPT-4o and Gemini 1.5 Pro stumble on simple visual perception questions, e.g., "What is the degree of angle AOD?"🧐
arxiv.org/abs/2412.00947

December 6, 2024 at 3:38 PM

Ryo Kamoi

@ryokamoi.bsky.social

Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).

We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

📢 New preprint! Do LVLMs have strong visual perception capabilities? Not quite yet...
We introduce VisOnlyQA, a new dataset for evaluating the visual perception of LVLMs, but existing LVLMs perform poorly on our dataset. [1/n]
arxiv.org/abs/2412.00947
github.com/psunlpgroup/...

December 4, 2024 at 7:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news