Lightnews — Scholar-powered news

Ryo Kamoi

@ryokamoi.bsky.social

#NLProc PhD Student at Penn State. Prev: MS at UT Austin, BE at Keio Univ, Intern at Microsoft OAR and Amazon Alexa.
https://ryokamoi.github.io/

Posts Replies Media Videos

Ryo Kamoi

@ryokamoi.bsky.social

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang

Paper: arxiv.org/abs/2412.00947
Data: huggingface.co/collections/...
Code: github.com/psunlpgroup/...

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information

Errors in understanding visual information in images (i.e., visual perception errors) remain a major source of mistakes in Large Vision Language Models (LVLMs). While further analysis is essential, th...

arxiv.org

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

Interestingly, our experiments suggest that stronger language models improve visual perception of LVLMs, even when using the same visual encoders (ViT).

We conclude that we need to improve both the training data and model architecture of LVLMs for better visual perception. [4/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

We hypothesize that the weak visual perception is due to the lack of training data. To verify this, we make training data for VisOnlyQA, but we observe that the performance after fine-tuning depends on tasks and models, suggesting that training data is not the only problem. [3/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

VisOnlyQA includes questions about geometric and numerical information on scientific figures.
Recent benchmarks for LVLMs often involve reasoning or knowledge, putting less focus on visual perception. In contrast, VisOnlyQA is designed to evaluate visual perception directly [2/n]

December 4, 2024 at 7:05 PM

Ryo Kamoi

@ryokamoi.bsky.social

This reading list is based on our survey paper. Don't forget to check it out as well 😉

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs (TACL 2024)
arxiv.org/abs/2406.01297

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction framework...

arxiv.org

November 29, 2024 at 9:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news