Lightnews — Scholar-powered news

Florian Schneider

@floschne-nlp.bsky.social

he/him

3rd and final year PhD Student

Researching on the applications and limitations of multimodal transformer encoder and decoder models.

Posts Replies Media Videos

Reposted by Florian Schneider

Fabian David Schmidt

@fdschmidt.bsky.social

Strong vision-language models (VLMs) like GPT-4o-mini maintain good performance for top-150 languages, only to drop to performing no better than chance for the lowest resource languages!

February 21, 2025 at 7:46 AM

Reposted by Florian Schneider

Fabian David Schmidt

@fdschmidt.bsky.social

X-modal to text-only perf. *gap* shows that VL support decreases from high to low-resource language tiers:

Images/Topic→Sentence (for I/T, pick S): narrows with less textual support (left)
Sentences→Image/Topic (for S, pick I/T): increases with less VL support worse (right)

February 21, 2025 at 7:46 AM

Reposted by Florian Schneider

Fabian David Schmidt

@fdschmidt.bsky.social

Cross-modal topic matching correlates well with other multilingual vision-language tasks!

🤗Images-To-Sentence (given Images, select topically fitting sentence) & Sentences-To-Image (given Sentences, pick topically matching image) probe complementary aspects in VLU

February 21, 2025 at 7:46 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news