Lightnews — Scholar-powered news

Fabian David Schmidt

@fdschmidt.bsky.social

PhD candidate at Uni of Würzburg working on multilinguality & multimodality | prev. visited visit Mila & LTL@UniCambridge

https://fdschmidt93.github.io

Posts Replies Media Videos

Fabian David Schmidt

@fdschmidt.bsky.social

Joint work with Florian Schneider, Chris Biemann, and @gglavas.bsky.social

My first paper on multilingual vision-language, and couldn't be happier how this work turned out!🙂

February 21, 2025 at 7:46 AM

Fabian David Schmidt

@fdschmidt.bsky.social

Cross-modal topic matching correlates well with other multilingual vision-language tasks!

🤗Images-To-Sentence (given Images, select topically fitting sentence) & Sentences-To-Image (given Sentences, pick topically matching image) probe complementary aspects in VLU

February 21, 2025 at 7:46 AM

Fabian David Schmidt

@fdschmidt.bsky.social

X-modal to text-only perf. *gap* shows that VL support decreases from high to low-resource language tiers:

Images/Topic→Sentence (for I/T, pick S): narrows with less textual support (left)
Sentences→Image/Topic (for S, pick I/T): increases with less VL support worse (right)

February 21, 2025 at 7:46 AM

Fabian David Schmidt

@fdschmidt.bsky.social

Strong vision-language models (VLMs) like GPT-4o-mini maintain good performance for top-150 languages, only to drop to performing no better than chance for the lowest resource languages!

February 21, 2025 at 7:46 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news