Fabian David Schmidt
fdschmidt.bsky.social
Fabian David Schmidt
@fdschmidt.bsky.social
PhD candidate at Uni of Würzburg working on multilinguality & multimodality | prev. visited visit Mila & LTL@UniCambridge

https://fdschmidt93.github.io
Joint work with Florian Schneider, Chris Biemann, and @gglavas.bsky.social

My first paper on multilingual vision-language, and couldn't be happier how this work turned out!🙂
February 21, 2025 at 7:46 AM
Cross-modal topic matching correlates well with other multilingual vision-language tasks!

🤗Images-To-Sentence (given Images, select topically fitting sentence) & Sentences-To-Image (given Sentences, pick topically matching image) probe complementary aspects in VLU
February 21, 2025 at 7:46 AM
X-modal to text-only perf. *gap* shows that VL support decreases from high to low-resource language tiers:

Images/Topic→Sentence (for I/T, pick S): narrows with less textual support (left)
Sentences→Image/Topic (for S, pick I/T): increases with less VL support worse (right)
February 21, 2025 at 7:46 AM
Strong vision-language models (VLMs) like GPT-4o-mini maintain good performance for top-150 languages, only to drop to performing no better than chance for the lowest resource languages!
February 21, 2025 at 7:46 AM