Florian Schneider
floschne-nlp.bsky.social
Florian Schneider
@floschne-nlp.bsky.social
he/him

3rd and final year PhD Student

Researching on the applications and limitations of multimodal transformer encoder and decoder models.
Reposted by Florian Schneider
😩Tired of the latest ARR cycle? @aclmeeting.bsky.social
Join us for #KnitTogether25 ⛰️✨ – a week to focus on “Bias and Social Factors in NLP”!
Lets explore and exchange research - together. But we also cook and explore nature - together. 🚀💡
🔗 knittogether.github.io/kt25
#NLPProc
Knit Togehter 25
knittogether.github.io
February 18, 2025 at 4:43 PM
Reposted by Florian Schneider
Introducing MVL-SIB, a massively multilingual vision-language benchmark for cross-modal topic matching in 205 languages!

🤔Tasks: Given images (sentences), select topically matching sentence (image).

Arxiv: arxiv.org/abs/2502.12852
HF: huggingface.co/datasets/Wue...

Details👇
February 21, 2025 at 7:46 AM
Reposted by Florian Schneider
Strong vision-language models (VLMs) like GPT-4o-mini maintain good performance for top-150 languages, only to drop to performing no better than chance for the lowest resource languages!
February 21, 2025 at 7:46 AM
Reposted by Florian Schneider
X-modal to text-only perf. *gap* shows that VL support decreases from high to low-resource language tiers:

Images/Topic→Sentence (for I/T, pick S): narrows with less textual support (left)
Sentences→Image/Topic (for S, pick I/T): increases with less VL support worse (right)
February 21, 2025 at 7:46 AM
Reposted by Florian Schneider
Cross-modal topic matching correlates well with other multilingual vision-language tasks!

🤗Images-To-Sentence (given Images, select topically fitting sentence) & Sentences-To-Image (given Sentences, pick topically matching image) probe complementary aspects in VLU
February 21, 2025 at 7:46 AM