We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes.
🧵1/3
We find leading multimodal LLMs can reliably identify objects, yet hallucinate when reasoning across scenes.
🧵1/3
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and Léon Bottou.
🧵
w/ Jingtong Su, Jianyu Zhang, @karen-ullrich.bsky.social , and Léon Bottou.
🧵
LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.
bsky.app/profile/lavo...
Paper: arxiv.org/abs/2405.00740
Code: github.com/facebookrese...
Models:
- ViT-G: huggingface.co/lavoies/llip...
- ViT-B: huggingface.co/lavoies/llip...
LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.
bsky.app/profile/lavo...
We find frontier reasoning degrades models’ ability to know when NOT to answer.
🧵1/2
We find frontier reasoning degrades models’ ability to know when NOT to answer.
🧵1/2
to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email polkirichenko@meta.com with the title [Prospective Intern 2025] and attach your CV if interested!
to start this summer or fall with a focus on open science into multimodal models, agents and beyond! Email polkirichenko@meta.com with the title [Prospective Intern 2025] and attach your CV if interested!
We emphatically say YES in our #NeurIPS 2024 study! 🧵
w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat
Paper arxiv.org/abs/2406.05183
We emphatically say YES in our #NeurIPS 2024 study! 🧵
w/ Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, and Mike Rabbat
Paper arxiv.org/abs/2406.05183