Dhruv Batra
@dhruvbatra.bsky.social
Co-founder & Chief Scientist at Yutori. Prev: Senior Director leading FAIR Embodied AI at Meta, and Professor at Georgia Tech.
As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.
October 23, 2025 at 5:18 PM
As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.
VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
October 21, 2025 at 7:27 PM
VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
The problem with “AI slop” isn’t the AI — it’s the slop.
People act like AI is the issue, when it’s actually part of the fix.
If we're honest: most of what we make, most of the time, is slop by our own standards.
That’s the generator–discriminator gap in creative work that Ira Glass talks about.
People act like AI is the issue, when it’s actually part of the fix.
If we're honest: most of what we make, most of the time, is slop by our own standards.
That’s the generator–discriminator gap in creative work that Ira Glass talks about.
October 15, 2025 at 4:22 PM
The problem with “AI slop” isn’t the AI — it’s the slop.
People act like AI is the issue, when it’s actually part of the fix.
If we're honest: most of what we make, most of the time, is slop by our own standards.
That’s the generator–discriminator gap in creative work that Ira Glass talks about.
People act like AI is the issue, when it’s actually part of the fix.
If we're honest: most of what we make, most of the time, is slop by our own standards.
That’s the generator–discriminator gap in creative work that Ira Glass talks about.
It is so refreshing to see conferences innovate on the reviewing model and run actual experiments (!) as opposed to fighting change.
For #ICLR2025, we piloted an LLM that provided optional feedback to some reviewers. Results are promising: over 12K suggestions were incorporated by reviewers to improve review quality. See our blog post for details and more analysis blog.iclr.cc/2025/04/15/l...
Leveraging LLM feedback to enhance review quality – ICLR Blog
blog.iclr.cc
April 16, 2025 at 4:43 AM
It is so refreshing to see conferences innovate on the reviewing model and run actual experiments (!) as opposed to fighting change.
My entire robotics career has led to this.
April 1, 2025 at 4:05 PM
My entire robotics career has led to this.
The answer to many "why X?" questions:
Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.
Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.
March 28, 2025 at 3:43 PM
The answer to many "why X?" questions:
Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.
Because the laws of physics do not prohibit X and the forces of biology gave us curiosity.
I started something new last year with a wonderful group of people. We showed a demo in Jan.
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
March 27, 2025 at 2:31 PM
I started something new last year with a wonderful group of people. We showed a demo in Jan.
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
Reposted by Dhruv Batra
📢Excited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) @CVPR 2025!
🌐 sites.google.com/view/vlms4all
🌐 sites.google.com/view/vlms4all
March 14, 2025 at 3:55 PM
📢Excited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) @CVPR 2025!
🌐 sites.google.com/view/vlms4all
🌐 sites.google.com/view/vlms4all
Using a locally-running LLM to translate a review is explicitly prohibited by @iccv.bsky.social
Why? Whom does this possibly harm?
Why? Whom does this possibly harm?
March 6, 2025 at 6:10 PM
Using a locally-running LLM to translate a review is explicitly prohibited by @iccv.bsky.social
Why? Whom does this possibly harm?
Why? Whom does this possibly harm?
Brilliant talk by Ilya, but he's wrong on one point.
We are NOT running out of data. We are running out of human-written text.
We have more videos than we know what to do with. We just haven't solved pre-training in vision.
Just go out and sense the world. Data is easy.
December 14, 2024 at 7:15 PM
Brilliant talk by Ilya, but he's wrong on one point.
We are NOT running out of data. We are running out of human-written text.
We have more videos than we know what to do with. We just haven't solved pre-training in vision.
Just go out and sense the world. Data is easy.
3.2 —> 3.3
See, model naming isn't that hard.
See, model naming isn't that hard.
December 7, 2024 at 2:31 AM
3.2 —> 3.3
See, model naming isn't that hard.
See, model naming isn't that hard.
Looking forward to #NeurIPS2024 next week!
If you work in digital or physical AI agents, I'm scheduling chats (Dec 9-12). DMs open.
If you work in digital or physical AI agents, I'm scheduling chats (Dec 9-12). DMs open.
December 6, 2024 at 7:52 PM
Looking forward to #NeurIPS2024 next week!
If you work in digital or physical AI agents, I'm scheduling chats (Dec 9-12). DMs open.
If you work in digital or physical AI agents, I'm scheduling chats (Dec 9-12). DMs open.
Does the term "LLM" mean:
— a language model in the technical sense
— a "modern" AI system
— an auto-regressive symbol-sequence models, built with transformers, trained with SGD and self-supervised learning
— something else?
dhruvbatra.substack.com/p/the-term-l...
— a language model in the technical sense
— a "modern" AI system
— an auto-regressive symbol-sequence models, built with transformers, trained with SGD and self-supervised learning
— something else?
dhruvbatra.substack.com/p/the-term-l...
The term “LLM” is a misnomer.
Sometime last year, I noticed AI-adjacent (or “AI curious”) folks using the term “LLM” in odd ways:
dhruvbatra.substack.com
December 4, 2024 at 8:01 PM
Does the term "LLM" mean:
— a language model in the technical sense
— a "modern" AI system
— an auto-regressive symbol-sequence models, built with transformers, trained with SGD and self-supervised learning
— something else?
dhruvbatra.substack.com/p/the-term-l...
— a language model in the technical sense
— a "modern" AI system
— an auto-regressive symbol-sequence models, built with transformers, trained with SGD and self-supervised learning
— something else?
dhruvbatra.substack.com/p/the-term-l...