Dhruv Batra
banner
dhruvbatra.bsky.social
Dhruv Batra
@dhruvbatra.bsky.social
Co-founder & Chief Scientist at Yutori. Prev: Senior Director leading FAIR Embodied AI at Meta, and Professor at Georgia Tech.
Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.

Open: spatial reasoning, data-efficiency, learning compatible representations.
October 23, 2025 at 5:18 PM
As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.
October 23, 2025 at 5:18 PM
Fun-fact: the T-shirt I'm wearing is an inside joke about the quality of 2015 models.

However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.

x.com/DhruvBatra_/...
October 21, 2025 at 7:27 PM
VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.

It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.

When we started, the idea of answering any question about any image seemed outlandish.
October 21, 2025 at 7:27 PM
I started something new last year with a wonderful group of people. We showed a demo in Jan.

Today, we’re telling our story — show before you talk!

𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.

yutori.com
March 27, 2025 at 2:31 PM
Using a locally-running LLM to translate a review is explicitly prohibited by @iccv.bsky.social

Why? Whom does this possibly harm?
March 6, 2025 at 6:10 PM

Brilliant talk by Ilya, but he's wrong on one point.

We are NOT running out of data. We are running out of human-written text.

We have more videos than we know what to do with. We just haven't solved pre-training in vision.

Just go out and sense the world. Data is easy.
December 14, 2024 at 7:15 PM