Lightnews — Scholar-powered news

Dhruv Batra

@dhruvbatra.bsky.social

Co-founder & Chief Scientist at Yutori. Prev: Senior Director leading FAIR Embodied AI at Meta, and Professor at Georgia Tech.

Posts Replies Media Videos

Dhruv Batra

@dhruvbatra.bsky.social

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.

Open: spatial reasoning, data-efficiency, learning compatible representations.

October 23, 2025 at 5:18 PM

Dhruv Batra

@dhruvbatra.bsky.social

As part of the award ceremony, VQA team presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.

October 23, 2025 at 5:18 PM

Dhruv Batra

@dhruvbatra.bsky.social

Fun-fact: the T-shirt I'm wearing is an inside joke about the quality of 2015 models.

However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.

x.com/DhruvBatra_/...

October 21, 2025 at 7:27 PM

Dhruv Batra

@dhruvbatra.bsky.social

VQA challenge series won the Mark Everingham prize at #ICCV2025 for stimulating a new strand of vision-and-language research.

It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.

When we started, the idea of answering any question about any image seemed outlandish.

October 21, 2025 at 7:27 PM

Dhruv Batra

@dhruvbatra.bsky.social

I started something new last year with a wonderful group of people. We showed a demo in Jan.

Today, we’re telling our story — show before you talk!

𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.

yutori.com

March 27, 2025 at 2:31 PM

Dhruv Batra

@dhruvbatra.bsky.social

Using a locally-running LLM to translate a review is explicitly prohibited by @iccv.bsky.social

Why? Whom does this possibly harm?

March 6, 2025 at 6:10 PM

Dhruv Batra

@dhruvbatra.bsky.social

Brilliant talk by Ilya, but he's wrong on one point.

We are NOT running out of data. We are running out of human-written text.

We have more videos than we know what to do with. We just haven't solved pre-training in vision.

Just go out and sense the world. Data is easy.

December 14, 2024 at 7:15 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news