Lightnews — Scholar-powered news

Atharva Sehgal

@aseg.bsky.social

73 followers 420 following 12 posts

PhD student at UT Austin working on program synthesis. Visiting student at Caltech.

Posts Replies Media Videos

Atharva Sehgal

@aseg.bsky.social

Check out the full paper for the mathematical formulation, experiments, and our methodology: arxiv.org/abs/2504.00185 Code and other artifacts are available here: trishullab.github.io/escher-web/ Thank you for following along!

Self-Evolving Visual Concept Library using Vision-Language Critics

We study the problem of building a visual concept library for visual recognition. Building effective visual concept libraries is challenging, as manual definition is labor-intensive, while relying sol...

arxiv.org

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

How it works:
1️⃣ LLM proposes concepts per class
2️⃣ CLIP-style VLM scores them
3️⃣ Escher spots confused classes
4️⃣ Escher stores this in a history bank
5️⃣ LLM proposes better concepts and stores them → repeat
The loop is self-amplifying: better concepts ➡️ better feedback ➡️ an even better concept library.

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

Escher solves this problem using feedback from a vision language model to improve the reasoning, specifically for fine-grained image classification.

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

Our hypothesis: the failure arises from the program synthesizers treating the vision model as a deterministic function. Reality is messy and the VLM outputs are stochastic. The LLMs assumptions of how the VLM will behave and how it actually behaves are decoupled. We need to overcome this decoupling.

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

A visual program decomposes complex perceptual reasoning problems into a logical combination of simpler perceptual tasks that can be solved using off-the-shelf vision foundation models. This provides a modular and robust framework, but finding the correct decomposition is still extremely hard.

Even with visual programming, the LLM proposing the program has no idea about the execution semantics of the underlying VLM. Things still don't work.

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

Reasoning about these images is pretty hard. o3 – even with web access – can’t do this for us out of the box. In such a situation, writing programs provides a mechanism for dividing up a complex reasoning task into solvable subtasks. This motivates most of the visual programming literature.

gpt-o3, which has probably seen this image before, reasons incorrectly about the type of lizard and gets it wrong. Visual feedback is extremely important here!

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

In many vision tasks, perceptual reasoning does not come naturally. Experts still have to deeply study an image, deduce relevant concepts, and reason about them in natural language (www.inaturalist.org/observations...). Our goal is to automate this process – with no human oversight.

An example from inaturalist of two scientist deliberating how to classify a rare lizard. The first scientists gets it wrong because they aren't trained as a herpetologist. The second scientist is a trained herpetologist, and reasons in natural language how to correctly identify the image.

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

Massive thanks to my co-authors Patrick Yuan, Ziniu Hu, @yisongyue.bsky.social, Jennifer J. Sun & @swarat.bsky.social for making this possible!

June 13, 2025 at 2:44 PM

Atharva Sehgal

@aseg.bsky.social

Check out the full paper for the mathematical formulation, llm scaling law experiments, and our methodology: arxiv.org/abs/2409.09359

More context here: x.com/atharva_sehg...

Thank you to all my coauthors: Arya, Omar, @milescranmer.bsky.social, and @swarat.bsky.social!

x.com

December 10, 2024 at 2:10 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news