Atharva Sehgal
aseg.bsky.social
Atharva Sehgal
@aseg.bsky.social
PhD student at UT Austin working on program synthesis. Visiting student at Caltech.
How it works:
1️⃣ LLM proposes concepts per class
2️⃣ CLIP-style VLM scores them
3️⃣ Escher spots confused classes
4️⃣ Escher stores this in a history bank
5️⃣ LLM proposes better concepts and stores them → repeat
The loop is self-amplifying: better concepts ➡️ better feedback ➡️ an even better concept library.
June 13, 2025 at 2:44 PM
Our hypothesis: the failure arises from the program synthesizers treating the vision model as a deterministic function. Reality is messy and the VLM outputs are stochastic. The LLMs assumptions of how the VLM will behave and how it actually behaves are decoupled. We need to overcome this decoupling.
June 13, 2025 at 2:44 PM
A visual program decomposes complex perceptual reasoning problems into a logical combination of simpler perceptual tasks that can be solved using off-the-shelf vision foundation models. This provides a modular and robust framework, but finding the correct decomposition is still extremely hard.
June 13, 2025 at 2:44 PM
Reasoning about these images is pretty hard. o3 – even with web access – can’t do this for us out of the box. In such a situation, writing programs provides a mechanism for dividing up a complex reasoning task into solvable subtasks. This motivates most of the visual programming literature.
June 13, 2025 at 2:44 PM
In many vision tasks, perceptual reasoning does not come naturally. Experts still have to deeply study an image, deduce relevant concepts, and reason about them in natural language (www.inaturalist.org/observations...). Our goal is to automate this process – with no human oversight.
June 13, 2025 at 2:44 PM
I’m presenting Escher (trishullab.github.io/escher-web) at #cvpr2025 Saturday morning (Poster Session #3). Escher builds a visual concept library with a vision‑language critic (no human labels needed). Swing by if you’d like to chat about program synthesis & multimodal reasoning!
June 13, 2025 at 2:44 PM
Just julia things.
February 13, 2025 at 5:39 AM
Arya and I'll be at #NeurIPS presenting LaSR (trishullab.github.io/lasr-web/) on Wednesday morning 11AM PST to 2PM PST (East Exhibit Hall A-C #4003). Drop by and say Hi!
December 10, 2024 at 2:04 AM