Lightnews — Scholar-powered news

Aida Nematzadeh

@aidanematzadeh.bsky.social

170 followers 35 following 10 posts

Research scientist at Google DeepMind.🦎
She/her.
http://www.aidanematzadeh.me/

Posts Replies Media Videos

Aida Nematzadeh

@aidanematzadeh.bsky.social

This result is particularly interesting for capabilities that are harder for models (like numerical reasoning or text rendering) as prompt-aware guidance boosts performance on these failure modes without retraining.

September 30, 2025 at 4:00 PM

Aida Nematzadeh

@aidanematzadeh.bsky.social

Also at #ICLR2025: See this work in action! We're demoing "Gecko" showing how we use capability-based evaluators to help users customize evals & select models. 🦎

Find us at the Google booth, Fri 4/26, 12:00-12:30 PM.

April 23, 2025 at 9:23 PM

Aida Nematzadeh

@aidanematzadeh.bsky.social

At #ICLR2025, we're diving into what makes prompt adherence evaluators work for image/video generation.
Check out our poster Friday at 3 PM: iclr.cc/virtual/2025/p… 🦎

https://iclr.cc/virtual/2025/p…

April 23, 2025 at 9:23 PM

Aida Nematzadeh

@aidanematzadeh.bsky.social

The released benchmark can also be used as a counting VQA benchmark and evaluation of auto-metrics.

Paper: arxiv.org/abs/2406.14774

Evaluating Numerical Reasoning in Text-to-Image Models

Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range o...

arxiv.org

December 9, 2024 at 7:08 PM

Aida Nematzadeh

@aidanematzadeh.bsky.social

We design 3 main tasks with varying degrees of difficulty and evaluate 13 models across different families. Models show rudimentary numerical reasoning skills, limited to small numbers and simple prompt formats; many models are affected by non-numerical prompt manipulations.

December 9, 2024 at 7:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news