Aida Nematzadeh
banner
aidanematzadeh.bsky.social
Aida Nematzadeh
@aidanematzadeh.bsky.social
Research scientist at Google DeepMind.🦎
She/her.
http://www.aidanematzadeh.me/
This result is particularly interesting for capabilities that are harder for models (like numerical reasoning or text rendering) as prompt-aware guidance boosts performance on these failure modes without retraining.
September 30, 2025 at 4:00 PM
Also at #ICLR2025: See this work in action! We're demoing "Gecko" showing how we use capability-based evaluators to help users customize evals & select models. 🦎

Find us at the Google booth, Fri 4/26, 12:00-12:30 PM.
April 23, 2025 at 9:23 PM
At #ICLR2025, we're diving into what makes prompt adherence evaluators work for image/video generation.
Check out our poster Friday at 3 PM: iclr.cc/virtual/2025/p… 🦎
https://iclr.cc/virtual/2025/p…
April 23, 2025 at 9:23 PM
The released benchmark can also be used as a counting VQA benchmark and evaluation of auto-metrics.

Paper: arxiv.org/abs/2406.14774
Evaluating Numerical Reasoning in Text-to-Image Models
Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range o...
arxiv.org
December 9, 2024 at 7:08 PM
We design 3 main tasks with varying degrees of difficulty and evaluate 13 models across different families. Models show rudimentary numerical reasoning skills, limited to small numbers and simple prompt formats; many models are affected by non-numerical prompt manipulations.
December 9, 2024 at 7:08 PM