Hanna Wallach
banner
hannawallach.bsky.social
Hanna Wallach
@hannawallach.bsky.social
VP and Distinguished Scientist at Microsoft Research NYC. AI evaluation and measurement, responsible AI, computational social science, machine learning. She/her.

One photo a day since January 2018: https://www.instagram.com/logisticaggression/
Alright, it's that time of year: Who all is going to
@neuripsconf.bsky.social this year??? #NeurIPS2025 🤖☃️
November 24, 2025 at 3:34 PM
Three exciting opportunities at
@msftresearch.bsky.social in NYC!!! 🎉

Internship w/ FATE: apply.careers.microsoft.com/careers/job?...

Internship w/ STAC on AI evaluation and measurement: apply.careers.microsoft.com/careers/job?...

Postdoc w/ FATE: apply.careers.microsoft.com/careers/job?...
November 24, 2025 at 3:33 PM
This is happening now!!!
If you're at @icmlconf.bsky.social this week, come check out our poster on "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" presented by the amazing @afedercooper.bsky.social from 11:30am--1:30pm PDT on Weds!!! icml.cc/virtual/2025...
ICML Poster Position: Evaluating Generative AI Systems Is a Social Science Measurement ChallengeICML 2025
icml.cc
July 16, 2025 at 6:33 PM
Reposted by Hanna Wallach
1) (Tomorrow!) Wed 7/16, 11am-1:30 pm PT poster for "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" (E. Exhibition Hall A-B, E-503)

Work led by @hannawallach.bsky.social + @azjacobs.bsky.social

arxiv.org/abs/2502.00561
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge
The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges com...
arxiv.org
July 16, 2025 at 12:46 AM
If you're at @icmlconf.bsky.social this week, come check out our poster on "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" presented by the amazing @afedercooper.bsky.social from 11:30am--1:30pm PDT on Weds!!! icml.cc/virtual/2025...
ICML Poster Position: Evaluating Generative AI Systems Is a Social Science Measurement ChallengeICML 2025
icml.cc
July 15, 2025 at 6:35 PM
Generative language systems are everywhere, and many of them stereotype, demean, or erase particular social groups.
June 16, 2025 at 9:49 PM
Alright, people, let's be honest: GenAI systems are everywhere, and figuring out whether they're any good is a total mess. Should we use them? Where? How? Do they need a total overhaul?

(1/6)
June 15, 2025 at 12:20 AM
I'm so excited this paper is finally online!!! 🎉 We had so much fun working on this with @emmharv.bsky.social!!! Thread below summarizing our contributions...
📣 "Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems" is forthcoming at #ACL2025NLP - and you can read it now on arXiv!

🔗: arxiv.org/pdf/2506.04482
🧵: ⬇️
June 10, 2025 at 7:12 PM
Exciting news: The Fairness, Accountability, Transparency and Ethics (FATE) group at Microsoft Research NYC is hiring a predoctoral fellow!!! 🎉

www.microsoft.com/en-us/resear...
FATE Research Assistant (“Pre-doc”) - Microsoft Research
The Fairness, Accountability, Transparency, and Ethics (FATE) Research group at Microsoft Research New York City (MSR NYC) is looking for a pre-doctoral research assistant (pre-doc) to start August 20...
www.microsoft.com
May 20, 2025 at 1:47 PM
Exciting news!!! This just got into @icmlconf.bsky.social as a position paper!!! 🎉 More updates to come as we work on the camera-ready version!!!
Remember this @neuripsconf.bsky.social workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here: arxiv.org/abs/2502.00561
May 3, 2025 at 8:59 PM
Reposted by Hanna Wallach
Reading - Evaluating Evaluations for GenAI from
@hannawallach.bsky.social madesai.bsky.social afedercooper.bsky.social et al-This work dovetails with our work at
@worldprivacyforum.bsky.social on measuring AI governance tools from governments, through privacy/ policy lens arxiv.org/pdf/2502.00561
April 3, 2025 at 12:27 PM
Reposted by Hanna Wallach
At the #HEAL workshop, I'll present "Systematizing During Measurement Enables Broader Stakeholder Participation" on the ways we can further structure LLM evaluations and open them for deliberation. A project led by @hannawallach.bsky.social
April 25, 2025 at 10:57 PM
Reposted by Hanna Wallach
2. Also Saturday, @amabalayn.bsky.social will represent our piece arguing that systematization during measurement enables broad stakeholder participation in AI evaluation.

This came out of a huge group collaboration led by @hannawallach.bsky.social: bsky.app/profile/hann...

heal-workshop.github.io
April 25, 2025 at 5:24 PM
Reposted by Hanna Wallach
📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵
LinkedIn
This link will take you to a page that’s not on LinkedIn
lnkd.in
March 20, 2025 at 1:28 PM
Reposted by Hanna Wallach
⚫⚪ It's coming...SHADES. ⚪⚫
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.
February 10, 2025 at 8:28 AM
Remember this @neuripsconf.bsky.social workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here: arxiv.org/abs/2502.00561
February 4, 2025 at 3:28 PM
Reposted by Hanna Wallach
🚨Postdoc Alert 🚨

The Computational Social Science group at Microsoft Research NYC (Jake Hofman, David Rothschild, Dan Goldstein) is hiring a postdoc!

jobs.careers.microsoft.com/global/en/jo...

Deadline: December 20, 2024
Search Jobs | Microsoft Careers
jobs.careers.microsoft.com
October 17, 2024 at 6:44 PM
Reposted by Hanna Wallach
Microsoft's Computational Social Science group may have the opportunity to hire one researcher

Senior: 0-3 yrs post PhD
jobs.careers.microsoft.com/global/en/jo...

Principal: 3+ yrs post PhD
jobs.careers.microsoft.com/global/en/sh...

Please note: our ability to hire this season is not certain
December 19, 2024 at 6:54 PM
Reposted by Hanna Wallach
It's company holiday party season! Every year I start a thread of my favorite questions guaranteed to get you 20 minutes of lively conversation (as an introvert, this is how I thrive at parties).

What are your favorites? Here are some of mine...
December 17, 2024 at 7:59 PM
Reposted by Hanna Wallach
"there's a lot of qualitative work that goes into designing quantitative metrics" -- @azjacobs.bsky.social

"how do we translate between benchmark performance and what it will really be like to use a model" -- Su Lin Blodgett
Super interesting panel discussion taking place right now at the Evaluating Evaluations workshop at @neuripsconf.bsky.social with amazing panelists @abeba.bsky.social, @azjacobs.bsky.social, Su Lin Blodgett, and Lee Wan Sie!!! #NeurIPS2024
December 15, 2024 at 6:16 PM
Super interesting panel discussion taking place right now at the Evaluating Evaluations workshop at @neuripsconf.bsky.social with amazing panelists @abeba.bsky.social, @azjacobs.bsky.social, Su Lin Blodgett, and Lee Wan Sie!!! #NeurIPS2024
December 15, 2024 at 5:35 PM
And, as if that wasn't enough excitement for one day, we'll also be presenting a poster on "Red Teaming: Everything Everywhere All at Once" at the Safe GenAI @neuripsconf.bsky.social workshop from 3--5pm: neurips.cc/virtual/2024... #NeurIPS2024
Safe Generative AINeurIPS 2024
neurips.cc
December 15, 2024 at 4:28 PM
Super excited for the Evaluating Evaluations workshop at @neuripsconf.bsky.social today!!! evaleval.github.io #NeurIPS2024

@msftresearch.bsky.social's FATE group, Sociotechnical Alignment Center, and friends will be presenting several papers there. See below for details...
Home - EvalEval 2024
A NeurIPS 2024 workshop on best practices for measuring the broader impacts of generative AI systems
evaleval.github.io
December 15, 2024 at 4:19 PM
Stop by the 345--430pm poster session at the Statistical Frontiers in LLMs @neuripsconf.bsky.social workshop today (Dec 14) to catch posters on the following papers... #NeurIPS2024
December 14, 2024 at 8:15 PM
New paper on why machine "unlearning" is much harder than it seems is now up on arXiv: arxiv.org/abs/2412.06966 This was a huuuuuge cross-disciplinary effort led by @msftresearch.bsky.social FATE postdoc @grumpy-frog.bsky.social!!!
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. ...
arxiv.org
December 14, 2024 at 12:55 AM