Lightnews — Scholar-powered news

Hanna Wallach

@hannawallach.bsky.social

Alright, it's that time of year: Who all is going to
@neuripsconf.bsky.social this year??? #NeurIPS2025 🤖☃️

November 24, 2025 at 3:34 PM

Hanna Wallach

@hannawallach.bsky.social

Three exciting opportunities at
@msftresearch.bsky.social in NYC!!! 🎉

Internship w/ FATE: apply.careers.microsoft.com/careers/job?...

Internship w/ STAC on AI evaluation and measurement: apply.careers.microsoft.com/careers/job?...

Postdoc w/ FATE: apply.careers.microsoft.com/careers/job?...

Jenn Wortman Vaughan @jennwv.bsky.social · 7d

FATE internships: apply.careers.microsoft.com/careers/job?...

FATE postdocs: apply.careers.microsoft.com/careers/job?...

And internships with our close collaborators at STAC: apply.careers.microsoft.com/careers/job?...

November 24, 2025 at 3:33 PM

Hanna Wallach

@hannawallach.bsky.social

This is happening now!!!

Hanna Wallach @hannawallach.bsky.social · Jul 15

If you're at @icmlconf.bsky.social this week, come check out our poster on "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" presented by the amazing @afedercooper.bsky.social from 11:30am--1:30pm PDT on Weds!!! icml.cc/virtual/2025...

ICML Poster Position: Evaluating Generative AI Systems Is a Social Science Measurement ChallengeICML 2025

icml.cc

July 16, 2025 at 6:33 PM

Reposted by Hanna Wallach

A. Feder Cooper

@afedercooper.bsky.social

1) (Tomorrow!) Wed 7/16, 11am-1:30 pm PT poster for "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" (E. Exhibition Hall A-B, E-503)

Work led by @hannawallach.bsky.social + @azjacobs.bsky.social

arxiv.org/abs/2502.00561

Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges com...

arxiv.org

July 16, 2025 at 12:46 AM

Hanna Wallach

@hannawallach.bsky.social

If you're at @icmlconf.bsky.social this week, come check out our poster on "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge" presented by the amazing @afedercooper.bsky.social from 11:30am--1:30pm PDT on Weds!!! icml.cc/virtual/2025...

ICML Poster Position: Evaluating Generative AI Systems Is a Social Science Measurement ChallengeICML 2025

icml.cc

July 15, 2025 at 6:35 PM

Hanna Wallach

@hannawallach.bsky.social

Generative language systems are everywhere, and many of them stereotype, demean, or erase particular social groups.

June 16, 2025 at 9:49 PM

Hanna Wallach

@hannawallach.bsky.social

Alright, people, let's be honest: GenAI systems are everywhere, and figuring out whether they're any good is a total mess. Should we use them? Where? How? Do they need a total overhaul?

(1/6)

June 15, 2025 at 12:20 AM

Hanna Wallach

@hannawallach.bsky.social

I'm so excited this paper is finally online!!! 🎉 We had so much fun working on this with @emmharv.bsky.social!!! Thread below summarizing our contributions...

Emma Harvey @emmharv.bsky.social · Jun 9

📣 "Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems" is forthcoming at #ACL2025NLP - and you can read it now on arXiv!

🔗: arxiv.org/pdf/2506.04482
🧵: ⬇️

A screenshot of our paper:

Title: Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Authors: Emma Harvey, Emily Sheng, Su Lin Blodgett, Alexandra Chouldechova, Jean Garcia-Gathright, Alexandra Olteanu, Hanna Wallach

Abstract: The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language model (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments---even useful instruments---are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

June 10, 2025 at 7:12 PM

Hanna Wallach

@hannawallach.bsky.social

Exciting news: The Fairness, Accountability, Transparency and Ethics (FATE) group at Microsoft Research NYC is hiring a predoctoral fellow!!! 🎉

www.microsoft.com/en-us/resear...

FATE Research Assistant (“Pre-doc”) - Microsoft Research

The Fairness, Accountability, Transparency, and Ethics (FATE) Research group at Microsoft Research New York City (MSR NYC) is looking for a pre-doctoral research assistant (pre-doc) to start August 20...

www.microsoft.com

May 20, 2025 at 1:47 PM

Hanna Wallach

@hannawallach.bsky.social

Exciting news!!! This just got into @icmlconf.bsky.social as a position paper!!! 🎉 More updates to come as we work on the camera-ready version!!!

Hanna Wallach @hannawallach.bsky.social · Feb 4

Remember this @neuripsconf.bsky.social workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here: arxiv.org/abs/2502.00561

May 3, 2025 at 8:59 PM

Reposted by Hanna Wallach

Kate Kaye

@katekaye.bsky.social

Reading - Evaluating Evaluations for GenAI from
@hannawallach.bsky.social madesai.bsky.social afedercooper.bsky.social et al-This work dovetails with our work at
@worldprivacyforum.bsky.social on measuring AI governance tools from governments, through privacy/ policy lens arxiv.org/pdf/2502.00561

April 3, 2025 at 12:27 PM

Reposted by Hanna Wallach

Agathe Balayn

@amabalayn.bsky.social

At the #HEAL workshop, I'll present "Systematizing During Measurement Enables Broader Stakeholder Participation" on the ways we can further structure LLM evaluations and open them for deliberation. A project led by @hannawallach.bsky.social

April 25, 2025 at 10:57 PM

Reposted by Hanna Wallach

Jenn Wortman Vaughan

@jennwv.bsky.social

2. Also Saturday, @amabalayn.bsky.social will represent our piece arguing that systematization during measurement enables broad stakeholder participation in AI evaluation.

This came out of a huge group collaboration led by @hannawallach.bsky.social: bsky.app/profile/hann...

heal-workshop.github.io

April 25, 2025 at 5:24 PM

Reposted by Hanna Wallach

weidingerlaura.bsky.social

@weidingerlaura.bsky.social

📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵

This link will take you to a page that’s not on LinkedIn

lnkd.in

March 20, 2025 at 1:28 PM

Reposted by Hanna Wallach

Margaret Mitchell

@mmitchell.bsky.social

⚫⚪ It's coming...SHADES. ⚪⚫
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.

Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.'
SHADES is in multiple grey colors (shades).

February 10, 2025 at 8:28 AM

Hanna Wallach

@hannawallach.bsky.social

Remember this @neuripsconf.bsky.social workshop paper? We spent the past month writing a newer, better, longer version!!! You can find it online here: arxiv.org/abs/2502.00561

February 4, 2025 at 3:28 PM

Reposted by Hanna Wallach

Dan Goldstein

@dggoldst.bsky.social

🚨Postdoc Alert 🚨

The Computational Social Science group at Microsoft Research NYC (Jake Hofman, David Rothschild, Dan Goldstein) is hiring a postdoc!

jobs.careers.microsoft.com/global/en/jo...

Deadline: December 20, 2024

Search Jobs | Microsoft Careers

jobs.careers.microsoft.com

October 17, 2024 at 6:44 PM

Reposted by Hanna Wallach

Dan Goldstein

@dggoldst.bsky.social

Microsoft's Computational Social Science group may have the opportunity to hire one researcher

Senior: 0-3 yrs post PhD
jobs.careers.microsoft.com/global/en/jo...

Principal: 3+ yrs post PhD
jobs.careers.microsoft.com/global/en/sh...

Please note: our ability to hire this season is not certain

December 19, 2024 at 6:54 PM

Reposted by Hanna Wallach

Hilary Mason

@hmason.bsky.social

It's company holiday party season! Every year I start a thread of my favorite questions guaranteed to get you 20 minutes of lively conversation (as an introvert, this is how I thrive at parties).

What are your favorites? Here are some of mine...

December 17, 2024 at 7:59 PM

Reposted by Hanna Wallach

Maria Antoniak

@mariaa.bsky.social

"there's a lot of qualitative work that goes into designing quantitative metrics" -- @azjacobs.bsky.social

"how do we translate between benchmark performance and what it will really be like to use a model" -- Su Lin Blodgett

Hanna Wallach @hannawallach.bsky.social · Dec 15

Super interesting panel discussion taking place right now at the Evaluating Evaluations workshop at @neuripsconf.bsky.social with amazing panelists @abeba.bsky.social, @azjacobs.bsky.social, Su Lin Blodgett, and Lee Wan Sie!!! #NeurIPS2024

December 15, 2024 at 6:16 PM

Hanna Wallach

@hannawallach.bsky.social

Super interesting panel discussion taking place right now at the Evaluating Evaluations workshop at @neuripsconf.bsky.social with amazing panelists @abeba.bsky.social, @azjacobs.bsky.social, Su Lin Blodgett, and Lee Wan Sie!!! #NeurIPS2024

December 15, 2024 at 5:35 PM

Hanna Wallach

@hannawallach.bsky.social

And, as if that wasn't enough excitement for one day, we'll also be presenting a poster on "Red Teaming: Everything Everywhere All at Once" at the Safe GenAI @neuripsconf.bsky.social workshop from 3--5pm: neurips.cc/virtual/2024... #NeurIPS2024

Safe Generative AINeurIPS 2024

neurips.cc

December 15, 2024 at 4:28 PM

Hanna Wallach

@hannawallach.bsky.social

Super excited for the Evaluating Evaluations workshop at @neuripsconf.bsky.social today!!! evaleval.github.io #NeurIPS2024

@msftresearch.bsky.social's FATE group, Sociotechnical Alignment Center, and friends will be presenting several papers there. See below for details...

Home - EvalEval 2024

A NeurIPS 2024 workshop on best practices for measuring the broader impacts of generative AI systems

evaleval.github.io

December 15, 2024 at 4:19 PM

Hanna Wallach

@hannawallach.bsky.social

Stop by the 345--430pm poster session at the Statistical Frontiers in LLMs @neuripsconf.bsky.social workshop today (Dec 14) to catch posters on the following papers... #NeurIPS2024

December 14, 2024 at 8:15 PM

Hanna Wallach

@hannawallach.bsky.social

New paper on why machine "unlearning" is much harder than it seems is now up on arXiv: arxiv.org/abs/2412.06966 This was a huuuuuge cross-disciplinary effort led by @msftresearch.bsky.social FATE postdoc @grumpy-frog.bsky.social!!!

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. ...

arxiv.org

December 14, 2024 at 12:55 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news