Emma Harvey
banner
emmharv.bsky.social
Emma Harvey
@emmharv.bsky.social
PhD student @ Cornell info sci | Sociotechnical fairness & algorithm auditing | Previously Stanford RegLab, MSR FATE, Penn | https://emmaharv.github.io/
⏳ Bias Delayed is Bias Denied? Assessing the Effect of Reporting Delays on Disparity Assessments by @jennahgosciak.bsky.social and @aparnabee.bsky.social et al. (incl. @allisonkoe.bsky.social @marzyehghassemi.bsky.social) analyzes how missing demographic data impacts estimates of health disparities.
July 24, 2025 at 7:53 PM
Towards AI Accountability Infrastructure: Gaps and Opportunities in AI Audit Tooling by @victorojewale.bsky.social @rbsteed.com @briana-v.bsky.social @abeba.bsky.social @rajiinio.bsky.social compares the landscape of AI audit tools (tools.auditing-ai.com) to the actual needs of AI auditors.
July 24, 2025 at 7:53 PM
📱 External Evaluation of Discrimination Mitigation Efforts in Meta's Ad Delivery by Imana et al. audits VRS (Meta’s process for reducing bias in ad delivery as part of a settlement with DOJ), and finds VRS reduces demographic differences in ad audience – but also reduces reach and increases cost.
July 24, 2025 at 7:53 PM
🎲 Consistently Arbitrary or Arbitrarily Consistent: Navigating the Tensions Between Homogenization and Multiplicity in Algorithmic Decision-Making by Gur-Arieh and Lee explores the competing desires for consistency in decision-making models and opportunity pluralism in decision-making ecosystems.
July 24, 2025 at 7:53 PM
🔡 Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline by Kapania et al. (incl. @jennwv.bsky.social) asks: what are practitioners' motivations, current practices, desiderata, and challenges when generating, using, and validating synthetic data to develop AI?
July 21, 2025 at 3:47 PM
🚗 Not Even Nice Work If You Can Get It; A Longitudinal Study of Uber's Algorithmic Pay and Pricing by @rdbinns.bsky.social @jmlstein.bsky.social et al. (incl. @emax.bsky.social) audits Uber's pay practices, focusing on the shift to paying drivers a "dynamic" (opaque, unpredictable) share of fare.
July 21, 2025 at 3:47 PM
🏦 Legacy Procurement Practices Shape How U.S. Cities Govern AI: Understanding Government Employees’ Practices, Challenges, and Needs by @narijohnson.bsky.social et al. explores procurement in the context of recent calls for governments to use their "purchasing power" to incentivize responsible AI.
July 15, 2025 at 4:31 PM
🕵️ Auditing the Audits: Lessons for Algorithmic Accountability from Local Law 144’s Bias Audits by @mkgerchick.bsky.social et al. (incl. @rone.bsky.social @metaxa.net) analyzes the reports produced in response to NYC's Local Law 144, which requires audits of automated employment decision tools.
July 14, 2025 at 5:03 PM
📜 Historical Methods for AI Evaluations, Assessments, and Audits by Becerra Sandoval & @feliciajing.bsky.social proposes historical analysis as a methodological component of AI audits and presents a case study of this approach in an audit of a virtual agent.
July 14, 2025 at 5:03 PM
👯 Allocation Multiplicity: Evaluating the Promises of the Rashomon Set by Jain et al. (incl. @kathleencreel.bsky.social) argues that allocation (vs. model) multiplicity should be seen as a pathway for reducing discrimination, homogenization, and arbitrariness in decision-making problems.
July 14, 2025 at 5:03 PM
🍎 Difficult Lessons on Social Prediction from Wisconsin Public Schools by Perdomo et al. asks: are individual risk scores necessary for effectively targeting interventions? In WI, targeted support is provided to students who are predicted by a model to be at risk of dropping out of high school.
July 14, 2025 at 5:03 PM
📚 Algorithms in the Stacks: Investigating automated, for-profit diversity audits in public libraries by @mellymeldubs.bsky.social et al. asks: what happens when value-driven cultural work (measuring how diverse a library collection is) is outsourced to automated systems owned by commercial vendors?
July 14, 2025 at 5:03 PM
I've arrived in the 🌁Bay Area🌁, where I'll be spending the summer as a research fellow at Stanford's RegLab! If you're also here, LMK and let's get a meal / go on a hike / etc!!
July 1, 2025 at 6:58 PM
In particular, Rufus consistently provides incorrect responses to prompts with zero copula, or the omission of the verb "to be" that is common in dialects like AAE. Encouragingly, this implies that quality of service could improve if Rufus were trained to be robust to common linguistic features!
June 23, 2025 at 2:45 PM
We apply our framework to audit Amazon Rufus, a customer service chatbot. We find that Rufus produces lower-quality responses to prompts written in minoritized English dialects. These quality-of-service harms are exacerbated by the presence of typos in prompts.
June 23, 2025 at 2:45 PM
In our paper, we present a five-step framework for auditing LLM-based chatbots for dialect bias by measuring the extent to which they produce quality-of-service harms, which occur when systems do not work equally well for different people.
June 23, 2025 at 2:45 PM
I am so excited to be in 🇬🇷Athens🇬🇷 to present "A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms" by me, @kizilcec.bsky.social, and @allisonkoe.bsky.social, at #FAccT2025!!

🔗: arxiv.org/pdf/2506.04419
June 23, 2025 at 2:45 PM
👎 In some cases, this is because instruments are *not useful*: they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other words, they lack validity, reliability, specificity, scalability, interpretability, and/or actionability.
June 9, 2025 at 6:58 PM
💬 Through semi-structured interviews with 12 practitioners tasked with evaluating LLM-based systems for representational harms, we find that practitioners are often unable to use publicly available measurement instruments - despite a desire to do so!
June 9, 2025 at 6:58 PM
🤖 LLM-based systems can cause representational harms (e.g., by stereotyping, demeaning, or failing to recognize the existence of a particular social group). The NLP research community has produced numerous *publicly available measurement instruments* for measuring such harms.
June 9, 2025 at 6:58 PM
📣 "Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems" is forthcoming at #ACL2025NLP - and you can read it now on arXiv!

🔗: arxiv.org/pdf/2506.04482
🧵: ⬇️
June 9, 2025 at 6:58 PM
And if you're looking for something to do from 2:46 to 2:58 PM tomorrow, why not come see me present "Don't Forget the Teachers": Towards an Educator-Centered Understanding of Harms from Large Language Models in Education by me, @allisonkoe.bsky.social, and @kizilcec.bsky.social??
April 27, 2025 at 8:33 AM
I am officially in 🌸Yokohama🌸 for #CHI2025! If you're interested in algorithmic fairness and/or getting something to eat, please say hi as I am a big fan of both!!
April 27, 2025 at 8:33 AM
🎉 So excited to share that "Don't Forget the Teachers" has received a Best Paper Award at #CHI2025!!

@allisonkoe.bsky.social @kizilcec.bsky.social
March 27, 2025 at 2:59 PM
In light of this, we make recommendations on how edtech providers, researchers, regulators, and school leaders can facilitate the design of *educator-centered* edtech.
March 13, 2025 at 4:07 PM