Lightnews — Scholar-powered news

sorelle

@friedler.net

CS prof at Haverford, Chair @acm.org U.S. tech policy, Brookings nonres Senior Fellow, former White House OSTP tech policy, co-author AI Bill of Rights, research on AI and society, @facct.bsky.social co-founder
formerly @kdphd 🐦
sorelle.friedler.net

Posts Replies Media Videos

sorelle

@friedler.net

Thanks, AI.

share.inquirer.com/ObEMPw

A screenshot from the linked (gift) article showing energy prices spiking over the summer of 2025 in New Jersey and Pennsylvania.

September 11, 2025 at 1:15 PM

sorelle

@friedler.net

Trump's AI Action Plan released today aims to use federal procurement policy to shape the speech AI systems can generate to be free from "ideological bias."

On at least one political issue, the AI platforms are already in agreement.

Screenshot of asking OpenAI's ChatGPT: Is Trump in the Epstein files? The answer is yes, followed by further details.

Screenshot of asking Anthropic's Claude Sonnet 4: Is Trump in the Epstein files? The answer is yes, followed by further details.

Screenshot of asking X.AI's Grok: Is Trump in the Epstein files? The answer is yes, followed by further details.

Screenshot of asking Google's Gemini: Is Trump in the Epstein files? The answer is yes, followed by further details.

July 24, 2025 at 1:38 AM

sorelle

@friedler.net

And - no surprise - this holds across identity groups and across APIs. AI filters have the same problem and block speech generation.

In work w/ @metaxa.net and students we find that identity-related text is 2-3x more likely to be *incorrectly* filtered than other text.

arxiv.org/abs/2409.13725

Over-filtering ("speech suppression") results for APIs (Open AI, Llama Guard, Anthropic, Google, and Jigsaw) across identity groups (women, men, white, non-white, straight, LGBT, Christian, non-Christian, and disability). Results show that each API has at least one identity group for which identity-related speech is incorrectly filtered at beyond the usual underlying error rate for that API.

May 3, 2025 at 7:30 PM

sorelle

@friedler.net

You can calculate responsiveness scores too! We've released our code, including a quickstart guide.

We'd love to hear about if or how you find them useful.

github.com/ustunb/reachml

...4/

Sample python code from the quickstart guide available at the linked github repository.

The responsiveness score bar graph generated from the quickstart code.

April 24, 2025 at 4:37 PM

sorelle

@friedler.net

Instead, we score features based on the proportion of changes to that single feature that would lead to recourse.

We call these responsiveness scores and find that they can successfully identify features that individuals can change to get a better outcome. ...3/

April 24, 2025 at 4:37 PM

sorelle

@friedler.net

In work with @harrycheon.bsky.social @anniewernerfelt.bsky.social @berkustun.bsky.social we show that many features highlighted by SHAP and LIME are non-responsive: they can't be changed (like age) or wouldn't lead to a better model outcome (e.g., getting a loan) even if you did change them!... 2/

(Same figure as the left hand side of the previous post in the thread.)

A feature-highlighting explanation generated by SHAP for a loan allocation task that shows multiple important features, however these include features that can not be changed (e.g., age, number of dependents) and features that even if they were changed would not result in a different outcome (e.g., credit utilization).

April 24, 2025 at 4:37 PM

sorelle

@friedler.net

Hey AI folks - stop using SHAP! It won't help you debug [1], won't catch discrimination [2], and makes no sense for feature importance [3].

Plus - as we show - it also won't give recourse.

In a paper at #ICLR we introduce feature responsiveness scores... 1/

arxiv.org/pdf/2410.22598

Left: a feature-highlighting explanation generated by SHAP that shows multiple important features, however these include features that can not be changed (e.g., age, number of dependents) and features that even if they were changed would not result in a different outcome (e.g., credit utilization).

Right: a feature-highlighting explanation generated by our responsiveness scores showing only features that can be changed and which have the potential to result in a better outcome for the individual (multiple credit lines and monthly income).

April 24, 2025 at 4:37 PM

sorelle

@friedler.net

Our in-progress work shows that across AI systems identity-related speech (whether about marginalized or dominant groups) is more likely to be incorrectly flagged.

arxiv.org/abs/2409.13725

November 21, 2024 at 2:06 PM

sorelle

@friedler.net

In a recent audit (with @metaxa.net and students) we found that even some PG-rated TV scripts get blocked by OpenAI's automated content moderation filter.

Press release description: ai.seas.upenn.edu/news/censori...

Actual paper: facctconference.org/static/paper...

November 21, 2024 at 2:06 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news