Julian Skirzynski
@jskirzynski.bsky.social
PhD student in Computer Science @UCSD. Studying interpretable AI and RL to improve people's decision-making.
We’ll be presenting @facct on 06.24 at 10:45 AM during the Evaluating Explainable AI session!
Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research.
#AIFairness #XAI
Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research.
#AIFairness #XAI
June 24, 2025 at 6:14 AM
We’ll be presenting @facct on 06.24 at 10:45 AM during the Evaluating Explainable AI session!
Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research.
#AIFairness #XAI
Come chat with us. We would love to discuss implications for AI policy, better auditing methods, and next steps for algorithmic fairness research.
#AIFairness #XAI
But if they are indeed used to dispute discrimination claims, we can expect multiple failed cases due to insufficient evidence and many undetected discriminatory decisions.
Current explanation-based auditing is, therefore, fundamentally flawed, and we need additional safeguards.
Current explanation-based auditing is, therefore, fundamentally flawed, and we need additional safeguards.
June 24, 2025 at 6:14 AM
But if they are indeed used to dispute discrimination claims, we can expect multiple failed cases due to insufficient evidence and many undetected discriminatory decisions.
Current explanation-based auditing is, therefore, fundamentally flawed, and we need additional safeguards.
Current explanation-based auditing is, therefore, fundamentally flawed, and we need additional safeguards.
Despite their unreliability, explanations are suggested as anti-discrimination measures by a number of regulations.
GDPR ✓ Digital Services Act ✓ Algorithmic Accountability Act ✓ GDPD (Brazil) ✓
GDPR ✓ Digital Services Act ✓ Algorithmic Accountability Act ✓ GDPD (Brazil) ✓
June 24, 2025 at 6:14 AM
Despite their unreliability, explanations are suggested as anti-discrimination measures by a number of regulations.
GDPR ✓ Digital Services Act ✓ Algorithmic Accountability Act ✓ GDPD (Brazil) ✓
GDPR ✓ Digital Services Act ✓ Algorithmic Accountability Act ✓ GDPD (Brazil) ✓
So why do explanations fail?
1️⃣ They target individuals, while discrimination operates on groups
2️⃣ Users’ causal models are flawed
3️⃣ Users overestimate proxy strength and treat its presence in the explanation as discrimination
4️⃣ Feature-outcome relationships bias user claims
1️⃣ They target individuals, while discrimination operates on groups
2️⃣ Users’ causal models are flawed
3️⃣ Users overestimate proxy strength and treat its presence in the explanation as discrimination
4️⃣ Feature-outcome relationships bias user claims
June 24, 2025 at 6:14 AM
So why do explanations fail?
1️⃣ They target individuals, while discrimination operates on groups
2️⃣ Users’ causal models are flawed
3️⃣ Users overestimate proxy strength and treat its presence in the explanation as discrimination
4️⃣ Feature-outcome relationships bias user claims
1️⃣ They target individuals, while discrimination operates on groups
2️⃣ Users’ causal models are flawed
3️⃣ Users overestimate proxy strength and treat its presence in the explanation as discrimination
4️⃣ Feature-outcome relationships bias user claims
BADLY.
When participants flag discrimination, they are correct ~50% of the time, miss 55% of the discriminatory predictions and keep a 30% FPR.
Additional knowledge (protected attributes, proxy strength) improves the detection to roughly 60% without affecting other measures.
When participants flag discrimination, they are correct ~50% of the time, miss 55% of the discriminatory predictions and keep a 30% FPR.
Additional knowledge (protected attributes, proxy strength) improves the detection to roughly 60% without affecting other measures.
June 24, 2025 at 6:14 AM
BADLY.
When participants flag discrimination, they are correct ~50% of the time, miss 55% of the discriminatory predictions and keep a 30% FPR.
Additional knowledge (protected attributes, proxy strength) improves the detection to roughly 60% without affecting other measures.
When participants flag discrimination, they are correct ~50% of the time, miss 55% of the discriminatory predictions and keep a 30% FPR.
Additional knowledge (protected attributes, proxy strength) improves the detection to roughly 60% without affecting other measures.
Our setup lets us assign each robot a ground-truth discrimination outcome, which lets us evaluate how well each participant could do under different information regimes.
So, how did they do?
So, how did they do?
June 24, 2025 at 6:14 AM
Our setup lets us assign each robot a ground-truth discrimination outcome, which lets us evaluate how well each participant could do under different information regimes.
So, how did they do?
So, how did they do?
We recruited participants, anchored their beliefs on discrimination, trained them to use explanations, and tested to make sure they got it right.
We then saw how well they could flag unfair predictions based on counterfactual explanations and feature attribution scores.
We then saw how well they could flag unfair predictions based on counterfactual explanations and feature attribution scores.
June 24, 2025 at 6:14 AM
We recruited participants, anchored their beliefs on discrimination, trained them to use explanations, and tested to make sure they got it right.
We then saw how well they could flag unfair predictions based on counterfactual explanations and feature attribution scores.
We then saw how well they could flag unfair predictions based on counterfactual explanations and feature attribution scores.
Participants audit a model to predict if robots sent to Mars will break down. Some are built by “Company X.” Others by “Company S.”
Our model predicts failure based on robot body parts. It can discriminate against Company X by predicting that robots without an antenna fail.
Our model predicts failure based on robot body parts. It can discriminate against Company X by predicting that robots without an antenna fail.
June 24, 2025 at 6:14 AM
Participants audit a model to predict if robots sent to Mars will break down. Some are built by “Company X.” Others by “Company S.”
Our model predicts failure based on robot body parts. It can discriminate against Company X by predicting that robots without an antenna fail.
Our model predicts failure based on robot body parts. It can discriminate against Company X by predicting that robots without an antenna fail.
We cannot tell if explanations work or not due to these reasons.
To tackle this challenge, we introduce a synthetic task where we:
- Teach users how to use explanations
- Control their beliefs
- Adapt the world to fit their beliefs
- Control the explanation content
To tackle this challenge, we introduce a synthetic task where we:
- Teach users how to use explanations
- Control their beliefs
- Adapt the world to fit their beliefs
- Control the explanation content
June 24, 2025 at 6:14 AM
We cannot tell if explanations work or not due to these reasons.
To tackle this challenge, we introduce a synthetic task where we:
- Teach users how to use explanations
- Control their beliefs
- Adapt the world to fit their beliefs
- Control the explanation content
To tackle this challenge, we introduce a synthetic task where we:
- Teach users how to use explanations
- Control their beliefs
- Adapt the world to fit their beliefs
- Control the explanation content
Users may fail to detect discrimination through explanations due to:
- Proxies not being revealed by explanations
- Issues with interpreting explanations
- Wrong assumptions about proxy strength
- Unknown protected class
- Incorrect causal beliefs
- Proxies not being revealed by explanations
- Issues with interpreting explanations
- Wrong assumptions about proxy strength
- Unknown protected class
- Incorrect causal beliefs
June 24, 2025 at 6:14 AM
Users may fail to detect discrimination through explanations due to:
- Proxies not being revealed by explanations
- Issues with interpreting explanations
- Wrong assumptions about proxy strength
- Unknown protected class
- Incorrect causal beliefs
- Proxies not being revealed by explanations
- Issues with interpreting explanations
- Wrong assumptions about proxy strength
- Unknown protected class
- Incorrect causal beliefs
Imagine a model that predicts loan approval based on credit history and salary.
Would a rejected female applicant get approved if she somehow applied as a man?
If yes, her prediction was discriminatory.
Fairness requires predictions to stay the same regardless of the protected class.
Would a rejected female applicant get approved if she somehow applied as a man?
If yes, her prediction was discriminatory.
Fairness requires predictions to stay the same regardless of the protected class.
June 24, 2025 at 6:14 AM
Imagine a model that predicts loan approval based on credit history and salary.
Would a rejected female applicant get approved if she somehow applied as a man?
If yes, her prediction was discriminatory.
Fairness requires predictions to stay the same regardless of the protected class.
Would a rejected female applicant get approved if she somehow applied as a man?
If yes, her prediction was discriminatory.
Fairness requires predictions to stay the same regardless of the protected class.
Oh yeah, welcome to the pack!
January 8, 2025 at 4:01 PM
Oh yeah, welcome to the pack!
Actually, I've added you some time ago already so you're good :)
December 8, 2024 at 3:29 PM
Actually, I've added you some time ago already so you're good :)
Let's have bioinformatics represented then :) Regarding the clubs, I have not heard of any, might be just a coincidence :D
December 8, 2024 at 3:28 PM
Let's have bioinformatics represented then :) Regarding the clubs, I have not heard of any, might be just a coincidence :D