prakharg.bsky.social
@prakharg.bsky.social
Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!
June 25, 2025 at 8:53 AM
Evaluations in the way the model would be deployed vs evaluations in only controlled unrealistic settings!
June 25, 2025 at 8:53 AM
Allowing companies to do isolated audits can lead to D-Hacking!! More robust testing is needed...
June 25, 2025 at 8:53 AM
Legal frameworks tend to have control over allocative decisions (Yes/No outcomes), which fit well with traditional ML systems... But not with GenAI systems
June 25, 2025 at 8:53 AM
Nuance of stereotype errors is so important to understand their true harms... Insightful presentation by @angelinawang.bsky.social
June 25, 2025 at 8:43 AM
Women tend to report stereotype-reinforcing errors as more harmful while men tend to report stereotype-violating errors as more harmful...
June 25, 2025 at 8:43 AM
Some items are more associated with men vs women (not surprising), but not all of them are equally harmful!!
June 25, 2025 at 8:43 AM
Cognitive beliefs, attitudes and behaviours... Three ways to measure harms ('pragmatic harms')
June 25, 2025 at 8:43 AM
Are all errors equally harmful? No! Stereotype-reinforcing errors vs stereotype-violating errors
June 25, 2025 at 8:43 AM
Our understanding of stereotypes sometimes isn't indicative of reality.... they can appear in both directions, or might exist simply without harm
June 25, 2025 at 8:43 AM
Clear narrative and a great presentation by Cecilia Panigutti
June 25, 2025 at 8:33 AM
Risk-measuring studies - Bringing it back to risk measurement, but this time with a clearly defined objective instead of risk-uncovering as before... Not just whether a risk exists, but 'how severe' is it?
June 25, 2025 at 8:33 AM
Interface-design studies - Focus on UI design elements which impact user interaction
June 25, 2025 at 8:33 AM
Reverse-engineering studies - Narrower scope and in-depth studies of how algorithms work... Methodological precision in the key!
June 25, 2025 at 8:33 AM
Risk-uncovering studies - Typical starts from anecdotal evidence and help surface new risks
June 25, 2025 at 8:33 AM
A review organized not by data collection technique, but by DSA risk management framework categories
June 25, 2025 at 8:33 AM
Narrative review of algorithmic auditing studies, practical recommendation for best practices, and mapping to DSA obligations...
June 25, 2025 at 8:33 AM
Such a broad topic... Excellent presentation by @feliciajing.bsky.social
June 25, 2025 at 8:22 AM
Historical methods working alongside many other ways of auditing these models can help us take advantage of the broader scope of historical evaluations....
June 25, 2025 at 8:22 AM
AI Audits have moved from bottom-up external evaluations to new age 'auditing companies'. While this has increased speed and scale, they have significantly narrowed the scope of auditing.
June 25, 2025 at 8:22 AM
Why the history of AI assessments? A study through the lens of historical methods can help us understand neglected areas of auditing.
June 25, 2025 at 8:22 AM