Angelie Kraft
banner
krangelie.bsky.social
Angelie Kraft
@krangelie.bsky.social
(She/her)
Researcher AI Ethics & NLP
@uni-hamburg.de & @leuphana.bsky.social
🌳 Previously Fellow @weizenbauminstitut.bsky.social
🌐 angeliekraft.com
Der Videomitschnitt von unserem Weizenbaum Forum zu dekolonialen und feministischen Perspektiven auf KI und Gerechtigkeit ist jetzt hier verfügbar: youtu.be/oPMFTIh6Dco?...
Weizenbaum Forum Mai25 | Ethische KI von der Theorie zur Praxis
YouTube video by Weizenbaum-Institut
youtu.be
May 31, 2025 at 10:46 AM
Otherwise, benchmarks will continue to systematically incentivize the optimization of LLMs towards biased heuristics, rewarding the development of technologies that serve the needs of only a privileged few. 7/7
May 23, 2025 at 12:22 PM
We need benchmarks that are representative of diverse perspectives and experiences or make it a rule to be intentional and transparent about the perspectives and experiences that are represented. A benchmark will always represent only some interests and that is okay, as long as we are aware! 6/7
May 23, 2025 at 12:21 PM
Most of them lack transparency regarding the individuals involved in their creation, particularly the annotators, even though decisions made while curating and labeling data are influenced by identity. 5/7
May 23, 2025 at 12:20 PM
Most benchmarks are biased in terms of gender, occupation, religion, and geographic representation. The questions asked, i.e. the contents benchmarked for, are highly skewed towards Western, Christian and male entities. 4/7
May 23, 2025 at 12:15 PM
We analyzed the 30 most popular QA and RC benchmark papers & 20 respective datasets and found that most are created without considering social representation, leading to biased outcomes. 3/7
May 23, 2025 at 12:14 PM
In our work, we focus on the task of question-answering as it is the closest proxy to the ways in which users access knowledge from LLM-based chatbots. 2/7
May 23, 2025 at 12:14 PM
More and more researchers have been acknowledging the issues around intransparent documentation practices and lacks of construct validity. Our paper focuses on yet another issue and presents a systematic analysis of the social bias inherent in AI performance benchmarks. 1/7
May 23, 2025 at 12:13 PM