Maharshi Gor
banner
maharshigor.bsky.social
Maharshi Gor
@maharshigor.bsky.social
PhD student @ Univ of Maryland
NLP, Question Answering, Human AI, LLMs
More at mgor.info
📝 Full paper link: arxiv.org/abs/2406.16342

TL;DR: We introduce AdvScore, a human-grounded metric to measure how "adversarial" a dataset really is—by comparing model vs. human performance. It helps build better, lasting benchmarks like AdvQA (proposed) that evolve with AI progress.
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness
Adversarial datasets should validate AI robustness by providing samples on which humans perform well, but models do not. However, as models evolve, datasets can become obsolete. Measuring whether a da...
arxiv.org
May 1, 2025 at 12:44 PM
Reposted by Maharshi Gor
The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

Despite hopes that explanations improve fairness, we see that when biases are hidden behind proxy features, explanations may not help.

Navita Goyal, Connor Baumler +al IUI’24
hal3.name/docs/daume23...
>
December 9, 2024 at 11:41 AM
Reposted by Maharshi Gor
Do great minds think alike? Investigating Human-AI Complementarity in QA

We use item response theory to compare the capabilities of 155 people vs 70 chatbots at answering questions, teasing apart complementarities; implications for design.

by Maharshi Gor +al EMNLP’24
hal3.name/docs/daume24...
>
December 12, 2024 at 10:41 AM
I used to like writefull when it was new and there nothing else better. But 🥲
December 12, 2024 at 4:07 PM
👋🏽 Hey! 🫡
November 11, 2024 at 5:21 AM