Lightnews — Scholar-powered news

Shiwali Mohan

@shiwali.bsky.social

Posts Replies Media Videos

Shiwali Mohan

@shiwali.bsky.social

Its a preliminary study but it shows how we can make #AI #ML evaluations more informative; beyond benchmarks curated with minimal insights about what a useful questions is and what an appropriate answer looks like. (8/8)

📖 Paper: arxiv.org/abs/2402.00234
🎥Talk: drive.google.com/file/d/1m79W...

Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems

Generative AI systems such as ChatGPT and Claude are built upon language models that are typically evaluated for accuracy on curated benchmark datasets. Such evaluation paradigms measure predictive an...

arxiv.org

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

🤖 Measured how #GenAI systems did; not only in terms of correctness but also how similar they were to an expert answering the same question. (7/8)

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

📥 Curated an evaluation question set from observed interactions. The set contains real questions asked by participants as they were attempting to do a specific task. Such datasets are critical to measuring if an #AI system is producing responses that are useful. (6/8)

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

🤕👩‍⚕️ Studied how people interact with the expert if they were available. This uncovered specific needs people have as they make sense of data and also, how an expert addresses those needs. (5/8)

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

🏥 Identified a specific usecase in which people need support from an expert but the expert is not easily accessible; understanding medical scans and reports in order to make good decisions about your treatment. (4/8)

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

In our most recent paper, we explore an evaluation approach for #GenAI #GenerativeAI systems. Here are the steps we followed - (3/8)

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

As a science, we have to adopt rigorous evaluations that identify what #IntelligentSystem #AIAgent #Agent behavior should be & measure if it works as intended. Move beyond a 𝘱𝘳𝘰𝘣𝘭𝘦𝘮-𝘢𝘨𝘯𝘰𝘴𝘵𝘪𝘤 metric (accuracy) on a 𝘵𝘢𝘴𝘬-𝘢𝘨𝘰𝘯𝘴𝘵𝘪𝘤 benchmark. Adopt practices from #HCI, #psychology, #economics. (2/8)

March 24, 2025 at 7:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news