Lightnews — Scholar-powered news

Nishant Balepur

@nbalepur.bsky.social

140 followers 190 following 19 posts

CS PhD Student. Trying to find that dog in me at UMD. Babysitting (aligning) + Bullying (evaluating) LLMs

nbalepur.github.io

Posts Replies Media Videos

Nishant Balepur

@nbalepur.bsky.social

🎉🎉 Excited to have two papers accepted to #ACL2025!

Our first paper designs a preference training method to boost LLM personalization 🎨
While the second outlines our position on why MCQA evals are terrible and how to make them better 🙏

Grateful for amazing collaborators!

May 21, 2025 at 6:34 PM

Nishant Balepur

@nbalepur.bsky.social

Had a great time presenting my research on building more helpful QA systems @imperialcollegeldn.bsky.social! Thank you @joestacey.bsky.social for letting me invite myself 🫶

And loved visiting London+Edinburgh this week, hope to be back soon! 🙏

March 21, 2025 at 12:07 PM

Nishant Balepur

@nbalepur.bsky.social

Lastly, we discuss persistent flaws of LLMs when running MCQA:
🔩Robustness Issues
🌎 Biases
💬 Unfaithful Explanations

Many of our previous solutions to MCQA's format/datasets can better address or evaluate these issues 😁

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

Two of the most pressing and promising dataset improvements include:
📋 Writing MCQs using educators' rubrics to improve question quality
🧑‍🎓 Designing MCQs hard for models but easy for humans (adversarial), rather than creating needlessly impossible/obscure questions

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

So what's better? ❤️‍🩹

We explore two possible improvements:
1️⃣ Constructed Response (short-form QA)
2️⃣ Explanation MCQA (justifying answers)

Both are grounded in education research, better align with LLM use cases, and test deeper knowledge levels versus MCQA ⭐️

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

First, we show MCQA is flawed as a standardized LLM eval format because it often fails to:
🔒 Test subjectivity and generation
👥 Align with real LLM use cases
🧠 Assess knowledge (based on education research)

When's the last time you asked ChatGPT to answer an MCQ? 🤔

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

We break our position into three points:
1️⃣ Flaws in MCQA’s format
2️⃣ Issues in datasets
3️⃣ Weaknesses in how LLMs run MCQA

The good news? Best practices in education made for effective student testing can help fix these 🧑‍🏫

Yet, we rarely use these insights in LLM evaluation 🤦

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

🚨 New Position Paper 🚨

Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠

Here's why MCQA evals are broken, and how to fix them 🧵

February 24, 2025 at 9:04 PM

Nishant Balepur

@nbalepur.bsky.social

Excited to share 2 papers at #NAACL2025 main!

📄✍️ MoDS: Multi-Doc Summarization for Debatable Queries (Adobe intern work, coming soon!)
🤔❓Reverse QA: LLMs struggle with the simple task of giving questions for answers

Grateful for all my collaborators 😁

January 31, 2025 at 2:32 PM

Nishant Balepur

@nbalepur.bsky.social

Manifesting some good luck for my experiment running tonight 🤞

Best of luck to anyone submitting tmrw :)

December 15, 2024 at 5:03 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news