dpeskoff.bsky.social
@dpeskoff.bsky.social
Reposted
🚨 New Position Paper 🚨

Multiple choice evals for LLMs are simple and popular, but we know they are awful 😬

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? 🫠

Here's why MCQA evals are broken, and how to fix them 🧵
February 24, 2025 at 9:04 PM
Hello, World!
April 5, 2025 at 3:21 AM