https://serinachang5.github.io/
@msftresearch.bsky.social, @ashtonanderson.bsky.social and @jakehofman.bsky.social! 9/
ChatBench: huggingface.co/datasets/mic...
Paper: arxiv.org/abs/2504.07114
@msftresearch.bsky.social, @ashtonanderson.bsky.social and @jakehofman.bsky.social! 9/
ChatBench: huggingface.co/datasets/mic...
Paper: arxiv.org/abs/2504.07114
What happens when a static benchmark comes to life? ✨ Introducing ChatBench, a large-scale user study where we *converted* MMLU questions into thousands of user-AI conversations. Then, we trained a user simulator on ChatBench to generate user-AI outcomes on unseen questions. 1/ 🧵
What happens when a static benchmark comes to life? ✨ Introducing ChatBench, a large-scale user study where we *converted* MMLU questions into thousands of user-AI conversations. Then, we trained a user simulator on ChatBench to generate user-AI outcomes on unseen questions. 1/ 🧵