https://serinachang5.github.io/
Post your questions for panelists here: forms.gle/m2mXY3xFafAX...
What happens when a static benchmark comes to life? ✨ Introducing ChatBench, a large-scale user study where we *converted* MMLU questions into thousands of user-AI conversations. Then, we trained a user simulator on ChatBench to generate user-AI outcomes on unseen questions. 1/ 🧵
What happens when a static benchmark comes to life? ✨ Introducing ChatBench, a large-scale user study where we *converted* MMLU questions into thousands of user-AI conversations. Then, we trained a user simulator on ChatBench to generate user-AI outcomes on unseen questions. 1/ 🧵
@serinachang5.bsky.social @ashtonanderson.bsky.social
serinachang5.github.io/assets/files...
huggingface.co/datasets/mic...
@serinachang5.bsky.social @ashtonanderson.bsky.social
serinachang5.github.io/assets/files...
huggingface.co/datasets/mic...