Alex Gill
agill32.bsky.social
Alex Gill
@agill32.bsky.social
NLP researcher at U of U
Reposted by Alex Gill
Folks, I don’t know how it’s possible, but it gets funnier.
November 21, 2025 at 3:19 PM
I'll be in Suzhou 🇨🇳 at #EMNLP this week presenting "What has been Lost with Synthetic Evaluation?" done with @anamarasovic.bsky.social & @lasha.bsky.social! 🎉

📍Findings Session 1 - Hall C
📅 Wed, November 5, 13:00 - 14:00

arxiv.org/abs/2505.22830
November 3, 2025 at 11:03 AM
Reposted by Alex Gill
🧠 Can large language models build the very benchmarks used to evaluate them?
In “What Has Been Lost with Synthetic Evaluation”, Ana Marasović (@anamarasovic.bsky.social) and collaborators ask what happens when LLMs start generating the datasets used to test their reasoning. (1/6🧵)
October 20, 2025 at 4:01 PM
𝐖𝐡𝐚𝐭 𝐇𝐚𝐬 𝐁𝐞𝐞𝐧 𝐋𝐨𝐬𝐭 𝐖𝐢𝐭𝐡 𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧?

(arxiv.org/abs/2505.22830)

I'm happy to announce that the preprint release of my first project is online! Developed with the amazing support of @lasha.bsky.social & @anamarasovic.bsky.social
What Has Been Lost with Synthetic Evaluation?
Large language models (LLMs) are increasingly used for data generation. However, creating evaluation benchmarks raises the bar for this emerging paradigm. Benchmarks must target specific phenomena, pe...
arxiv.org
June 4, 2025 at 10:24 PM