weidingerlaura.bsky.social
@weidingerlaura.bsky.social
📣 New paper! The field of AI research is increasingly realising that benchmarks are very limited in what they can tell us about AI system performance and safety. We argue and lay out a roadmap toward a *science of AI evaluation*: arxiv.org/abs/2503.05336 🧵
LinkedIn
This link will take you to a page that’s not on LinkedIn
lnkd.in
March 20, 2025 at 1:28 PM