Lexin Zhou
banner
lexinzhou.bsky.social
Lexin Zhou
@lexinzhou.bsky.social
Research Intern at Microsoft | Working on AI Evaluation, Social Computing and NLP | Incoming PhD candidate for Fall 2025
https://lexzhou.github.io
Thrilled to share this accessible MSR blogpost that summarizes our latest work on building a Science of AI Evaluation, where we manage to both reliably explain and predict success/failure of general-purpose AI models on new, unforeseen tasks and environments!
May 13, 2025 at 4:08 AM
🚨To continuously foster conceptual & technical innovations for a science of AI Evaluation:

An open collaborative community is initiated by Leverhulme Centre for the Future of Intelligence, to adopt and extend our novel methodology.

Join us: kinds-of-intelligence-cfi.github.io/ADELE!
March 14, 2025 at 3:37 AM
Reposted by Lexin Zhou
To better understand why this matters in high-stakes contexts, you can also check out our previous work. We discuss why predicting model performance (e.g., failures on out-of-distribution languages in machine translation) remains essential in legal contexts.
March 11, 2025 at 8:07 PM
Reposted by Lexin Zhou
Understanding and extrapolating benchmark results will become essential for effective policymaking and informing users. New work identifies indicators that have high predictive power in modeling LLM performance. Excited for it to be out!
March 11, 2025 at 8:07 PM
Thrilled to unlock AI Evaluation with explanatory and predictive power through general ability scales!

With a new methodology to
-Explain what common benchmarks really measure
-Extract explainable ability profiles of AI systems
-Predict performance for new task instances, in & out-of-distribution
🧵
March 11, 2025 at 6:12 PM