Oxford Internet Institute
oii.ox.ac.uk
Oxford Internet Institute
@oii.ox.ac.uk
The Oxford Internet Institute (OII) is a multidisciplinary research and teaching department at the University of Oxford, that collectively helps shape the development of our digital world for the public good.
Find out more about the study 'Measuring What Matters: Construct Validity in Large Language Model Benchmarks', accepted for publication in the upcoming NeurIPS 2025 conference proceedings. oii.ox.ac.uk/news-events/... 2/2
oii.ox.ac.uk
November 11, 2025 at 2:39 PM
Find out more about the study 'Measuring What Matters: Construct Validity in Large Language Model Benchmarks', accepted for publication in the upcoming NeurIPS 2025 conference proceedings. oii.ox.ac.uk/news-events/... 2/2
oii.ox.ac.uk
November 11, 2025 at 2:36 PM
“You need to really take it with a grain of salt when you hear things like ‘a model achieves Ph.D. level intelligence,’” said Andrew Bean, lead author of the study and DPhil student. “We’re not sure that those measurements are being done especially well.”

www.nbcnews.com/tech/tech-ne...
AI's capabilities may be exaggerated by flawed tests, according to new study
A study from the Oxford Internet Institute analyzed 445 tests used to evaluate AI models.
www.nbcnews.com
November 6, 2025 at 11:47 AM
Researchers from the OII, including @aboxiwu.bsky.social and Zoe Hawkins, in collaboration with Prof. Vili Lehdonvirta and the OECD's AI Policy Observatory, have co-authored a working paper and blog post on how to measure domestic public cloud compute availability for AI:

oecd.ai/en/wonk/the-...
The geography of AI compute: Mapping what is available and where
Countries count AI compute infrastructure as a strategic asset without systematically tracking its distribution, availability and access. A new OECD Working Paper presents a methodology to help fill t...
oecd.ai
November 6, 2025 at 10:27 AM
Supervisors: Professor Helen Margetts and Professor Phil Howard, Oxford Internet Institute, University of Oxford.

Examiners: Professor Ralph Schroeder, Oxford Internet Institute, University of Oxford, and Professor Kate Dommett, The University of Sheffield.
November 6, 2025 at 10:08 AM
We particularly welcome applications from candidates with a research profile in fields such as computational social science, political science, social data science, or related disciplines. Find out more and apply today! www.oii.ox.ac.uk/people/vacan...
November 4, 2025 at 2:13 PM
Reposted by Oxford Internet Institute
Read about our work on oxrml.com/measuring-wh...
Measuring what Matters
Construct Validity in Large Language Model Benchmarks
oxrml.com
November 4, 2025 at 12:03 PM
With thanks to the team of international researchers including EPFL, Stanford, the Technical University of Munich, @ucberkeleyofficial.bsky.social , the UK AI Security Institute, the Weizenbaum Institute and Yale. 4/4
November 4, 2025 at 11:30 AM
The researchers find that many of these benchmarks are built on unclear definitions or weak analytical methods, making it difficult to draw reliable conclusions about AI progress, capabilities or safety. Download the paper: oxrml.com/measuring-wh... 3/4
November 4, 2025 at 11:30 AM
In their new paper, 'Measuring What Matters: Construct Validity in Large Language Model Benchmarks', accepted for publication in the upcoming NeurIPS conference proceedings, researchers review 445 AI benchmarks – the standardised evaluations used to compare and rank AI systems. 2/4
November 4, 2025 at 11:30 AM