@euractiv.com @maximilian-henning.bsky.social
www.euractiv.com/news/stop-ov...
@euractiv.com @maximilian-henning.bsky.social
www.euractiv.com/news/stop-ov...
1️⃣ Lack of statistical rigour. Only 16% of reviewed benchmarks used statistical tests in their comparisons. Statistical tests are essential to reliable science: without them, many AI system “wins” could be simply due to random chance.
1️⃣ Lack of statistical rigour. Only 16% of reviewed benchmarks used statistical tests in their comparisons. Statistical tests are essential to reliable science: without them, many AI system “wins” could be simply due to random chance.