Lightnews — Scholar-powered news

EvalEval Coalition

@eval-eval.bsky.social

We are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.

https://evalevalai.com/

Posts Replies Media Videos

EvalEval Coalition

@eval-eval.bsky.social

🚨 AI keeps scaling, but social impact evaluations aren’t–and the data proves it 🚨

Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)

November 13, 2025 at 1:59 PM

EvalEval Coalition

@eval-eval.bsky.social

🚨 EvalEval is back - now in San Diego!🚨

🧠 Join us for the 2025 Workshop on "Evaluating AI in Practice Bridging Statistical Rigor, Sociotechnical Insights, and Ethical Boundaries" (Co-hosted with UKAISI)

📅 Dec 8, 2025
📝 Abstract due: Nov 20, 2025

Details below! ⬇️
evalevalai.com/events/works...

evalevalai.com

November 6, 2025 at 9:19 PM

EvalEval Coalition

@eval-eval.bsky.social

✨ Weekly AI Evaluation Paper Spotlight ✨

🤔Is it time to move beyond static tests and toward more dynamic, adaptive, and model-aware evaluation?

🖇️ "Fluid Language Model Benchmarking" by
@valentinhofmann.bsky.social et. al introduces a dynamic benchmarking method for evaluating language models

October 31, 2025 at 3:47 PM

EvalEval Coalition

@eval-eval.bsky.social

🌟 Weekly AI Evaluation Spotlight 🌟

🤖 Did you know malicious actors can exploit trust in AI leaderboards to promote poisoned models in the community?

This week's paper 📜"Exploiting Leaderboards for Large-Scale Distribution of Malicious Models" by @iamgroot42.bsky.social explores this!

October 24, 2025 at 4:44 PM

EvalEval Coalition

@eval-eval.bsky.social

✨Weekly AI Evaluation Paper Spotlight✨

🕵️ Is benchmark noise and label errors masking the true fragility of LLMs?

🖇️"Do Large Language Model Benchmarks Test Reliability?" - This paper by @joshvendrow.bsky.social provides insights!

October 17, 2025 at 4:15 PM

EvalEval Coalition

@eval-eval.bsky.social

🚨New blog: The AI Evaluation Chart Crisis 📝

From misleading bar heights to missing error bars, recent model launches have sparked debate on AI evals. In our new blogpost, we dig into what’s broken, why it matters and how they should be presented 👇

evalevalai.com/documentatio...

The AI Evaluation Chart Crisis

Charts used to showcase performance demonstrate broader issues in the AI evaluation ecosystem: a lack of balance between competitive benchmarking and statistical rigor.

evalevalai.com

August 11, 2025 at 7:20 PM

EvalEval Coalition

@eval-eval.bsky.social

🚨 AI Evals Crisis: Officially kicking off the Eval Science Workstream 🚨

We’re building a shared scientific foundation for evaluating AI systems, one that’s rigorous, open, and grounded in real-world & cross-disciplinary best practices👇 (1/2)

Read our new blog post: tinyurl.com/evalevalai

The Science of Evaluations: Workstream Kickoff Post

Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.

tinyurl.com

July 16, 2025 at 5:17 PM

EvalEval Coalition

@eval-eval.bsky.social

Join us for the Eval Eval Coalition Social at @facct.bsky.social tomorrow Tuesday June 24th from 4-4:30 pm during the coffee break! We would love to have you join us and we look forward to seeing you there!! #FAccT2025 #EvalEval

June 23, 2025 at 2:41 PM

EvalEval Coalition

@eval-eval.bsky.social

Introducing the Eval Eval Coalition! ✨
We are a community of researchers dedicated to designing, developing, and deploying better evaluations (1/3)

June 22, 2025 at 7:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news