Ruchira Dhar
eclecticruchira.bsky.social
Ruchira Dhar
@eclecticruchira.bsky.social
PhD Fellow in AI Evals @UniCopenhagen.
Interested in AI Policy/ AI Ethics/ Responsible AI.
Community Lead @cohereforai.bsky.social
Site: ruchiradhar.github.io
#nlproc #llm #ai
A small but meaningful step toward an evaluation culture that values clarity over marketing. Read the paper: papers.ssrn.com/sol3/papers...
(feat. EvalCards for Qwen & Gemini!).

Thrilled to share this work—hope this leads to more transparent, accessible AI releases 🚀
November 13, 2025 at 4:08 PM
📌 Why it matters:

As LLM adoption grows, we need clear, honest, and comparable evaluation reporting. EvalCards help enable:
1️⃣ Better model selection
2️⃣ Smoother regulatory compliance
3️⃣ A more transparent AI ecosystem
November 13, 2025 at 4:08 PM
💡 What we propose:

EvalCards to report model evaluations. They’re designed to be:
✅ Easy to write
✅ Easy to understand
✅ Hard to miss
Each card summarizes capabilities, safety tests, metrics, prompts & key notes. Here’s a sample for an OLMo model from @allen_ai!
November 13, 2025 at 4:08 PM
AI evaluation reporting has 3 major problems:

⚠️ Reproducibility (missing metrics/prompting)
⚠️ Accessibility (details scattered everywhere)
⚠️ Governance (inconsistent disclosures + rising AI regulations)
November 13, 2025 at 4:08 PM
Yes, this! And sometimes, I think about how we never really needed AI - like we wouldn't have died without it.
August 25, 2025 at 7:37 AM