Interested in AI Policy/ AI Ethics/ Responsible AI.
Community Lead @cohereforai.bsky.social
Site: ruchiradhar.github.io
#nlproc #llm #ai
(feat. EvalCards for Qwen & Gemini!).
Thrilled to share this work—hope this leads to more transparent, accessible AI releases 🚀
(feat. EvalCards for Qwen & Gemini!).
Thrilled to share this work—hope this leads to more transparent, accessible AI releases 🚀
As LLM adoption grows, we need clear, honest, and comparable evaluation reporting. EvalCards help enable:
1️⃣ Better model selection
2️⃣ Smoother regulatory compliance
3️⃣ A more transparent AI ecosystem
As LLM adoption grows, we need clear, honest, and comparable evaluation reporting. EvalCards help enable:
1️⃣ Better model selection
2️⃣ Smoother regulatory compliance
3️⃣ A more transparent AI ecosystem
EvalCards to report model evaluations. They’re designed to be:
✅ Easy to write
✅ Easy to understand
✅ Hard to miss
Each card summarizes capabilities, safety tests, metrics, prompts & key notes. Here’s a sample for an OLMo model from @allen_ai!
EvalCards to report model evaluations. They’re designed to be:
✅ Easy to write
✅ Easy to understand
✅ Hard to miss
Each card summarizes capabilities, safety tests, metrics, prompts & key notes. Here’s a sample for an OLMo model from @allen_ai!
⚠️ Reproducibility (missing metrics/prompting)
⚠️ Accessibility (details scattered everywhere)
⚠️ Governance (inconsistent disclosures + rising AI regulations)
⚠️ Reproducibility (missing metrics/prompting)
⚠️ Accessibility (details scattered everywhere)
⚠️ Governance (inconsistent disclosures + rising AI regulations)