Interested in AI Policy/ AI Ethics/ Responsible AI.
Community Lead @cohereforai.bsky.social
Site: ruchiradhar.github.io
#nlproc #llm #ai
EvalCards to report model evaluations. They’re designed to be:
✅ Easy to write
✅ Easy to understand
✅ Hard to miss
Each card summarizes capabilities, safety tests, metrics, prompts & key notes. Here’s a sample for an OLMo model from @allen_ai!
EvalCards to report model evaluations. They’re designed to be:
✅ Easy to write
✅ Easy to understand
✅ Hard to miss
Each card summarizes capabilities, safety tests, metrics, prompts & key notes. Here’s a sample for an OLMo model from @allen_ai!
Excited to share our new paper, “EvalCards: A Framework for Standardized Evaluation Reporting”is accepted for presentation at the @EurIPSConf workshop on "The Science of Benchmarking and Evaluating AI" .
Excited to share our new paper, “EvalCards: A Framework for Standardized Evaluation Reporting”is accepted for presentation at the @EurIPSConf workshop on "The Science of Benchmarking and Evaluating AI" .