navitas.bsky.social
navitas.bsky.social
@navitas.bsky.social
Reposted by navitas.bsky.social
Today, @tuetschek.bsky.social shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.
April 30, 2025 at 12:02 PM