Ivan Kartáč
ivankartac.bsky.social
Ivan Kartáč
@ivankartac.bsky.social
PhD student @ Charles University. Researching evaluation and explainability of reasoning in language models.
OpeNLGauge comes in two variants: a prompt-based ensemble and a smaller fine-tuned model, both built exclusively on open-weight LLMs (including training data!).

Thanks @tuetschek.bsky.social and @mlango.bsky.social!
August 23, 2025 at 4:39 PM
We introduce an explainable metric for evaluating a wide range of natural language generation tasks, without any need for reference texts. Given an evaluation criterion, the metric provides fine-grained assessments of the output by highlighting and explaining problematic spans in the text.
August 23, 2025 at 4:37 PM
Reposted by Ivan Kartáč
Slides and links to papers at bit.ly/mlprague25-od 🤓
Ondrej Dusek MLPrague 2025
Evaluating LLM outputs with humans and LLMs Ondřej Dušek MLPrague 30 April 2025 These slides: https://bit.ly/mlprague25-od
bit.ly
May 2, 2025 at 7:25 PM
Reposted by Ivan Kartáč
Today, @tuetschek.bsky.social shared the work of his team on evaluating LLM text generation with both human annotation frameworks and LLM-based metrics. Their approach tackles the benchmark data leakage problem and how to get unseen data for unbiased LLM testing.
April 30, 2025 at 12:02 PM