Low resource languages | culture-aware LLMs | machine-generated test detection
arxiv.org/abs/2305.10284
"Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks" by Anas Himmi et al. They explore ranking LLMs is required where some scores for certain tasks are missing. The Borda count constructs reliable leaderboards.
arxiv.org/abs/2305.10284
"Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks" by Anas Himmi et al. They explore ranking LLMs is required where some scores for certain tasks are missing. The Borda count constructs reliable leaderboards.