Lightnews — Scholar-powered news

Katya Artemova

@katya-art.bsky.social

#NLProc researcher @ Toloka AI, ex LMU, ex HSE

Low resource languages | culture-aware LLMs | machine-generated test detection

Posts Replies Media Videos

Katya Artemova

@katya-art.bsky.social

Poster Session 8 - R&E; Hall 3, May 2, 11-12:30

April 29, 2025 at 7:52 PM

Katya Artemova

@katya-art.bsky.social

Check out Beemo: huggingface.co/datasets/tol...

toloka/beemo · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

April 29, 2025 at 7:50 PM

Katya Artemova

@katya-art.bsky.social

Thank you for your answer! I co-authored RuBLIMP, so I was curious of your take on this and whether you had the same experience — when we first run the experiments without decontamination the results seemed super inflated. But it seems like for low resource languages it’s not the case than.

April 21, 2025 at 7:32 PM

Katya Artemova

@katya-art.bsky.social

Hi! Great work and thanks for sharing! I wonder if there is a chance all these LLMs have been trained on the UD data? Aren’t they contaminated?

April 17, 2025 at 5:47 PM

Katya Artemova

@katya-art.bsky.social

Check out our repo: github.com/eloquent-lab... !

github.com

February 11, 2025 at 7:44 PM

Katya Artemova

@katya-art.bsky.social

Co-organizers: Akim Tsvigun (University of Amsterdam and Nebius), Dominik Schlechtweg (University of Stuttgart), with Natalia Fedorova, Boris Obmoroshev, Sergei Tilga, Ekaterina Artemova, and Konstantin Chernyshev from Toloka

January 7, 2025 at 1:10 PM

Katya Artemova

@katya-art.bsky.social

Such a good thread idea!

arxiv.org/abs/2305.10284

"Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks" by Anas Himmi et al. They explore ranking LLMs is required where some scores for certain tasks are missing. The Borda count constructs reliable leaderboards.

Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks

The evaluation of natural language processing (NLP) systems is crucial for advancing the field, but current benchmarking approaches often assume that all systems have scores available for all tasks, w...

arxiv.org

November 27, 2024 at 10:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news