Lightnews — Scholar-powered news

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

180 followers 330 following 38 posts

Postdoc at Aarhus University working on developing and evaluating representations of language and more

Maintain and develop: MTEB, ScandEval, tomsup, DaCy, etc.

#NLPProc

Posts Replies Media Videos

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Long way to go 😅

October 18, 2025 at 8:23 PM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This work is joint work with Dan Sattrup Nielsen and Peter Schneider-Kamp.

We share both the code and update the leaderboard with new releases:

🔗 Website: euroeval.com/leaderboards...
  📄 Paper: arxiv.org/abs/2406.13469 
👩‍💻 GitHub: github.com/EuroEval/Eur...

🇪🇺 European - EuroEval

euroeval.com

March 11, 2025 at 10:12 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

I especially like our dashboard, which allows the comparison of models of interest across target languages.

March 11, 2025 at 10:10 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

It notably includes both high, mid, and highly low-resource languages, which allow examining generalization even in areas where the available training data is minuscule in comparison to English:

March 11, 2025 at 10:09 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This work is joint work with Dan Sattrup Nielsen and Peter Schneider-Kamp

📄 Paper: euroeval.com/leaderboards...
🔗 Website: euroeval.com
👩‍💻 GitHub: github.com/EuroEval/Eur...

🇪🇺 European - EuroEval

euroeval.com

March 11, 2025 at 10:04 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

One of the features that I really like is the ability to compare specific models of interest across target languages. Here, we show an example of Dutch, English, and German, but you can try out any combination:

euroeval.com/extras/radia...

March 11, 2025 at 10:01 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This notably includes low-resource languages such as Faroese and Icelandic, which are great for checking generalizations to languages in which the available data is minuscule

March 11, 2025 at 10:00 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Find out more or check the leaderboard here:

📑 Paper: arxiv.org/abs/2502.135...
📈 Leaderboard: huggingface.co/spaces/mteb/...
👩‍💻 GitHub: github.com/embeddings-b...

GitHub - embeddings-benchmark/mteb: MTEB: Massive Text Embedding Benchmark

MTEB: Massive Text Embedding Benchmark. Contribute to embeddings-benchmark/mteb development by creating an account on GitHub.

github.com

February 20, 2025 at 10:04 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

I would especially like to thank the managing team Isaac Chung, @imeneker.bsky.social, Márton Kardos, Roman Solomatin, @tomaarsen.com, Chenghao Xiao, @vaibhavadlakha.bsky.social, @orionweller.bsky.social, Siva Reddy. and @muennighoff.bsky.social, who all have done fantastic work 🙏

February 20, 2025 at 10:02 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This work resulted from a large-scale collaboration, and I would like to thank all of the authors and contributors on MTEB.

February 20, 2025 at 10:00 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This new release also comes with a whole new leaderboard, where it is possible to build benchmarks tailored to your use case using in-depth task selection.

February 20, 2025 at 9:59 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Such an extensive collection of tasks comes with a considerable computational cost. Thus, we have added multiple optimizations to ensure the benchmark is accessible and quick to run. We see notable speedups for the English benchmark while maintaining relative rank.

February 20, 2025 at 9:58 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Examining this claim, we see that the Mistral-derived models indeed perform better in languages on which the models are believed to be trained:

February 20, 2025 at 9:58 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

We use this collection of tasks to propose multiple benchmarks for multilingual, code, European and Indic languages, and many more.

We find that smaller multilingual models (~500M) outperform notably larger 7B models, likely due to a limited multilingual pre-training.

February 20, 2025 at 9:57 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Would love to hear more, do you intend to expand existing metadata or utilise if the pretraining?

December 27, 2024 at 8:33 PM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Desværre ikke

December 21, 2024 at 12:57 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news