Lightnews — Scholar-powered news

Tom Aarsen

@tomaarsen.com

That choice ended up being very valuable for the embedding & information retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

I would like to thank the @ukplab.bsky.social, and especially Nils Reimers and @igurevych.bsky.social, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to lead the project.

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

We see an increasing desire from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

Today, the @ukplab.bsky.social is transferring the project to @hf.co.

Sentence Transformers will remain a community-driven, open-source project, with the same Apache 2.0 license as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged!

🧵

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

Read our full announcement for more details and quotes from UKP and Hugging Face leadership: huggingface.co/blog/sentenc...

🧵

Sentence Transformers is joining Hugging Face!

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

October 22, 2025 at 1:04 PM

Tom Aarsen

@tomaarsen.com

Check out the blogpost here: huggingface.co/blog/isaacch...

Super nice work by the MTEB core, this has been in the works for a very long time.

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

A Blog post by Isaac Chung on Hugging Face

huggingface.co

October 20, 2025 at 2:36 PM

Tom Aarsen

@tomaarsen.com

I quite like this one! Would like to see its code & model released.

October 16, 2025 at 7:01 AM

Tom Aarsen

@tomaarsen.com

Hahaha, or somewhat less accidental overfitting 😉

October 1, 2025 at 4:20 PM

Tom Aarsen

@tomaarsen.com

Check out the leaderboard here: mteb-leaderboard.hf.space?benchmark_na...

I'm very proud of everyone who worked on this. It's been a nice collaboration between Voyage AI by @mongodb.bsky.social and the core MTEB team.

Gradio

Click to try out the app!

mteb-leaderboard.hf.space

October 1, 2025 at 3:52 PM

Tom Aarsen

@tomaarsen.com

The benchmark is multilingual (20 languages) and covers various domains (general, legal, healthcare, code, etc.), and it's already available on MTEB right now.

There's also an English only version available.

🧵

October 1, 2025 at 3:52 PM

Tom Aarsen

@tomaarsen.com

With RTEB, we can see the differences between public and private benchmarks, displayed in this figure here.

This would be an indication of whether the model is capable of generalizing nicely.

🧵

October 1, 2025 at 3:52 PM

Tom Aarsen

@tomaarsen.com

In short: RTEB uses a hybrid approach with both open and private datasets to measure generalization, preventing overfitting to test sets.

The picture at the top of this thread is what we commonly see on MTEB: models with lower zero-shot score higher, but generalize worse.

🧵

October 1, 2025 at 3:52 PM

Tom Aarsen

@tomaarsen.com

Read our full blogpost: huggingface.co/blog/rteb

🧵

Introducing RTEB: A New Standard for Retrieval Evaluation

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

October 1, 2025 at 3:52 PM

Tom Aarsen

@tomaarsen.com

And more! Check out the full release notes here: github.com/UKPLab/sente...

Looking forward to bigger changes coming soon!

Release v5.1.1 - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative · UKPLab/sentence-transformers

This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining. Install this version with # ...

github.com

September 22, 2025 at 11:42 AM

Tom Aarsen

@tomaarsen.com

- Add FLOPS calculation to SparseEncoder evaluators for determining a performance/speed tradeoff
- Add support for Knowledgeable Passage Retriever (KPR) models
- Multi-GPU processing with 'model.encode()' now works with 'convert_to_tensor'

🧵

September 22, 2025 at 11:42 AM

Tom Aarsen

@tomaarsen.com

- `model.encode()` now throws an error if an unused keyword argument is passed
- a new `model.get_model_kwargs()` method for checking which custom model-specific keyword arguments are supported for this model

🧵