Tom Aarsen
tomaarsen.com
Tom Aarsen
@tomaarsen.com
Sentence Transformers, SetFit & NLTK maintainer
Machine Learning Engineer at 🤗 Hugging Face
That choice ended up being very valuable for the embedding & information retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.

I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
October 22, 2025 at 1:04 PM
I would like to thank the @ukplab.bsky.social, and especially Nils Reimers and @igurevych.bsky.social, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to lead the project.

🧵
October 22, 2025 at 1:04 PM
We see an increasing desire from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.

🧵
October 22, 2025 at 1:04 PM
Today, the @ukplab.bsky.social is transferring the project to @hf.co.

Sentence Transformers will remain a community-driven, open-source project, with the same Apache 2.0 license as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged!

🧵
October 22, 2025 at 1:04 PM
Read our full announcement for more details and quotes from UKP and Hugging Face leadership: huggingface.co/blog/sentenc...

🧵
Sentence Transformers is joining Hugging Face!
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 22, 2025 at 1:04 PM
Check out the blogpost here: huggingface.co/blog/isaacch...

Super nice work by the MTEB core, this has been in the works for a very long time.
Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text
A Blog post by Isaac Chung on Hugging Face
huggingface.co
October 20, 2025 at 2:36 PM
I quite like this one! Would like to see its code & model released.
October 16, 2025 at 7:01 AM
Hahaha, or somewhat less accidental overfitting 😉
October 1, 2025 at 4:20 PM
Check out the leaderboard here: mteb-leaderboard.hf.space?benchmark_na...

I'm very proud of everyone who worked on this. It's been a nice collaboration between Voyage AI by @mongodb.bsky.social and the core MTEB team.
Gradio
Click to try out the app!
mteb-leaderboard.hf.space
October 1, 2025 at 3:52 PM
The benchmark is multilingual (20 languages) and covers various domains (general, legal, healthcare, code, etc.), and it's already available on MTEB right now.

There's also an English only version available.

🧵
October 1, 2025 at 3:52 PM
With RTEB, we can see the differences between public and private benchmarks, displayed in this figure here.

This would be an indication of whether the model is capable of generalizing nicely.

🧵
October 1, 2025 at 3:52 PM
In short: RTEB uses a hybrid approach with both open and private datasets to measure generalization, preventing overfitting to test sets.

The picture at the top of this thread is what we commonly see on MTEB: models with lower zero-shot score higher, but generalize worse.

🧵
October 1, 2025 at 3:52 PM
And more! Check out the full release notes here: github.com/UKPLab/sente...

Looking forward to bigger changes coming soon!
Release v5.1.1 - Explicit incorrect arguments, fixes for multi-GPU, evaluator, and hard negative · UKPLab/sentence-transformers
This patch makes Sentence Transformers more explicit with incorrect arguments and introduces some fixes for multi-GPU processing, evaluators, and hard negatives mining. Install this version with # ...
github.com
September 22, 2025 at 11:42 AM
- Add FLOPS calculation to SparseEncoder evaluators for determining a performance/speed tradeoff
- Add support for Knowledgeable Passage Retriever (KPR) models
- Multi-GPU processing with 'model.encode()' now works with 'convert_to_tensor'

🧵
September 22, 2025 at 11:42 AM
- `model.encode()` now throws an error if an unused keyword argument is passed
- a new `model.get_model_kwargs()` method for checking which custom model-specific keyword arguments are supported for this model

🧵
September 22, 2025 at 11:42 AM
Sounds like a great initiative. I'm looking forward to seeing it develop
September 10, 2025 at 12:25 PM
Or even read their paper: huggingface.co/papers/2509....
Paper page - mmBERT: A Modern Multilingual Encoder with Annealed Language Learning
Join the discussion on this paper page
huggingface.co
September 9, 2025 at 2:54 PM
And if you made it this far, just go read the blogpost! huggingface.co/blog/mmbert

🧵
mmBERT: ModernBERT goes Multilingual
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
September 9, 2025 at 2:54 PM