Lightnews — Scholar-powered news

Benjamin Warner

@benjaminwarner.dev

Can all encoders be instruction-tuned? Can we replicate ModernBERT's results with an older model like RoBERTa or peer model like GTE-en-MLM?

No. And it's not close.

February 10, 2025 at 6:13 PM

Benjamin Warner

@benjaminwarner.dev

When we finetune ModernBERT-Large-Instruct on task specific datasets, the generative MLM head is better or nearly equal to standard classification heads.

February 10, 2025 at 6:13 PM

Benjamin Warner

@benjaminwarner.dev

After instruction tuning on Flan, ModernBERT-Large-Instruct outperforms similarly sized LLMs on MMLU & MMLU-Pro, and achieves ~90 percent of Llama 3.2 1B's performance with ~65 percent fewer parameters.

February 10, 2025 at 6:13 PM

Benjamin Warner

@benjaminwarner.dev

One of the questions we debated while training ModernBERT was whether a modern trained encoder would unlock zero-shot reasoning using only it's generative head?

Spoilers: the answer is yes.

$from transformers import pipeline model_name = "answerdotai/ModernBERT-Large-Instruct" fill_mask = pipeline("fill-mask", model=model_name, tokenizer=model_name) text = """You will be given a question and options. Select the right answer. QUESTION: If (G, .) is a group such that (ab)^-1 = a^-1b^-1, for all a, b in G, then G is a/an CHOICES: - A: commutative semi group - B: abelian group - C: non-abelian group - D: None of these ANSWER: [unused0] [MASK]""" results = fill_mask(text) answer = results[0]["token_str"].strip() print(f"Predicted answer: {answer}") # Answer: B$

February 10, 2025 at 6:13 PM

Benjamin Warner

@benjaminwarner.dev

In addition to being the best retrieval model under 300M params on METB (without extra work), and top 10 for under 1B, here's a fun tidbit from Alibaba's GTE ModernBERT model card:

gte-modernbert-base beats gte-qwen1.5-7b on LoCo long context retrieval with 7B less parameters.

January 23, 2025 at 7:22 PM

Benjamin Warner

@benjaminwarner.dev

ModernBERT is officially released on Transformers v4.48.0. You no longer need to install from git to use.

If you are plugging ModernBERT into an existing encoder finetuning pipeline, try increasing the learning rate. We've found that ModernBERT tends to prefer a higher LR than older models.

Transformers v4.48.0: ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose, DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge, Moonshine

January 10, 2025 at 6:28 PM

Benjamin Warner

@benjaminwarner.dev

Second, we carefully designed ModernBERT's architecture run to efficiently across most common GPUs. Many common older models don't consider the hardware they will run on and are slower than they should be. Not so with ModernBERT.
(Full model sequence packing illustrated below)

December 22, 2024 at 6:12 AM

Benjamin Warner

@benjaminwarner.dev

How did we do it? First, we brought all the modern LLM architectural improvements to encoders, including alternating global & local attention, RoPE, and GeGLU layers, and added full model unpadding using Flash Attention for maximum performance (illustrated in the next post).

December 22, 2024 at 6:12 AM

Benjamin Warner

@benjaminwarner.dev

ModernBERT was designed from the ground up for speed and memory efficiency. ModernBERT is both faster and more memory efficient than every major encoder released since the original BERT.

December 22, 2024 at 6:12 AM

Benjamin Warner

@benjaminwarner.dev

ModernBERT-base is the first encoder to beat DeBERTaV3-base on GLUE. ModernBERT is also competitive or top scoring on single vector retrieval, ColBERT retrieval, and programming benchmarks.

December 22, 2024 at 6:12 AM

Benjamin Warner

@benjaminwarner.dev

This week we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.

December 22, 2024 at 6:12 AM

Benjamin Warner

@benjaminwarner.dev

I feel the need for speed.

December 13, 2024 at 9:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news