Lightnews — Scholar-powered news

Sebastian Bordt

@sbordt.bsky.social

470 followers 250 following 66 posts

Language models and interpretable machine learning. Postdoc @ Uni Tübingen.

https://sbordt.github.io/

Posts Replies Media Videos

Sebastian Bordt

@sbordt.bsky.social

www.inference.vc/we-may-be-su...

July 20, 2025 at 8:54 PM

Sebastian Bordt

@sbordt.bsky.social

During the last couple of years, we have read a lot of papers on explainability and often felt that something was fundamentally missing🤔

This led us to write a position paper (accepted at #ICML2025) that attempts to identify the problem and to propose a solution.

arxiv.org/abs/2402.02870
👇🧵

July 10, 2025 at 5:58 PM

Sebastian Bordt

@sbordt.bsky.social

🔧What are the reasons for the forgetting? We highlight one important factor: The weight decay parameter of AdamW. Concretely, we show that forgetting data contamination always occurs at least as fast as the decay of past gradients in AdamW.

July 8, 2025 at 6:45 AM

Sebastian Bordt

@sbordt.bsky.social

🧠 The mechanism? Forgetting dynamics! As the model contines training, texts that were seen earlier become less and less important. As we illustrate with OLMo-7B, this effect persists in fairly large models.

July 8, 2025 at 6:45 AM

Sebastian Bordt

@sbordt.bsky.social

🚀But modern LLMs are not Chinchilla-optimal, they are trained on significantly more tokens. And as the overall size of the training data increases, the impact of contamination starts to decrease.

July 8, 2025 at 6:44 AM

Sebastian Bordt

@sbordt.bsky.social

Have you ever wondered whether a few times of data contamination really lead to benchmark overfitting?🤔 Then our latest #ICML paper about the effect of data contamination on LLM evals might be for you!🚀

Paper: arxiv.org/abs/2410.03249
👇🧵

July 8, 2025 at 6:42 AM

Sebastian Bordt

@sbordt.bsky.social

I really like the new HTML preview on arxiv, but it somehow handles latex errors differently from PDF. I've been seeing lots of ICML error messages lately.

March 13, 2025 at 11:00 AM

Sebastian Bordt

@sbordt.bsky.social

can you draw me a dragon in tikz

February 28, 2025 at 10:50 AM

Sebastian Bordt

@sbordt.bsky.social

The chain of thought in DeepSeek-R1 is pretty impressive.

January 20, 2025 at 9:35 PM

Sebastian Bordt

@sbordt.bsky.social

What is the reason for the forgetting?

The phenomenon is complex and requires more investigation.

However, in large-scale training runs, weight decay plays an important role.

This leads to a fun little theory of example forgetting via cumulative weight decay - check the paper for details! :)

December 14, 2024 at 8:12 PM

Sebastian Bordt

@sbordt.bsky.social

The results for OLMo-7B are still preliminary and not yet in the pre-print. But you can find them on the poster!

December 14, 2024 at 8:09 PM

Sebastian Bordt

@sbordt.bsky.social

We then scale up our experiments by contaminating intermediate checkpoints of OLMo-1B and OLMo-7B.

Immediately after contamination, this leads to strong benchmark overfitting.

Surprisingly, as we continue training, almost all of the contamination is forgotten!

December 14, 2024 at 8:07 PM

Sebastian Bordt

@sbordt.bsky.social

At the same time, even 32x repeated contamination can be forgotten if the data is scaled beyond 5 times Chinchilla - the regime of many modern LLMs.

December 14, 2024 at 8:03 PM

Sebastian Bordt

@sbordt.bsky.social

By training small models from scratch, we find that the effect of contamination strongly depends on the scale of the data.

If the data follows the Chinchilla scaling law (20x model parameters), minor contamination leads to overfitting.

December 14, 2024 at 8:00 PM

Sebastian Bordt

@sbordt.bsky.social

Are you interested in data contamination and LLM benchmarks?🤖

Check out our poster today at the NeurIPS ATTRIB workshop (3-4:30pm)!

💡 TL;DR: In the large-data regime, a few times of data contamination matter less than you might think.

December 14, 2024 at 7:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news