https://sbordt.github.io/
This led us to write a position paper (accepted at #ICML2025) that attempts to identify the problem and to propose a solution.
arxiv.org/abs/2402.02870
👇🧵
This led us to write a position paper (accepted at #ICML2025) that attempts to identify the problem and to propose a solution.
arxiv.org/abs/2402.02870
👇🧵
Paper: arxiv.org/abs/2410.03249
👇🧵
Paper: arxiv.org/abs/2410.03249
👇🧵
The phenomenon is complex and requires more investigation.
However, in large-scale training runs, weight decay plays an important role.
This leads to a fun little theory of example forgetting via cumulative weight decay - check the paper for details! :)
The phenomenon is complex and requires more investigation.
However, in large-scale training runs, weight decay plays an important role.
This leads to a fun little theory of example forgetting via cumulative weight decay - check the paper for details! :)
Immediately after contamination, this leads to strong benchmark overfitting.
Surprisingly, as we continue training, almost all of the contamination is forgotten!
Immediately after contamination, this leads to strong benchmark overfitting.
Surprisingly, as we continue training, almost all of the contamination is forgotten!
If the data follows the Chinchilla scaling law (20x model parameters), minor contamination leads to overfitting.
If the data follows the Chinchilla scaling law (20x model parameters), minor contamination leads to overfitting.
Check out our poster today at the NeurIPS ATTRIB workshop (3-4:30pm)!
💡 TL;DR: In the large-data regime, a few times of data contamination matter less than you might think.
Check out our poster today at the NeurIPS ATTRIB workshop (3-4:30pm)!
💡 TL;DR: In the large-data regime, a few times of data contamination matter less than you might think.