1️⃣ L1/L2/no regularization destroys perfectly generalizing solutions across 6 formal-language tasks
2️⃣ MDL keeps or improves on these same perfect solutions
(6/7)
1️⃣ L1/L2/no regularization destroys perfectly generalizing solutions across 6 formal-language tasks
2️⃣ MDL keeps or improves on these same perfect solutions
(6/7)
This encourages true generalization and can also yield more interpretable models.
(5/7)
This encourages true generalization and can also yield more interpretable models.
(5/7)
But smaller ≠ simpler.
Networks can still “smuggle” complexity - even in tiny numbers.
(4/7)
But smaller ≠ simpler.
Networks can still “smuggle” complexity - even in tiny numbers.
(4/7)
Tiny weight, full memorization.
(3/7)
Tiny weight, full memorization.
(3/7)
Yet, standard regularizers (like L1 and L2) actively push models away from these perfect solutions.
Full paper 👉 arxiv.org/abs/2505.13398
(2/7)
Yet, standard regularizers (like L1 and L2) actively push models away from these perfect solutions.
Full paper 👉 arxiv.org/abs/2505.13398
(2/7)