Suhas Kotha
suhasia.bsky.social
Suhas Kotha
@suhasia.bsky.social
cs phd @ stanford
yeah, i think the main reason is that larger models generalize better and can be distilled. this seems like a satisfying enough reason?
July 16, 2025 at 10:12 PM
i really hope every ML class can catch up to teaching this (just yesterday i also wrote a post about how double descent appears in linear regression, specifically due to the inductive bias of minimum norm solutions kothasuhas.github.io/writing/doub...)
Suhas Kotha
Some Intuition Behind Double Descent
kothasuhas.github.io
February 14, 2025 at 10:30 PM