Lightnews — Scholar-powered news

Dan Busbridge

@dbusbridge.bsky.social

35 followers 61 following 21 posts

Machine Learning Research @ Apple (opinions are my own)

Posts Replies Media Videos

Pinned

Dan Busbridge @dbusbridge.bsky.social · Feb 13

Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering:

"If I want a small, capable model, should I distill from a more powerful model, or train from scratch?"

Our distillation scaling law shows, well, it's complicated... 🧵

arxiv.org/abs/2502.08606

Distillation Scaling Laws

We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings reduce the risks associated ...

arxiv.org

Dan Busbridge

@dbusbridge.bsky.social

Several people have asked me to comment further on the connection between our work and the Patient and Consistent Teachers study by Beyer et al., as In Section 5.1, we note that our findings appear to contradict theirs.

arxiv.org/abs/2106.05237

February 13, 2025 at 9:52 PM

Dan Busbridge

@dbusbridge.bsky.social

Distillation Scaling Laws

arxiv.org

February 13, 2025 at 9:50 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news