The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
"If I want a small, capable model, should I distill from a more powerful model, or train from scratch?"
Our distillation scaling law shows, well, it's complicated... 🧵
arxiv.org/abs/2502.08606
We explored this through the lens of MoEs:
www.nature.com/articles/s41...
arxiv.org/abs/2412.13303
arxiv.org/abs/2412.13303
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n