Nicolas Beltran-Velez
banner
velezbeltran.bsky.social
Nicolas Beltran-Velez
@velezbeltran.bsky.social
Machine Learning PhD Student
@ Blei Lab & Columbia University.

Working on probabilistic ML | uncertainty quantification | LLM interpretability.

Excited about everything ML, AI and engineering!
Reposted by Nicolas Beltran-Velez
this is probably not the complete picture of KD, but i can definitely sleep better after writing down and confirming this minimal working explanation.

arXiv: arxiv.org/abs/2505.13111

(3/4)
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation
Knowledge distillation (KD) is a core component in the training and deployment of modern generative models, particularly large language models (LLMs). While its empirical benefits are well documented-...
arxiv.org
May 20, 2025 at 12:18 PM
Tests!! :)
January 25, 2025 at 7:50 PM
But the memory needed for the value function kills the ones that don't have good GPUs 😭
January 25, 2025 at 3:36 PM
I mostly use copilot for writing code (as auto complete), gpt4-o for boiler plate, and o1 for serious debugging or boilerplate with some complexity or a lot of requirements. I also use o1 for quick but slightly involved experiments but not as often.
January 8, 2025 at 7:36 PM
I use chatgpt over Google for a lot of things because it is really good at fuzzy queries + data aggregation from many sources. I feel that as long as you double check results it is much faster and convenient.
January 7, 2025 at 12:40 AM