The Yang-Lecun correspondence
The Yang-Lecun correspondence
Lots of interesting post-training details as well. And great performance ofc!
arxiv.org/abs/2504.00698
Lots of interesting post-training details as well. And great performance ofc!
arxiv.org/abs/2504.00698
> c++ code is slower than python
> After a while optimizing it, figure out you forgot to add -O3
> Runs much faster obviously
> At the end the python bindings eat up half of the runtime benefits
🥲🎢
> c++ code is slower than python
> After a while optimizing it, figure out you forgot to add -O3
> Runs much faster obviously
> At the end the python bindings eat up half of the runtime benefits
🥲🎢