Francis Bach
@bachfrancis.bsky.social
Researcher in machine learning
Not all scaling laws are nice power laws. This month’s blog post: Zipf’s law in next-token prediction and why Adam (ok, sign descent) scales better to large vocab sizes than gradient descent: francisbach.com/scaling-laws...
September 27, 2025 at 2:57 PM
Not all scaling laws are nice power laws. This month’s blog post: Zipf’s law in next-token prediction and why Adam (ok, sign descent) scales better to large vocab sizes than gradient descent: francisbach.com/scaling-laws...
Tired of lengthy computations to derive scaling laws? This post is made for you: discover the sharpness of the z-transform!
francisbach.com/z-transform/
francisbach.com/z-transform/
July 18, 2025 at 2:39 PM
Tired of lengthy computations to derive scaling laws? This post is made for you: discover the sharpness of the z-transform!
francisbach.com/z-transform/
francisbach.com/z-transform/