calccon.bsky.social
@calccon.bsky.social
August 12, 2025 at 5:23 PM
Where does HTSR and the weightwatcher theory come from ? T𝐡𝐞 𝗪𝐢𝐥𝐬𝐨𝐧 𝐄𝐱𝐚𝐜𝐭 𝐑𝐞𝐧𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐆𝐫𝐨𝐮𝐩

A new principle of learning that is not only fundamental to our understanding of AI 🧠

I have a draft of the theory monograph up on github, and it is just about ready

lnkd.in/gBsZ-QKF
June 11, 2025 at 4:39 AM
🎉 🚀 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡𝐢𝐧𝐠 𝟐𝟎𝟎𝐊 𝐝𝐨𝐰𝐧𝐥𝐨𝐚𝐝𝐬 🥳 💯

𝐖𝐞𝐢𝐠𝐡𝐭𝐖𝐚𝐭𝐜𝐡𝐞𝐫: 𝐃𝐚𝐭𝐚-𝐅𝐫𝐞𝐞 𝐃𝐢𝐚𝐠𝐧𝐨𝐬𝐭𝐢𝐜𝐬 𝐟𝐨𝐫 𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠

WeightWatcher is based on theoretical research into 𝑾ℎ𝒚 𝑫𝒆𝒆𝒑 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈, using the new 𝐓𝐡𝐞𝐨𝐫𝐲 𝐨𝐟 𝐇𝐞𝐚𝐯𝐲-𝐓𝐚𝐢𝐥𝐞𝐝 𝐒𝐞𝐥𝐟-𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (HTSR), published in JMLR, Nature Comm., and NeurIPS

weightwatcher.ai
June 2, 2025 at 10:20 PM
March 30, 2025 at 4:47 AM
The weightwatcher theory paper is just about ready.

SETOL: SemiEmpirical Theory of (Deep) Learning

It's been a passion project of mine for nearly 10 years. before submitting it, I'd like to have a few people read it carefully and comment
January 20, 2025 at 5:16 PM
How did come up with the idea to look for power law signatures in deep learning models ? Why does power law behavior matter in neural systems and deep learning? Here’s the story:
January 19, 2025 at 6:56 PM
The quality of a NN layer has an Effective Free Energy obtained through a volume-preserving change of measure analogous to taking a single step of the Wilson Exact Renormalization Group

For an ideal layer (alpha=2), this can be experimentally validated with weightwatcher
January 17, 2025 at 12:46 AM
You can see the emerging signatures of the Wilson Exact Renormalization Group in the best trained layers of modern LLMs like Llama and Falcon. If you're an old physics supernerd like me, that's super cool. But it's also super useful for AI people.

weightwatcher.ai
January 13, 2025 at 5:34 AM
A quick weightwatcher workup on the new Falcon3 base models. As predicted by theory, the average weightwatcher layer alpha systematically decrease with increasing model size. The exception is the 10B model, which is an upscaled model.
December 19, 2024 at 6:19 AM
The theory behind weightwatcher is essentially an application of the Wilson Renormalization Group. The PL exponent alpha=2 is the analogous critical exponent separating the good generalization and overfit (i.e., spin-glass) phases of the NN layer.
December 10, 2024 at 7:59 AM
The claims on Hopfield make no sense. Hopfield cited a 1977 Amari paper that literally says "This work is an development of my 1971 work" . Yet, somehow, citing this 1977 paper is plagiarism, but citing 1971 paper it would not be ?!

This is just not credible.

www.researchgate.net/publication/...
December 8, 2024 at 8:01 AM
December 8, 2024 at 12:16 AM
Why alpha=2 is the ideal state of a NN layer ? In our upcoming monograph, A SemiEmpirical Theory of (Deep) Learning, we show that the HTSR metrics can be derived as an phenomenological Effective Hamiltonian, but one that is governed by a scale-invariant partition function, just like the Wilson RG
December 7, 2024 at 1:04 AM
Llama3-70B is Baking
December 7, 2024 at 12:10 AM
updated history.
December 2, 2024 at 9:34 PM
What is a SemiEmpirical Theory. Second pass. Comments ? Suggestions ?
November 30, 2024 at 5:19 PM
What is a SemiEmpirical Theory ?
November 29, 2024 at 8:50 PM
The SETOL layer quality metric, derived from statistical mechanics, correlated perfectly with the HTSR alpha layer quality metric. Good sign
November 28, 2024 at 7:35 PM
OLMo-7B. Looks pretty good!
November 28, 2024 at 3:17 PM
The weightwatcher theory (SETOL) posits that the quality of a NN layer is given by a sum of the integrated R-transform R(z), over the power law tail of the ESD. When alpha=2, the Inverse Wishart model is a good model of the ESD, and the branch cut starts right at the tail.
November 27, 2024 at 7:14 PM
I recently worked up a bunch of examples of Instruction Fine-Tuned models using weightwatcher. I hope they are useful to you.

weightwatcher.ai/models.html
November 26, 2024 at 5:31 PM
WeightWatcher (w|w) is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data.
Here are over a dozen new examples of popular Instruction Fine-Tuned models.
weightwatcher.ai/models.html
November 22, 2024 at 7:11 AM