calccon.bsky.social
@calccon.bsky.social
Reposted
SETOL: SemiEmpirical Theory of (Deep) Learning
The draft is just about ready

Why weightwatcher--and the HTSR theory--work
github.com/CalculatedCo...
github.com
June 27, 2025 at 3:27 PM
August 12, 2025 at 5:23 PM
SETOL: SemiEmpirical Theory of (Deep) Learning
The draft is just about ready

Why weightwatcher--and the HTSR theory--work
github.com/CalculatedCo...
github.com
June 27, 2025 at 3:27 PM
Where does HTSR and the weightwatcher theory come from ? T𝐡𝐞 𝗪𝐢𝐥𝐬𝐨𝐧 𝐄𝐱𝐚𝐜𝐭 𝐑𝐞𝐧𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐆𝐫𝐨𝐮𝐩

A new principle of learning that is not only fundamental to our understanding of AI 🧠

I have a draft of the theory monograph up on github, and it is just about ready

lnkd.in/gBsZ-QKF
June 11, 2025 at 4:39 AM
🎉 🚀 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡𝐢𝐧𝐠 𝟐𝟎𝟎𝐊 𝐝𝐨𝐰𝐧𝐥𝐨𝐚𝐝𝐬 🥳 💯

𝐖𝐞𝐢𝐠𝐡𝐭𝐖𝐚𝐭𝐜𝐡𝐞𝐫: 𝐃𝐚𝐭𝐚-𝐅𝐫𝐞𝐞 𝐃𝐢𝐚𝐠𝐧𝐨𝐬𝐭𝐢𝐜𝐬 𝐟𝐨𝐫 𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠

WeightWatcher is based on theoretical research into 𝑾ℎ𝒚 𝑫𝒆𝒆𝒑 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈, using the new 𝐓𝐡𝐞𝐨𝐫𝐲 𝐨𝐟 𝐇𝐞𝐚𝐯𝐲-𝐓𝐚𝐢𝐥𝐞𝐝 𝐒𝐞𝐥𝐟-𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (HTSR), published in JMLR, Nature Comm., and NeurIPS

weightwatcher.ai
June 2, 2025 at 10:20 PM
March 30, 2025 at 4:47 AM
Reminder: tomorrow at 10AM PST we will be finishing the table read of section 4.2 of the SETOL monograph

Here's the video from last week
www.youtube.com/watch?v=0WhB...

and the latest version of the paper can be found in ⁠theory-paper
SETOL Paper Table Read Section 4 Part 1
YouTube video by Calculation Consulting
www.youtube.com
February 13, 2025 at 7:25 PM
SETOL: SemiEmpirical Theory of (Deep Learning) & the connection to Renormalization Group

Turns out, AI models obey a fundamental law of physics when they are trained well.

A big thanks to the ML Research Jam for giving me the opportunity to present.

www.slideshare.net/slideshow/se...
SETOL: SemiEmpirical Theory of (Deep Learning)
SETOL: SemiEmpirical Theory of (Deep Learning) - Download as a PDF or view online for free
www.slideshare.net
January 22, 2025 at 11:10 PM
"We have not used perturbation theory—we have used an axe on the Hamiltonian"
Ken Wilson ( Nobel Prize, Physics 1982 )
January 21, 2025 at 6:33 AM
The weightwatcher theory paper is just about ready.

SETOL: SemiEmpirical Theory of (Deep) Learning

It's been a passion project of mine for nearly 10 years. before submitting it, I'd like to have a few people read it carefully and comment
January 20, 2025 at 5:16 PM
How did come up with the idea to look for power law signatures in deep learning models ? Why does power law behavior matter in neural systems and deep learning? Here’s the story:
January 19, 2025 at 6:56 PM
The quality of a NN layer has an Effective Free Energy obtained through a volume-preserving change of measure analogous to taking a single step of the Wilson Exact Renormalization Group

For an ideal layer (alpha=2), this can be experimentally validated with weightwatcher
January 17, 2025 at 12:46 AM
You can see the emerging signatures of the Wilson Exact Renormalization Group in the best trained layers of modern LLMs like Llama and Falcon. If you're an old physics supernerd like me, that's super cool. But it's also super useful for AI people.

weightwatcher.ai
January 13, 2025 at 5:34 AM
A quick weightwatcher workup on the new Falcon3 base models. As predicted by theory, the average weightwatcher layer alpha systematically decrease with increasing model size. The exception is the 10B model, which is an upscaled model.
December 19, 2024 at 6:19 AM
The theory behind weightwatcher is essentially an application of the Wilson Renormalization Group. The PL exponent alpha=2 is the analogous critical exponent separating the good generalization and overfit (i.e., spin-glass) phases of the NN layer.
December 10, 2024 at 7:59 AM
December 8, 2024 at 12:16 AM
Why alpha=2 is the ideal state of a NN layer ? In our upcoming monograph, A SemiEmpirical Theory of (Deep) Learning, we show that the HTSR metrics can be derived as an phenomenological Effective Hamiltonian, but one that is governed by a scale-invariant partition function, just like the Wilson RG
December 7, 2024 at 1:04 AM
Llama3-70B is Baking
December 7, 2024 at 12:10 AM
updated history.
December 2, 2024 at 9:34 PM
What is a SemiEmpirical Theory. Second pass. Comments ? Suggestions ?
November 30, 2024 at 5:19 PM
What is a SemiEmpirical Theory ?
November 29, 2024 at 8:50 PM
The SETOL layer quality metric, derived from statistical mechanics, correlated perfectly with the HTSR alpha layer quality metric. Good sign
November 28, 2024 at 7:35 PM
OLMo-7B. Looks pretty good!
November 28, 2024 at 3:17 PM
The weightwatcher theory (SETOL) posits that the quality of a NN layer is given by a sum of the integrated R-transform R(z), over the power law tail of the ESD. When alpha=2, the Inverse Wishart model is a good model of the ESD, and the branch cut starts right at the tail.
November 27, 2024 at 7:14 PM