Lightnews — Scholar-powered news

Reposted

calccon.bsky.social

@calccon.bsky.social

SETOL: SemiEmpirical Theory of (Deep) Learning
The draft is just about ready

Why weightwatcher--and the HTSR theory--work
github.com/CalculatedCo...

github.com

June 27, 2025 at 3:27 PM

calccon.bsky.social

@calccon.bsky.social

August 12, 2025 at 5:23 PM

calccon.bsky.social

@calccon.bsky.social

SETOL: SemiEmpirical Theory of (Deep) Learning
The draft is just about ready

Why weightwatcher--and the HTSR theory--work
github.com/CalculatedCo...

github.com

June 27, 2025 at 3:27 PM

calccon.bsky.social

@calccon.bsky.social

Where does HTSR and the weightwatcher theory come from ? T𝐡𝐞 𝗪𝐢𝐥𝐬𝐨𝐧 𝐄𝐱𝐚𝐜𝐭 𝐑𝐞𝐧𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐆𝐫𝐨𝐮𝐩

A new principle of learning that is not only fundamental to our understanding of AI 🧠

I have a draft of the theory monograph up on github, and it is just about ready

lnkd.in/gBsZ-QKF

June 11, 2025 at 4:39 AM

calccon.bsky.social

@calccon.bsky.social

🎉 🚀 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡𝐢𝐧𝐠 𝟐𝟎𝟎𝐊 𝐝𝐨𝐰𝐧𝐥𝐨𝐚𝐝𝐬 🥳 💯

𝐖𝐞𝐢𝐠𝐡𝐭𝐖𝐚𝐭𝐜𝐡𝐞𝐫: 𝐃𝐚𝐭𝐚-𝐅𝐫𝐞𝐞 𝐃𝐢𝐚𝐠𝐧𝐨𝐬𝐭𝐢𝐜𝐬 𝐟𝐨𝐫 𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠

WeightWatcher is based on theoretical research into 𝑾ℎ𝒚 𝑫𝒆𝒆𝒑 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈, using the new 𝐓𝐡𝐞𝐨𝐫𝐲 𝐨𝐟 𝐇𝐞𝐚𝐯𝐲-𝐓𝐚𝐢𝐥𝐞𝐝 𝐒𝐞𝐥𝐟-𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (HTSR), published in JMLR, Nature Comm., and NeurIPS

weightwatcher.ai

June 2, 2025 at 10:20 PM

calccon.bsky.social

@calccon.bsky.social

github.com/CalculatedCo...

March 30, 2025 at 4:47 AM

calccon.bsky.social

@calccon.bsky.social

Reminder: tomorrow at 10AM PST we will be finishing the table read of section 4.2 of the SETOL monograph

Here's the video from last week
www.youtube.com/watch?v=0WhB...

and the latest version of the paper can be found in ⁠theory-paper

SETOL Paper Table Read Section 4 Part 1

YouTube video by Calculation Consulting

www.youtube.com

February 13, 2025 at 7:25 PM

calccon.bsky.social

@calccon.bsky.social

SETOL: SemiEmpirical Theory of (Deep Learning) & the connection to Renormalization Group

Turns out, AI models obey a fundamental law of physics when they are trained well.

A big thanks to the ML Research Jam for giving me the opportunity to present.

www.slideshare.net/slideshow/se...

SETOL: SemiEmpirical Theory of (Deep Learning)

SETOL: SemiEmpirical Theory of (Deep Learning) - Download as a PDF or view online for free

www.slideshare.net

January 22, 2025 at 11:10 PM

calccon.bsky.social

@calccon.bsky.social

"We have not used perturbation theory—we have used an axe on the Hamiltonian"
Ken Wilson ( Nobel Prize, Physics 1982 )

January 21, 2025 at 6:33 AM

calccon.bsky.social

@calccon.bsky.social

The weightwatcher theory paper is just about ready.

SETOL: SemiEmpirical Theory of (Deep) Learning

It's been a passion project of mine for nearly 10 years. before submitting it, I'd like to have a few people read it carefully and comment

January 20, 2025 at 5:16 PM

calccon.bsky.social

@calccon.bsky.social

How did come up with the idea to look for power law signatures in deep learning models ? Why does power law behavior matter in neural systems and deep learning? Here’s the story:

January 19, 2025 at 6:56 PM

calccon.bsky.social

@calccon.bsky.social

The quality of a NN layer has an Effective Free Energy obtained through a volume-preserving change of measure analogous to taking a single step of the Wilson Exact Renormalization Group

For an ideal layer (alpha=2), this can be experimentally validated with weightwatcher

January 17, 2025 at 12:46 AM

calccon.bsky.social

@calccon.bsky.social

You can see the emerging signatures of the Wilson Exact Renormalization Group in the best trained layers of modern LLMs like Llama and Falcon. If you're an old physics supernerd like me, that's super cool. But it's also super useful for AI people.

weightwatcher.ai

January 13, 2025 at 5:34 AM

calccon.bsky.social

@calccon.bsky.social

calculatedcontent.com/2024/12/24/w...

WeightWatcher, HTSR theory, and the Renormalization Group

There is a deep connection between the open-source weightwatcher tool, which implements ideas from the theory of Heavy Tailed Self-Regularization (HTSR) of Deep Neural Networks (DNNs), and the Wils…

calculatedcontent.com

December 25, 2024 at 6:13 PM

calccon.bsky.social

@calccon.bsky.social

A quick weightwatcher workup on the new Falcon3 base models. As predicted by theory, the average weightwatcher layer alpha systematically decrease with increasing model size. The exception is the 10B model, which is an upscaled model.

December 19, 2024 at 6:19 AM

calccon.bsky.social

@calccon.bsky.social

The theory behind weightwatcher is essentially an application of the Wilson Renormalization Group. The PL exponent alpha=2 is the analogous critical exponent separating the good generalization and overfit (i.e., spin-glass) phases of the NN layer.

December 10, 2024 at 7:59 AM

calccon.bsky.social

@calccon.bsky.social

December 8, 2024 at 12:16 AM

calccon.bsky.social

@calccon.bsky.social

Why alpha=2 is the ideal state of a NN layer ? In our upcoming monograph, A SemiEmpirical Theory of (Deep) Learning, we show that the HTSR metrics can be derived as an phenomenological Effective Hamiltonian, but one that is governed by a scale-invariant partition function, just like the Wilson RG

December 7, 2024 at 1:04 AM

calccon.bsky.social

@calccon.bsky.social

Llama3-70B is Baking

December 7, 2024 at 12:10 AM

calccon.bsky.social

@calccon.bsky.social

updated history.

December 2, 2024 at 9:34 PM

calccon.bsky.social

@calccon.bsky.social

What is a SemiEmpirical Theory. Second pass. Comments ? Suggestions ?

November 30, 2024 at 5:19 PM

calccon.bsky.social

@calccon.bsky.social

What is a SemiEmpirical Theory ?

November 29, 2024 at 8:50 PM

calccon.bsky.social

@calccon.bsky.social

The SETOL layer quality metric, derived from statistical mechanics, correlated perfectly with the HTSR alpha layer quality metric. Good sign

November 28, 2024 at 7:35 PM

calccon.bsky.social

@calccon.bsky.social

OLMo-7B. Looks pretty good!

November 28, 2024 at 3:17 PM

calccon.bsky.social

@calccon.bsky.social

The weightwatcher theory (SETOL) posits that the quality of a NN layer is given by a sum of the integrated R-transform R(z), over the power law tail of the ESD. When alpha=2, the Inverse Wishart model is a good model of the ESD, and the branch cut starts right at the tail.

November 27, 2024 at 7:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news