Lightnews — Scholar-powered news

Hidde Fokkema

@hiddefokkema.bsky.social

PhD candidate in Mathematical Machine Learning with @tverven | Researching formal XAI | Maths nerd | Occasional producer of electronic music

https://www.hidde-fokkema.com

Posts Replies Media Videos

Hidde Fokkema

@hiddefokkema.bsky.social

I did some googling and this article has a surprisingly nice and pedagogical discussion on this, with a similar conclusion to your idea.

tinyurl.com/52a3whac

And I found that I missed the opportunity to make the joke that the posterior of the simpler model is "Sharper", keeping the razor theme.

Ockham's Razor and Bayesian Analysis on JSTOR

William H. Jefferys, James O. Berger, Ockham's Razor and Bayesian Analysis, American Scientist, Vol. 80, No. 1 (January-February 1992), pp. 64-72

tinyurl.com

November 3, 2025 at 5:03 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(2/2) if we see the complicated model and simple model as 2 different hypothesis classes, with 2 seperate priors, then the posterior for the more complicated class will be flatter than the posterior of the simple class, which is what you want I think.

November 3, 2025 at 4:38 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(1/2) Fair point, I think my point was that anything Bayesian is prior related. So with the correct prior you could at least recover Ockham's razor, but not really derive it. But my thinking is a bit different I think, as in my points above the hypothesis classes are the same.

In your idea, if ..

November 3, 2025 at 4:36 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(7/n=7) So, in the end, you can get Ockham's razor if your prior is that simple explanations (read explanations with less parameters) are more likely than complicated ones. For binary parameters you could write the prior explicitly. For real valued parameters this becomes impossible (I am guessing)

November 3, 2025 at 4:14 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(6/n) Now if you really want to derive Ockham's razor, in the sense of minimum assumptions, or really the number of parameters, you would need a prior distribution that assigns more probability mass to simple models.

November 3, 2025 at 4:11 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(5/n) Similarly, if β ~ Laplace(0, σ^2), then you get the Lasso objective

min ||y - <β, x> ||^2 + λ || β ||_1

where we now have the 1-norm as regularization penalty. This one has the added benefit that irrelevant parameters are set to 0, which resembles the original Ockham's razor principle more

November 3, 2025 at 4:09 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(4/n) writing out the posterior likelihood and performing maximum likelihood on the parameters (Maximum a postiori Bayesian inference). How much you regularise is determined by σ, and there is relation to λ.

November 3, 2025 at 4:07 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(3/n) Let's say we consider as possible models all linear models and complexity measure the euclidean norm of the parameters. (This is ridge regresions). Then, we would retrieve the optimisation problem:

min ||y - <β, x>||^2 + λ||β||^2

By assuming that β ~ N(0, σ^2) and ...

November 3, 2025 at 4:04 PM

Hidde Fokkema

@hiddefokkema.bsky.social

(2/n) In particular, this would give you the model with the least amount of assumptions, if you consider 2 models that explain the data equally well, but one has less assumptions and that is the complexity measure you consider.

November 3, 2025 at 4:01 PM

Hidde Fokkema

@hiddefokkema.bsky.social

Sure! Here are some thoughts

(1/n) I would see Ockham's razor as the following optimisation problem:

min Error(data, model) + Complexity(model)

Where you minimise over all models.

November 3, 2025 at 3:59 PM

Hidde Fokkema

@hiddefokkema.bsky.social

If you see Ockham's razor as a regularization mechanism, because you optimize to fit the data and minimizing the parameters, then there are explicit connection. For example ridge regression follows from assuming a gaussian prior on the parameters and Lasso regression follows from a Laplace prior

November 3, 2025 at 3:29 PM

Hidde Fokkema

@hiddefokkema.bsky.social

Ohw and the timing of the Q2B conference being this week probably also factors in. So they can hype it a bit more there

December 10, 2024 at 9:25 PM

Hidde Fokkema

@hiddefokkema.bsky.social

My guess would be because the nature version of the article was just published?

December 10, 2024 at 9:22 PM

Hidde Fokkema

@hiddefokkema.bsky.social

Aren't these dual numbers? I think Julia has some autodiff packages based in this idea

November 29, 2024 at 3:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news