Mathurin Massias
mathurinmassias.bsky.social
Mathurin Massias
@mathurinmassias.bsky.social
Tenured Researcher @INRIA, Ockham team. Teacher @Polytechnique
and @ENSdeLyon

Machine Learning, Python and Optimization
🌀🌀🌀New paper on the generation phases of Flow Matching arxiv.org/abs/2510.24830
Are FM & diffusion models nothing else than denoisers at every noise level?
In theory yes, *if trained optimally*. But in practice, do all noise level equally matter?

with @annegnx.bsky.social, S Martin & R Gribonval
November 5, 2025 at 9:03 AM
Yet flow matching generates new samples!

An hypothesis to explain this paradox is target stochasticity: FM targets the conditional velocity field ie only a stochastic approximation of the full velocity field u*

*We refute this hypothesis*: very early, the approximation almost equals u*
June 18, 2025 at 8:11 AM
New paper on the generalization of Flow Matching www.arxiv.org/abs/2506.03719

🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn *can only generate training points*?

w @quentinbertrand.bsky.social @annegnx.bsky.social @remiemonet.bsky.social 👇👇👇
June 18, 2025 at 8:08 AM
FM is a technique to train continuous normalizing flows (CNF) that progressively transform a simple base distrib to the target one
2 benefits:
- no need to compute likelihoods nor solve ODE in training
- makes the problem better posed by defining a *unique sequence of densities* from base to target
November 27, 2024 at 9:05 AM
Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition!

We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
November 27, 2024 at 9:00 AM
Johnson-Lindenstrauss lemma in action:
it is possible to embed any cloud of N points from R^d into R^k without distorting their respective distances too much, provided k is not too small (independently of d!)

Better: any random Gaussian embedding works with high proba!
November 25, 2024 at 9:58 AM
Conditioning of a function = ratio between highest and smallest eigenvalues of its Hessian.

Higher conditioning => harder to minimize the function

Gradient Descent gets faster on function with decreasing conditioning L/mu 👇
November 22, 2024 at 1:52 PM
Time to unearth posts from the previous network!
1°: Two equivalent views on PCA: maximize the variance of the projected data, or minimize the reconstruction error
November 20, 2024 at 4:27 PM