and @ENSdeLyon
Machine Learning, Python and Optimization
Are FM & diffusion models nothing else than denoisers at every noise level?
In theory yes, *if trained optimally*. But in practice, do all noise level equally matter?
with @annegnx.bsky.social, S Martin & R Gribonval
Are FM & diffusion models nothing else than denoisers at every noise level?
In theory yes, *if trained optimally*. But in practice, do all noise level equally matter?
with @annegnx.bsky.social, S Martin & R Gribonval
An hypothesis to explain this paradox is target stochasticity: FM targets the conditional velocity field ie only a stochastic approximation of the full velocity field u*
*We refute this hypothesis*: very early, the approximation almost equals u*
An hypothesis to explain this paradox is target stochasticity: FM targets the conditional velocity field ie only a stochastic approximation of the full velocity field u*
*We refute this hypothesis*: very early, the approximation almost equals u*
🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn *can only generate training points*?
w @quentinbertrand.bsky.social @annegnx.bsky.social @remiemonet.bsky.social 👇👇👇
🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn *can only generate training points*?
w @quentinbertrand.bsky.social @annegnx.bsky.social @remiemonet.bsky.social 👇👇👇
2 benefits:
- no need to compute likelihoods nor solve ODE in training
- makes the problem better posed by defining a *unique sequence of densities* from base to target
2 benefits:
- no need to compute likelihoods nor solve ODE in training
- makes the problem better posed by defining a *unique sequence of densities* from base to target
We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
it is possible to embed any cloud of N points from R^d into R^k without distorting their respective distances too much, provided k is not too small (independently of d!)
Better: any random Gaussian embedding works with high proba!
it is possible to embed any cloud of N points from R^d into R^k without distorting their respective distances too much, provided k is not too small (independently of d!)
Better: any random Gaussian embedding works with high proba!
Higher conditioning => harder to minimize the function
Gradient Descent gets faster on function with decreasing conditioning L/mu 👇
Higher conditioning => harder to minimize the function
Gradient Descent gets faster on function with decreasing conditioning L/mu 👇
1°: Two equivalent views on PCA: maximize the variance of the projected data, or minimize the reconstruction error
1°: Two equivalent views on PCA: maximize the variance of the projected data, or minimize the reconstruction error