Oussama Zekri
ozekri.bsky.social
Oussama Zekri
@ozekri.bsky.social
ENS Saclay maths dpt + UW Research Intern.

Website : https://oussamazekri.fr
Blog : https://logb-research.github.io/
You mean, we don’t stop at the frontier of the convex set but just a bit further ?

Wow, does this trick have a name?
February 6, 2025 at 12:12 PM
Looks nice!! Will stop by your notebooks
February 5, 2025 at 10:28 PM
Working with him these past months has been both fun and inspiring. He’s an incredibly talented researcher! 🚀

If you haven’t heard of him, check out his work : he’s one of the pioneers of operator learning and pushing this field to new heights!
Nicolas Boullé
About me
nboulle.github.io
February 4, 2025 at 3:42 PM
Thanks for reading !

❤️ Work done during my 3-months internship at Imperial College!

A huge thanks to Nicolas Boullé (nboulle.github.io) for letting me work on a topic that interested me a lot during the internship.
Nicolas Boullé
About me
nboulle.github.io
February 4, 2025 at 3:42 PM
We fine-tuned a discrete diffusion model to respond to user prompts. In just 7k iterations (GPU poverty is real, haha), it outperforms the vanilla model ~75% of the time! 🚀
February 4, 2025 at 3:42 PM
Building on this, we can correct the gradient direction to better **follow the flow**, using the implicit function theorem (cf @mblondel.bsky.social et al., arxiv.org/abs/2105.15183 )✨

The cool part? We only need to invert a linear system, whose inverse is known in closed form! 🔥
February 4, 2025 at 3:42 PM
Inspired by Implicit Diffusion (@pierremarion.bsky.social @akorba.bsky.social @qberthet.bsky.social🤓, arxiv.org/abs/2402.05468), we sample using a specific CTMC, reaching the limiting distribution in an infinite time horizon. This effectively implements a gradient flow w.r.t. a Wasserstein metric!🔥
February 4, 2025 at 3:42 PM
SEPO, like most policy optimization algorithms, alternates between sampling and optimization. But what if sampling itself was seen as an optimization procedure in distribution space? 🚀
February 4, 2025 at 3:42 PM
If you have a discrete diffusion model (naturally designed for discrete data, e.g. language or DNA sequence modeling), you can finetune it with non-differentiable reward functions! 🎯

For example, this enables RLHF for discrete diffusion models, making alignment more flexible and powerful. ✅
February 4, 2025 at 3:42 PM
The main gradient takes the form of a weighted log concrete score, echoing DeepSeek’s unified paradigm with the weighted log policy!🔥

From this, we can reconstruct any policy gradient method for discrete diffusion models (e.g. PPO, GRPO etc...). 🚀
February 4, 2025 at 3:42 PM
The main bottleneck of Energy-Based Models is computing the normalizing constant Z.

Instead, recent discrete diffusion models skip Z by learning ratios of probabilities. This forms the concrete score, which a neural network models efficiently!⚡

The challenge? Using this score network as a policy.
February 4, 2025 at 3:42 PM
i couldn’t have say it better myself !
December 6, 2024 at 4:09 PM
This equivalence between LLMs and Markov chains seems useless, but it isn't! Among the contributions, the paper highlights bounds established thanks to this equivalence, and verifies the influence of bound terms on recents LLMs !

I invite you to take a look at the other contributions of the paper 🙂
December 4, 2024 at 9:47 AM
This number is huge, but **finite**! Working with markov chains in a finite state space really gives non-trivial mathematical insights (existence and uniqueness of a stationary distribution for example...).
December 4, 2024 at 9:41 AM