11/11 This is joint work with @willberghammer, @haoyu_wang66, @EnnemoserMartin, @HochreiterSepp, and @sebaleh. See you at #ICLR! [Poster Link](iclr.cc/virtual/202...) [Paper Link](arxiv.org/abs/2502.08696) ---
April 24, 2025 at 8:57 AM
11/11 This is joint work with @willberghammer, @haoyu_wang66, @EnnemoserMartin, @HochreiterSepp, and @sebaleh. See you at #ICLR! [Poster Link](iclr.cc/virtual/202...) [Paper Link](arxiv.org/abs/2502.08696) ---
8/11 💡 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 2: We address the limitations of the fKL by combining it with Neural Importance Sampling over samples from the diffusion sampler. This allows us to estimate the gradient of the fKL using Monte Carlo integration, making training more memory-efficient.
April 24, 2025 at 8:57 AM
8/11 💡 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 2: We address the limitations of the fKL by combining it with Neural Importance Sampling over samples from the diffusion sampler. This allows us to estimate the gradient of the fKL using Monte Carlo integration, making training more memory-efficient.
6/11 💡 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 1: We apply the policy gradient theorem to the rKL between joint distributions of the diffusion path. This enables the use of mini-batches over diffusion time steps by leveraging reinforcement learning methods, allowing for memory-efficient training.
April 24, 2025 at 8:57 AM
6/11 💡 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 1: We apply the policy gradient theorem to the rKL between joint distributions of the diffusion path. This enables the use of mini-batches over diffusion time steps by leveraging reinforcement learning methods, allowing for memory-efficient training.
5/11 A commonly used divergence is the reverse KL divergence (rKL), as the expectation of the divergence goes over samples from the generative model. However, naive optimization of this KL divergence requires backpropagating through the whole generative process.
April 24, 2025 at 8:57 AM
5/11 A commonly used divergence is the reverse KL divergence (rKL), as the expectation of the divergence goes over samples from the generative model. However, naive optimization of this KL divergence requires backpropagating through the whole generative process.
3/11 🔍 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐒𝐚𝐦𝐩𝐥𝐞𝐫𝐬 aim to sample from an unnormalized target distribution without access to samples from this distribution. They can be trained by minimizing a divergence between the joint distribution of the forward and reverse diffusion paths.
April 24, 2025 at 8:57 AM
3/11 🔍 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐒𝐚𝐦𝐩𝐥𝐞𝐫𝐬 aim to sample from an unnormalized target distribution without access to samples from this distribution. They can be trained by minimizing a divergence between the joint distribution of the forward and reverse diffusion paths.