Willis (Nanye) Ma
nanye-ma.bsky.social
Willis (Nanye) Ma
@nanye-ma.bsky.social
PhD at NYU Courant | undergrad at NYU
Lastly, we examine how scaling inference-time compute benefits smaller diffusion models.
These results indicate that substantial training costs can be partially offset by modest inference-time compute, enabling higher-quality samples more efficiently. [6/n]
January 17, 2025 at 4:50 PM
We then proceed to examine the capability of the search framework in text-conditioned generation task.
With the 12B FLUX.1-dev model on DrawBench, searching with all verifiers improves sample quality, while again specific improvement behaviors largely vary across different setups. [5/n]
January 17, 2025 at 4:50 PM
Our search framework consists of two components: verifiers to provide feedback, and algorithms to find better noise candidates.
On ImageNet with SiT-XL, different combinations of verifiers and algorithms are observed to have very different scaling behaviors. [4/n]
January 17, 2025 at 4:50 PM
From "cherry-picking", we know that some noises are better than others.
This suggests pushing the inference-time scaling limit by investing compute in searching for better noises.
Then, it's natural to ask: how do we know which sampling noises are good, and how do we search for such noises? [3/n]
January 17, 2025 at 4:50 PM
Inference-time scaling for LLMs improves the model's ability in many perspectives, but what about diffusion models?
In our latest study—Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps—we reframe inference-time scaling as a search problem over sampling noises. 🧵[1/n]
January 17, 2025 at 4:50 PM