Lightnews — Scholar-powered news

Nicolas Dufour

@nicolasdufour.bsky.social

The explicit reward conditioning allows for flexible trade-offs, like optimizing for GenEval by reducing the aesthetic weight in the prompt. We can also isolate the look of a specific reward or interpolate them via multi-reward classifier-free guidance

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO excels on challenging compositional tasks (Geneval here)

The multi-reward conditioning fosters better understanding of complex spatial relationships and object interactions.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Despite being a compact model (0.36B parameters), MIRO achieves state-of-the-art results:

GenEval score of 75, outperforming the 12B FLUX-dev (67) for 370x less inference cost.
Conditioning on rich reward signals is a highly effective way to achieve large model capabilities in a compact form!

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO dramatically improves sample efficiency for test-time scaling.

On PickScore, MIRO needs just 4 samples to match the baseline's 128 samples (a 32x efficiency gain).
For ImageReward, it's a 16x efficiency gain

This demonstrates superior inference-time efficiency for high-quality generation.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Traditional single-objective optimization often leads to reward hacking. MIRO's multi-dimensional conditioning naturally prevents this by requiring the model to balance multiple objectives simultaneously. This produces balanced, robust performance across all metrics contrary to single rewards.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

The multi-reward conditioning provides a dense supervisory signal, accelerating convergence dramatically. A snapshot of the speed-up:

AestheticScore: 19.1x faster to reach baseline quality.
HPSv2: 6.2x faster.

You can clearly see the improvements visually

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

This reward vector s becomes an explicit, interpretable control input at inference time. We extend classifier-free guidance to the multi-reward setting, allowing users to steer generation toward jointly high-reward regions by defining positive (s^+) and negative (s^−) targets.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

MIRO trains p(x∣c,s) by conditioning the generative model on a vector s of reward scores for each image-text pair. Instead of correcting a pre-trained model, we teach it how to trade off multiple rewards from the start.

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.

- 19x faster convergence ⚡
- 370x less FLOPS than FLUX-dev 📉

October 31, 2025 at 11:24 AM

Nicolas Dufour

@nicolasdufour.bsky.social

Makes me think of StyleGAN3 visualizations

August 18, 2025 at 10:44 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Even crazier 🤯 DinoV3 works in some out-of-distribution setups too — as long as there are geographical cues 🌄🗺️

(Remember: the network is trained only on road images!)

Where DinoV2 totally failed, DinoV3 is holding up 👊

August 18, 2025 at 3:14 PM

Nicolas Dufour

@nicolasdufour.bsky.social

The setup 👉 We use our riemannian flow matching model PLONK (CVPR25: Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation) 🌍

We simply swap StreetCLIP with DinoV3 as a drop-in backbone, and train on OpenStreetView-5M.

And boom 💥 — DinoV3 wins.

August 18, 2025 at 3:14 PM

Nicolas Dufour

@nicolasdufour.bsky.social

🚀 DinoV3 just became the new go-to backbone for geoloc!
It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)… and that’s shocking 🤯
Why? CLIP models have an innate advantage — they literally learn place names + images. DinoV3 doesn’t.

August 18, 2025 at 3:14 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Come see us in poster 186 to see our poster Around the World in 80 timesteps: A generative Approach to Global Visual Geolocation!

Cc @loicland.bsky.social @davidpicard.bsky.social @vickykalogeiton.bsky.social

June 15, 2025 at 3:30 PM

Nicolas Dufour

@nicolasdufour.bsky.social

I will be at #CVPR2025 this week in Nashville.

I will be presenting our paper "Around the World in 80 Timesteps:
A Generative Approach to Global Visual Geolocation".

We tackle geolocalization as a generative task allowing for SOTA performance and more interpretable predictions.

June 11, 2025 at 12:52 AM

Nicolas Dufour

@nicolasdufour.bsky.social

This is an idea I've had for a while, but wow, it's working way better than expected! 🚀
The model looks really promising, even though it's just 256px for now.

April 24, 2025 at 12:40 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Our paper got accepted at TMLR!

TLDR; You can improve your diffusion samples by increasing guidance during the sampling process. A simpler linear scheduler suffice and is more robust than more elaborated methods.

December 20, 2024 at 1:23 AM

Nicolas Dufour

@nicolasdufour.bsky.social

To make our models easy to use, we packaged them into an easy to use library.
Also all our training code is available here: github.com/nicolas-dufo...
The demo can be run from there if you want to run it on GPU for superfast inference.

December 12, 2024 at 3:21 PM

Nicolas Dufour

@nicolasdufour.bsky.social

🌎 Why does geolocation matter?
From OSINT for journalists 🕵️ to tracking wildlife 🐘, geolocation solves critical challenges.
Models released:
🚗 OSV-5M: Pinpoint street-view images
🦋 iNat21: Track biodiversity
📸 YFCC-100M: Organize millions of diverse user-uploaded images

December 10, 2024 at 3:56 PM

Nicolas Dufour

@nicolasdufour.bsky.social

🧭 Our diffusion model learns to map an image's content to its location through multiple scales.

🌍 With Riemannian Flow Matching, we can denoise coordinates on the Earth's spherical geometry.

🔍 We reach SOTA geolocation results on OpenStreetView-5M, iNat-21, and YFCC100M.

December 10, 2024 at 3:56 PM

Nicolas Dufour

@nicolasdufour.bsky.social

Some images can be pinpointed exactly, while others are more ambiguous. Our model doesn't just pick a point, but provides a spatial probability distribution.

🌐 We can now quantify how "localizable" an image is, from a picture of the Eiffel Tower 🗼 to one of a random pigeon🐦!

December 10, 2024 at 3:56 PM

Nicolas Dufour

@nicolasdufour.bsky.social

🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocation—we predict global locations by refining random guesses into trajectories across the Earth's surface!

🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk

December 10, 2024 at 3:56 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news