leodingo.bsky.social
@leodingo.bsky.social
Reposted
MGPO: multi-turn grounding-based policy optimization.

I've been waiting for a paper like this! Trains the LLM to iteratively crop regions of interest to answer a question, and the only reward is the final answer.

Details in thread 👇
July 9, 2025 at 3:24 PM