I've been waiting for a paper like this! Trains the LLM to iteratively crop regions of interest to answer a question, and the only reward is the final answer.
Details in thread 👇
I've been waiting for a paper like this! Trains the LLM to iteratively crop regions of interest to answer a question, and the only reward is the final answer.
Details in thread 👇