Richard M. Bailey
rmbailey.bsky.social
Richard M. Bailey
@rmbailey.bsky.social
Professor of Environmental Systems, Oxford.
I mostly like building computer models, pondering complex natural systems, appreciating friendly cats.
How to improve LLM responses in domains we can’t score? Implicit signals from structured dialogue help LLM agents edit their own contexts, improving responses dramatically.

“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.

arxiv.org/pdf/2510.15772
arxiv.org
October 20, 2025 at 11:16 AM
New paper just out on multi-agent reinforcement learning in an open-ended environment.
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…

www.jmlr.org/papers/volum...
www.jmlr.org
May 14, 2025 at 2:16 PM