Richard M. Bailey
@rmbailey.bsky.social
Professor of Environmental Systems, Oxford.
I mostly like building computer models, pondering complex natural systems, appreciating friendly cats.
I mostly like building computer models, pondering complex natural systems, appreciating friendly cats.
How to improve LLM responses in domains we can’t score? Implicit signals from structured dialogue help LLM agents edit their own contexts, improving responses dramatically.
“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.
arxiv.org/pdf/2510.15772
“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.
arxiv.org/pdf/2510.15772
arxiv.org
October 20, 2025 at 11:16 AM
How to improve LLM responses in domains we can’t score? Implicit signals from structured dialogue help LLM agents edit their own contexts, improving responses dramatically.
“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.
arxiv.org/pdf/2510.15772
“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.
arxiv.org/pdf/2510.15772
New paper just out on multi-agent reinforcement learning in an open-ended environment.
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…
www.jmlr.org/papers/volum...
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…
www.jmlr.org/papers/volum...
www.jmlr.org
May 14, 2025 at 2:16 PM
New paper just out on multi-agent reinforcement learning in an open-ended environment.
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…
www.jmlr.org/papers/volum...
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…
www.jmlr.org/papers/volum...