Lightnews — Scholar-powered news

Richard M. Bailey

@rmbailey.bsky.social

Professor of Environmental Systems, Oxford.
I mostly like building computer models, pondering complex natural systems, appreciating friendly cats.

Posts Replies Media Videos

Richard M. Bailey

@rmbailey.bsky.social

How to improve LLM responses in domains we can’t score? Implicit signals from structured dialogue help LLM agents edit their own contexts, improving responses dramatically.

“Self-evolving expertise in complex non-verifiable subject domains: dialogue as implicit meta-RL”.

arxiv.org/pdf/2510.15772

arxiv.org

October 20, 2025 at 11:16 AM

Richard M. Bailey

@rmbailey.bsky.social

New paper just out on multi-agent reinforcement learning in an open-ended environment.
It introduces the RULE algorithm, allowing groups of agents to update their own reward functions to solve otherwise insoluble problems. Fixed reward functions, so 2024…

www.jmlr.org/papers/volum...

www.jmlr.org

May 14, 2025 at 2:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news