Miguel Suau
miguelsuau.bsky.social
Miguel Suau
@miguelsuau.bsky.social
Machine Teacher. Research Scientist at Phaidra. PhD from TU Delft. Previously JP Morgan, Huawei, Unity.

https://www.suau.io/
It achieves this by reweighting samples according to the likelihood of state-action pairs under the agent’s state representation, effectively breaking the spurious correlations introduced by the policy.
June 18, 2025 at 7:55 PM
Here, we show that the advantage function not only reduces the variance of gradient estimates but also helps mitigate the effects of policy confounding.
June 18, 2025 at 7:55 PM
This paper builds on our work published last year at RLC, where we showed that agents can develop policies that exploit spurious correlations induced by their own policies, a phenomenon we call policy confounding.
June 18, 2025 at 7:55 PM
x.com
x.com
January 25, 2025 at 4:26 PM