@gfranceschelli.bsky.social
📢 New Paper!
We replace the entropy bonus in PPO with a *complexity* bonus, encouraging structured and stochastic policies that are robust to different scaling factors and can work in environments with variable exploration needs.
Read more:
arxiv.org/abs/2509.20509
w/ @mircomusolesi.bsky.social
We replace the entropy bonus in PPO with a *complexity* bonus, encouraging structured and stochastic policies that are robust to different scaling factors and can work in environments with variable exploration needs.
Read more:
arxiv.org/abs/2509.20509
w/ @mircomusolesi.bsky.social
Complexity-Driven Policy Optimization
Policy gradient methods often balance exploitation and exploration via entropy maximization. However, maximizing entropy pushes the policy towards a uniform random distribution, which represents an un...
arxiv.org
October 8, 2025 at 9:08 AM
📢 New Paper!
We replace the entropy bonus in PPO with a *complexity* bonus, encouraging structured and stochastic policies that are robust to different scaling factors and can work in environments with variable exploration needs.
Read more:
arxiv.org/abs/2509.20509
w/ @mircomusolesi.bsky.social
We replace the entropy bonus in PPO with a *complexity* bonus, encouraging structured and stochastic policies that are robust to different scaling factors and can work in environments with variable exploration needs.
Read more:
arxiv.org/abs/2509.20509
w/ @mircomusolesi.bsky.social