External lecturer at Johannes Kepler University Linz, Austria
Drug Discovery | Deep Learning | RL
http://www.arjonamedina.com
To me it looks like a clever way of applaying a PPO-like clipping within a supervised framework, constrained by a fixed reference model. Althought some parts in its formulation are very similar to PPO, I wouldn't describe it as RL. (1/5)🧵
To me it looks like a clever way of applaying a PPO-like clipping within a supervised framework, constrained by a fixed reference model. Althought some parts in its formulation are very similar to PPO, I wouldn't describe it as RL. (1/5)🧵