Jose Arjona-Medina
arjonamedina.bsky.social
Jose Arjona-Medina
@arjonamedina.bsky.social
AI/ML Principal Scientist at J&J
External lecturer at Johannes Kepler University Linz, Austria
Drug Discovery | Deep Learning | RL
http://www.arjonamedina.com
I still uncertain about the RL aspect in DeepSeek.

To me it looks like a clever way of applaying a PPO-like clipping within a supervised framework, constrained by a fixed reference model. Althought some parts in its formulation are very similar to PPO, I wouldn't describe it as RL. (1/5)🧵
March 1, 2025 at 7:30 PM