Mostly a retrospective on how I mourned RL after AlphaZero and how much better it feels that it's back.
If you weren't working with DQNs it's hard to appreciate just how well things work with LLMs.
hackbot.dad/writing/rl-l...
Mostly a retrospective on how I mourned RL after AlphaZero and how much better it feels that it's back.
If you weren't working with DQNs it's hard to appreciate just how well things work with LLMs.
hackbot.dad/writing/rl-l...