shane caldwell
shanecaldwell.bsky.social
shane caldwell
@shanecaldwell.bsky.social
synthetic data, RL, hackbots - writing at https://hackbot.dad/
new blog: RL Needed LLMs Because Agency Requires Priors

Mostly a retrospective on how I mourned RL after AlphaZero and how much better it feels that it's back.

If you weren't working with DQNs it's hard to appreciate just how well things work with LLMs.

hackbot.dad/writing/rl-l...
August 25, 2025 at 2:16 PM