www.alexirpan.com/2025/11/16/a...
www.alexirpan.com/2025/11/16/a...
We look at problems that could be solved if the model behaved consistently over a set of prompts, and tried training that in output space and internal activations. Both were effective. See thread or paper for details.
We look at problems that could be solved if the model behaved consistently over a set of prompts, and tried training that in output space and internal activations. Both were effective. See thread or paper for details.
www.alexirpan.com/2025/08/18/t...
www.alexirpan.com/2025/08/18/t...
www.alexirpan.com/2025/07/21/b...
www.alexirpan.com/2025/07/21/b...
vs
"Let me do one more hyperparam sweep before giving up. One more prompt tuning run. I swear we'll beat baseline. I know it's gonna beat the baseline this time. It's gonna win. This time for sure."
vs
"Let me do one more hyperparam sweep before giving up. One more prompt tuning run. I swear we'll beat baseline. I know it's gonna beat the baseline this time. It's gonna win. This time for sure."
95% accuracy -> 97.5% accuracy = meh
5% error -> 2.5% error = omg we've halved the error rate
95% accuracy -> 97.5% accuracy = meh
5% error -> 2.5% error = omg we've halved the error rate