Come chat about alignment!
Turns out:
-best-of-$n$ is the optimal option!
-you can contrastively train an LLM to mimic its own best-of-$n$ distribution!
BonBon alignment: arxiv.org/abs/2406.00832
Come chat about alignment!
Turns out:
-best-of-$n$ is the optimal option!
-you can contrastively train an LLM to mimic its own best-of-$n$ distribution!
BonBon alignment: arxiv.org/abs/2406.00832
Turns out:
-best-of-$n$ is the optimal option!
-you can contrastively train an LLM to mimic its own best-of-$n$ distribution!
BonBon alignment: arxiv.org/abs/2406.00832