https://noamrazin.github.io/
📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6
📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6
📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers!
🧵
📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers!
🧵
Catch me at these #NeurIPS workshop poster sessions:
- M3L 11:15am
- ATTRIB 3:00pm
- FITML 4:40pm
Catch me at these #NeurIPS workshop poster sessions:
- M3L 11:15am
- ATTRIB 3:00pm
- FITML 4:40pm
I will present in the M3L, FITML, and ATTRIB workshops our paper on why DPO often decreases the probability of preferred responses and how that can lead to weird failures in alignment.
arxiv.org/abs/2410.08847
I will present in the M3L, FITML, and ATTRIB workshops our paper on why DPO often decreases the probability of preferred responses and how that can lead to weird failures in alignment.
arxiv.org/abs/2410.08847
Hope that it can be useful, especially to those entering the field as it highlights distinctions between DL and "classical" ML theory
arxiv.org/abs/2408.13767
Hope that it can be useful, especially to those entering the field as it highlights distinctions between DL and "classical" ML theory
arxiv.org/abs/2408.13767