Noam Razin
noamrazin.bsky.social
Noam Razin
@noamrazin.bsky.social
Postdoctoral Fellow at Princeton Language and Intelligence | Past: Computer Science PhD at Tel Aviv University & Apple Scholar in AI/ML | Interested in the foundations of deep learning

https://noamrazin.github.io/
Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.

📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6
July 11, 2025 at 5:32 PM
The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality?

📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers!
🧵
March 20, 2025 at 6:05 PM
Presenting tomorrow a poster on why DPO often decreases the probability of preferred responses, how that can cause surprising failures in alignment, and what can we do about it.

Catch me at these #NeurIPS workshop poster sessions:
- M3L 11:15am
- ATTRIB 3:00pm
- FITML 4:40pm
December 14, 2024 at 1:35 AM
I am attending NeurIPS! Feel free to reach out if you want to chat.

I will present in the M3L, FITML, and ATTRIB workshops our paper on why DPO often decreases the probability of preferred responses and how that can lead to weird failures in alignment.

arxiv.org/abs/2410.08847
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate prefer...
arxiv.org
December 9, 2024 at 2:41 PM
Catch Sadhika's talk today if you want to learn more about the surprising ways in which aligning language models based on preference data can fail
November 26, 2024 at 3:20 PM
Nadav Cohen and I recently uploaded lecture notes on the theory (and surprising practical applications) of linear neural networks.

Hope that it can be useful, especially to those entering the field as it highlights distinctions between DL and "classical" ML theory

arxiv.org/abs/2408.13767
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (devel...
arxiv.org
November 20, 2024 at 1:51 PM