Lightnews — Scholar-powered news

Noam Razin

@noamrazin.bsky.social

820 followers 270 following 34 posts

Postdoctoral Fellow at Princeton Language and Intelligence | Past: Computer Science PhD at Tel Aviv University & Apple Scholar in AI/ML | Interested in the foundations of deep learning

https://noamrazin.github.io/

Posts Replies Media Videos

Noam Razin

@noamrazin.bsky.social

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.

📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6

July 11, 2025 at 5:32 PM

Noam Razin

@noamrazin.bsky.social

The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality?

📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers!
🧵

March 20, 2025 at 6:05 PM

Noam Razin

@noamrazin.bsky.social

Presenting tomorrow a poster on why DPO often decreases the probability of preferred responses, how that can cause surprising failures in alignment, and what can we do about it.

Catch me at these #NeurIPS workshop poster sessions:
- M3L 11:15am
- ATTRIB 3:00pm
- FITML 4:40pm

December 14, 2024 at 1:35 AM

Noam Razin

@noamrazin.bsky.social

I am attending NeurIPS! Feel free to reach out if you want to chat.

I will present in the M3L, FITML, and ATTRIB workshops our paper on why DPO often decreases the probability of preferred responses and how that can lead to weird failures in alignment.

arxiv.org/abs/2410.08847

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate prefer...

arxiv.org

December 9, 2024 at 2:41 PM

Noam Razin

@noamrazin.bsky.social

Catch Sadhika's talk today if you want to learn more about the surprising ways in which aligning language models based on preference data can fail

Sadhika Malladi @sadhika.bsky.social · Nov 26

I'll be giving a talk on my two recent preference learning works (led by Angelica Chen and @noamrazin.bsky.social) in the AI Tinkerers Paper Club today (11/26) at noon ET. Excited to share this talk with a broader audience! paperclub.aitinkerers.org/p/join-paper...

Join Paper Club with Princeton University on Model Alignment Challenges in Preference Learning [AI Tinkerers - Paper Club]

Join Our Paper Club Event Series! Meet with Sadhika Malladi, AI Researcher at Princeton University and discuss the challenges of aligning language models with human preferences. Don’t miss this unique...

paperclub.aitinkerers.org

November 26, 2024 at 3:20 PM

Noam Razin

@noamrazin.bsky.social

Nadav Cohen and I recently uploaded lecture notes on the theory (and surprising practical applications) of linear neural networks.

Hope that it can be useful, especially to those entering the field as it highlights distinctions between DL and "classical" ML theory

arxiv.org/abs/2408.13767

Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning

These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (devel...

arxiv.org

November 20, 2024 at 1:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news