Lightnews — Scholar-powered news

Kawin Ethayarajh

@kawinethayarajh.bsky.social

900 followers 170 following 15 posts

Postdoc at Princeton PLI. Formerly PhD at Stanford CS. Working on behavioral machine learning. https://kawine.github.io/

Posts Replies Media Videos

Kawin Ethayarajh

@kawinethayarajh.bsky.social

Interesting. Is this because they have govt contracts and those would be jeopardized? Or uncertainty around whether those models will be banned for use in America, which adds a huge risk premium?

May 5, 2025 at 4:51 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

relevant paper: arxiv.org/abs/2410.08847

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate prefer...

arxiv.org

December 19, 2024 at 8:28 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

for all methods, it is better for data to be on-policy and be labelled as good/bad relative to the current state of the policy.

but ultimately this is a learning dynamics problem that transcends how the data is sampled

December 19, 2024 at 8:27 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

unpaired methods work the way we hope paired methods would, simultaneously increasing the relative prob of good outputs and decreasing the relative prob of bad outputs. this allows you to skip SFT entirely

December 19, 2024 at 8:25 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

it's not really on vs. off-policy. in theory, paired methods should increase the prob of good outputs, decrease prob of bad outputs. in practice, they decrease *both*. you need to do SFT beforehand so that you can pay this price and hope that relative to the base model, p(good|x) is still higher

December 19, 2024 at 8:23 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

all paired preference methods suffer from this problem while also being more inflexible. unpaired preference methods are always the way to go IME

December 19, 2024 at 6:02 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

Source: old.reddit.com/r/LocalLLaMA...

old.reddit.com

November 26, 2024 at 11:36 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

nominating myself @kawinethayarajh.bsky.social

November 21, 2024 at 7:02 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

These differences with the DPO version don't seem statistically significant?

November 21, 2024 at 6:39 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

(almost) all good poetry has high perplexity. it's by design something an out-of-the-box llm would be bad at. alignment on one poet would actually help imo.

November 21, 2024 at 5:24 AM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

Moderately grumpy UToronto alumnus nominating myself 🙋

November 18, 2024 at 11:19 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

Could i be added (recent alumnus)? Thank you!

November 18, 2024 at 11:16 PM

Kawin Ethayarajh

@kawinethayarajh.bsky.social

Would love to be added thanks!

November 17, 2024 at 4:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news