Lightnews — Scholar-powered news

Conor Durkan

@conormdurkan.bsky.social

300 followers 33 following 15 posts

Generative modeling person
https://conordurkan.com

Posts Replies Media Videos

Conor Durkan

@conormdurkan.bsky.social

https://arxiv.org/abs/1805.00909

t.co

January 3, 2025 at 5:02 PM

Conor Durkan

@conormdurkan.bsky.social

https://arxiv.org/abs/2205.11275

t.co

January 3, 2025 at 5:02 PM

Conor Durkan

@conormdurkan.bsky.social

January 3, 2025 at 5:02 PM

Conor Durkan

@conormdurkan.bsky.social

This means post-training (of this kind at least) optimizes KL(model || posterior), whereas pre-training optimizes KL(data || model). It also means post-training is mode-seeking (as opposed to mode-covering like pre-training), so those rewards better be well calibrated.

January 3, 2025 at 5:02 PM

Conor Durkan

@conormdurkan.bsky.social

Reward functions are log-likelihoods, and the pre-trained model is a prior. The posterior target is the product of the likelihoods and prior (the prior weighting can equivalently sharpen or smooth your likelihoods). Rewards can be hard for math/code verification, or soft for subjective preference.

January 3, 2025 at 5:02 PM

Conor Durkan

@conormdurkan.bsky.social

Ironically on my wrist

November 26, 2024 at 5:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news