Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

Hadi Khalaf

Hadi Khalaf

@hadikh.bsky.social

6 followers 20 following 8 posts

phd @ harvard seas, thinking about alignment, information theory, and the likes

Posts Replies Media Videos

Hadi Khalaf

@hadikh.bsky.social

Modeling: arxiv.org/pdf/2307.15217 (very detailed view of the alignment pipelines) - arxiv.org/pdf/2312.13619 (grounds-up explanation of BTL reward models) - arxiv.org/pdf/2410.02197 - arxiv.org/pdf/2312.09244

Evals: arxiv.org/pdf/2403.13787 (check the HF leaderboard!) - arxiv.org/pdf/2410.14872

February 19, 2025 at 9:11 PM

Hadi Khalaf

@hadikh.bsky.social

Any specific areas (reward modeling, pluralistic alignment, controllability, alignment evaluation, variants of alignment...)?

January 13, 2025 at 4:41 AM