Hadi Khalaf
hadikh.bsky.social
Hadi Khalaf
@hadikh.bsky.social
phd @ harvard seas, thinking about alignment, information theory, and the likes
Modeling: arxiv.org/pdf/2307.15217 (very detailed view of the alignment pipelines) - arxiv.org/pdf/2312.13619 (grounds-up explanation of BTL reward models) - arxiv.org/pdf/2410.02197 - arxiv.org/pdf/2312.09244

Evals: arxiv.org/pdf/2403.13787 (check the HF leaderboard!) - arxiv.org/pdf/2410.14872
February 19, 2025 at 9:11 PM
Any specific areas (reward modeling, pluralistic alignment, controllability, alignment evaluation, variants of alignment...)?
January 13, 2025 at 4:41 AM