Lightnews — Scholar-powered news

Hadi Khalaf

@hadikh.bsky.social

6 followers 20 following 8 posts

phd @ harvard seas, thinking about alignment, information theory, and the likes

Posts Replies Media Videos

Reposted by Hadi Khalaf

Maarten Buyl

@maartenbuyl.bsky.social

AI is built to “be helpful” or “avoid harm”, but which principles should it prioritize and when? We call this alignment discretion. As Asimov's stories show: balancing such principles for AI behavior is tricky. In fact, we find that AI has its own set of priorities. (comic by @xkcd.com)🧵👇

February 19, 2025 at 9:08 PM

Hadi Khalaf

@hadikh.bsky.social

I feel queasy when I read LLM interpretability papers, some results seem wonderful but I am distrustful of the methodology and interpretation

February 2, 2025 at 4:19 AM

Hadi Khalaf

@hadikh.bsky.social

Reward modeling + BoN seem like a poor man's way to get good alignment because PPO training is expensive. Are perfect reward signals all we need? I'm not convinced

February 1, 2025 at 6:06 AM

Hadi Khalaf

@hadikh.bsky.social

The current alignment paradigm is plagued by the fact it's an flimsy adaptation of RL. RLHF is not RL but it could be and maybe it should be. There's something missing in the treatment of feedback-based alignment, and there are fundamental differences between it & RL that are not clear to me!

February 1, 2025 at 6:03 AM

Hadi Khalaf

@hadikh.bsky.social

spending my sunday battling with tikz (im losing)

December 9, 2024 at 1:17 AM

Hadi Khalaf

@hadikh.bsky.social

true agi is when your llm stops affirming everything you say

December 6, 2024 at 3:43 AM

Hadi Khalaf

@hadikh.bsky.social

doing your phd at a time when it *feels* everyone is studying the same things and you'd be facing major FOMO if you don't... is not fun

November 26, 2024 at 1:37 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news