Lightnews — Scholar-powered news

Brad Knox

@bradknox.bsky.social

1.4K followers 29 following 18 posts

Research Associate Professor in CS at UT Austin. I research how humans can specify aligned reward functions.

Posts Replies Media Videos

Brad Knox

@bradknox.bsky.social

Our paper, Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners, won the Outstanding Paper Award on Emerging Topics in Reinforcement Learning this year at RLC! Congrats to 1st author @cmuslima.bsky.social!

Paper: sites.google.com/ualberta.ca/...

August 8, 2025 at 12:21 AM

Brad Knox

@bradknox.bsky.social

Vibecoding apparently requires a magic touch I lack. In two attempts from scratch, Cursor AI + Claude 3.5 goes off the rails constantly and has eventually degenerated into non-functionality.

Degeneration #2: Claude is only pretending to run terminal commands and edit my code. 🤦

March 4, 2025 at 6:01 PM

Brad Knox

@bradknox.bsky.social

Study 3: Simply changing the question asked during preference elicitation. (7/n)

January 14, 2025 at 11:51 PM

Brad Knox

@bradknox.bsky.social

Study 2: Training people to follow a specific preference model. (6/n)

January 14, 2025 at 11:51 PM

Brad Knox

@bradknox.bsky.social

Study 1 intervention: Show humans the quantities that underlie a preference model---normally unobservable information derived from the reward function. (5/n)

January 14, 2025 at 11:51 PM

Brad Knox

@bradknox.bsky.social

RLHF algorithms assume humans generate preferences according to normative models. We propose a new method for model alignment: influence humans to conform to these assumptions through interface design. Good news: it works!
#AI #MachineLearning #RLHF #Alignment (1/n)

First page of the paper Influencing Humans to Conform to Preference Models for RLHF, by Hatgis-Kessell et al.

Our proposed method of influencing human preferences.

January 14, 2025 at 11:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news