Leon Lang
banner
leon-lang.bsky.social
Leon Lang
@leon-lang.bsky.social
PhD Candidate at the University of Amsterdam. AI Alignment and safety research. Formerly multivariate information theory and equivariant deep learning. Masters degrees in both maths and AI. https://langleon.github.io/
The idea: In the robot-hand example, when the hand is in front of the ball, the human believes the ball was grasped and gives "thumbs up", leading to bad behavior. If we knew the human's beliefs, then we could assign the feedback properly: Reward the ball being grasped! (2/4)
March 3, 2025 at 3:44 PM