Lightnews — Scholar-powered news

Aly Lidayan

@aliday.bsky.social

28 followers 18 following 8 posts

AI PhD student at Berkeley
alyd.github.io

Posts Replies Media Videos

Aly Lidayan

@aliday.bsky.social

5️⃣We demonstrate our framework in Mountain Car. We set the potential to the maximum displacement the agent learnt to reach so far, signaling the value of its training. Rewarding displacement directly (pink) led to reward-hacking but the BAMPF (green) preserved optimality✅

March 26, 2025 at 12:05 AM

Aly Lidayan

@aliday.bsky.social

4️⃣We get a new typology for intrinsic motivation & reward shaping terms based on which BAMDP value component they signal! They hinder exploration if they align poorly with actual value, e.g., prediction error is high for watching a noisy TV but no valuable information is gained.

March 26, 2025 at 12:05 AM

Aly Lidayan

@aliday.bsky.social

3️⃣To guide more efficient exploration, BAMPF potentials should encode BAMDP state value. To gain further insights, we decompose BAMDP value into the value of the information gathered🧠 and the value of the MDP state given prior knowledge only🌎.

March 26, 2025 at 12:05 AM

Aly Lidayan

@aliday.bsky.social

2️⃣Harmful reward-hacking policies maximize modified rewards to the detriment of true rewards. We prove that converting IM and reward shaping terms to BAMDP potential-based shaping functions (BAMPFs) prevents hacking, and empirically validate this in both RL and meta-RL.

March 26, 2025 at 12:05 AM

Aly Lidayan

@aliday.bsky.social

1️⃣We cast RL agents as policies in Bayes-Adaptive MDPs, which augment the MDP state with the history of all environment interactions. Optimal exploration maximizes BAMDP state value, and pseudo-rewards guide RL agents by rewarding them for going to more valuable BAMDP states.

March 26, 2025 at 12:05 AM

Aly Lidayan

@aliday.bsky.social

🚨Our new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of the RL agent’s state🤖=external state🌎+past experience🧠. Rewards based on potentials over the learning agent’s state provably avoid reward hacking!🧵

March 26, 2025 at 12:05 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news