Lightnews — Scholar-powered news

Maëva L'Hôtellier

@maevalhotellier.bsky.social

190 followers 160 following 14 posts

Studying learning and decision-making in humans | HRL team - ENS Ulm |

Posts Replies Media Videos

Maëva L'Hôtellier

@maevalhotellier.bsky.social

What about Deep Decision Trees? 🌳
We further extend the RA model by integrating a temporal difference component to the dynamic range updates. With this extension, we demonstrate that the magnitude invariance capabilities of the RA model persist in multi-step tasks.

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

With this enhanced model, we generalize the main findings to other bandit settings: The dynamic RA model outperforms the ABS model in several bandit tasks with noisy outcomes, non-stationary rewards, and even multiple options.

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

Once these basic properties are demonstrated in a simplified set-up, we enhance the RA model to successfully cope with stochastic and volatile environments, by dynamically adjusting its internal range variables (Rmax / Rmin).

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

In contrast, the RA model, by constraining all rewards to a similar scale, efficiently balances exploration and exploitation without the need for task-specific adjustment!

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

Crucially, modifying the value of the temperature (𝛽) from the Softmax function does not solve the problem of the standard model. It simply shifts the peak performance along the magnitude axis.
Thus, to achieve high performance, the ABS model requires tuning the 𝛽 value to the magnitudes at stake.

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

Agent-Level Insights: ABS performance drops to chance due to over-exploration in small rewards and over-exploitation in large rewards.
In contrast, the RA model maintains a consistent, scale-invariant performance.

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

First, we simulate ABS and RA behavior in bandits tasks with various magnitude and discriminability levels.

As expected the standard model is highly dependent on the tasks levels, while the RA model achieves high accuracy over the whole range of values tested!

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

To avoid magnitude-dependence, we propose the Range-Adapted (RA) model: RA normalizes rewards, enabling consistent representation of subjective values within a constrained space, independent of reward magnitude.

December 10, 2024 at 6:02 PM

Maëva L'Hôtellier

@maevalhotellier.bsky.social

Standard reinforcement learning algorithms encode rewards in an unbiased, absolute manner (ABS), which make their performance magnitude-dependent.

December 10, 2024 at 6:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news