Lightnews — Scholar-powered news

Harley Wiltzer

@harwiltz.bsky.social

PhD student at Mila / McGill. Studying distributional RL for transfer across risk-sensitive utilities, and for long-horizon high-frequency decision-making.

Posts Replies Media Videos

Harley Wiltzer

@harwiltz.bsky.social

Thanks so much @patrickshafto.bsky.social!!

December 12, 2024 at 10:44 PM

Harley Wiltzer

@harwiltz.bsky.social

There's an Easter egg after the 1024th iteration

December 9, 2024 at 10:31 PM

Harley Wiltzer

@harwiltz.bsky.social

Thanks a lot :D

December 9, 2024 at 7:20 PM

Harley Wiltzer

@harwiltz.bsky.social

For feature dimensions any larger than 1, things get tricky: projecting distributions onto finite representations can be expensive, and sampled-based updates can be biased. We present new methods using *randomized projections* and *signed measures* to overcome these issues.

December 9, 2024 at 3:30 PM

Harley Wiltzer

@harwiltz.bsky.social

This is closely related to our recent work on the Distributional Successor Measure (arxiv.org/abs/2402.08530). We strengthen the analysis to tractable projected DP and TD algorithms, and provide convergence rates as a function of the return distribution resolution & feature dim.

December 9, 2024 at 3:30 PM

Harley Wiltzer

@harwiltz.bsky.social

We learn the joint distribution over SFs in RL. Whereas SFs enable 0-shot transfer of value functions across a finite-dimensional class of reward functions, distributional SFs enable 0-shot generalization of return *distribution* functions across the class.

December 9, 2024 at 3:30 PM

Harley Wiltzer

@harwiltz.bsky.social

This was joint work with legendary collaborators: @jessefarebro.bsky.social, @arthurgretton.bsky.social, and Mark Rowland.

Paper: arxiv.org/abs/2409.00328
#NeurIPS2024 poster: neurips.cc/virtual/2024...

Foundations of Multivariate Distributional Reinforcement Learning

In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning....

arxiv.org

December 9, 2024 at 3:30 PM

Harley Wiltzer

@harwiltz.bsky.social

The rescaled superiority also preserves consistent action rankings for any distortion risk measure. We design DRL algorithms from these insights, and demonstrate that they are much more robust in a high-frequency option trading domain, *especially* with risk-sensitive utilities.

December 9, 2024 at 2:46 PM

Harley Wiltzer

@harwiltz.bsky.social

By *rescaling* the superiority, we can preserve *distributional action gaps* at high frequency. However, these gaps collapse at a slower sqrt(h) rate! Consequently, we discover that Baird's rescaled advantage has unbounded variance, making it tough to estimate in stochastic MDPs.

December 9, 2024 at 2:46 PM

Harley Wiltzer

@harwiltz.bsky.social

Towards solving this problem, we define the *superiority* as a probabilistic analogue of the advantage. Our axiomatic characterization of the superiority admits a simple and natural representation, despite the fact that superiority samples cannot be observed.

December 9, 2024 at 2:46 PM

Harley Wiltzer

@harwiltz.bsky.social

Q-Learning at high frequency fails, since action values differ by a quantity proportional to h, the amount of time between actions.

What about return distributions? We show that action-conditioned distributions also collapse, but different statistics collapse at different rates.

December 9, 2024 at 2:46 PM

Harley Wiltzer

@harwiltz.bsky.social

This was the result of a fantastic collaboration with the OT wizard Yash Jhaveri, Marc G. Bellemare, David Meger, and @patrickshafto.bsky.social.

Paper: arxiv.org/abs/2410.11022
#NeurIPS2024 poster: neurips.cc/virtual/2024...

https://arxiv.org/pdf/2410.11022

t.co

December 9, 2024 at 2:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news