Lightnews — Scholar-powered news

Nathan Kallus

@kallus.bsky.social

🏳️‍🌈👨‍👨‍👧‍👦 interested in causal inference, experimentation, optimization, RL, statML, econML, fairness
Cornell & Netflix
www.nathankallus.com

Posts Replies Media Videos

Nathan Kallus

@kallus.bsky.social

arxiv.org/abs/2302.02392 In offline RL, we replace exploration with assumptions that data is nice. We try to make these minimal by refining standard realizability and coverage assumptions to single policies. We do this via a minimax formulation and strong guarantees for learning the saddle point.

September 27, 2023 at 7:09 PM

Nathan Kallus

@kallus.bsky.social

arxiv.org/abs/2305.15703 RL only needs mean reward to go (q-fn) so why is distRL (learn whole reward-to-go dist) so empirically effective? We prove distRL is really good when optimal policy has small loss. When that's true then least-squares (q-learning) misses the signal due to heteroskedasticity.

September 27, 2023 at 7:08 PM

Nathan Kallus

@kallus.bsky.social

arxiv.org/abs/2207.13081 Off-policy eval in POMDPs is tough b/c hidden states ruin memorylessness inducing a curse of horizon. Using histories as instrumental variables, we derive a new Bellman eq for a new kind of v-fn. We solve it using minimax learning to get model-free eval using general fn apx.

September 27, 2023 at 7:06 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news