Nathan Kallus
kallus.bsky.social
Nathan Kallus
@kallus.bsky.social
🏳️‍🌈👨‍👨‍👧‍👦 interested in causal inference, experimentation, optimization, RL, statML, econML, fairness
Cornell & Netflix
www.nathankallus.com
arxiv.org/abs/2302.02392 In offline RL, we replace exploration with assumptions that data is nice. We try to make these minimal by refining standard realizability and coverage assumptions to single policies. We do this via a minimax formulation and strong guarantees for learning the saddle point.
September 27, 2023 at 7:09 PM
arxiv.org/abs/2305.15703 RL only needs mean reward to go (q-fn) so why is distRL (learn whole reward-to-go dist) so empirically effective? We prove distRL is really good when optimal policy has small loss. When that's true then least-squares (q-learning) misses the signal due to heteroskedasticity.
September 27, 2023 at 7:08 PM
arxiv.org/abs/2207.13081 Off-policy eval in POMDPs is tough b/c hidden states ruin memorylessness inducing a curse of horizon. Using histories as instrumental variables, we derive a new Bellman eq for a new kind of v-fn. We solve it using minimax learning to get model-free eval using general fn apx.
September 27, 2023 at 7:06 PM