Lars van der Laan
banner
larsvanderlaan3.bsky.social
Lars van der Laan
@larsvanderlaan3.bsky.social
Ph.D. Student @uwstat; Research fellowship @Netflix; visiting researcher @UCJointCPH; M.A. @UCBStatistics - machine learning; calibration; semiparametrics; causal inference.

https://larsvanderlaan.github.io
What does ‘biased’ mean here? It would be biased in expectation, since if you were to repeat the experiment many times, some power users would join. If you define your estimator as the empirical mean over non-power users, then it might be unbiased.
June 18, 2025 at 1:24 AM
I’d be surprised if this actually works in practice, since neural networks are often overfitting (e.g. perfectly fitting labels with double descent), which violates donsker conditions. And, the neural tangent kernel ridge approximation of neural networks has been shown to not hold empirically.
May 26, 2025 at 11:39 PM
Looks like they are assuming the neural network can be approximated by ridge regression in an RKHS (which seems strong in practice). Under this approximation, plug-in efficiency follows from fairly standard results on undersmoothed ridge regression; see, e.g. arxiv.org/abs/2306.08598
Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters
When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce ``plug-in bias.'' Traditional methods...
arxiv.org
May 26, 2025 at 11:34 PM
Calibrate your outcome predictions and propensities using isotonic regression as follows:

mu_hat <- as.stepfun(isoreg(mu_hat, Y))(mu_hat)

pi_hat <- as.stepfun(isoreg(pi_hat, A))(pi_hat)

(Or use the isoreg_with_xgboost function given in the paper, which I recommend)
May 19, 2025 at 12:07 AM
arxiv.org
May 12, 2025 at 6:19 PM
This work is a result of my internship at Netflix over the summer and is joint with Aurelien Bibaut and Nathan Kallus.
May 12, 2025 at 6:10 PM
Inference for smooth functionals of M-estimands in survival models, like regularized coxPH and the beta-geometric model (see our experiments section) are one application of this approach.
May 12, 2025 at 5:56 PM
By targeting low dimensional summaries, there is no need to establish asymptotic normality of the entire infinite dimensional M-estimator (which isn’t possible in general). It allows for the use of ML and regularization to estimate it, and valid inference via a one step bias correction.
May 12, 2025 at 5:53 PM
If you’re willing to consider smooth functionals of the infinite dimensional M-estimand, then there is a general theory for inference, where the sandwich variance estimator now involves the derivative of the loss and a Riesz representer of the functional.

Working paper:
arxiv.org/pdf/2501.11868
arxiv.org
May 12, 2025 at 5:51 PM
The motivation should have been something like a confounder that is somewhat predictive of both the treatment and outcome might be more important to adjust for then a variable that is super predictive of the outcome but doesn’t predict treatment. TR might help give more importance to such variables
April 25, 2025 at 4:45 PM
One could have given an analogous theorem saying that E[Y | T, X] is a sufficient deconfounding score and argued that one should only adjust for features predictive of the outcome. So yeah I think it’s wrong/poorly phrased
April 25, 2025 at 5:12 AM
The OP’s approach is based on the conditional probability of Y given the treatment is intervened upon and set to some value. But, they don’t seem to define what this means formally, which is exactly what potential outcomes/NPSEM achieve.
April 1, 2025 at 3:33 AM
The second stage coefficients are the estimand (identifying the structural coefficients/treatment effect). The first stage coefficients are nuisances, and typically not of direct interest.
March 20, 2025 at 9:35 PM