Edward H. Kennedy
banner
edwardhkennedy.bsky.social
Edward H. Kennedy
@edwardhkennedy.bsky.social
assoc prof of statistics & data science at Carnegie Mellon

https://www.ehkennedy.com/

interested in causality, machine learning, nonparametrics, public policy, etc
Awesome!
April 6, 2025 at 2:59 PM
Ok I think I'll stop now :) I'm always amazed at how ahead of its time this work was.

It's too bad it's not as widely known among us causal+ML people
February 17, 2025 at 2:47 AM
Once you have a pathwise differentiable parameter, a natural estimator is a debiased plug-in, which subtracts off the avg of estimated influence fn

Pfanzagl gives this 1-step estimator here - in causal inference this is exactly the doubly robust / DML estimator you know & love!
February 17, 2025 at 2:47 AM
Pfanzagl uses pathwise differentiability above, but w/regularity conditions this is just a distributional Taylor expansion, which is easier to think about

I note this in my tutorial here:

www.ehkennedy.com/uploads/5/8/...

Also v related to so-called "Neyman orthogonality" - worth separate thread
x.com
x.com
February 17, 2025 at 2:47 AM
Here’s Pfanzagl on the gradient of a functional/parameter, aka derivative term in a von Mises expansion, aka influence function, aka Neyman-orthogonal score

Richard von Mises first characterized smoothness this way for stats in the 30s/40s! eg:

projecteuclid.org/journals/ann...
February 17, 2025 at 2:47 AM
The m-estimator logic certainly relies on “exactly correct”

Once you start moving to “close enough” to me that means you’re no longer getting precise root-n rates with the nuisances. Then you’ll have to deal with the bias/variance consequences just as if you were using flexible ML
February 11, 2025 at 3:12 PM
And here for more specific discussion:

arxiv.org/pdf/2405.08525

I think DR estimation vs inference are two quite different things and we need different assumptions to make them work
arxiv.org
February 11, 2025 at 2:47 PM
If we really rely on 2 parametric models, we should of course use a variance estimator recognizing this. But this is more about how we model nuisances vs DR estimator itself

Also our paper here suggests strictly more assumptions are needed for DR inference vs estimation:

arxiv.org/pdf/2305.04116
arxiv.org
February 11, 2025 at 2:43 PM
I find it much more believable that I could estimate both nuisances consistently, but at slower rates, vs that I could pick 2 parametric models (without looking at data) & happen to get one exactly correct
February 11, 2025 at 2:43 PM
Hm not sure I agree with this logic…

To me the beautiful thing about the DR estimator is you can get away with estimating both nuisances at slower rates (as long as the product is < 1/sqrt(n))

This opens the door to using much more flexible methods - random forests, lasso, ensembles, etc etc
February 11, 2025 at 2:43 PM
Here's the recent paper!

bsky.app/profile/edwa...
In this paper we consider incremental effects of continuous exposures:

arxiv.org/abs/2409.11967

i.e., soft interventions on cts treatments like dose, duration, frequency

it turns out exponential tilts preserve all nice properties of incremental effects with binary trt (arxiv.org/abs/1704.00211)
December 19, 2024 at 2:24 AM
Reposted by Edward H. Kennedy
@bonv.bsky.social presented this at NYU this week -- terrific work with an excellent presentation (no surprise there)! I found the connections to higher-order estimators and the orthogonalizing property of the U-stat kernel fascinating&illuminating.
December 13, 2024 at 7:05 PM
December 13, 2024 at 4:07 AM