Lightnews — Scholar-powered news

Ehud Karavani

@ehudk.bsky.social

260 followers 230 following 460 posts

Research Staff Member at IBM Research.
Causal Inference 🔴→🟠←🟡.
Machine Learning 🤖🎓.
Data Communication 📈.
Healthcare ⚕️.
Creator of 𝙲𝚊𝚞𝚜𝚊𝚕𝚕𝚒𝚋: https://github.com/IBM/causallib
Website: https://ehud.co

Posts Replies Media Videos

Ehud Karavani

@ehudk.bsky.social

it ain't nuthin' but a g thang

homer simpson in the street with $100 bills flying around him, next to each bill there's a name of a g-based method like g-formula, g-computation, and g-estimation

November 8, 2025 at 5:07 PM

Ehud Karavani

@ehudk.bsky.social

An excellent opportunity to sneak in a DAG into a flowchart

October 23, 2025 at 2:05 PM

Ehud Karavani

@ehudk.bsky.social

The party loner meme with partying people setting random seeds to common conventions like 0 and 42, and loner thinking "they don't know I run sensitivity analyses over seeds"

October 23, 2025 at 5:38 AM

Ehud Karavani

@ehudk.bsky.social

June 22, 2025 at 5:49 PM

Ehud Karavani

@ehudk.bsky.social

these parentheses have a lot to unpack 😅

May 22, 2025 at 5:53 AM

Ehud Karavani

@ehudk.bsky.social

fixed it 🤭

May 14, 2025 at 6:46 AM

Ehud Karavani

@ehudk.bsky.social

I'll admit the argument through IF theory is beyond me at this hour of my day, but I believe my case holds in the simplest simulation conceivable.
full code: gist.github.com/ehudkr/a9dd3...

Code snippet for bootstrapping an IPW model, once using only confounding variables, and once using both confounding and prognostic variables.
Code:
```python
n_bootstrap = 1000
res_conf, res_prog = [], []
for i in range(n_bootstrap):
cur_df = df.sample(n=df.shape[0], replace=True, random_state=i)
ipw = IPW(LogisticRegression(penalty=None))

a = cur_df["a"]
y = cur_df["y"]

X = cur_df[["x_conf"]]
ipw.fit(X, a)
po = ipw.estimate_population_outcome(X, a, y)
ate = po[1] - po[0]
res_conf.append(ate)

X = cur_df[["x_conf", "x_prog"]]
ipw.fit(X, a)
po = ipw.estimate_population_outcome(X, a, y)
ate = po[1] - po[0]
res_prog.append(ate)
```

A histogram showing a narrow distribution in orange and a wider distribution in blue. Both distributions describe the average effect estimated with an IPW model. But the blue one was estimated using only confounding factors, and the orange one using both confounding and prognostic variables.

April 26, 2025 at 7:28 PM

Ehud Karavani

@ehudk.bsky.social

oh don't look at me just causally bayesianizing my lm

A screenshot of a code snippet in Python:
```
m = smf.ols("y ~ 1 + x1 + x2", data=df).fit()
n_draws = 100
coefs = rng.multivariate_normal(
mean=m.params, cov=m.cov_params(),
size=n_draws,
)
XX = df.assign(Intercept=1)[["Intercept", "x1", "x2"]]
yy_pred = XX @ coefs.T
```

April 25, 2025 at 11:27 AM

Ehud Karavani

@ehudk.bsky.social

starting on quantum computing, and now I cannot unsee brakets over Bell states everywhere I go.

Elevator buttons for opening doors, closing them, and an alarm bell.

April 23, 2025 at 3:04 PM

Ehud Karavani

@ehudk.bsky.social

i mean, not quite, but also not entirely wrong?

Tuxedo Winnie the Pooh meme: with Regular Pooh reading "SEM" (structural equation modeling), and Fancy Pooh reading "Turning DAGs into algebraic systems using simultaneous multi-task Bayesian inference"

February 27, 2025 at 12:15 PM

Ehud Karavani

@ehudk.bsky.social

when i present this topic I often also exclude the upper right quadrant for amplifying the effect (and realism 😬😅)

February 26, 2025 at 5:09 PM

Ehud Karavani

@ehudk.bsky.social

however, if the variables affecting the decision about prolonging treatment (X1) are affected by the 1st treatment decision (which is plausible), then regular regression methods can no longer provide a valid estimate because of the feedback mechanism between the treatment and the confounders.

February 19, 2025 at 3:18 PM

Ehud Karavani

@ehudk.bsky.social

If the second treatment is also confounded (as is probably the case) BUT THESE CONFOUNDERS ARE NOT AFFECTED BY THE FIRST TREATMENT, then the adjustment is still quite simple:
post ~ pre + X0 + A0 + X1 + A1

February 19, 2025 at 3:18 PM

Ehud Karavani

@ehudk.bsky.social

If you care about the cumulative effect of treatment over time, then you need to account for treatment varying over time.

In the simplest case, if the second treatment is completely randomized then it isn't a big deal:
post ~ pre + X0 + A0 + A1
will suffice

February 19, 2025 at 3:18 PM

Ehud Karavani

@ehudk.bsky.social

Welcome to the zoo of time-varying treatment Solomon. There different answers depending on the different questions of interest.

The simplest answers will be to ignore treatment being dynamic. It will answer whether treatment _initiation_ is effective, regardless of how patients stick to protocol.

a causal graph describing how baseline score (Y0) and baseline confounders (X0) affect treatment initiation (A0) and final outcome (Y1), while ignoring time-varying treatment and confounding that are greyed out in the schematics.

February 19, 2025 at 3:18 PM

Ehud Karavani

@ehudk.bsky.social

now I'm feeling like a sucker for making this absolutely stunning piece of graphical abstract last week

February 4, 2025 at 1:58 PM

Ehud Karavani

@ehudk.bsky.social

This is a bit of a tangent, but still related and interesting perspective on the topic (and the authors seem to have read Ben there)
arxiv.org/abs/2407.12220

3.1.9 Meta-contamination: training on test at the field-level
Tuning hyperparameters on the test set can also — in effect — happen across multiple
papers or teams. Specifically, ML is accretive: successful papers have their architectures
and hyperparameters reused and their code forked. This is not problematic in itself, but
when a single test set is, simultaneously, reused as the main form of evaluation for the
descendent lineage of papers (as ImageNet (Deng et al., 2009) was for computer vision),
this multi-team process is effectively training on the test set (Dwork et al., 2015; Arora
and Zhang, 2021): implicitly, we are designing successors based on a summary statistic of
the test set. For lodestar benchmarks like ImageNet, this can involve tens of thousands of
papers, each representing hundreds or thousands of hyperparameter settings.
Initial empirical tests (testing modern models on new but near-identically distributed
test sets) found that the novel-test-set error (i.e. the effect of meta-contamination) was
4-15% higher than the original-test-set error Recht et al. (2018, 2019a). This turns out to be
pessimistic: theoretical work by Arora and Zhang (2021) implies that the meta-overfitting
error on ImageNet must be less than 7-10%.
Achieving high benchmark scores in this way is unlikely to be malicious. However, with
extensively-used benchmarks such as ImageNet, meta-contamination is ubiquitous, and may
raise questions about the generalisation of methods and architectures which were tuned
primarily on such datasets.
When choosing a benchmark, a model should thus be tested for contamination on it;
methods for lightweight, zero-shot data extraction include Sainz et al. (2023); Wei et al.
(2023); Golchin and Surdeanu (2023).

January 31, 2025 at 8:17 PM

Ehud Karavani

@ehudk.bsky.social

I'm flattered, but now you made me draw DAGs, Ben.
On the left, you don't expect ɛ (y=f(X)+ɛ) to be consistent across data splits since it's random, and thus fitting it is bad.
On the right, you don't expect U (ruler) to appear on deployment, so a model using it instead of X (skin) will be wrong.

January 31, 2025 at 8:08 PM

Ehud Karavani

@ehudk.bsky.social

Appreciate the post and I agree DL provided new evidence.
I just think overfitting assumes iid train/test, so I'm not sure if cases like described in this paragraph hold (e.g., black swan).
I don't think that poor performance from distribution shift would be classified as "overfitting".

Calling this “overfitting” is decidedly unhelpful because predictions can go poorly in many ways. Perhaps you forgot to include a salient feature. Perhaps there was a feedback mechanism where the predictions impacted the outcome (e.g., The Lukas Critique). Perhaps we saw a black swan. There are a multitude of ways our reference narrative about the present can fail to capture future reality. When the future turns out not to be like the past, machine learning can’t work!

January 31, 2025 at 9:09 AM

Ehud Karavani

@ehudk.bsky.social

For a bayesian view of this issue I can recommend "Regularization and Confounding in Linear Regression for Treatment Effect Estimation" by Hahn and friends, if only for coining "regularization-induced confounding"
projecteuclid.org/journals/bay...

January 15, 2025 at 9:26 PM

Ehud Karavani

@ehudk.bsky.social

January 12, 2025 at 8:15 PM

Ehud Karavani

@ehudk.bsky.social

Gave a hands-on casual inference workshop in Python tonight at the DataNights/DataHack causality course and really enjoyed how engaged everyone were.

January 7, 2025 at 8:36 PM

Ehud Karavani

@ehudk.bsky.social

oh of course, the 3rd english dialect from the pirates of the C

A screenshot of a platform allowing to choose a language from either "English (US)", "English (UK)", or "English Debug".
bonus panel: that's the dialect i'm most fluent in

January 6, 2025 at 1:14 PM

Ehud Karavani

@ehudk.bsky.social

I didn't know Quarto can do this!
But it seems as simple as everything else in quarto:
closeread.dev

Instructions of how to install the Closeread Quarto extension

January 6, 2025 at 10:24 AM

Ehud Karavani

@ehudk.bsky.social

"next week" who was I kidding...
anyways, it's here now, and I'm glad I set down and wrote it because I discovered I had to iron out some personal misunderstandings.
so without further ado and with even shinier visuals, a post about double cross-fitting for #causalinference
ehud.co/blog/2024/03...

January 2, 2025 at 7:17 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news