Wouter van Amsterdam
vanamsterdam.bsky.social
Wouter van Amsterdam
@vanamsterdam.bsky.social
machine learning, causal inference, healthcare - assistant professor in dep. of Data Science Methods, Julius Center, of University Medical Center Utrecht, the Netherlands; wvanamsterdam.com
work with

Diantha Schipaanboord, Floor B.H. van der Zalm, René van Es, Melle Vessies, Rutger R. van de Leur, Klaske R. Siegersma, Pim van der Harst, Hester M. den Ruijter, N. Charlotte Onland-Moret, on behalf of the IMPRESS consortium
September 4, 2025 at 12:03 PM
Conclusion: The convolutional neural networks in this study demonstrated resilience to simulated sex-imbalance in training ECG data.

pre-print: doi.org/10.1101/2025...
ECG classification with convolutional neural networks demonstrates resilience to sex-imbalances in data
Background: Many ECG-AI models have been developed to predict a wide range of cardiovascular outcomes. The underrepresentation of women in cardiovascular disease studies has raised concerns if these m...
doi.org
September 4, 2025 at 12:03 PM
Discrimination remained stable across sexes; only calibration shifted in extreme scenarios when prevalence differed by sex, with similar patterns for women and men.
September 4, 2025 at 12:03 PM
Using ~165k ECGs, we simulated sex-imbalances in representation (women-to-men ratio), outcome prevalence, and misclassification in the training data for LBBB, long QT syndrome, LVH, and physician-labeled “abnormal” ECGs.
September 4, 2025 at 12:03 PM
Even if you model a physical system, e.g. avg yearly temperature depending on height, and assume that temp given height is the same everywhere. If you invert it into predicting presence of mountain given temp, you’ll find varying discrimination in diff countries. Example from scholkopf’s talks
April 25, 2025 at 2:58 PM
You’ve modeled a system with no meaningful variation across environments. The model may be reliable in the tested environments but you haven’t shown robustness against variation in distributions as you haven’t observed any
April 25, 2025 at 2:56 PM
A question that remains is how these differences in environments may come about and what to do with this in practice? On this, I wrote a paper titled, available here: arxiv.org/abs/2409.01444

fin!
A causal viewpoint on prediction model performance under changes in case-mix: discrimination and calibration respond differently for prognosis and diagnosis predictions
Prediction models need reliable predictive performance as they inform clinical decisions, aiding in diagnosis, prognosis, and treatment planning. The predictive performance of these models is typicall...
arxiv.org
April 25, 2025 at 11:13 AM
if the distribution of outcome given features remains the same (Y|X), calibration is preserved. If both are the same, the environments were not meaningfully different to begin with!

a more lengthy explanation is in this blog post: wvanamsterdam.com/posts/250425...
wvanamsterdam.com
April 25, 2025 at 11:13 AM
as promised (so all of you can breathe normally again), here's my TLDR answer:

Environments must differ with respect to something. If the distribution of features given outcome remains the same (X|Y), discrimination is preserved;
April 25, 2025 at 11:13 AM
April 24, 2025 at 2:41 PM
what are the exceptions?
April 11, 2025 at 6:12 AM
2. an external reproduction of the PROTECT method from Manchester University with Charlie Cuniffe, Matt Sperrin and Gareth Price (www.nature.com/articles/s41...)

3. a 'causal' meta-analysis method using only aggregate data, exciting work with Qingyang Shi from Groningen University
Individual treatment effect estimation in the presence of unobserved confounding using proxies: a cohort study in stage III non-small cell lung cancer - Scientific Reports
Scientific Reports - Individual treatment effect estimation in the presence of unobserved confounding using proxies: a cohort study in stage III non-small cell lung cancer
www.nature.com
April 9, 2025 at 6:28 AM
Building in the physics is one way to potentially get the right causal mechanisms

In sofar as the model is trained on real world patient data, you'll still have to ensure no biases e.g. related to confounding creep in
December 23, 2024 at 8:01 AM
Not sure about overfitting, results seemed robust to 5-site cross validation.

It just learns correlations, what's wrong with that? The words 'confounders' and 'bias' make it sound they expected the model to yield some causal understanding. Maybe these heatmaps are the new table 2 fallacy
December 16, 2024 at 7:23 PM