Maarten van Smeden
maartenvsmeden.bsky.social
Maarten van Smeden
@maartenvsmeden.bsky.social

statistician • associate prof • team lead health data science and head methods research program at julius center • director ai methods lab, umc utrecht, netherlands • views and opinions my own

Mathematics 52%
Public Health 16%

Prediction models that are used to guide medical decisions are usually regulated under medical device regulation. This means, putting a calculator out there to promote the use your new prediction model is likely to break some rules.

The lasso works really well in particular settings and for particular purposes. If you are after high prediction performance alone and you have a rather large sample size, it can be an excellent choice indeed. But most analytical goals are not only about prediction

Kind reminder: data driven variable selection (e.g. forward/stepwise/univariable screening) makes things *worse* for most analytical goals

NEW FULLY FUNDED PHD POSITION

Looking for a motivated PhD candidate to join our team. Together with Danya Muilwijk, Jeffrey Beekman and I, you will explore opportunities and limitations of AI in the context of organoids

For more info and for applying 👉
www.careersatumcutrecht.com/vacancies/sc...
Vacancy — PhD position on AI methodology for prediction of patient outcomes using organoid models
Are you passionate about bringing personalized medicine to the next level and make real impact in healthcare? Join our team and develop novel AI methodology to improve predictions of relevant patient ...
www.careersatumcutrecht.com

This is right tho. Let’s therefore call them sensitivity positive predictive value curves bsky.app/profile/laur...
9. It's annoying how often the same model is "discovered" in a different field, with a completely different set of jargon

No.
5. You should use a precision-recall curve for a binary classifier, not an ROC curve

I wonder who those people are who come here dying to know what GenAI has done with some prompt you put in

If you think AI is cool, wait until you learn about regression analysis

TL;DR: Explainable AI models often don't do a good job explaining. They can be very useful for description. We should be really careful when using Explainable AI in clinical decision making, and even when judging face validity of AI models

Excellently led by @alcarriero.bsky.social

NEW PREPRINT

Explainable AI refers to an extremely popular group of approaches that aim to open "black box" AI models. But what can we see when we open the black AI box? We use Galit Shmueli's framework (to describe, predict or explain) to evaluate

arxiv.org/abs/2508.05753

This is, however, not clever or safe writing, it is a bad collective habit that needs to stop. Not by avoiding references to causality but by clear referencing to it

pubmed.ncbi.nlm.nih.gov/37286459/
Guidelines for Reporting Observational Research in Urology: The Importance of Clear Reference to Causality - PubMed
Observational studies often dance around the issue of causality. We propose guidelines to ensure that papers refer to whether or not the study aim is to investigate causality, and suggest language to ...
pubmed.ncbi.nlm.nih.gov

The healthcare literature is filled with "risk factors". This word combination makes research findings sound important by implying causality, while avoiding direct claims of having identified causal associations that are easily critiqued.
9. It's annoying how often the same model is "discovered" in a different field, with a completely different set of jargon

Reposted by Maarten van Smeden

5. You should use a precision-recall curve for a binary classifier, not an ROC curve

Reposted by Maarten van Smeden

Wait people are sending MDPI cash money?

Reposted by Maarten van Smeden

Once my selection had been approved by the AAAS board, I spoke with Rush Holt about details. He said that the salary would be the same as the current EiC Marcia McNutt, namely $500, 000/year. I was surprised and, frankly, a bit confused.

10/n
a large pile of gold coins is being poured out of a vault
ALT: a large pile of gold coins is being poured out of a vault
media.tenor.com
Periodic reminder the world of data analysis cannot be meaningfully categorised into "machine learning" and "statistics". Two cultures with substantial overlap in the use of methods (e.g. logistic regression), analytical goals (e.g. causal inference) and history

jamanetwork.com/journals/jam...
Wrote Scientific Reports February 8 2024 that a newly published meta-analysis on mindfulness & brain morphology excluded all null-findings and therefore ... by definition found a relationship.

Still no proper response from the journal (other then many "we'll look into it"). It's been a year now.

Reposted by Maarten van Smeden

Reposted by Maarten van Smeden

I'm now audience captured. A few more gems:

Reposted by Maarten van Smeden

What is common knowledge in your field, but shocks outsiders?

We're not clear on what peer review is, at all.
What is common knowledge in your field, but shocks outsiders?

We’re not clear on what intelligence is, at all
What is common knowledge in your field, but shocks outsiders?

We’re not clear on what *information* is, at all

And taking this analogy one step further: it gives genuine phone repair shops a bad name

When forced to make a choice, my choice will be logistic regression model over linear probability model 103% of the time

Reposted by Maarten van Smeden

Post just up: Is multiple imputation making up information?

tldr: no.

Includes a cheeky simulation study to demonstrate the point.
open.substack.com/pub/tpmorris...

Reposted by Maarten van Smeden

You can have all the omni-omics data in the world and the bestest algorithms, but eventually a predicted probability is produced & it should be evaluated using well-established methods, and correctly implemented in the context of medical decision making.

statsepi.substack.com/i/140315566/...

Clients: “I want to find real, meaningful clusters”
Me: “I want world peace, which is more likely to happen than what you want”

Depending which methods guru you ask every analytical task is “essentially” a missing data problem, a causal inference problem, a Bayesian problem, a regression problem or a machine learning problem

In medicine they are called "risk factors" and, of course, you want all "important" risk factors in your model all the time

Unless a risk factor is not statistically significant then you can drop that factor without issues

Also, the fact that a model with the best AUC doesn't always mean the model makes the best predictions is lost in such cases too