Vladimir Shitov
shitovhappens.bsky.social
Vladimir Shitov
@shitovhappens.bsky.social
Computational biologist, data scientist, PhD candidate @ Lücken lab, Helmholtz Munich
Fun fact: it was supposed to be a quick one-month project on the intersection of ethics and single-cell research to produce a one-page comment. But we got carried away and wrote a bit more 😅 I hope you learn something useful! I certainly did when working on it. 10/10
February 19, 2025 at 6:49 PM
Want to see more examples and details? Check out the full publication: nature.com/articles/s41...

Thanks to all co-authors, especially @theresawillem.bsky.social, who did most of the work,
Malte Lücken, who initialised the collaboration, and
@fabiantheis.bsky.social. 9/10
Biases in machine-learning models of human single-cell data - Nature Cell Biology
This Perspective discusses the various biases that can emerge along the pipeline of machine learning-based single-cell analysis and presents methods to train models on human single-cell data in order ...
nature.com
February 19, 2025 at 6:49 PM
6. Result interpretation bias. The complexity of modern methods sometimes leads to wrong interpretation of the results. The literature knows examples of UMAP-based conclusions or praising useless models because of data leakage to the metrics. 8/10
February 19, 2025 at 6:49 PM
5. Machine learning bias. Batch effects in the data, not considering outliers, limitations of the used models, or wrong metrics can all lead to incorrect results. 7/10
February 19, 2025 at 6:49 PM
4. Single-cell sequencing bias. Some cell types are often missing in the data for technical reasons (e.g. neutrophils). And even for captured cells, we don't see all RNA copies because of the dropout. 6/10
February 19, 2025 at 6:49 PM
3. Cohort bias. Number of donors in SC studies is still quite low (see previous post: x.com/shitov_happe..., sorry for X link). Moreover, most of the samples in the datasets come from individuals with European ancestry. This can limit the generalization of conclusions to other populations. 5/10
February 19, 2025 at 6:49 PM
2. Clinical bias. Patients with different conditions are not sampled uniformly. Especially, "healthy" controls might not reflect a population norm well. Not everyone wants to donate a piece of their lung or a brain for science. 4/10
February 19, 2025 at 6:49 PM
1. Societal bias. The samples likely come from clinics or research institutions with quite some money to run single-cell experiments. Not everyone might have access to them. Be careful when extrapolating your conclusions to the general population. 3/10
February 19, 2025 at 6:49 PM
Recently, a number of methods emerged for working with single-cell data at the sample level. We call them sample (in a clinical context – patient) representation methods. They enable patient stratification, prognostic and diagnostic capabilities. But be aware of the biases! 2/10
February 19, 2025 at 6:49 PM
That led to amazing comebacks sometimes. An ace could massacre an entire group, but then meet a six and lose the army. Also it was fascinating to think about the best strategies where to put your strongest and weakest cards
January 29, 2025 at 11:36 PM
We used to have a card game as kids. Everyone has the same set and puts cards on the floor face down. Players move step by step. When cards of enemies meet, faces are revealed and the higher in order card wins, the other one dies. The highest card (ace) can only be beaten by the weakest (six)
January 29, 2025 at 11:33 PM