Rafael Irizarry
rafalab.bsky.social
Rafael Irizarry
@rafalab.bsky.social
Applied statistician. I tweet data-driven observations, data science educational materials, academic research updates, and the occasional joke.
We agree. I meant function in its most basic definition: every p-dimensional x in your dataset is mapped to a unique 2-dimensional f(x). I did not claim f is defined for all p-dimensional xs.

My point is that this f is difficult or impossible to describe. In contrast, we can write it down for PCA.
December 24, 2024 at 10:47 PM
I like it. Thank for the tip.
December 24, 2024 at 10:28 PM
I agree. I don't use them at all with genomics data, especially sparse noisy scRNA-Seq data.

It does appear to perform impressively well with high signal-to-noise ratio datasets, such as MNIST.
December 24, 2024 at 1:23 PM
It depends on the specific biological insight you want to highlight or communicate.
December 24, 2024 at 1:05 PM
The inertia here is strong. I have not been unable to convince collaborators to not use them on papers I am a co-author on...
I'll keep trying though.

At some point I might give up as I did with pie charts: simplystatistics.org/posts/2012-1...
Simply Statistics: I give up, I am embracing pie charts
simplystatistics.org
December 24, 2024 at 12:58 PM
As made clear in the blogpost I am not against UMAP either. But when I see a plot in a paper, I want to understand what I am being shown and why.

Other than to show different cell types have different expression patterns, which I already know, or to decorate,
why use UMAP to display in 2D?
December 24, 2024 at 12:49 PM
What am I supposed to learn from that plot? Cell types have different expression patterns. Those are markers for different cell types. So, this just confirms something obvious. Do all those non-linear shapes and tiny clusters represent anything biological?
December 24, 2024 at 12:40 PM
To be clear, as the post explains, UMAP can be useful for exploratory data analysis. My concern is their inclusion in papers as if they were results. What exactly is the reader supposed to learn? And how often are we misdirected by false clusters or artifactual shapes?
December 23, 2024 at 7:35 PM
Can you explain what the axes represent?

As mentioned in the post, UMAP can be useful for exploring data. But why are plots included in papers? What is the reader supposed to get out of them? The 2D distance between points can't be interpreted.

It seems the only reason is because they are pretty.
December 23, 2024 at 7:25 PM
There are plenty of alternatives. They don’t produce flashy art work but they do provide scientific insights.

If journals want art work no need to pretend we are analyzing data. Just paint pretty pictures.
December 23, 2024 at 5:20 PM
This is unfortunately true. I would say the main reasons are that the subject is hard and deep understanding is not incentivized enough.

But note understanding UMAP is much harder than understanding pvalues.
December 23, 2024 at 2:54 PM
To apply use links below:

1️⃣ Tenure-track (any rank) in AI/ML academicpositions.harvard.edu/postings/14387

2️⃣ Assistant Professor in Single Cell Genomics academicpositions.harvard.edu/postings/14416

3️⃣ Lecturer & Director Training/Education careers.dana-farber.org/job/9572/dir...
Assistant/Associate/Full Professor of Data Science and Biostatistics
The Departments of Biostatistics at Harvard T.H. Chan School of Public Health and Data Science at the Dana-Farber Cancer Institute provide exceptional environments to pursue research and education in ...
academicpositions.harvard.edu
November 26, 2024 at 3:07 PM
Starting to work on it 😅
September 22, 2023 at 1:55 PM