Lightnews — Scholar-powered news

Luke Zappia

@lazappi.bsky.social

We focused on feature selection methods, but we compared scANVI and Harmony/Symphony to our baseline of scVI. Feature selection methods performed similarly but scANVI scored higher overall and Symphony worse, particularly at unseen population detection. More work is needed to understand why.

12/16

Figure showing the comparison between scVI, scANVI and Harmony/Symphony integration methods. a) metric category scores for each feature selection and integration method. b) difference in metric scores for scANVI and Symphony compared to scVI. c) metric category ranks for each feature selection and integration method. d) difference in ranks for scANVI and Symphony compared to scVI.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

What about lineage-specific integration? Using subsets of the Human Lung Cell Atlas we saw poorer performance overall on lineages compared to the full dataset, particularly for unseen population detection, but a full study is needed to properly answer this.

11/16

Figure showing performance on subsets of the Human Lung Cell Atlas. a) shows scores for metric categories on the full HLCA, the immune lineage and the epithelial lineage. b) is a heatmap of the Jaccard index of the features selected on different subsets. c) shows the proportion of marker genes selected by each method on each subset. d) shows Milo scores for identifying unseen populations on the full HLCA and lineage subsets, as well as the difference between lineages compared to the full dataset.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

Highly variable features performed consistently well, especially the Seurat VST method. Supervised marker genes also score highly but are more variable and require cell labels. Check out triku for an alternative approach that performs similarly.

9/16

Figure showing the comparison of feature selection methods. a) shows the overall scores and ranks for each metric category. b) is a heatmap of Jaccard index between feature sets selected by each method. c) shows the number of common features selected by different numbers of methods on each dataset. d) shows the number of features selected by methods that automatically choose the number of features. e) is a heatmap of the difference in metric category shows for batch-aware variants of methods compared to the standard version.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

Most methods require setting a number of features. We tried different numbers for some common methods and used 2000 for the rest of the benchmark. Slightly more features improves query metrics while slightly less improves the integration, but this should be tuned to your dataset and use case.

8/16

Figure showing the selection of the number of features to use. a) shows the overall trend in metric category scores as the number of selected features increases. b) and c) shows heatmaps of the trends for each dataset and selection method.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

Even well-designed metrics have different effective ranges. We used a set of positive and negative baseline methods to scale each metric to a range that was meaningful for this task, providing extra context. Scaled scores were combined to summarise each metric category.

7/16

Figure showing metric baselines and scaling. a) shows the effective range for each metric calculated using the baseline methods. b) shows the scaling procedure including measuring metrics, scaling using baselines, averaging by metric type and calculating an overall score.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

We spent a lot of time selecting a final set of effective, non-redundant metrics, independent from technical factors. We did this by simulating methods using random features. This was an important part of the study and something I think more benchmarks should show.

6/16

Figure showing the metric selection step. a) shows the metric selection workflow and b) shows the observed range, correlation with number of features, correlation with dataset features and correlation between metrics for each metric.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

The benchmark was implemented as a @Nextflow workflow with each step a separate R or Python script. Having this set up before starting the project allowed everyone to start contributing right away. Check out the code on GitHub github.com/theislab/atl....

5/16

Diagram showing the structure of the Nextflow workflow used for the feature selection benchmark. Steps are connected by coloured lines showing the flow of different information.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

We used a standard benchmark design. Test datasets were split into query and reference with features selected on the reference. This was then integrated and the query samples mapped. Metrics then measured different aspects of integration and reference usage.

3/16

Diagram showing the structure of the feature selection benchmark, including feature selection with different methods, integration, query mapping and calculating metrics at different stages.

March 18, 2025 at 3:40 PM

Luke Zappia

@lazappi.bsky.social

Final #scverse conference keynote Fabian Theis "From scanpy to the virtual cell: the coming-of-age of single cell analysis" #sketchnotes

Sketchnotes from the final scverse conference keynote Fabian Theis "From scanpy to the virtual cell: the coming-of-age of single cell analysis"

September 12, 2024 at 10:18 AM

Luke Zappia

@lazappi.bsky.social

#scverse conference keynote Maria Brbic "Towards AI-driven discoveries in Single-Cell genomics" #sketchnotes

Sketchnotes for scverse conference keynote Maria Brbic "Towards AI-driven discoveries in Single-Cell genomics"

September 12, 2024 at 9:27 AM

Luke Zappia

@lazappi.bsky.social

#scverse conference keynote Alex Wolf "Many anecdotes make a novel? Study-centered analysis & training models" #sketchnotes

Sketchnotes for scverse conference keynote Alex Wolf "Many anecdotes make a novel? Study-centered analysis & training models"

September 11, 2024 at 12:24 PM

Luke Zappia

@lazappi.bsky.social

#scverse conference keynote Christina Leslie "Machine learning for regulatory genomics at single-cell resolution" #sketchnotes

Sketchnotes for scverse conference keynote Christina Leslie "Machine learning for regulatory genomics at single-cell resolution"

September 11, 2024 at 9:47 AM

Luke Zappia

@lazappi.bsky.social

Angela Oliveira Pisco #scverse conference keynote "Multimodal Atlas for Biological Data Analysis and Drug Discovery" #sketchnotes

Sketch notes for Angela Oliveira Pisco scverse conference keynote "Multimodal Atlas for Biological Data Analysis and Drug Discovery"

September 10, 2024 at 12:24 PM

Luke Zappia

@lazappi.bsky.social

#scverse conference first keynote @robp.bsky.social "Upstream of the #singlecell data deluge" #sketchnotes

Sketch notes of Rob Patro's keynote at the scverse conference on "Upstream of the single-cell data deluge"

September 10, 2024 at 9:15 AM

Luke Zappia

@lazappi.bsky.social

Using alpha to represent variability with size showing mean and colour showing direction. I don't think this adds much over colour=mean, size=SD and is harder to understand.

Heatmap of squares with size indicating mean, transparency indicating variability and colour indicating direction

April 11, 2024 at 8:48 AM

Luke Zappia

@lazappi.bsky.social

Some ideas:

- Scale size by variability (circle/square)
- Donuts (or squares with holes) where outside size is Mean+SD and inside size is Mean-SD

Heatmap of circles where colour shows mean and shows shows standard deviation

Heatmap of squares where colour shows mean and shows shows standard deviation

Heatmap of donuts where colour shows mean, inner radius shows Mean-SD and outer radius Mean+SD

Heatmap of square donuts where colour shows mean, inner radius shows Mean-SD and outer radius Mean+SD

April 11, 2024 at 8:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news