Nathan Schaefer
nkschaefer.bsky.social
Nathan Schaefer
@nkschaefer.bsky.social
UCSF postdoc, human, mammal
Thanks for reading, and good luck checking IDs and keeping the rifraff out of your single cell data sets.

www.biorxiv.org/content/10.1...

github.com/nkschaefer/c...
March 24, 2025 at 6:47 PM
Back-mutations to the ancestral state at this type are uncommon, at a frequency typically seen in mitochondrial protein-coding or disease-implicated mutations. This suggests that this mutation may be one of the changes affecting gene regulation at this locus.
March 24, 2025 at 6:38 PM
Mitochondrial genes are expressed as polycistronic transcripts, then cleaved and selectively degraded. We looked at species differences in this process, from two causes: nuclear and mitochondrial mutations. Interestingly, the biggest differences we found were compensatory, with little net effect.
March 24, 2025 at 6:38 PM
After demultiplexing with CellBouncer, we found that composite cells mostly inherit only one species’ mitochondria: human, for human/chimpanzee cells, and bonobo, for chimpanzee/bonobo cells. Not always, though: some cells retained both mitochondria, or those from the less common species.
March 24, 2025 at 6:38 PM
We take CellBouncer for a spin on a cool data set: inter-species composite iPSCs we created by cell fusion (www.nature.com/articles/s41...) for studying species differences in gene regulation. Here, we asked if there were biases in which species’ mitochondria were inherited by the composite cells.
March 24, 2025 at 6:38 PM
doublet_dragon takes assignments from the other programs and infers a global doublet rate that encompasses both homotypic doublets (invisible to individual programs) and heterotypic ones. This can help with QC (given expectation based on cell loading density) and serve as a prior for other tools.
March 24, 2025 at 6:38 PM
demux_tags assigns custom labels (e.g. MULTIseq/HTO data), or sgRNAs (CRISPR guide capture data) to cells. Our method considers the distribution of all tag counts together, rather than considering each tag independently, and handles noisy/low-count data better than some alternatives.
March 24, 2025 at 6:38 PM
bulkprops takes genotypes and bulk data (or single cell data, ignoring cell barcodes) and infers the proportion of each individual in the pool. This can cross-check the other programs, and we provide a method to bootstrap proportions and get p-values when comparing two sets of proportions.
March 24, 2025 at 6:38 PM
quant_contam quantifies ambient RNA by measuring how often cells mismatch their expected genotypes. This introduces an external ground truth (genotype data), avoids the need to consider empty droplets, and can find ambient RNA in data lacking cell type diversity.
March 24, 2025 at 6:38 PM
demux_mt answers this problem by simultaneously clustering mitochondrial haplotypes and inferring the number of individuals in the pool. It takes only a BAM file. There is also a way to plot the haplotypes to see how well clustering worked.
March 24, 2025 at 6:38 PM
demux_vcf assigns cells to individuals using genotypes and is fast, accurate, and robust to deep population structure. It groups SNPs by allelic state in each pair of individuals and compares the likelihood of each pair of IDs for each cell, improving speed over methods that filter or refine SNPs.
March 24, 2025 at 6:38 PM
demux_species uses an alignment-free k-mer counting strategy to save time and memory and assigns cells to species using a statistical model instead of a cutoff. Users can plot the clustered k-mer counts to see if it worked. demux_species also separates reads by species for downstream processing.
March 24, 2025 at 6:38 PM
CellBouncer provides fast, compiled, self-contained, interacting programs with methods to validate results where possible (e.g. you can visually compare two sets of IDs for the same cells, and you can visualize inferred mitochondrial haplotypes to determine how well the clustering worked).
March 24, 2025 at 6:38 PM
Introducing CellBouncer, a toolkit for pooled single cell data that assigns cells to species or individual of origin, performs genotype-free ids using mitochondrial haplotypes, assigns sgRNAs and custom tags to cells, and models ambient RNA using external genotype data as a ground truth.
March 24, 2025 at 6:38 PM
Back-mutations to the ancestral state at this type are uncommon, at a frequency typically seen in mitochondrial protein-coding or disease-implicated mutations. This supports the idea that this mutation could be one of the changes affecting gene regulation at this locus.
March 24, 2025 at 6:03 PM
Mitochondrial genes are expressed as polycistronic transcripts, then cleaved and selectively degraded. We looked at species differences in this, from two causes: nuclear and mitochondrial genome mutations. Interestingly, the biggest differences we found were compensatory, with little net effect.
March 24, 2025 at 6:03 PM
Cells with two species’ mitochondria have significantly altered gene expression related to cell cycle arrest and apoptosis relative to other cells, suggesting they’re in trouble. They also express fewer mitochondrial transcripts overall and have abnormal post-expression transcriptional regulation.
March 24, 2025 at 6:03 PM
After demultiplexing with CellBouncer, we found that composite cells mostly inherit only one species’ mitochondria: human, for human/chimpanzee cells, and bonobo, for chimpanzee/bonobo cells. Not always, though: some cells retained both mitochondria, or those from the less common species.
March 24, 2025 at 6:03 PM
We take CellBouncer for a spin on a cool data set: inter-species composite iPSCs we created by cell fusion (www.nature.com/articles/s41...) for studying species differences in gene regulation. Here, we asked if there were biases in which species’ mitochondria were inherited by the composite cells.
March 24, 2025 at 6:03 PM
doublet_dragon takes assignments from the other programs and infers a global doublet rate that encompasses both homotypic doublets (invisible to individual programs) and heterotypic ones. This can help with QC (given expectation based on cell loading density) and serve as a prior for other tools.
March 24, 2025 at 6:03 PM
demux_tags assigns custom labels (e.g. MULTIseq/HTO data), or sgRNAs (CRISPR guide capture data) to cells. Our method considers the distribution of all tag counts together, rather than considering each tag independently, and handles noisy/low-count data better than some alternatives.
March 24, 2025 at 6:03 PM
bulkprops takes genotypes and bulk data (or single cell data, ignoring cell barcodes) and infers the proportion of each individual in the pool. This can cross-check the other programs, and we provide a method to bootstrap proportions and get p-values when comparing two sets of proportions.
March 24, 2025 at 6:03 PM
quant_contam quantifies ambient RNA by measuring how often cells mismatch their expected genotypes. This introduces an external ground truth (genotype data), avoids the need to consider empty droplets, and can find ambient RNA in data lacking cell type diversity.
March 24, 2025 at 6:03 PM
demux_vcf assigns cells to individuals using genotypes and is fast, accurate, and robust to deep population structure. It groups SNPs by allelic state in each pair of individuals and compares the likelihood of each pair of IDs for each cell, improving speed over methods that filter or refine SNPs.
March 24, 2025 at 6:03 PM
demux_species uses an alignment-free k-mer counting strategy to save time and memory and assigns cells to species using a statistical model instead of a cutoff. Users can plot the clustered k-mer counts to see if it worked. demux_species also separates reads by species for downstream processing.
March 24, 2025 at 6:03 PM