Nathan Schaefer
nkschaefer.bsky.social
Nathan Schaefer
@nkschaefer.bsky.social
UCSF postdoc, human, mammal
Thanks for reading, and good luck checking IDs and keeping the rifraff out of your single cell data sets.

www.biorxiv.org/content/10.1...

github.com/nkschaefer/c...
March 24, 2025 at 6:47 PM
In total, our study demonstrates the need for this set of tools, which provide new functionality, speed, and/or accuracy over existing tools. It also demonstrates the power of pooled single cell studies, including those involving composite cell lines, to discover new and interesting biology.
March 24, 2025 at 6:38 PM
Back-mutations to the ancestral state at this type are uncommon, at a frequency typically seen in mitochondrial protein-coding or disease-implicated mutations. This suggests that this mutation may be one of the changes affecting gene regulation at this locus.
March 24, 2025 at 6:38 PM
The affected locus (MT-ND3/MT-ND4L) was found by others (bmcbiol.biomedcentral.com/articles/10....) to be cleaved by an unknown mechanism at a site that we noticed is next to a fixed, derived human-specific mutation that might affect cleavage rates by altering the 3D shape of the RNA.
Identification of human mitochondrial RNA cleavage sites and candidate RNA processing factors - BMC Biology
Background The human mitochondrial genome is transcribed as long strands of RNA containing multiple genes, which require post-transcriptional cleavage and processing to release functional gene product...
bmcbiol.biomedcentral.com
March 24, 2025 at 6:38 PM
Mitochondrial genes are expressed as polycistronic transcripts, then cleaved and selectively degraded. We looked at species differences in this process, from two causes: nuclear and mitochondrial mutations. Interestingly, the biggest differences we found were compensatory, with little net effect.
March 24, 2025 at 6:38 PM
By finding one fusion line that tended to retain both species’ mitochondria, we were able to hone in on the gene network involved in this process: we can see what was turned up in the unhealthy cells, and what was turned down in those that survived.
March 24, 2025 at 6:38 PM
We think this means incompatibility between allospecific mitochondria that causes gene dysregulation, as well as a nuclear self-destruct mechanism. Interestingly, a prior study also found that human cells have “a suicidal preference for self-mtDNA”: www.molbiolcell.org/doi/10.1091/...
Mechanisms of Human Mitochondrial DNA Maintenance: The Determining Role of Primary Sequence and Length over Function | Molecular Biology of the Cell
Although the regulation of mitochondrial DNA (mtDNA) copy number is performed by nuclear-coded factors, very little is known about the mechanisms controlling this process. We attempted to introduce no...
www.molbiolcell.org
March 24, 2025 at 6:38 PM
Cells with two species’ mitochondria have significantly altered gene expression related to cell cycle arrest and apoptosis relative to other cells, suggesting they’re in trouble. They also express fewer mitochondrial transcripts overall and have abnormal post-expression transcriptional regulation.
March 24, 2025 at 6:38 PM
After demultiplexing with CellBouncer, we found that composite cells mostly inherit only one species’ mitochondria: human, for human/chimpanzee cells, and bonobo, for chimpanzee/bonobo cells. Not always, though: some cells retained both mitochondria, or those from the less common species.
March 24, 2025 at 6:38 PM
We take CellBouncer for a spin on a cool data set: inter-species composite iPSCs we created by cell fusion (www.nature.com/articles/s41...) for studying species differences in gene regulation. Here, we asked if there were biases in which species’ mitochondria were inherited by the composite cells.
March 24, 2025 at 6:38 PM
doublet_dragon takes assignments from the other programs and infers a global doublet rate that encompasses both homotypic doublets (invisible to individual programs) and heterotypic ones. This can help with QC (given expectation based on cell loading density) and serve as a prior for other tools.
March 24, 2025 at 6:38 PM
demux_tags assigns custom labels (e.g. MULTIseq/HTO data), or sgRNAs (CRISPR guide capture data) to cells. Our method considers the distribution of all tag counts together, rather than considering each tag independently, and handles noisy/low-count data better than some alternatives.
March 24, 2025 at 6:38 PM
bulkprops takes genotypes and bulk data (or single cell data, ignoring cell barcodes) and infers the proportion of each individual in the pool. This can cross-check the other programs, and we provide a method to bootstrap proportions and get p-values when comparing two sets of proportions.
March 24, 2025 at 6:38 PM
Additionally, quant_contam models the genotypic origins of ambient RNA, meaning it can highlight when specific donors or cell lines contribute disproportionately to ambient RNA. If expression data are provided, quant_contam can adjust counts to account for contamination.
March 24, 2025 at 6:38 PM
quant_contam quantifies ambient RNA by measuring how often cells mismatch their expected genotypes. This introduces an external ground truth (genotype data), avoids the need to consider empty droplets, and can find ambient RNA in data lacking cell type diversity.
March 24, 2025 at 6:38 PM
After running demux_mt, we suggest a pipeline that can produce a VCF file of nuclear variants and demultiplex more cells using demux_vcf. While not suited to every data set, we demonstrate this method on whole-cell RNA-seq and single nucleus ATAC data, outperforming competing methods.
March 24, 2025 at 6:38 PM
demux_mt answers this problem by simultaneously clustering mitochondrial haplotypes and inferring the number of individuals in the pool. It takes only a BAM file. There is also a way to plot the haplotypes to see how well clustering worked.
March 24, 2025 at 6:38 PM
If you don’t have preexisting genotype data, there are tools to assign cells to individuals of origin by clustering genotypes (Vireo, souporcell, scSplit, freemuxlet), but there’s not a clear way to check results, and they can make mistakes.
March 24, 2025 at 6:38 PM
demux_vcf assigns cells to individuals using genotypes and is fast, accurate, and robust to deep population structure. It groups SNPs by allelic state in each pair of individuals and compares the likelihood of each pair of IDs for each cell, improving speed over methods that filter or refine SNPs.
March 24, 2025 at 6:38 PM
demux_species uses an alignment-free k-mer counting strategy to save time and memory and assigns cells to species using a statistical model instead of a cutoff. Users can plot the clustered k-mer counts to see if it worked. demux_species also separates reads by species for downstream processing.
March 24, 2025 at 6:38 PM
CellBouncer provides fast, compiled, self-contained, interacting programs with methods to validate results where possible (e.g. you can visually compare two sets of IDs for the same cells, and you can visualize inferred mitochondrial haplotypes to determine how well the clustering worked).
March 24, 2025 at 6:38 PM
…and tools that can identify specific types of cell doublets, but cannot calculate a global doublet rate (which includes droplets containing two cells of the same type).
March 24, 2025 at 6:38 PM
…genotype-free demultiplexing tools that lack a validation method, ambient RNA removal tools that require cell type heterogeneity, custom tag (e.g. MULTIseq, HTO) or sgRNA (e.g. CRISPR guide capture) assignment strategies that fail when data are sparse or noisy, …
March 24, 2025 at 6:38 PM
Useful bioinformatic tools for demultiplexing and QCing pooled data exist. We identified several unmet needs, though, including: no dedicated method for species demultiplexing, slow genotype demultiplexing with large SNP panels, sensitivity to deep population structure in SNP reference panels…
March 24, 2025 at 6:38 PM
Pooling cells from multiple donors, cell lines, or species makes it easy to scale up experiments, incorporate genetic variation, and mitigate technical artifacts, while doing cool things like disentangling the effects of cell-extrinsic from cell-intrinsic variation (www.nature.com/articles/s41...).
Human neuronal maturation comes of age: cellular mechanisms and species differences - Nature Reviews Neuroscience
Human cortical neurons undergo a protracted period of postmitotic maturation compared with those of other species. Wallace and Pollen review the cell-intrinsic and cell-extrinsic mechanisms that gover...
www.nature.com
March 24, 2025 at 6:38 PM