Li Song
mourisl.bsky.social
Li Song
@mourisl.bsky.social
毛利元光. Assistant Prof at the Department of Biomedical Data Science at Dartmouth College. Research on bioinformatics, algorithms. Lab page: mourisl.github.io
Reposted by Li Song
Introns have to come from somewhere, right? @celineh2ooo.bsky.social and I looked at multiple genome alignments with 1000s of genomes and found 342 cases where humans (and our relatives) had gained a new intron. Still not sure where these come from, but it's a fascinating question
@celineh2ooo.bsky.social and @stevensalzberg.bsky.social compared 3,493 vertebrate genomes to identify 342 gains of introns in human genes, tracing their origins and identifying cases of intronization as a mechanism of intron emergence.

🔗 doi.org/10.1093/gbe/evaf091

#genome #evolution #introns
June 4, 2025 at 8:13 PM
Reposted by Li Song
Neng Huang developed longcallR for joint SNP calling and phasing from long RNA-seq reads, AND for identifying allele-specific splicing/junctions (ASJ). Although ASJs of statistical significance are rare, a large fraction involve unannotated junctions. In Rust!
SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads https://www.biorxiv.org/content/10.1101/2025.05.26.656191v1
May 30, 2025 at 2:54 PM
Reposted by Li Song
Industry friends, now is the time for MUCH more speaking out on behalf of academic colleagues under duress. Here are core open source methods that many of your products doubtlessly depend on either directly or indirectly (see en.wikipedia.org/wiki/HMMER) being abruptly defunded. Make noise.
May 29, 2025 at 2:39 PM
Reposted by Li Song
Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social).

myloasm-docs.github.io
myloasm - metagenomic assembly with (noisy) long reads
myloasm-docs.github.io
May 28, 2025 at 5:54 PM
Reposted by Li Song
Excited to share a new update to Mumemto, scaling MUM and conserved element finding to any size pangenome! Preprint out now w/ @benlangmead.bsky.social.
Mumemto scales to the new HPRC v2 release and beyond, and can merge in future assemblies without any recomputation! 1/n
Partitioned Multi-MUM finding for scalable pangenomics
Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We prev...
www.biorxiv.org
May 27, 2025 at 7:35 PM
Centrifuger has updated the pre-built index list to include this exciting GTDB new release r226 for taxonomic classification of sequencing data: github.com/mourisl/cent.... There is also a gtdb+refseq human/virus/fungi/contaminants index, hopefully will be useful for human microbiome studies.
May 27, 2025 at 3:58 PM
Reposted by Li Song
Great 🧵 by Pierre on the Kaminari paper! In short, Kaminari is a simple and elegant, but highly effective index for approximate colored k-mer queries. The simplicity leads to very fast query, but with accuracy consistent with (or exceeding) best-in-class solutions; a very fun collaboration indeed!
📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8
May 27, 2025 at 3:41 PM
Reposted by Li Song
Bioinformatics folks: check out our @biorxivpreprint on a new, very efficient and accurate system for automated genome annotation, EviAnn, led by my colleague Aleksey Zimin: www.biorxiv.org/content/10.1...
Efficient evidence-based genome annotation with EviAnn
For many years, machine learning-based ab initio gene finding approaches have been the central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these app...
www.biorxiv.org
May 13, 2025 at 5:52 PM
Reposted by Li Song
Check out our latest collaboration with UniProt, who has integrated over 700,000 experimentally validated epitopes to enhance its protein entries with detailed immune response information. This data is accessible via the UniProt Feature Viewer and API! 💻🔬🧪 #collaboration #immunology #proteins
Inside UniProt
Rich Epitope Information Comes to UniProt Mammalian immune responses are mediated by interactions between antigens and immune system compo...
insideuniprot.blogspot.com
May 9, 2025 at 12:35 AM
Reposted by Li Song
The deadline for WABI 2025 has been extended (but is still rapidly approaching) wabiconf.github.io/2025/

* abstract deadline: May 12 (AoE)
* paper deadline: May 15 (AoE)

Consider submitting your exciting algorithmic bioinformatics work to the WABI conference!
WABI 2025
WABI Conference on Algorithms in Bioinformatics
wabiconf.github.io
May 7, 2025 at 7:14 PM
Forgot to dustmasker the genomes before creating a Centrifuger index and indeed saw some misclassifications. Took a while to figure out and lessons learned... Need to implement a built-in masking step like Kraken2 in case forget doing it in the future..
May 4, 2025 at 6:25 AM
Reposted by Li Song
Extracting @NCBI SRA files with fasterq-dump can require 17x the size of the accession while decompressing. Our new tool xsra extracts sequences at 5x throughput with significantly less disk usage, built-in compression, and optional BINSEQ outputs

github.com/arcInstitute...
GitHub - ArcInstitute/xsra: An efficient CLI to extract sequences from the SRA
An efficient CLI to extract sequences from the SRA - ArcInstitute/xsra
github.com
April 29, 2025 at 9:03 PM
Reposted by Li Song
Small update from AllTheBacteria (allthebacteria.org). Assemblies can be bulk downloaded from OSF as before, or you can now get individual assemblies from AWS. We now also have a LexicMap index on AWS, so you can align your favourite gene against 2.4million bacteria (next post for price estimates)
AllTheBacteria
allthebacteria.org
April 29, 2025 at 3:36 PM
Reposted by Li Song
The Department of Human Genetics at the University of Utah is sponsoring the Rising Stars in Genetics and Genomics symposium!

- We are seeking nominations bu June 1.
- September 18-19, 2025
- Please share with the star postdocs that you know.

docs.google.com/forms/d/e/1F...
April 28, 2025 at 5:20 PM
Reposted by Li Song
The sequence analysis session of #RECOMB2025 is off to a great start with @jimshaw.bsky.social presenting devider, a new algorithm for haplotyping small sequences from long-read sequencing.

www.biorxiv.org/content/10.1...
April 27, 2025 at 1:27 AM
Reposted by Li Song
If you want to check if a human gene has copy-number changes or lands in a complex region, try pangene.bioinweb.org. Recently updated with more and better assemblies.
April 26, 2025 at 1:06 AM
Time to build a new index!!
GTDB release 10 based on RefSeq 226 (R10-RS226) is live at gtdb.ecogenomic.org. This release covers 732,475 genomes (22% increase) and has 143,6141 species clusters (37% increase). Release notes at: forum.gtdb.ecogenomic.org/t/announcing.... Release statistics at: gtdb.ecogenomic.org/stats/r226.
GTDB - Genome Taxonomy Database
The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny.
gtdb.ecogenomic.org
April 24, 2025 at 1:54 AM
Reposted by Li Song
Minimap2-2.29 released with the support of short RNA-seq read alignment. More explanation and results here: lh3.github.io/2025/04/18/s...
Short RNA-seq read alignment with minimap2
lh3.github.io
April 18, 2025 at 9:53 PM
Reposted by Li Song
Reposted by Li Song
minimap2 adds support for short read spliced RNA-seq alignment! lh3.github.io/2025/04/18/s...
Short RNA-seq read alignment with minimap2
lh3.github.io
April 18, 2025 at 9:58 PM
Reposted by Li Song
New set of thesis figures on pairwise alignment just dropped!
- schematic and worked example for many algorithms
- alignment modes
March 27, 2025 at 4:31 PM
Reposted by Li Song
fqgrep release 1.1.0 now speeds up searching FASTQ files!

Thank-you to both Markus Schlegel from @activegroupgmbh.bsky.social for updating seq_io and Nicholas D. Crosbie of grepq for some competition and inspiration.

See more: github.com/fulcrumgenom...
GitHub - fulcrumgenomics/fqgrep: Grep for FASTQ files
Grep for FASTQ files. Contribute to fulcrumgenomics/fqgrep development by creating an account on GitHub.
github.com
March 14, 2025 at 5:45 PM