Wei Shen 沈 伟
banner
shenwei356.bsky.social
Wei Shen 沈 伟
@shenwei356.bsky.social
Associate professor of Bioinformatics at Chongqing Medical University, China. Lab: https://mbio.info, Personal: https://shenwei.me, https://x.com/shenwei356
Pinned
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
Reposted by Wei Shen 沈 伟
Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...
Genome size estimation from long read overlaps
AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin
academic.oup.com
November 7, 2025 at 3:19 AM
Reposted by Wei Shen 沈 伟
Thread on #GI2025 's second day! 👇🏻
Second day of Genome Informatics #GI2025 began with the session “Genome Assembly and Sequence Algorithms" Yun William Yu presented “Average-case Analysis of Seed-Chain-Extend under Random Mutations"
genome.cshlp.org/content/33/7/1175
providing theoretical guarantees for the popular seed-chain-extend
November 6, 2025 at 5:53 PM
Reposted by Wei Shen 沈 伟
Ben Langmead @benlangmead.bsky.social delivers the official opening for this year's Genome Informatics Conference #GI2025 at Cold Spring Harbor Laboratory.
List of talks and posters: meetings.cshl.edu/abstracts.as...
November 6, 2025 at 12:38 AM
Reposted by Wei Shen 沈 伟
Cool paper new paper from Lorién López-Villellas, @santiagomarco.bsky.social and others!

Super cute and simple idea:
In Gotoh's affine-cost alignment, only the M matrix is needed during tracing: we can just search for a gap-length x such that M[i][j] = M[i-x][j]+o+x*e or M[i][j] = M[i][j-x]+o+x*e.
Singletrack: An Algorithm for Improving Memory Consumption and Performance of Gap-Affine Sequence Alignment https://www.biorxiv.org/content/10.1101/2025.10.31.685625v1
November 4, 2025 at 7:12 PM
Reposted by Wei Shen 沈 伟
I also have serious concerns about the consolidation of roles (one person is now publisher, chief editor, and also a frequent author) as exemplified in a recent paper that was fast-tracked for publication.
October 29, 2025 at 4:07 PM
Reposted by Wei Shen 沈 伟
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 9:28 PM
Reposted by Wei Shen 沈 伟
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 8:16 PM
Reposted by Wei Shen 沈 伟
1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes
Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi
github.com
October 21, 2025 at 8:00 PM
Reposted by Wei Shen 沈 伟
Podcast with me and @turiking.bsky.social for the @milnerevolution.bsky.social series, on plasmid evolution over the last 100 years, talking about our ( @cazares-adr.bsky.social , Nick Thomson, @sarah1alexander.bsky.social & co) recent paper www.science.org/doi/10.1126/...
youtu.be/Mzr3TD4ijs0?...
How the Vectors of Antibiotic Resistance Have Evolved - Professor Zamin Iqbal
YouTube video by Milner Centre for Evolution
youtu.be
October 17, 2025 at 11:48 AM
Reposted by Wei Shen 沈 伟
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 7, 2025 at 11:35 PM
Reposted by Wei Shen 沈 伟
Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching
We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...
www.biorxiv.org
October 3, 2025 at 2:51 PM
Reposted by Wei Shen 沈 伟
New tool "bwt-svg" for making illustrations of the BWT and the many auxiliary arrays and other structures related to it. Pyodide-based no-installation-necessary interface here: benlangmead.github.io/bwt-svg/. (H/t to @robert.bio for pointing me to pyodide!) Full repo: github.com/benlangmead/....
October 14, 2025 at 8:48 PM
Reposted by Wei Shen 沈 伟
After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature (www.nature.com/articles/s41...)
Efficient and accurate search in petabase-scale sequence repositories - Nature
MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.
www.nature.com
October 8, 2025 at 8:56 PM
Reposted by Wei Shen 沈 伟
Efficient and accurate search in petabase-scale sequence repositories www.nature.com/articles/s41... 🧬🖥️🧪
MetaGraph: metagraph.ethz.ch
Code: github.com/ratschlab/me...
October 9, 2025 at 5:10 PM
Reposted by Wei Shen 沈 伟
Just published an interactive article about a magical algorithm known as the Burrows-Wheeler Transform, which powers sequence alignment tools like bowtie and bwa: sandbox.bio/concepts/bwt

It's also notoriously unintuitive so I'm hoping this article helps you build that intuition.
October 9, 2025 at 5:05 PM
Reposted by Wei Shen 沈 伟
There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠
How to rapidly search the world’s microbial DNA
By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.
www.ebi.ac.uk
September 30, 2025 at 9:47 AM
Reposted by Wei Shen 沈 伟
Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.
September 25, 2025 at 1:21 PM
Reposted by Wei Shen 沈 伟
Delighted to see our paper studying the evolution of plasmids over the last 100 years, now out! Years of work by Adrian Cazares, also Nick Thomson @sangerinstitute.bsky.social - this version much improved over the preprint. Final version should be open access, apols.
Thread 1/n
September 25, 2025 at 9:29 PM
Reposted by Wei Shen 沈 伟
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - @shenwei356.bsky.social @zaminiqbal.bsky.social go.nature.com/3K09TgJ
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
go.nature.com
September 10, 2025 at 4:08 PM
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:56 AM
Reposted by Wei Shen 沈 伟
Hashing vs. sorting; interesting! reiner.org/hashed-sorting. Also I wonder if, depending on your use case, semi-sorting provides an even greater benefit? 🧬🖥️
Hashed sorting is typically faster than hash tables
Benchmarks and theoretical explanation of why and when hashed radix sort beats hash tables.
reiner.org
September 8, 2025 at 12:37 PM
Amazing Jim!
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 8, 2025 at 12:14 AM
Reposted by Wei Shen 沈 伟
Ok this is mad
"...an ant that lays individuals from two distinct species. In this life cycle, females must clone males of another species because they require their sperm to produce the worker caste"

www.nature.com/articles/s41...
One mother for two species via obligate cross-species cloning in ants - Nature
In a case of obligate cross-species cloning, female ants of Messor ibericus need to clone males of Messor structor to obtain sperm for producing the worker caste, resulting in males from the same moth...
www.nature.com
September 4, 2025 at 11:02 PM
Reposted by Wei Shen 沈 伟
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
September 3, 2025 at 8:39 AM
Reposted by Wei Shen 沈 伟
Finally printed and submitted my thesis :)

You may call me Dr. now 🎓

curiouscoding.nl/thesis.pdf
August 26, 2025 at 1:40 PM