Jim Shaw
jimshaw.bsky.social
Jim Shaw
@jimshaw.bsky.social
Postdoc at Dana-Farber and Harvard Med with Heng Li (@lh3lh3.bsky.social). Prev: UBC / UofT.

I like thinking about computational biological sequence analysis and its applications to metagenomics.

https://jim-shaw-bluenote.github.io
Reposted by Jim Shaw
🚨New preprint out!
We present a foundational genomic resource of human gut microbiome viruses. It delivers high-quality, deeply curated data spanning taxonomy, predicted hosts, structures, and functions, providing a reference for gut virome research. (1/8)
www.biorxiv.org/content/10.1...
November 6, 2025 at 5:26 PM
Reposted by Jim Shaw
Excited to share our LongTrack study out in
@natmicrobiol.nature.com today!

Fecal microbiota transplant (FMT), donor 💩 => patients' gut, is an effective treatment for recurrent C. difficile infection & is being evaluated for Inflammatory Bowel Diseases (IBD) & other conditions 1/

📄 rdcu.be/eL8mR
Long-read metagenomics for strain tracking after faecal microbiota transplant
Nature Microbiology - A long-read metagenomics method empowers faecal microbiota transplantation studies by precisely tracking bacteria from donors to recipients, distinguishing co-existing strains...
rdcu.be
October 22, 2025 at 3:39 PM
Reposted by Jim Shaw
Our @narjournal.bsky.social manuscript is out! It explores the growth of the GTDB (gtdb.ecogenomic.org) since its inception, as well as updates to the website, methodology, policies, and major taxonomic and nomenclatural changes over the past three years.

academic.oup.com/nar/advance-...
GTDB release 10: a complete and systematic taxonomy for 715 230 bacterial and 17 245 archaeal genomes
Abstract. The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy
academic.oup.com
October 22, 2025 at 2:20 PM
Reposted by Jim Shaw
Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching
We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...
www.biorxiv.org
October 3, 2025 at 2:51 PM
Reposted by Jim Shaw
New pre-print from the Banfield lab, highlighting an interesting case of 1.5Mb megaplasmids found in human gut.

Plasmid genomes were resolved using #PacBio HiFi sequencing with hifiasm-meta for #metagenome assembly. Host association was detected using epigenetic signals.

doi.org/10.1101/2025...
Megaplasmids associate with Escherichia coli and other Enterobacteriaceae
Humans and animals are ubiquitously colonized by Enterobacteriaceae , a bacterial family that contains both commensals and clinically significant pathogens. Here, we report Enterobacteriaceae megaplas...
doi.org
October 1, 2025 at 4:44 PM
Reposted by Jim Shaw
Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/o.... Work with Alvin Qin
September 30, 2025 at 2:19 AM
Reposted by Jim Shaw
High-accuracy SNV calling for bacterial isolates using deep learning with AccuSNV https://www.biorxiv.org/content/10.1101/2025.09.26.678787v1
September 29, 2025 at 6:47 PM
Reposted by Jim Shaw
Delighted to see our paper studying the evolution of plasmids over the last 100 years, now out! Years of work by Adrian Cazares, also Nick Thomson @sangerinstitute.bsky.social - this version much improved over the preprint. Final version should be open access, apols.
Thread 1/n
September 25, 2025 at 9:29 PM
Reposted by Jim Shaw
New blog post!

metaMDBG (@gaetanbenoit.bsky.social) and Myloasm (@jimshaw.bsky.social) have had recent releases, so I updated the benchmarks from the Autocycler paper:
rrwick.github.io/2025/09/23/a...

Both tools improved considerably! Time to update your conda environments 😄
Benchmark update: metaMDBG and Myloasm
a blog for miscellaneous bioinformatics stuff
rrwick.github.io
September 23, 2025 at 1:53 AM
Reposted by Jim Shaw
Many of the most complex and useful functions in biology emerge at the scale of whole genomes.

Today, we share our preprint “Generative design of novel bacteriophages with genome language models”, where we validate the first, functional AI-generated genomes 🧵
September 17, 2025 at 3:03 PM
Reposted by Jim Shaw
agtools: a software framework to manipulate assembly graphs https://www.biorxiv.org/content/10.1101/2025.09.14.676178v1
September 16, 2025 at 8:48 PM
Reposted by Jim Shaw
X-Mapper 🦠🧬🧪 - a sequence aligner developed for microbes, now on Bioconda! 🚀
• 11–24× fewer suboptimal alignments (same for human genome)
• 3–579× lower inconsistency
• improves on ~30% of reads aligned to non-target species
github.com/mathjeff/map...
bioconda.github.io/recipes/x-ma...
#microsky
September 15, 2025 at 2:32 AM
Reposted by Jim Shaw
New blog post – A quick look at Roche's SBX
lh3.github.io/2025/09/11/a...
September 12, 2025 at 3:26 AM
Reposted by Jim Shaw
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:56 AM
Reposted by Jim Shaw
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:12 AM
Reposted by Jim Shaw
Now preprinted at arxiv.org/abs/2509.07357
September 10, 2025 at 2:10 AM
Reposted by Jim Shaw
How do you long-read sequence metagenomes? I would argue it starts with the right sample storage & DNA extraction, to enable efficient @nanoporetech.com /@pacbio.bsky.social sequencing, which we investigated in our new paper: www.biorxiv.org/content/10.1...

Massive thanks to Klara for driving this
September 9, 2025 at 3:35 PM
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 7, 2025 at 11:35 PM
Reposted by Jim Shaw
Check out Ryan's new blogpost, especially if you work on and polish small eukaryotic genome assemblies - it's always nice when someone adds new features for your tools
New blog post!

I added a new feature to @gbouras13.bsky.social's Pypolca: homopolymer-only polishing. Potentially useful for cross-sample polishing - early test on Cryptosporidium looks promising.

Check it out here:
rrwick.github.io/2025/09/04/h...
Cross-sample homopolymer polishing with Pypolca
a blog for miscellaneous bioinformatics stuff
rrwick.github.io
September 5, 2025 at 6:20 AM
Reposted by Jim Shaw
Now published in GigaScience with minor improvements: academic.oup.com/gigascience/...

* Download: zenodo.org/records/1490...
* More info: github.com/lh3/panmask
Preprint on "Finding easy regions for short-read variant calling from pangenome data": arxiv.org/abs/2507.03718
September 4, 2025 at 4:44 PM
Reposted by Jim Shaw
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
September 3, 2025 at 8:39 AM
Reposted by Jim Shaw
Our study developing a skin metatranscriptomics protocol is now out in @natbiotech.nature.com!

We finally have the ability to study microbial activity on skin and identify key functional genes playing a role in diseases.

Amazing team of Chia Minghao and Amanda Ng 👏

nature.com/articles/s41...
August 30, 2025 at 2:17 AM
Reposted by Jim Shaw
The microbial composition of metagenomes is identified in seconds by profiling against large databases go.nature.com/3BBVqDC
rdcu.be/eCbj4
Rapid species-level metagenome profiling and containment estimation with sylph - Nature Biotechnology
The microbial composition of metagenomes is identified in seconds by profiling against large databases.
go.nature.com
August 27, 2025 at 1:14 AM
Reposted by Jim Shaw
Thrilled to share our recent review on multidimensional metagenomics analysis, in which we highlight cutting edge technologies and AI applications at 1D to 4D levels. Congrats @hpeng.bsky.social and Angel Ruiz-Moreno.
www.nature.com/articles/s44...
Multi-dimensional metagenomics - Nature Reviews Bioengineering
High-throughput sequencing and artificial intelligence-driven structural biology have vastly expanded our understanding of the human metagenome, yet microbial functions remain largely elusive. In this...
www.nature.com
August 26, 2025 at 11:44 AM
Reposted by Jim Shaw