Ben J Woodcroft
@benjwoodcroft.bsky.social
Yet another microbial bioinformatician, group leader, dad
github.com/wwood https://research.qut.edu.au/cmr/team/ben-woodcroft/
github.com/wwood https://research.qut.edu.au/cmr/team/ben-woodcroft/
In the meantime, we've been collecting a list of these at tinyurl.com/mag-collecti.... Feel free to add more you find.
See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896
See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896
Public MAG datasets not available at NCBI or ENA
Some metagenome assembled genome (MAG) datasets are not available in the standard locations (NCBI / ENA / etc) for a variety of reasons. Here you can contribute new ones you come across. To be recorde...
tinyurl.com
October 24, 2025 at 8:41 AM
In the meantime, we've been collecting a list of these at tinyurl.com/mag-collecti.... Feel free to add more you find.
See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896
See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896
Yes normal behaviour. See wwood.github.io/singlem/FAQ for the formula - most windows are 60bp, and so if your reads are uniform length you get that.
But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?
But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?
FAQ
Documentation for SingleM
wwood.github.io
July 23, 2025 at 10:27 PM
Yes normal behaviour. See wwood.github.io/singlem/FAQ for the formula - most windows are 60bp, and so if your reads are uniform length you get that.
But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?
But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?
Trimmed reads are bad news when they become short, but if they remain 100bp+ then you should be fine I reckon.
2/2
2/2
July 23, 2025 at 5:35 AM
Trimmed reads are bad news when they become short, but if they remain 100bp+ then you should be fine I reckon.
2/2
2/2
Thanks - strange that your Lyrebird experience wasn't good. Please report errors (what did you you?) at github.com/wwood/single... or just via email. We test installation inclusive of DB download at github.com/wwood/single...
But a new version of the lyrebird DB incoming btw.
1/2
But a new version of the lyrebird DB incoming btw.
1/2
wwood/singlem
Novelty-inclusive microbial (and now dsDNA phage) community profiling of shotgun metagenomes - wwood/singlem
github.com
July 23, 2025 at 5:35 AM
Thanks - strange that your Lyrebird experience wasn't good. Please report errors (what did you you?) at github.com/wwood/single... or just via email. We test installation inclusive of DB download at github.com/wwood/single...
But a new version of the lyrebird DB incoming btw.
1/2
But a new version of the lyrebird DB incoming btw.
1/2
Thanks for kind words. By UCEs you mean e.g. 16S? It actually does this already, and tests pass. But it isn't the most efficient and code is a bit crusty and db is out of date, since it doesn't get used much. See wwood.github.io/singlem/FAQ
FAQ
Documentation for SingleM
wwood.github.io
July 19, 2025 at 10:51 AM
Thanks for kind words. By UCEs you mean e.g. 16S? It actually does this already, and tests pass. But it isn't the most efficient and code is a bit crusty and db is out of date, since it doesn't get used much. See wwood.github.io/singlem/FAQ
This is great @titus.idyll.org (though to be picky it's SingleM or singlem, not singleM). We wrote a few parsers for other formats at github.com/wwood/single... - it'd be nice if not everyone needed to reinvent (and use standard names for things like coverage inclusive vs exclusive of children).
singlem-benchmarking/bin at main · wwood/singlem-benchmarking
Contribute to wwood/singlem-benchmarking development by creating an account on GitHub.
github.com
July 18, 2025 at 11:39 PM
This is great @titus.idyll.org (though to be picky it's SingleM or singlem, not singleM). We wrote a few parsers for other formats at github.com/wwood/single... - it'd be nice if not everyone needed to reinvent (and use standard names for things like coverage inclusive vs exclusive of children).
I wonder if AI could do a good job of that integration. I'd love to learn some Haskell actually, just need to find the time..
July 18, 2025 at 11:33 PM
I wonder if AI could do a good job of that integration. I'd love to learn some Haskell actually, just need to find the time..
There is also a branch that takes nanopore reads as input, which works reasonably well. We are putting some final code touches on it, but maybe helpful - github.com/wwood/single...
Nanopore by thepatientwait · Pull Request #208 · wwood/singlem
Working Nanopore build.
Important changes:
DIAMOND prefilter
Uses --range-culling + related args for DIAMOND.
These results are now streamed to help memory.
Sequences are indexed using gene names ...
github.com
July 18, 2025 at 1:07 AM
There is also a branch that takes nanopore reads as input, which works reasonably well. We are putting some final code touches on it, but maybe helpful - github.com/wwood/single...
Good good, or could be better?
July 17, 2025 at 11:24 PM
Good good, or could be better?
Good q. Imagine that new Chem nanopore should be fine. You can check by running the supplemented package on your genomes and making sure there is the expected number of markers detected. Should be in line with mag completeness.
July 17, 2025 at 11:22 PM
Good q. Imagine that new Chem nanopore should be fine. You can check by running the supplemented package on your genomes and making sure there is the expected number of markers detected. Should be in line with mag completeness.
cheers Daan - here's a thread explaining some of the deets bsky.app/profile/benj...
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
July 16, 2025 at 10:07 PM
cheers Daan - here's a thread explaining some of the deets bsky.app/profile/benj...
Thanks for spreading the word @jcamthrash.bsky.social - there's a explanatory thread at bsky.app/profile/benj...
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
July 16, 2025 at 10:05 PM
Thanks for spreading the word @jcamthrash.bsky.social - there's a explanatory thread at bsky.app/profile/benj...
Thanks for considering it for publication. There's a explanation thread at bsky.app/profile/benj...
Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧵
July 16, 2025 at 10:03 PM
Thanks for considering it for publication. There's a explanation thread at bsky.app/profile/benj...
Much appreciated @bcoltman.bsky.social
July 16, 2025 at 10:02 PM
Much appreciated @bcoltman.bsky.social
Thanks for reading this thread - share link at rdcu.be/ewqLW
Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper
Nature Biotechnology - Novel microbial species in metagenomes are identified using conserved regions within universal marker genes.
rdcu.be
July 16, 2025 at 9:59 PM
Thanks for reading this thread - share link at rdcu.be/ewqLW
Thanks also to the reviewers including Alice McHardy - very fair and helpful we thought.
July 16, 2025 at 9:59 PM
Thanks also to the reviewers including Alice McHardy - very fair and helpful we thought.
Many many to thank, particularly
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.
July 16, 2025 at 9:59 PM
Many many to thank, particularly
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.
SingleM is BYO genome, you can add your MAGs to the refDB to get profiles which include both known species and your novel MAGs. wwood.github.io/singlem/tool...
SingleM supplement
Documentation for SingleM
wwood.github.io
July 16, 2025 at 9:59 PM
SingleM is BYO genome, you can add your MAGs to the refDB to get profiles which include both known species and your novel MAGs. wwood.github.io/singlem/tool...
Novel lineage detection + 700k profiles makes it possible to recover novel MAGs from taxons you care about. We recovered new genera from the underrepresented Muirbacteria, Wallbacteria, Riflebacteria and Fusobacteria phyla by assembling the right metagenomes.
July 16, 2025 at 9:59 PM
Novel lineage detection + 700k profiles makes it possible to recover novel MAGs from taxons you care about. We recovered new genera from the underrepresented Muirbacteria, Wallbacteria, Riflebacteria and Fusobacteria phyla by assembling the right metagenomes.
@ace-gtdb.bsky.social R226-based profiles from 700k public metagenomes are at sandpiper.qut.edu.au. Search for your fave microbe by GTDB taxonomy there and see to get prevalence and community profiles. Got something novel? Get in touch.
July 16, 2025 at 9:59 PM
@ace-gtdb.bsky.social R226-based profiles from 700k public metagenomes are at sandpiper.qut.edu.au. Search for your fave microbe by GTDB taxonomy there and see to get prevalence and community profiles. Got something novel? Get in touch.
A new
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.
GitHub - wwood/smafa: Biological sequence aligner for pre-aligned sequences
Biological sequence aligner for pre-aligned sequences - wwood/smafa
github.com
July 16, 2025 at 9:59 PM
A new
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.
Fast and RAM-efficient since most raw reads are swiftly ignored. We optimise an up-front DIAMOND BLASTX-based method. Thanks @bbuchfink.bsky.social / Serratus for makeidx
July 16, 2025 at 9:59 PM
Fast and RAM-efficient since most raw reads are swiftly ignored. We optimise an up-front DIAMOND BLASTX-based method. Thanks @bbuchfink.bsky.social / Serratus for makeidx
Perhaps most strikingly, it detects microbes that aren't in the ref db, correctly weighting their relative abundance.
July 16, 2025 at 9:59 PM
Perhaps most strikingly, it detects microbes that aren't in the ref db, correctly weighting their relative abundance.
It's accurate on communities of known species / non-rep strains (though can struggle with low abundance species where coverage <1X)
July 16, 2025 at 9:59 PM
It's accurate on communities of known species / non-rep strains (though can struggle with low abundance species where coverage <1X)