#GlobDB
I love singleM, and it matches sourmash taxonomy really well, too (which means I trust it ;)). I wouldn't consider using singleM for classifying individual long reads, however, since it's marker-gene based. sylph or sourmash for that. Using globdb sounds like a good idea.
January 29, 2026 at 4:29 PM
And a very worthwhile mention ;) globDB is an extensive database so you would get even better coverage than using GTDB as a reference !
January 29, 2026 at 11:45 AM
for reads under 1kb straight up diamond blastx might work, but then you're of course doing a two stage analysis and might be overengineering it (depending on Q).

Also going to briefly (and selfishly) mention the GlobDB here again, because I want it stuck in the heads of the vocal folks on here 😅
January 29, 2026 at 8:46 AM
the GlobDB (globdb.org) is the better option here. Full taxonomy but twice as many species reps as the GTDB. Those extra 50% are also biased towards less characterized environments because those big metagenome studies don't end up in INSDC dbs anymore (and thus fo missing from GTDB).
home | GlobDB
globdb.org
January 29, 2026 at 8:41 AM
i would use either singleM or sylph with the GlobDB databases available for them from globdb.org. Double the number of species reps relative to GTDB.

Also would genuinely be interested to see the comparison of both profilers
home | GlobDB
globdb.org
January 29, 2026 at 8:08 AM
With the release of anvi'o v9, the 300k+ contigs databases available for download on the GlobDB website are now compatible with both the development version and the latest stable release (v9) of anvi'o.

More info:
globdb.org/news

Thanks #anvio team!
🖥️🧬🦠
January 26, 2026 at 1:27 PM
Check out this awesome resource for all your future microbial #genome analyses:
#GlobDB is the to date most complete genome database available!

Developed by @daanspeth.bsky.social and colleagues 🧪🦠🧬🖥️
Our paper describing the GlobDB is now published in @bioinfoadv.bsky.social
doi.org/10.1093/bioa...

The GlobDB is the largest species dereplicated genome database currently available, containing 306,260 species representatives.
More information on globdb.org 1/5
🖥️🧬🦠
GlobDB: a comprehensive species-dereplicated microbial genome resource
AbstractMotivation. Over the past years, substantial numbers of microbial species’ genomes have been deposited outside of conventional INSDC databases.Resu
doi.org
December 30, 2025 at 4:33 PM
New @fwf-at.bsky.social CoE Microplanet study introduces GlobDB globdb.org, a comprehensive database integrating 14 genomic catalogues providing consistent taxonomy for microbial species.
academic.oup.com/bioinformati...
December 10, 2025 at 12:43 PM
The annual increase in number of dereplicated species representative genomes in GTDB or GlobDB shows how undersampled. And reference based MG profiling works best at species level, but not so great at higher phylogenetic distance.
Anyways, what I want to say is amplicon sequencing ain't dead (yet)
December 1, 2025 at 9:28 PM
The GlobDB aggregates 14 independent #genomic catalogues to create a unified, species-dereplicated microbial genome database.
November 26, 2025 at 10:01 AM
🧫 Just out in Bioinformatics Advances: “GlobDB: A comprehensive species-dereplicated microbial genome resource.” 

Explore the full study: https://doi.org/10.1093/bioadv/vbaf280
November 26, 2025 at 10:01 AM
Just in time for me to cite in something I've been using GlobDB for
Our paper describing the GlobDB is now published in @bioinfoadv.bsky.social
doi.org/10.1093/bioa...

The GlobDB is the largest species dereplicated genome database currently available, containing 306,260 species representatives.
More information on globdb.org 1/5
🖥️🧬🦠
GlobDB: a comprehensive species-dereplicated microbial genome resource
AbstractMotivation. Over the past years, substantial numbers of microbial species’ genomes have been deposited outside of conventional INSDC databases.Resu
doi.org
November 22, 2025 at 9:32 AM
Congratulations @daanspeth.bsky.social and the whole team. GlobDB has already been very helpful for several projects at @cemess.bsky.social, and it will no doubt be a useful resource for many microbiologists around the world. Really glad to see it out there.
November 21, 2025 at 6:28 PM
I don’t have separate funding for the GlobDB, so i don’t think this is in the cards. Our cluster, where this is hosted, is separately funded and storage is available long term.
November 21, 2025 at 5:47 PM
Great news. I've been using GlobDB for tracking distribution of a class of replicators across taxonomies. Highly, highly recommended for people looking for dereplicated dataset.

🧬💻
Our paper describing the GlobDB is now published in @bioinfoadv.bsky.social
doi.org/10.1093/bioa...

The GlobDB is the largest species dereplicated genome database currently available, containing 306,260 species representatives.
More information on globdb.org 1/5
🖥️🧬🦠
GlobDB: a comprehensive species-dereplicated microbial genome resource
AbstractMotivation. Over the past years, substantial numbers of microbial species’ genomes have been deposited outside of conventional INSDC databases.Resu
doi.org
November 21, 2025 at 4:37 PM
GlobDB is a great resource and is frequently of use in my research!

Developed and maintained by @daanspeth.bsky.social, I was glad to contribute in a small way. Have a look at Daan’s thread to find out more and let us know how you use it! >>
Our paper describing the GlobDB is now published in @bioinfoadv.bsky.social
doi.org/10.1093/bioa...

The GlobDB is the largest species dereplicated genome database currently available, containing 306,260 species representatives.
More information on globdb.org 1/5
🖥️🧬🦠
GlobDB: a comprehensive species-dereplicated microbial genome resource
AbstractMotivation. Over the past years, substantial numbers of microbial species’ genomes have been deposited outside of conventional INSDC databases.Resu
doi.org
November 21, 2025 at 4:35 PM
Finally, for taxonomic analyses, the GlobDB includes a full seven level taxonomy that is compatible with, and extends, the GTDB taxonomy. This taxonomy was also used to create sylph databases and a SingleM metapackage for taxonomic profiling of read datasets. 4/5
November 21, 2025 at 4:22 PM
For protein analyses, the GlobDB provides the amino acid fasta files for all genomes, as well as kegg/cog/pfam annotations, and a clustered dataset (40% id over 80% of both sequences) of ~80M proteins. For this clustered dataset, PLM embeddings are available. 3/5
November 21, 2025 at 4:22 PM
The GlobDB extends the GTDB (@ace-gtdb.bsky.social) by over 160,000 species representatives. In addition to the added diversity captured, we provide several analysis products for download.

For the genomes, there's anvi'o dbs, genome fasta, quality stats, and GFF files. 2/5
November 21, 2025 at 4:21 PM
Our paper describing the GlobDB is now published in @bioinfoadv.bsky.social
doi.org/10.1093/bioa...

The GlobDB is the largest species dereplicated genome database currently available, containing 306,260 species representatives.
More information on globdb.org 1/5
🖥️🧬🦠
GlobDB: a comprehensive species-dereplicated microbial genome resource
AbstractMotivation. Over the past years, substantial numbers of microbial species’ genomes have been deposited outside of conventional INSDC databases.Resu
doi.org
November 21, 2025 at 4:21 PM
The GlobDB extends the GTDB (@ace-gtdb.bsky.social) by over 160,000 species representatives. In addition to the added diversity captured, we provide several analysis products for download.

For the genomes, there's anvi'o dbs, genome fasta, quality stats, and GFF files. 2/5
November 21, 2025 at 4:20 PM
ah yes, that i totally get. Sorry, don’t know a workflow of hand…

I could/should put together something like this for the globdb, but that’ll not happen before xmas 😅
November 19, 2025 at 8:13 PM
jgi used to provide contig taxonomy for assemblies the made, i think based on best hits of gene products with some minimum requirements of gene nr & fraction. Of course very sensitive to binning error propagation, but easy to implement as a screening w gtdb or globdb taxonomy
November 19, 2025 at 6:39 PM
In the meantime, we've been collecting a list of these at tinyurl.com/mag-collecti.... Feel free to add more you find.

See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896
Public MAG datasets not available at NCBI or ENA
Some metagenome assembled genome (MAG) datasets are not available in the standard locations (NCBI / ENA / etc) for a variety of reasons. Here you can contribute new ones you come across. To be recorde...
tinyurl.com
October 24, 2025 at 8:41 AM