Ryan Wick
@rrwick.bsky.social
Bioinformatician at the Centre for Pathogen Genomics at the University of Melbourne
And since both metaMDBG and Myloasm had new versions after the paper was accepted, here's a blog post with an updated benchmark:
rrwick.github.io/2025/09/23/a...
(3/3)
rrwick.github.io/2025/09/23/a...
(3/3)
September 29, 2025 at 4:11 AM
And since both metaMDBG and Myloasm had new versions after the paper was accepted, here's a blog post with an updated benchmark:
rrwick.github.io/2025/09/23/a...
(3/3)
rrwick.github.io/2025/09/23/a...
(3/3)
Here's the ultra-short version:
If you want the best possible long-read bacterial genome assemblies, Autocycler is the tool for you! It is computationally intensive (due to the need to generate many alternative input assemblies) but consistently more accurate than other methods.
(2/3)
If you want the best possible long-read bacterial genome assemblies, Autocycler is the tool for you! It is computationally intensive (due to the need to generate many alternative input assemblies) but consistently more accurate than other methods.
(2/3)
September 29, 2025 at 4:11 AM
Here's the ultra-short version:
If you want the best possible long-read bacterial genome assemblies, Autocycler is the tool for you! It is computationally intensive (due to the need to generate many alternative input assemblies) but consistently more accurate than other methods.
(2/3)
If you want the best possible long-read bacterial genome assemblies, Autocycler is the tool for you! It is computationally intensive (due to the need to generate many alternative input assemblies) but consistently more accurate than other methods.
(2/3)
One caveat though: my experience is with isolates that (usually) have a single 'true' sequence to aim for. In a metagenome with natural variation (i.e. a mixture of multiple 'true' sequences), I'm not sure how Dorado/Medaka would behave...
September 24, 2025 at 12:59 AM
One caveat though: my experience is with isolates that (usually) have a single 'true' sequence to aim for. In a metagenome with natural variation (i.e. a mixture of multiple 'true' sequences), I'm not sure how Dorado/Medaka would behave...
I have limited experience with long-read metagenome assembly, so I don't have any special insight here. But I like the examples shown @floriantrigodet.bsky.social's preprint - it shows how sometimes one strange read (e.g. a chimera) can throw off the assembly.
September 23, 2025 at 9:37 PM
I have limited experience with long-read metagenome assembly, so I don't have any special insight here. But I like the examples shown @floriantrigodet.bsky.social's preprint - it shows how sometimes one strange read (e.g. a chimera) can throw off the assembly.
No, the assemblies are not Dorado/Medaka polished. I predict doing so would significantly reduce error rates, especially for the higher-error assemblies - Dorado/Medaka is quite good at fixing small-to-medium scale errors. But for lower-error assemblies (e.g. Autocycler), it may not change much.
September 23, 2025 at 9:36 PM
No, the assemblies are not Dorado/Medaka polished. I predict doing so would significantly reduce error rates, especially for the higher-error assemblies - Dorado/Medaka is quite good at fixing small-to-medium scale errors. But for lower-error assemblies (e.g. Autocycler), it may not change much.
I agree, most real-world MAGs probably have more errors than isolates, especially if low depth. Also, metagenomes may have within-species variation, and then the ideal MAG is (arguably) some sort of consensus. Especially if there is structural variation, this can be a BIG challenge for assemblers.
September 23, 2025 at 6:09 AM
I agree, most real-world MAGs probably have more errors than isolates, especially if low depth. Also, metagenomes may have within-species variation, and then the ideal MAG is (arguably) some sort of consensus. Especially if there is structural variation, this can be a BIG challenge for assemblers.
Amazing, thanks! Will keep an eye out for this preprint.
September 7, 2025 at 9:03 PM
Amazing, thanks! Will keep an eye out for this preprint.
I haven't yet, but that is absolutely something I should try. I should also more generally educate myself on best practices in telomere assembly, e.g. with that paper Adam linked. This is new to me!
September 7, 2025 at 9:02 PM
I haven't yet, but that is absolutely something I should try. I should also more generally educate myself on best practices in telomere assembly, e.g. with that paper Adam linked. This is new to me!
Thanks for this - looks interesting!
In the paper you said: "...Illumina sequencing can generate spurious indels within HTs, especially for HT lengths longer than 14 bp." Do you have a sense of how bad this gets for really long homopolymers, e.g. 20+ bp?
In the paper you said: "...Illumina sequencing can generate spurious indels within HTs, especially for HT lengths longer than 14 bp." Do you have a sense of how bad this gets for really long homopolymers, e.g. 20+ bp?
September 5, 2025 at 1:22 AM
Thanks for this - looks interesting!
In the paper you said: "...Illumina sequencing can generate spurious indels within HTs, especially for HT lengths longer than 14 bp." Do you have a sense of how bad this gets for really long homopolymers, e.g. 20+ bp?
In the paper you said: "...Illumina sequencing can generate spurious indels within HTs, especially for HT lengths longer than 14 bp." Do you have a sense of how bad this gets for really long homopolymers, e.g. 20+ bp?
I'm sure there are more robust ways to go about telomere assembly - I'm not very experienced with T2T eukaryote genome assemblies 😬
September 5, 2025 at 1:10 AM
I'm sure there are more robust ways to go about telomere assembly - I'm not very experienced with T2T eukaryote genome assemblies 😬
And my manual telomere fixing was makeshift. I pieced together a few assemblies (mostly from Flye) that extended all the way to the telomeres. And then I manually repaired the telomeres to be exact 6-mer repeats - i.e. I assumed any deviation from the 6-mer was ONT error not real biology.
September 5, 2025 at 1:10 AM
And my manual telomere fixing was makeshift. I pieced together a few assemblies (mostly from Flye) that extended all the way to the telomeres. And then I manually repaired the telomeres to be exact 6-mer repeats - i.e. I assumed any deviation from the 6-mer was ONT error not real biology.
Since all Illumina sequencing involves some PCR (bridge amplification), I also wonder at what length Illumina reads start to fail with homopolymers. Can they reliably sequence 20-mers? 40-mers? 60-mers? It's a hard question to answer if every sequencing tech struggles with these...
September 5, 2025 at 1:10 AM
Since all Illumina sequencing involves some PCR (bridge amplification), I also wonder at what length Illumina reads start to fail with homopolymers. Can they reliably sequence 20-mers? 40-mers? 60-mers? It's a hard question to answer if every sequencing tech struggles with these...
I too am wary. I suppose the hope is that by limiting changes to long homopolymers, polishing will fix more errors than it introduces. I.e. I'm guessing that ONT's long-homopolymer error rate is greater than cross-sample homopolymer differences. But this is very much unproven!
September 5, 2025 at 1:10 AM
I too am wary. I suppose the hope is that by limiting changes to long homopolymers, polishing will fix more errors than it introduces. I.e. I'm guessing that ONT's long-homopolymer error rate is greater than cross-sample homopolymer differences. But this is very much unproven!
However, I hear rumours that ONT might be working on a new move-table-aware bacterial polishing model. See my blog post from Feb for details: rrwick.github.io/2025/02/07/d.... If true, I'll be eager to test it out when released.
June 10, 2025 at 2:37 AM
However, I hear rumours that ONT might be working on a new move-table-aware bacterial polishing model. See my blog post from Feb for details: rrwick.github.io/2025/02/07/d.... If true, I'll be eager to test it out when released.
Good to know - thanks for clarifying. Makes sense for a tool that's designed to work with big metagenomic datasets. I'm using it a bit out of its domain on a bacterial isolate.
May 30, 2025 at 5:43 AM
Good to know - thanks for clarifying. Makes sense for a tool that's designed to work with big metagenomic datasets. I'm using it a bit out of its domain on a bacterial isolate.
The read set was only 240 MB (gzipped), so it is memory hungry. The myloasm docs do acknowledge that it uses more memory than other assemblers.
Also, I ran my tests on an ARM Mac, but the docs suggest that myloasm (specifically the polishing step) will be even faster on x86-64 CPUs with AVX2.
Also, I ran my tests on an ARM Mac, but the docs suggest that myloasm (specifically the polishing step) will be even faster on x86-64 CPUs with AVX2.
May 30, 2025 at 2:43 AM
The read set was only 240 MB (gzipped), so it is memory hungry. The myloasm docs do acknowledge that it uses more memory than other assemblers.
Also, I ran my tests on an ARM Mac, but the docs suggest that myloasm (specifically the polishing step) will be even faster on x86-64 CPUs with AVX2.
Also, I ran my tests on an ARM Mac, but the docs suggest that myloasm (specifically the polishing step) will be even faster on x86-64 CPUs with AVX2.
Just ran a few more tests through GNU time:
1 thread: 435 seconds, 10.1 GB RAM
2 threads: 238 seconds, 10.0 GB RAM
4 threads: 133 seconds, 10.1 GB RAM
8 threads: 73 seconds, 10.1 GB RAM
16 threads: 49 seconds, 13.3 GB RAM
1 thread: 435 seconds, 10.1 GB RAM
2 threads: 238 seconds, 10.0 GB RAM
4 threads: 133 seconds, 10.1 GB RAM
8 threads: 73 seconds, 10.1 GB RAM
16 threads: 49 seconds, 13.3 GB RAM
May 30, 2025 at 2:42 AM
Just ran a few more tests through GNU time:
1 thread: 435 seconds, 10.1 GB RAM
2 threads: 238 seconds, 10.0 GB RAM
4 threads: 133 seconds, 10.1 GB RAM
8 threads: 73 seconds, 10.1 GB RAM
16 threads: 49 seconds, 13.3 GB RAM
1 thread: 435 seconds, 10.1 GB RAM
2 threads: 238 seconds, 10.0 GB RAM
4 threads: 133 seconds, 10.1 GB RAM
8 threads: 73 seconds, 10.1 GB RAM
16 threads: 49 seconds, 13.3 GB RAM