James Hemker
jahemker.bsky.social
James Hemker
@jahemker.bsky.social
Stanford Dev Bio PhD
@petrovadmitri.bsky.social's lab
Carcharodontosaurus is my favorite dinosaur.
Grateful to be talking at #Evol2025! Will be presenting on how long Nanopore reads need to be in order to accurately call structural variants in Drosophila at the population level. Talk is at 3pm on Saturday in the Genomics III section. If you can’t make it get in touch!
June 20, 2025 at 10:32 PM
Finally, we short-read sequenced our inbred lines as the vast majority of genomic data is from NGS. Unsurprisingly, short-read data had the poorest accuracy, as well as the most significant biases against insertions and the largest number of spurious inversion calls.
April 25, 2025 at 8:03 PM
We additionally downsampled our 30x-coverage ultra-long reads to 20x- and 10x- coverage. We found that accuracy decreased even at 20x-coverage, and neither low-coverage distribution could recover all three of the cosmopolitan inversions.
April 25, 2025 at 8:03 PM
We report significant shifts in SV-calling accuracy at the population level when systematically varying read length within D. melanogaster. Our ultra-long (as defined by ONT: read N50 > 50kb) read distribution, called more SVs, and at a significantly higher accuracy, than any other distribution.
April 25, 2025 at 8:03 PM
As no definitive benchmark SV call sets exist for D. melanogaster, we then manually validated more 2,300 SVs at over 18,000 genomic loci across the read-length distributions to assess variant-calling accuracy. Validation was done by visualizing read alignments in Jbrowse2.
April 25, 2025 at 8:03 PM
To investigate this, we Nanopore sequenced eight D. melanogaster inbred lines to extremely high coverage (mean 238x) and then downsampled the reads to create 30x-coverage pools of distinct read-length distributions (as quantified by read N50). We additionally assembled genomes for each pool.
April 25, 2025 at 8:03 PM