John Lovell
banner
jotlovell.bsky.social
John Lovell
@jotlovell.bsky.social
Helping to make genomics useful for crop improvement, ecology, evolutionary biology, and conservation

HudsonAlpha Genome Sequencing Center and DOE Joint Genome Institute
I haven’t slept for seven days either … that would be too long
September 16, 2025 at 3:22 PM
Very nice! Thanks for the link.
This is something you find when you dig deep enough. We've been looking for ways to harmonize annotations since we saw a similar pattern among pecan genomes in 2021 (buried in the SI tho). www.nature.com/articles/s41...
August 19, 2025 at 12:26 AM
We can try it, but my guess is way worse. So many false positives.
August 18, 2025 at 5:51 PM
So, tl;dr: gene PAV and CDS variation is highly dependent on annotation method. Carefully choose, re-annotate, and integrate your pangenome if you want to trust the results

Preprint led by @tomasbruna.bsky.social, Avinash Sreedasyam, and @avril-m-harder.bsky.social. Support from @jgi.doe.gov.
August 18, 2025 at 4:51 PM
Furthermore, even within fully present ('core') gene families we noticed a disturbing trend — identical sequence was not annotated with identical gene structures 20-50% of the time w/in annotation methods and 40-70% of the time btw methods
IGC-reannotation is not perfect, but reduces this to 5-15%
August 18, 2025 at 4:51 PM
But what about within methods? Is using the same method enough to trust PAV? The answer here is less obvious, but method clearly matters.

Within two groups that annotated 7 and 23 soybean genomes there were 3x & 2x more PAVs than IGC — these pangenomes are not as 'open' as reported.
August 18, 2025 at 4:51 PM
These results clearly show that 'naive' integration of existing annotations is not a good idea, especially among genomes that were annotated with similar but not identical methods.
August 18, 2025 at 4:51 PM
In other words, while gene PAV similarity of IGC re-annotated genomes recapitulates known relatedness, clustering by original annotation PAV simply distinguished which consortium did the annotation (and did not evolutionary relationships): PAV across the original annotations is largely artifactual.
August 18, 2025 at 4:51 PM
To develop a baseline, we re-annotated the genomes with exactly the same 'Integrated Gene Caller' (IGC) pipeline. IGC annotations had ⬆️ BUSCO and ⬇️ false positives, yet more than halved PAV%. Critically, assembly-based relatedness predicted PAV similarity from IGC but not original annotations.
August 18, 2025 at 4:51 PM
We downloaded 'original' genome annotations directly from Soybase and Cottengen repos and calculated gene families from OrthoFinder. In both species there were WAY more PAVs than we expected: ~140k (86%) and ~90k (62%) of gene families were absent in ≥1 soybean and cotton genome respectively.
August 18, 2025 at 4:51 PM
To study causes of gene PAV, we looked for species with (1) a history of polyploidy, (2) relatively low amounts of genetic variation, and (3) the availability of many high-quality reference genomes with independent RNA-seq evidenced gene annotation. Soybean and cotton popped to the top of the list.
August 18, 2025 at 4:51 PM
We first looked at how divergence time correlates with gene PAV in pairs of plant and animal genomes that were annotated with the same method (mostly NCBI refseq).

While PAV generally scales with divergence time, it is 2-4X more common in plants, especially those with a history of polyploids.
August 18, 2025 at 4:51 PM
Reposted by John Lovell
Motherfucker wrote one sloppy paper in April 2020 and instead of being like oops, shit, my bad, he has kept doubling down until now he's killing most promising medical technology of the past quarter century rather than going to therapy.
August 14, 2025 at 4:30 AM
Funding and support from: @energygov.bsky.social (especially The Office of Biological and Environmental Research), Bill and Melinda Gates Foundation, and many others. 🙏
August 6, 2025 at 8:32 PM
This work is part of a global collaboration across many groups. In particular, Todd Mockler, who tragically passed in 2023, and the contributions of many scientists at @danforthcenter.bsky.social made much of this work possible. The pangenome was built by scientists at @jgi.doe.gov & HudsonAlpha.
August 6, 2025 at 8:32 PM
Combined, these results illustrate the power of pangenomics for trait discovery ... but they also highlight how far we have to go. Integrated methods to probe and iteratively update variant calls in pangenome frameworks really are needed to bridge the gap between resources and stakeholders
August 6, 2025 at 8:32 PM
There are three major haplotypes that each harbor several large structural variants but few coding variants. While the evidence for single-marker associations was limited, these three typable haplotypes segregate major variation in dhurrin concentration and drought severity of source habitat
August 6, 2025 at 8:32 PM
Finally, we combined pangenome-informed haplotype classification and tests of drought adaptation by probing the biosynthetic gene cluster that produces dhurrin, a secondary metabolite known to enhance drought stress tolerance and resistance against chewing insect herbivory ...
August 6, 2025 at 8:32 PM