🎓 Associate Lecturer in Bioinformatics @ UVic-UCC
📚 Author of biologydatascience.com
🔬 Exploring genomics, data science, and the frontiers of biology
We hope this helps others working on retroviral domains, paleovirology, or TE functional genomics.
🙏 Thanks to @cnag-eu.bsky.social and @annaesteveco.bsky.social for support.
We welcome feedback, questions, or collaborations as we submit to a peer-reviewed journal.
We hope this helps others working on retroviral domains, paleovirology, or TE functional genomics.
🙏 Thanks to @cnag-eu.bsky.social and @annaesteveco.bsky.social for support.
We welcome feedback, questions, or collaborations as we submit to a peer-reviewed journal.
🧰 All annotations are open-access:
- BED + FASTA files
- InterProScan + Phobius output
- Domain sequences & conservation scores
- Scripts on GitHub
🧬 Data: doi.org/10.5281/zeno...
💻 Code: github.com/funcgen/herv...
🧰 All annotations are open-access:
- BED + FASTA files
- InterProScan + Phobius output
- Domain sequences & conservation scores
- Scripts on GitHub
🧬 Data: doi.org/10.5281/zeno...
💻 Code: github.com/funcgen/herv...
🧠 Intriguingly, HERV activity has been linked to neurodegenerative diseases (ALS, MS, AD) and immune defense.
- HERV-encoded proteins may modulate immunity
- Some may even restrict exogenous viruses
Our dataset enables deeper exploration of these hypotheses.
🧠 Intriguingly, HERV activity has been linked to neurodegenerative diseases (ALS, MS, AD) and immune defense.
- HERV-encoded proteins may modulate immunity
- Some may even restrict exogenous viruses
Our dataset enables deeper exploration of these hypotheses.
🧬 One famous case of HERV co-option is Syncytin, a retroviral Env protein now essential for placenta formation.
Could other HERV proteins—preserved across millions of years—also serve beneficial roles in the human host?
🧬 One famous case of HERV co-option is Syncytin, a retroviral Env protein now essential for placenta formation.
Could other HERV proteins—preserved across millions of years—also serve beneficial roles in the human host?
Why does this matter?
✔️ Residual protein function?
✔️ Host co-option?
✔️ Antiviral defense?
✔️ Role in development, immunity, neurodegeneration?
Our resource supports new lines of research into the functional potential of HERV proteins.
Why does this matter?
✔️ Residual protein function?
✔️ Host co-option?
✔️ Antiviral defense?
✔️ Role in development, immunity, neurodegeneration?
Our resource supports new lines of research into the functional potential of HERV proteins.
🔍 We also found 13 HERVK loci encoding Gag, Pol, and Env with strong domain conservation and intact 5′/3′ LTRs.
- Some domains even share the same ORF—suggesting fused or intact polyproteins.
- A few insert into human gene introns—potential regulatory effects?
🔍 We also found 13 HERVK loci encoding Gag, Pol, and Env with strong domain conservation and intact 5′/3′ LTRs.
- Some domains even share the same ORF—suggesting fused or intact polyproteins.
- A few insert into human gene introns—potential regulatory effects?
💡 Subfamily patterns:
- HERVK: Full polyproteins across several loci
- HERVH: Conserves enzymatic domains, but not structural
- HERVE: Unexpectedly retains protease & RT domains
Young and ancient families both preserve functional fragments!
💡 Subfamily patterns:
- HERVK: Full polyproteins across several loci
- HERVH: Conserves enzymatic domains, but not structural
- HERVE: Unexpectedly retains protease & RT domains
Young and ancient families both preserve functional fragments!
🧠 3 examples:
- HERVK Env (99.5% coverage, fusion domains) chr5:156658763–156665917
- HERVH RNase H (DEDD motif) chr14:53129175–53135122
- HERV-E protease (full-length) chr1:20154322–20160102
More details in the preprint!
🧠 3 examples:
- HERVK Env (99.5% coverage, fusion domains) chr5:156658763–156665917
- HERVH RNase H (DEDD motif) chr14:53129175–53135122
- HERV-E protease (full-length) chr1:20154322–20160102
More details in the preprint!
💡 To our surprise, thousands of domains are highly conserved, with >1,000 showing nearly full alignment (>95% coverage).
We even recovered key catalytic motifs (e.g. DEDD in RNase H) and transmembrane regions in Env.
These are not just fossils—they may retain function.
💡 To our surprise, thousands of domains are highly conserved, with >1,000 showing nearly full alignment (>95% coverage).
We even recovered key catalytic motifs (e.g. DEDD in RNase H) and transmembrane regions in Env.
These are not just fossils—they may retain function.
💻 Using a reproducible pipeline (HMMER + InterProScan), we identified 17,540 retroviral domains—incl. Gag, RT, RNase H, protease, integrase, and Env.
We then quantified alignment coverage to assess structural conservation.
💻 Using a reproducible pipeline (HMMER + InterProScan), we identified 17,540 retroviral domains—incl. Gag, RT, RNase H, protease, integrase, and Env.
We then quantified alignment coverage to assess structural conservation.
🧬 HERVs make up ~8% of the human genome.
Yet no systematic annotation of protein domains within their internal sequences—quantifying structural conservation—has been released.
We analyzed >120,000 ORFs to address this gap.
🧬 HERVs make up ~8% of the human genome.
Yet no systematic annotation of protein domains within their internal sequences—quantifying structural conservation—has been released.
We analyzed >120,000 ORFs to address this gap.