plantgenomics.bsky.social
@plantgenomics.bsky.social
Squidly: Enzyme Catalytic Residue Prediction Harnessing a Biology-Informed Contrastive Learning Framework
doi.org/10.1101/2025...

KIPEs3: Automatic annotation of biosynthesis pathways
doi.org/10.1101/2022...

#Bioinformatics #Enzymes
Squidly: Enzyme Catalytic Residue Prediction Harnessing a Biology-Informed Contrastive Learning Framework
Enzymes present a sustainable alternative to traditional chemical industries, drug synthesis, and bioremediation applications. Because catalytic residues are the key amino acids that drive enzyme func...
doi.org
November 8, 2025 at 6:38 AM
Whole Metagenome Sequencing: not Deep Enough for Complete Microbial Function Recovery
doi.org/10.1101/2025...
Whole Metagenome Sequencing: not Deep Enough for Complete Microbial Function Recovery
Background Whole metagenome shotgun sequencing (WMS) is widely used to profile microbial function. However, technical variability in sequencing and analysis often obscures true biological patterns. Large-scale studies are particularly susceptible to batch effects, such as differences in sequencing depth and platform and annotation strategies, as well as sample-to-flow-cell assignments. However, the relative effects of these factors on functional inference in such studies have yet to be systematically evaluated. We analyzed oral-rinse WMS data from a study cohort including 671 Nigerian youths aged 9-18, sequenced on two Illumina platforms. Microbial molecular functionality encoded in these data were annotated using the mi-faser/Fusion pipeline, to capture the broad functional repertoire, and HUMAnN 3/EC numbers pipeline to characterize curated enzymatic activities. We then quantified how technical factors and batch effects shaped the recovery of microbial functionality. Results Three findings of our work were most salient. First, we observed that the choice of annotation strategy traded off between breadth and specificity of functional coverage. Second, we found that low-prevalence functions were disproportionately lost at shallow sequencing depths, indicating that in e.g. case-control studies with few representatives of the minor class, sequencing depth could critically impact study resolution. Finally, using our newly developed model relating sequencing depth to functional recovery, we demonstrated that increasing sequencing depth does not directly or proportionally improve functional recall. That is, at as little as 10% of this study’s sequencing depth, 30% of the estimated complete microbiome functional repertoire was detectable. However, even at the full depth used in this study, we were only able to recover an estimated 60% of that complete functional repertoire. Conclusions Together, these findings and our depth-to-function mapping framework provide practical guidelines for the design and interpretation of WMS studies. Coordinating sequencing depth planning with annotation strategy, experimental design, and rigorous batch control is thus essential for robust detection of microbial functions and for ensuring reproducible microbiome insights. ### Competing Interest Statement Dr. Osazuwa-Peters was a scientific advisor to Navigating Cancer and has received consulting fees from Merck for consultation on HPV vaccination. National Institute of Dental and Craniofacial Research, R01-DE032216
doi.org
November 7, 2025 at 9:12 PM
The eukaryotic horizontal gene transfer dataset a compendium
doi.org/10.1101/2025...

DupyliCate - mining, classifying, and characterizing gene duplications
doi.org/10.1101/2025...

#Evolution #Bioinformatics
The eukaryotic horizontal gene transfer dataset a compendium
With more eukaryotic genomes available for study researchers have been able to identify a growing number of horizontal gene transfer (HGT) candidates. We compiled 9,511 protein coding genes that were ...
doi.org
November 7, 2025 at 9:11 PM
Reposted
Curious about plant genomics? 🌿

Join our upcoming training courses to explore how plant genomes are assembled and annotated.

Details 👉 www.izmb.uni-bonn.de/en/pbb/news#...
#Genomics #Bioinformatics #DeNBI #PlantScience
@denbi.bsky.social @puckerlab.bsky.social
November 6, 2025 at 5:25 PM
GNAT: An Interactive Web Tool for Gene Neighbourhood Analysis
doi.org/10.1101/2025...

DupyliCate - mining, classifying, and characterizing gene duplications
doi.org/10.1101/2025...

#Genomics #Bioinformatics #Evolution
GNAT: An Interactive Web Tool for Gene Neighbourhood Analysis
Given the sequence of a protein, the Gene Neighbourhood Analysis Tool (GNAT) identifies homologues within microbial (bacterial, archaeal, or fungal) or viral databases, aligns and clusters their genom...
doi.org
November 6, 2025 at 5:01 PM
Svirlpool: structural variant detection from long read sequencing by local assembly
doi.org/10.1101/2025...

Large scale genomic rearrangements in selected Arabidopsis thaliana T-DNA lines are caused by T-DNA insertion mutagenesis
doi.org/10.1101/2021...
Svirlpool: structural variant detection from long read sequencing by local assembly
Motivation Long-Read Sequencing (LRS) promises great improvements in the detection of structural genome variants (SVs). However, existing methods are lacking in key areas such as the reliable detectio...
doi.org
November 6, 2025 at 4:57 PM
deCYPher: Star Allele-Resolution Computational Framework of Pharmacogenes for Haplotype-Resolved Long-Read Assemblies
doi.org/10.1101/2025...

Genome sequence of the blue flowering Centaurea cyanus
doi.org/10.1101/2025...
deCYPher: Star Allele-Resolution Computational Framework of Pharmacogenes for Haplotype-Resolved Long-Read Assemblies
Although existing next-generation sequencing (NGS) tools, such as Aldy and Cyrius, have been applied for allele typing, they cannot achieve complete accuracy due to various genomic challenges includin...
doi.org
November 4, 2025 at 5:53 PM
KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles
doi.org/10.1101/2025...

Mapping-by-sequencing reveals genomic regions associated with seed quality parameters in Brassica napus
doi.org/10.1101/2022...
KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles
Motivation: In the era of multiple genome references, researchers often align sequencing reads against distinct assemblies or even multiple references simultaneously. This enables applications such as...
doi.org
November 4, 2025 at 5:50 PM
Clair-Mosaic: A deep-learning method for long-read mosaic small variant calling
doi.org/10.1101/2025...

Comparison of read mapping and variant calling tools for the analysis of plant NGS data
doi.org/10.1101/2020...
Clair-Mosaic: A deep-learning method for long-read mosaic small variant calling
Mosaic variants, defined as postzygotic mutations occurring during an organism's development from zygote to adult, play critical roles in developmental biology, aging, and diseases such as cancer and ...
doi.org
November 4, 2025 at 5:48 PM
STARCall integrates image stitching, alignment, and read calling to enable scalable analysis of in situ sequencing data
doi.org/10.1101/2025...

Large scale genomic rearrangements in selected Arabidopsis thaliana T-DNA lines are caused by T-DNA insertion mutagenesis
doi.org/10.1101/2021...
STARCall integrates image stitching, alignment, and read calling to enable scalable analysis of in situ sequencing data
Fluorescent in situ sequencing involves imaging-based sequencing by synthesis in intact cells or tissues to reveal target nucleotide sequences inside each cell. Often, the target sequences are barcode...
doi.org
November 2, 2025 at 7:01 AM
Reposted
Interested in long-read plant genomics? 🌱

We are sharing our ONT P2 Solo sequencing outputs per flowcell — explore our real data and experience here:

👉 www.izmb.uni-bonn.de/en/pbb/news#...

#Genomics #LongReads #PlantSci @puckerlab.bsky.social
November 1, 2025 at 10:47 AM
Reposted
@igemhq.bsky.social and SB8 was unforgettable.

Would love to supervise a team on flower design for the new art/design village 🎨🌹🧬
October 31, 2025 at 5:16 PM
Large-scale Phylogenomics Reveals Systematic Loss of Anthocyanin Biosynthesis Genes at the Family Level in Cucurbitaceae
doi.org/10.1101/2025...
Large-scale Phylogenomics Reveals Systematic Loss of Anthocyanin Biosynthesis Genes at the Family Level in Cucurbitaceae
### Competing Interest Statement The authors have declared no competing interest.
doi.org
November 1, 2025 at 7:38 AM
ProteinSight: A Volumetric Deep Learning Model for Carotenoid-Binding Site Prediction
doi.org/10.1101/2025...
ProteinSight: A Volumetric Deep Learning Model for Carotenoid-Binding Site Prediction
Carotenoproteins play essential roles across all domains of life, yet identifying them from sequence or structure remains a significant challenge due to the lack of conserved motifs. To address this g...
doi.org
November 1, 2025 at 7:37 AM
Juggling offsets unlocks RNA-seq tools for fast and Scalable differential usage, Aberrant Splicing and Expression Retrieval
doi.org/10.1101/2023...

Animal, fungi, and plant genome sequences harbour different non-canonical splice sites
doi.org/10.1101/616565
Juggling offsets unlocks RNA-seq tools for fast and Scalable differential usage, Aberrant Splicing and Expression Retrieval.
RNA-seq data analysis relies on many different tools, each tailored to specific applications and coming with unique assumptions and limitations. Indeed, tools for differential transcript usage or rare...
doi.org
November 1, 2025 at 7:36 AM
Reposted
🎃 Why are (Halloween) pumpkins orange – and not pink or blue? This secret is explored by @bpucker.bsky.social, @nancy-choudhary.bsky.social and Marie Hagedorn. Their research examines how carotenoids took over, making Jack O’Lanterns forever orange: www.uni-bonn.de/en/news/colo.... Happy Halloween!
October 31, 2025 at 1:07 PM
Hi-C informed kernel association test: integrating 3-dimensional genome structure into variant-set association for whole-genome sequencing data
doi.org/10.1101/2025...

Comparison of read mapping and variant calling tools for the analysis of plant NGS data
doi.org/10.1101/2020...
Hi-C informed kernel association test: integrating 3-dimensional genome structure into variant-set association for whole-genome sequencing data
Variant-set association analysis is a powerful strategy for genetic studies of whole-genome sequence (WGS) data, especially for rare variants. By aggregating variant signals, variant-set analysis can ...
doi.org
October 31, 2025 at 5:57 AM
Genetic and developmental constraints drive parallelism in flower evolution
doi.org/10.1101/2025...

Genetic factors explaining anthocyanin pigmentation differences
doi.org/10.1101/2023...

#PlantSci #Evolution
Genetic and developmental constraints drive parallelism in flower evolution
Evolution repeatedly gives rise to similar phenotypes, reflecting shared constraints across independent lineages. In flowering plants, transitions to self-fertilization are typically accompanied by re...
doi.org
October 30, 2025 at 6:45 AM
Improving long-read somatic structural variant calling with pangenome and de novo personal genome assembly
doi.org/10.1101/2025...

Comparison of read mapping and variant calling tools for the analysis of plant NGS data
doi.org/10.1101/2020...

#Genomics #Bioinformatics
Improving long-read somatic structural variant calling with pangenome and de novo personal genome assembly
Accurate detection of mosaic and somatic structural variants (SVs) provides early diagnostic and therapeutic evidence for cancers. While long-read whole-genome sequencing leads to more accurate SV det...
doi.org
October 30, 2025 at 6:43 AM
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models
doi.org/10.1101/2025...

NAVIP: Unraveling the Influence of Neighboring Small Sequence Variants on Functional Impact Prediction
doi.org/10.1101/596718

#Python #Bioinformatics
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models
Generative models trained on natural sequences are increasingly used to predict the effects of genetic variation, enabling progress in therapeutic design, disease risk prediction, and synthetic biolog...
doi.org
October 29, 2025 at 5:31 AM
GrAnnoT, a tool for efficient and reliable annotation transfer through pangenome graph
doi.org/10.1101/2025...

KIPEs3: Automatic annotation of biosynthesis pathways
doi.org/10.1101/2022...

#FunctionalGenomics #PlantSci #Pangenomics
GrAnnoT, a tool for efficient and reliable annotation transfer through pangenome graph
The increasing availability of genome sequences has highlighted the limitations of using a single reference genome to represent the diversity within a species. Pangenomes, encompassing the genomic inf...
doi.org
October 29, 2025 at 5:29 AM
Predicting protein complexes in biosynthetic gene clusters
doi.org/10.1101/2025...

Phylogenomics and metabolic engineering reveal a conserved gene cluster in Solanaceae plants for withanolide biosynthesis
doi.org/10.1101/2024...

#Genomics #PlantSci #PlantMetabolism
Predicting protein complexes in biosynthetic gene clusters
Biosynthetic gene clusters (BGCs) are contiguous genomic regions that encode diverse, non-homologous proteins required for the production of specific natural products. Their genetic diversity underlie...
doi.org
October 28, 2025 at 5:43 AM
Reposted
Genetically engineered color-changing Arabidopsis 🧬📷- attempt #3

I think I finally nailed it with this one.
October 27, 2025 at 11:53 AM
Reposted
1/2 Want to become up to date with pangenomes and genome graphs and their history? Check out this fantastic review by @zbao.bsky.social!

Complexity welcome: Pangenome graphs for comprehensive population genomics
#pangenomes #plantscience #genomegraphs
www.cambridge.org/core/journal...
October 27, 2025 at 5:53 PM
Reposted
I cannot tell you how many tech journalists at prominent media organizations do not understand this
Chatbots — LLMs — do not know facts and are not designed to be able to accurately answer factual questions. They are designed to find and mimic patterns of words, probabilistically. When they’re “right” it’s because correct things are often written down, so those patterns are frequent. That’s all.
October 27, 2025 at 3:32 PM