Gherman Novakovsky
gnovakovsky.bsky.social
Gherman Novakovsky
@gnovakovsky.bsky.social
PhD, Illumina AI lab
We followed up by testing promoter variants in Mendelian genes using MPRA. Surprisingly, PromoterAI was more effective than MPRA at prioritizing variants linked to patient phenotypes, highlighting limitations of MPRA for rare disease interpretation. (13/)
May 29, 2025 at 11:57 PM
In the Genomics England rare disease cohort, functional promoter variants predicted by PromoterAI were enriched in phenotype-matched Mendelian genes. These variants accounted for an estimated 6% of the rare disease genetic burden. (11/)
May 29, 2025 at 11:57 PM
In the UK biobank cohort, PromoterAI's predicted promoter variant effects correlated strongly with measured protein levels and quantitative traits, suggesting that promoter variants contribute meaningfully to phenotypic variation in the general population. (10/)
May 29, 2025 at 11:57 PM
PromoterAI's embeddings split promoters into three distinct classes: P1 (~9K genes, ubiquitously active), P2 (~3K genes, bivalent chromatin), E (~6K genes, enhancer-like). The E class, enriched for TATA boxes, may reflect enhancers co-opted as promoters. (9/)
May 29, 2025 at 11:57 PM
Fine-tuning improved PromoterAI’s ability to predict the direction of motif effects — a known issue of multitask models. The model often recognized motifs before fine-tuning, but got the direction wrong. After fine-tuning, its predictions aligned better with the data. (8/)
May 29, 2025 at 11:57 PM
We used our list of gene expression outliers to explore their effect on transcription factor binding sites. Our results show that it is easier for new variants to cause outlier gene expression by disrupting existing regulatory components rather than creating new ones. (7/)
May 29, 2025 at 11:57 PM
We also attempted to fine-tune Enformer and Borzoi on our promoter variant set. While performance improved, both models lagged behind PromoterAI. Notably, PromoterAI outperformed Enformer and was similar to Borzoi before fine-tuning. (6/)
May 29, 2025 at 11:57 PM
When it comes to predicting expression effects of promoter variants, PromoterAI achieved best performance across benchmarks spanning RNA, proteins, QTLs, and MPRA. (5/)
May 29, 2025 at 11:57 PM
The second step was to fine-tune the model using a carefully curated list of rare promoter variants linked to aberrant gene expression. The fine-tuning was done using a twin-network setup to ensure the generalization across unseen genes and datasets. (4/)
May 29, 2025 at 11:57 PM
First, we pre-trained PromoterAI to predict histone marks, TF binding, DNA accessibility, and CAGE signal from a genomic sequence. The key difference with models like Enformer and Borzoi is that we predict at a single base-pair resolution and use only TSS-centered regions. (3/)
May 29, 2025 at 11:57 PM
PromoterAI is built from transformer-inspired blocks called metaformers — but instead of attention, we use depthwise convolutions, making it a fully convolutional model. We believe that CNN-based methods are not surpassed yet and remain a great choice for genomics tasks. (2/)
May 29, 2025 at 11:57 PM
Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧵 (1/)
May 29, 2025 at 11:57 PM