Dmitry Penzar
pensarata.bsky.social
Dmitry Penzar
@pensarata.bsky.social
PhD student, regulatory genomics, machine learning in biology, algorithms
The key ingredient of our solution was MPRA-LegNet, but we also incorporated a large number of new ideas to master the challenge.

It’s inspiring that the second-place team also used LegNet as the basis for their solution.

More details to come
December 8, 2025 at 3:58 AM
Our team achieved first place in the CAGI7 lentiMPRA challenge on predicting the effects of single-nucleotide mutations in regulatory elements, surpassing the nearest competitors by a significant margin.
December 8, 2025 at 3:58 AM
(13/13) In turn, the wider set of data for Final TFs remains suitable for offline benchmarking with the open-source bibis framework (github.com/autosome-ru/...). The whole story can be found on bioRxiv: doi.org/10.1101/2025....
GitHub - autosome-ru/ibis-challenge: Repository with source code and metadata for IBIS challenge
Repository with source code and metadata for IBIS challenge - autosome-ru/ibis-challenge
github.com
November 18, 2025 at 10:55 PM
(12/13) The online Leaderboard benchmarking platform, including the preprocessed data, benchmarking protocols, and rich documentation, remains fully functional and accessible online (ibis.autosome.org) to facilitate development of the future TFBS models.
IBIS Challenge
ibis.autosome.org
November 18, 2025 at 10:55 PM
(11/13) However, those changes did not translate into better prediction of SNP effects. Additionally, pre-initialization of the first convolutional layers with the best available PWMs for the corresponding TFs didn't yield any notable performance gain.
November 18, 2025 at 10:55 PM
(10/13) We conducted ablation studies on LegNet. Minor modifications, such as replacing global average pooling with global max pooling in the SE block, led to substantial performance gains, making the resulting model the best in the post-challenge assessment.
November 18, 2025 at 10:55 PM
(9/13) Post-challenge analysis added extra DL models: top models from the DREAM challenge and popular architectures unused in IBIS, including Malinois and DNA language models. Fine-tuned DNA LMs performed far worse than fully supervised approaches.
November 18, 2025 at 10:55 PM
(8/13) TF-binding models can be used to predict the effect of single-nucleotide variants. In A2G, PWMs performed unexpectedly well, e.g. MEX secured 2nd place. In G2A, the original top triple-A models dominated, followed by MEX and RSAT — the strongest PWM-based approach.
November 18, 2025 at 10:55 PM
(7/13) Yet, several deep learning approaches (DL) failed substantially in cross-experiment validation – in some cases performing far worse than PWMs. Unlocking the full potential of DL clearly requires careful architectural and training design.
November 18, 2025 at 10:55 PM
(6/13) Performance of the solutions varied substantially across TFs and experimental platforms. The top-scoring ML models outperformed PWM-based IBIS solutions from the competition and our PWM baseline from Codebook MEX (x.com/VorontsovIE/...).
Ilya Vorontsov on X: "Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon." / X
Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon.
x.com
November 18, 2025 at 10:55 PM
(5/13) Once again, we congratulate the runner-up teams (Medici, Salimov & Frolov lab, callitmagic), and the winners (Bench Pressers, mj, and Biology Impostor) (x.com/halfacrocodi...)
November 18, 2025 at 10:55 PM
(4/13) Participants employed a wide range of methods from classic motif discovery with position-specific weight matrices (PWMs) to arbitrary advanced approaches (triple-As), including CNNs, RNNs, gradient boosting, and even more exotic approaches.
November 18, 2025 at 10:55 PM
(3/13) For the first time, the IBIS Challenge assessed in depth the transferability of DNA motif models from artificial to genomic sequences (A2G), and vice versa (G2A), with rigorous test-train splits, multiple performance metrics, and transparent ranking system.
November 18, 2025 at 10:55 PM
(2/13) TFs orchestrate transcriptional programs by recognizing short DNA motifs. The long-standing goal is to develop reliable models of TFs' DNA binding specificities and avoid biases of particular experimental assays (x.com/halfacrocodi...).
Vanja (Ivan Kulakovskiy) on X: "Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9" / X
Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9
x.com
November 18, 2025 at 10:55 PM
(1/13) Excited to share the outcome of the IBIS Challenge! The IBIS challenge united dozens of teams across the world in tackling the problem of modeling transcription factor (TF) binding specificity using a diverse collection of experimental datasets for understudied human TFs.
November 18, 2025 at 10:55 PM
Reposted by Dmitry Penzar
Excited / nervous to share the “magnum opus” of my postdoc in Andreas Wagner’s lab!

"De-novo promoters emerge more readily from random DNA than from genomic DNA"

This project is the accumulation of 4 years of work, and lays the foundation for my future group. In short, we… (1/4)
De-novo promoters emerge more readily from random DNA than from genomic DNA
Promoters are DNA sequences that help to initiate transcription. Point mutations can create de-novo promoters, which can consequently transcribe inactive genes or create novel transcripts. We know lit...
www.biorxiv.org
August 28, 2025 at 6:37 AM
Reposted by Dmitry Penzar
Out in Cell @cp-cell.bsky.social: Design principles of cell-state-specific enhancers in hematopoiesis
🧬🩸 screen of fully synthetic enhancers in blood progenitors
🤖 AI that creates new cell state specific enhancers
🔍 negative synergies between TFs lead to specificity!
www.cell.com/cell/fulltex...
🧵
Design principles of cell-state-specific enhancers in hematopoiesis
Screen of minimalistic enhancers in blood progenitor cells demonstrates widespread dual activator-repressor function of transcription factors (TFs) and enables the model-guided design of cell-state-sp...
www.cell.com
May 8, 2025 at 4:07 PM
Reposted by Dmitry Penzar
Finally published! We developed an epigenomics to therapeutics screening approach that identifies naturally occurring elements that can titrate expression of transgenes at various levels including single elements stronger than the B-globin LCR. www.nature.com/articles/s41...
Large-scale discovery of potent, compact and erythroid specific enhancers for gene therapy vectors - Nature Communications
This study presents a large-scale enhancer screening approach to optimize gene therapy vectors. A compact, potent, erythroid-specific enhancer used in a therapeutic vector, improved viral titers, tran...
www.nature.com
May 9, 2025 at 2:15 PM
Reposted by Dmitry Penzar
Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...
Programmatic design and editing of cis-regulatory elements
The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...
www.biorxiv.org
April 24, 2025 at 12:59 PM
Reposted by Dmitry Penzar
We share a lot of our ideas, code, datasets (that we spend years sanitizing) early. Often way before we release preprints. We do this so that others can use, build on, improve & even "beat" our approaches. But I want to say a few things about some simple expectations 1/
January 17, 2025 at 5:16 PM
Reposted by Dmitry Penzar
We wrote a review article on modelling and design of transcriptional enhancers using sequence-to-function models.

From conventional machine learning methods to CNNs and using models as oracles/generative AI for synthetic enhancer design!

@natrevbioeng.bsky.social

www.nature.com/articles/s44...
Modelling and design of transcriptional enhancers - Nature Reviews Bioengineering
Enhancers are genomic elements critical for regulating gene expression. In this Review, the authors discuss how sequence-to-function models can be used to unravel the rules underlying enhancer activit...
www.nature.com
February 28, 2025 at 2:45 PM
Reposted by Dmitry Penzar
Super excited to announce our latest work. On a personal note, it's not an exaggeration to say that blood, sweat, and tears got us to the finish line on this: working w/ an outstanding global team of scientists in Germany, Japan, Russia, and USA responding in >100 pages of complex reviewer comments.
Massively parallel characterization of transcriptional regulatory elements - Nature
Lentivirus-based reporter assays for 680,000 regulatory sequences from three cell lines coupled to machine-learning models lead to insights into the grammar of cis-regulatory elements.
www.nature.com
January 15, 2025 at 5:39 PM
Reposted by Dmitry Penzar
Finally out! We present EXTRA-seq, a new EXTended Reporter Assay to quantify endogenous enhancer-promoter communication at kb scale!
www.biorxiv.org/content/10.1...
A 🧵about what it can do:
#SynBio #DeepLearning #GeneRegulation
EXTRA-seq: a genome-integrated extended massively parallel reporter assay to quantify enhancer-promoter communication
Precise control of gene expression is essential for cellular function, but the mechanisms by which enhancers communicate with promoters to coordinate this process are not fully understood. While seque...
biorxiv.org
December 16, 2024 at 2:39 PM
Wonderful.
Just two weeks ago I was explaining to a junior colleague the problem of exaggerated claims in science. This paragraph is exactly what should be printed in place of a user agreement when anybody submits a paper.
@dereklowe.bsky.social honing in on the same bottom line message from @wpwalters.bsky.social @prof-ajay-jain.bsky.social

it's so true and hits so hard:
December 7, 2024 at 6:11 PM