Lightnews — Scholar-powered news

Selin Jessa

@selinjessa.bsky.social

And then we stratified off-target base edits in non-coding loci based on their predicted consequences on the epigenome. In a case study, an intergenic off-target edit overlaps multiple motifs - our models predict that it specifically disrupts an AP-1 site. Much more in the paper, check out Tong's 🧵!

November 7, 2025 at 6:38 PM

Selin Jessa

@selinjessa.bsky.social

We then used sequence-to-activity deep learning models, to predict effects of non-coding edits on TF binding and chromatin accessibility. We first show that a ChromBPNet model can predict the same GATA site disruption mechanism exploited by the FDA-approved Casgevy medicine, specifically in T cells:

Erythroblast ChromBPNet model predicts impact of known therapeutic CRISPR target (exagamglogene autotemcel, Casgevy) on accessibility a) Schematic of Casgevy mechanism. BCL11A represses fetal hemoglobin in adulthood. Disruption of a BCL11A enhancer reactivates fetal hemoglobin. b) Observed ATAC and predicted accessibility from ChromBPNet BCL11A intron, for T-cells (this study) and erythroblasts 48. c) Predicted accessibility for chr2 60495264: T>C edit and reference sequence in erythroblasts, along with DeepLIFT contribution scores. d) Predicted accessibility of chr2 60495267: T>C edit and reference sequence in erythroblasts. The ABE8e edit window is derived from the highest efficiency gRNA sg1620

November 7, 2025 at 6:38 PM

Selin Jessa

@selinjessa.bsky.social

And then we stratified off-target base edits in non-coding loci based on their predicted consequences on the epigenome. We show a case study of an intergenic off-target edit overlapping multiple motifs. Our models predict that it disrupts an AP-1 site. So much more in the paper, check out Tong's 🧵!

November 7, 2025 at 4:25 PM

Selin Jessa

@selinjessa.bsky.social

We then used sequence-to-activity deep learning models, to predict effects of non-coding edits on TF binding and chromatin accessibility. We first show that a ChromBPNet model can predict the same GATA site disruption mechanism exploited by the FDA-approved Casgevy medicine, specifically in T cells:

November 7, 2025 at 4:25 PM

Selin Jessa

@selinjessa.bsky.social

Last, we found that putative causal noncoding variants for various diseases were enriched in cREs in a cell type-specific manner, and we used our models to predict how single nucleotide changes disrupt/create motifs and thus alter accessibility, providing mechanistic interpretation of variant effect

Heatmap showing enrichment of disease associated putative causal variants in regulatory elements of fetal cell types

Predicted effects of CAD variant in muscle endothelial cells. The predicted accessibility for the effect and non-effect alleles show the variant is predicted to increase accessibility through ablation of a weak ZEB/SNAI motif and creation of a CEBP site

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

Deep learning models can learn which sequences are predictive of accessibility, but also which *negatively* influence accessibility. We discovered a handful of negative motifs (in peaks!) which were extremely abundant in every cell type, enriched near nucleosome dyads, & concentrated at peak flanks

MYOD1 loci. Predicted and observed accessibility, and contribuition scores showing negatively-contributing region

c) De novo motifs (as CWMs) for each negative motif category, and most similar known PWM in external databases. d) Left and
middle: Heatmaps indicating counts of motif instances in 10 bp bins from inferred nucleosome dyad positions (left) and peak summits (middle), Z-scored across distance bins per motif in each heatmap. Right: Proportion of genomic instances overlapping various genomic features.

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

We also found examples of "soft" syntax, where motifs synergize across longer distances (<150 bp), potentially reflecting active or passive competition of TFs with nucleosomes mediating cooperativity:

An animation showing the predicted accessibility as two motifs are inserted at closer distances to each other. They have a soft synergistic effect, increasing as motifs move closer together

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

Our models were able to predict synergistic effects at exactly the motif syntax described for the Coordinator motif by @seungsookim.bsky.social/Wysocka lab, where X-ray crystallography showed DNA facilitates weak contacts between TWIST1 & ALX4, and the TF complex directs mesenchymal gene programs

Comparison between predicted synergy for a BHLH and HD composite motif, and the Coordinator motif identified in Kim et al, Cell, 2024

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

We identified 100s of composite motifs, so we used our models to run in silico experiments to systematically define effects of motif syntax (spacing/orientation) on synergy. We found dozens of cases of "hard" syntax, where synergy relies on strict motif position, likely due to direct interactions:

Schematic showing in silico experiments. Motifs are inserted in 100 background sequences and pushed through ChromBPNet models. Predicted accessibility/counts are averaged over 100 sequences to get isolated joint effects

An animation showing the predicted accessibility as two motifs are inserted at closer distances to each other. They have a synergistic spike when they are 5 bp apart

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

We used this map of motif instances in every cell type to identify ubiquitous and cell type specific motifs, and found that a small set of ubiquitous, CG-rich motifs tended to occur in promoters, while cell type specific motifs were predominant at distal and intronic regions:

Summary of motifs, one row per broad group of base motifs. From left
to right: de novo motif representation as a contribution weight matrix (CWM), total number of genomic instances across cell types, proportion of instances overlapping various genomic features, distribution of median distance of instances (per cell type) to nearest transcription start sites (TSS), proportion of instances from cell types in
each organ, and contribution to accessibility (positive or negative)

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

Clustering these motifs, we assembled a lexicon of 508 unique motifs which influence accessibility during development, and mapped these back to the peak regions to automatically annotate predictive motif instances in open chromatin in every cell type, representing putative TF binding sites

Left: summary of lexicon motifs. Right: genomic tracks at SRF locus, showing observed and predicted accessibility. A zoomed regions shows contribution scores and predictive instances

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

We used model interpretation techniques (DeepLIFT/SHAP & TF-MoDISco) to score the contribution of every nucleotide to accessibility, and discover recurrent patterns of sequences predictive of local chromatin accessibility - and these patterns turned out to primarily resemble TF binding motifs!

Schematic showing model interpretation and then clustering of predictive regions to derive motifs

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

What are the DNA sequences that drive accessibility in each cell type? In every cell type, we trained ChromBPNet models - deep learning models tasked with predicting chromatin accessibility in 1 kbp regions at basepair resolution from 2 kbp local sequence alone. These models work remarkably well:

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

Do these cREs drive activity in vivo? We inspected the VISTA enhancers (validated in reporter mice), and our data suggested previously unappreciated activity of some enhancers in the liver! Our accessibility data resolved specificity of one enhancer to erythroblasts, which we confirmed w/ histology:

Left, cryosections and histological analysis showing activity of enhancer in erythrocyte lineage, and right: accessibility tracks showing signal in erythro lineage

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

We generated a single-cell multi-ome, multi-organ atlas of human development using SHARE-seq, profiling gene expression & accessibility in 818k cells from 12 organs, 10-23 PCW: the Human Development Multiomic Atlas (HDMA). We annotated 203 cell types & defined >1M candidate cis-regulatory elements:

A schematic of 12 organs and the number of samples in each organ that are contained in the Human Development Multiomic Atlas (HDMA). Text also reads n=76 samples, 203 cell types, 817k cell

Dendrogram showing relatedness of 203 cell types, along with marker gene expression

May 3, 2025 at 6:27 PM

Selin Jessa

@selinjessa.bsky.social

Last, we found that putative causal noncoding variants for various diseases were enriched in cREs in a cell type-specific manner, and we used our models to predict how single nucleotide changes disrupt/create motifs and thus alter accessibility, providing mechanistic interpretation of variant effect

Heatmap showing traits where putative causal variants are enriched in open chromatin regions in fetal cell types

rs12740374, a coronary artery disease variant, is a fetal-only hit overlapping a ZEB/SNAIL negative motif in muscle endothelial cells. Tracks show that the variant is
predicted to increase accessibility through creation of a C/EBP site (the corresponding motif is shown as an inset at bottom right)

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

Deep learning models can learn which sequences are predictive of accessibility, but also which *negatively* influence accessibility. We discovered a handful of negative motifs (in peaks!) which were extremely abundant in every cell type, enriched near nucleosome dyads, & concentrated at peak flanks

Track showing predicted and observed accessibility at MYOD1 locus. In a zoomed region, contribution scores show an example of a negative motif

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

We also found examples of "soft" syntax, where motifs synergize across longer distances (<150 bp) with decaying effects as they move further apart, potentially reflecting active or passive competition of TFs with nucleosomes mediating cooperativity:

An animation showing the predicted accessibility as two motifs are inserted at closer distances to each other. They have a soft synergy, with increasing accessibility as they move closer together

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

For example, our models were able to predict synergistic effects at exactly the motif syntax described for the Coordinator motif by @seungsookim & Wysocka lab, where X-ray crystallography showed DNA facilitates weak contacts between TWIST1 & ALX4, and the TF complex directs mesenchymal gene programs

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

We identified 100s of composite motifs, so we used our models to run in silico experiments to systematically define effects of motif syntax (spacing/orientation) on synergy. We found 48 cases of "hard" syntax, where synergy relies on strict motif position, likely due to direct protein interactions:

Schematic indicating workflow for in silico marginalization to predict causal effects of sequence motifs on accessibility. Motifs are inserted into 100 inaccessible backround regions, pushed through ChromBPNet models, and the predicted accsessibility profiles/counts are averaged across sequences to isolate effects of motifs

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

We used this map of motif instances in every cell type to identify ubiquitous and cell type specific motifs, and found that a small set of ubiquitous, CG-rich motifs tended to occur in promoters, while cell type-specific motifs were predominant at distal and intronic regions:

A data summary where each row corresponds to one motif. The first column has the motif logos, second column has a heatmap with total instances for each motif across cell types, third column is proportion of instances in different genomic regions, fourth column distance to TSS, fifth column distribution of instances among organs, and fifth column the motif class based on direction of contribution to accessibility

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

Clustering these motifs, we assembled a lexicon of 508 unique motifs which influence accessibility during development, and mapped these back to the peak regions to automatically annotate predictive motif instances in open chromatin in every cell type, representing putative TF binding sites:

Left: overview of different categories of 508 motifs learned de novo. Right: tracks showing observed and predicted ATAC at the SRF locus. A zoomed in region shows model-derived contribution scores and annotated motif instances

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

We used model interpretation techniques (DeepLIFT/SHAP & TF-MoDISco) to score the contribution of every nucleotide to accessibility, and discover recurrent patterns of sequences predictive of local chromatin accessibility - and these patterns turned out to primarily resemble TF binding motifs!

Rest of the schematic showing how model interpretation produces base-resolution contribution scores. High contribution score regions are then clustered into motifs using TF-MoDISco, and used to annotate predictive motif instances in peaks using Fi-NeMo

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

What are the DNA sequences that drive accessibility in each cell type? In every cell type, we first trained ChromBPNet models - deep learning models tasked with predicting chromatin accessibility in 1 kbp regions at basepair resolution from 2 kbp local sequence alone, resulting in a suite of models:

May 3, 2025 at 6:02 PM

Selin Jessa

@selinjessa.bsky.social

Do these cREs drive activity in vivo? We inspected the VISTA enhancers (validated in reporter mice), and our data suggested previously unappreciated activity of some enhancers in the liver! Our accessibility data resolved specificity of one enhancer to erythroblasts, which we confirmed w/ histology:

Left: bright field and H&E staining images of VISTA
embryo mm101 sections. Blue color is from X-Gal staining indicating where the enhancer is
active. Right: Accessibility at VISTA enhancer mm101 indicating specificity to erythrocyte lineage

May 3, 2025 at 6:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news