Adam He
missingarib.bsky.social
Adam He
@missingarib.bsky.social
Genomics, transcription regulation, and machine learning.
Reposted by Adam He
Our paper describing the Range Extender element which is required and sufficient for long-range enhancer activation at the Shh locus is now available at @nature.com. Congrats to @gracebower.bsky.social who led the study. Below is a brief summary of the main findings www.nature.com/articles/s41... 1/
Range extender mediates long-distance enhancer activity - Nature
The REX element is associated with long-range enhancer–promoter interactions.
www.nature.com
July 2, 2025 at 4:17 PM
Reposted by Adam He
A case study on the challenges of evaluating AI predictions in biology and the implications for published results.

1/2
rachel.fast.ai/posts/2025-0...
Rachel Thomas, PhD - Deep learning gets the glory, deep fact checking gets ignored
an AI researcher going back to school for immunology
rachel.fast.ai
June 5, 2025 at 12:41 AM
Reposted by Adam He
Frustratingly easy domain adaptation for cross-speciestranscription factor binding prediction [new]
Predicts TF binding in target species via aligned sequence data distribution moments for cross-species generalization.
May 26, 2025 at 10:54 PM
Reposted by Adam He
breaking news, white steam emerges from the Autoclave
May 8, 2025 at 4:23 PM
Reposted by Adam He
Out in Cell @cp-cell.bsky.social: Design principles of cell-state-specific enhancers in hematopoiesis
🧬🩸 screen of fully synthetic enhancers in blood progenitors
🤖 AI that creates new cell state specific enhancers
🔍 negative synergies between TFs lead to specificity!
www.cell.com/cell/fulltex...
🧵
Design principles of cell-state-specific enhancers in hematopoiesis
Screen of minimalistic enhancers in blood progenitor cells demonstrates widespread dual activator-repressor function of transcription factors (TFs) and enables the model-guided design of cell-state-sp...
www.cell.com
May 8, 2025 at 4:07 PM
Reposted by Adam He
Many of you enjoy our sequence-based model of single-cell RNA and ATAC data scooby... Don't miss Laura Marten's talk at the upcoming Kipoi seminar about it this Wed!
@lauradmartens.bsky.social @johahi.bsky.social @kipoizoo.bsky.social
Last preprint version:
www.biorxiv.org/content/10.1...
May 5, 2025 at 4:26 PM
Finally finished porting our CLIPNET models to PyTorch. I've released the code for loading the TF models into PT as part our PersonalBPNet package, which also contains ...
GitHub - adamyhe/PersonalBPNet: A small modification to bpnetlite's BPNet to accomodate large validation datasets.
A small modification to bpnetlite's BPNet to accomodate large validation datasets. - adamyhe/PersonalBPNet
github.com
May 5, 2025 at 8:07 PM
Reposted by Adam He
New preprint from the @arnausebe.bsky.social lab! 💐

Here @crisnava.bsky.social, @seanamontgomery.bsky.social & collaborators develop a novel ChIPseq protocol, and demonstrate its huge potential to study the evolution of chromatin function and regulation across the eukaryotic tree of life.
March 19, 2025 at 10:31 AM
Reposted by Adam He
Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...
Programmatic design and editing of cis-regulatory elements
The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...
www.biorxiv.org
April 24, 2025 at 12:59 PM
High-resolution reconstruction of cell-type specific transcriptional regulatory processes from bulk sequencing samples
Biological systems exhibit remarkable heterogeneity, characterized by intricate interplay among diverse cell types. Resolving the regulatory processes of specific cell types is crucial for delineating developmental mechanisms and disease etiologies. While single-cell sequencing methods such as scRNA-seq and scATAC-seq have revolutionized our understanding of individual cellular functions, adapting bulk genome-wide assays to achieve single-cell resolution of other genomic features remains a significant technical challenge. Here, we introduce Deep-learning-based DEconvolution of Tissue profiles with Accurate Interpretation of Locus-specific Signals (DeepDETAILS), a novel quasi-supervised framework to reconstruct cell-type-specific genomic signals with base-pair precision. DeepDETAILS’ core innovation lies in its ability to perform cross-modality deconvolution using scATAC-seq reference libraries for other bulk datasets, benefiting from the affordability and availability of scATAC-seq data. DeepDETAILS enables high-resolution mapping of genomic signals across diverse cell types, with great versatility for various omics datasets, including nascent transcript sequencing (such as PRO-cap and PRO-seq) and ChIP-seq for chromatin modifications. Our results demonstrate that DeepDETAILS significantly outperformed traditional statistical deconvolution methods. Using DeepDETAILS, we developed a comprehensive compendium of high-resolution nascent transcription and histone modification signals across 39 diverse human tissues and 86 distinct cell types. Furthermore, we applied our compendium to fine-map risk variants associated with Primary Sclerosing Cholangitis (PSC), a progressive cholestatic liver disorder, and revealed a potential etiology of the disease. Our tool and compendium provide invaluable insights into cellular complexity, opening new avenues for studying biological processes in various contexts. ### Competing Interest Statement The authors have declared no competing interest.
www.biorxiv.org
April 10, 2025 at 5:07 PM
Reposted by Adam He
Our latest work indicates that termination of paused RNA polymerase is its most likely fate, while attempting to reconcile disparate estimates of relative rates and pause residency times from previous studies: www.biorxiv.org/content/10.1...
Genome-wide dynamic nascent transcript profiles reveal that most paused RNA polymerases terminate
We present a simple model for analyzing and interpreting data from kinetic experiments that measure engaged RNA polymerase occupancy. The framework represents the densities of nascent transcripts with...
www.biorxiv.org
April 8, 2025 at 8:48 PM
Does anyone know if the ATAC-seq bam files on ENCODE have had their tags shifted? and if so, by the more common +4/-5 or by +4/-4?
March 17, 2025 at 1:14 AM
Reposted by Adam He
Join us for our next Kipoi Seminar with with Alexander Sasse
@lxsasse.bsky.social
@zmbh.uni-heidelberg.de

👉Advanced training strategies for genomic sequence-to-function models
📅 Wed March 5, 5:30pm CET
🧬 kipoi.org/seminar/
🦋 @kipoizoo.bsky.social
Kipoi
kipoi.org
March 1, 2025 at 7:26 PM
Reposted by Adam He
ralphi: a deep reinforcement learning framework for haplotype assembly [new]
Deep reinforcement learning accurately partitions reads into haplotype sets. It uses fragment graphs and the max-cut problem for the reward objective.
February 22, 2025 at 3:51 AM
Reposted by Adam He
A scalable approach to investigating sequence-to-expression prediction from personal genomes [new]
Models fail to gen. w/ individual var., personal genome training helps some individuals only.
February 22, 2025 at 6:34 AM
Reposted by Adam He
New preprint w/ @soumyakundu.bsky.social @sbmontgom.bsky.social @anshulkundaje.bsky.social !

Using deep learning & scATAC-seq, we studied context-specific variants in disease & evolution, and introduce FLARE for de novo mutations—w/ application to autism-affected families.

doi.org/10.1101/2025...
Mapping the regulatory effects of common and rare non-coding variants across cellular and developmental contexts in the brain and heart
Whole genome sequencing has identified over a billion non-coding variants in humans, while GWAS has revealed the non-coding genome as a significant contributor to disease. However, prioritizing causal...
www.biorxiv.org
February 19, 2025 at 1:32 PM
Reposted by Adam He
This is the preprint write up of my sabbatical work with Dave Kelley’s group at Calico. We tried out several transformer replacements for multi-task learning in functional genomics (i.e. what Borzoi does). Mamba, in particular, seems to outperform a mini version of Borzoi, especially when “striped”.
Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage https://www.biorxiv.org/content/10.1101/2025.02.13.638190v1
February 18, 2025 at 4:17 AM
Reposted by Adam He
Refining sequence-to-expression modelling with chromatin accessibility [new]
Chromatin accessibility enhances sequence-to-expression models by focusing on open regions. Incorporating it improves predictions and reduces bias.
February 16, 2025 at 8:02 AM
Reposted by Adam He
Benchmarking DNA Sequence Models for Causal Regulatory Variant Prediction in Human Genetics [new]
TraitGym benchmarks reveal model-specific strengths in causal variant prediction for Mendelian/complex traits.
February 13, 2025 at 7:37 AM
Reposted by Adam He
Reposted by Adam He
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the word—please RT! #MLCB2025
February 5, 2025 at 2:50 AM
www.biorxiv.org/content/10.1...

Might explain some of the discrepancies between QTL effect & personalized gene expression prediction performance by S2F models
Haplotype rather than single causal variants effects contribute to regulatory gene expression associations in human myeloid cells
Genome-wide association studies typically identify hundreds to thousands of loci, many of which harbor multiple independent peaks, each parsimoniously assumed to be due to the activity of a single cau...
www.biorxiv.org
February 1, 2025 at 12:47 AM