pdhsu.bsky.social
@pdhsu.bsky.social
This work was a wonderful collaboration with Silvana Konermann, led by star graduate student Nick Perry with key contributions from the amazing Liam Bartie, Dhruva Katrekar, Gabe Gonzalez, Matt Durrant, James Pai, Alison Fanton, Masa Hiraizumi, Chiara Ricci-Tam, and Hiroshi Nishimasu

Arc is on 🔥
May 15, 2025 at 1:47 PM
Bridge recombinases can modify the genome from single gene insertions to megabase-sized rearrangements

We're excited about programmable genome design at unprecedented length scales, especially when combined with AI-generated DNA sequences of high complexity (e.g. Evo 2)
May 15, 2025 at 1:47 PM
Most people think of recombinases for payload insertion (e.g. of CARs or corrective genes)

We provide a therapeutic proof-of-concept with bridge-mediated excision of the BCL11A enhancer for sickle cell anemia and of expanded repeat sequences found in Friedreich's ataxia
May 15, 2025 at 1:47 PM
But unlike other tools, bridge editing is not limited to insertion! We use IS622 for programmable, precise, and scarless genome rearrangements, inverting up to 0.93 Mb and excising up to 0.13 Mb
May 15, 2025 at 1:47 PM
We then performed a systematic deep mutational scan of IS622 and combined a rationally engineered, high activity recombinase mutant with our enhanced bridge RNAs to demonstrate 20% insertion efficiency into the human genome
May 15, 2025 at 1:47 PM
Using these enhanced bridge RNAs, we discovered design principles for maximizing the specificity of insertion into the human genome, achieving as high as 82% specificity genome-wide
May 15, 2025 at 1:47 PM
In a tour de force of molecular engineering, our team conducted computational ortholog mining, human cell activity screening, and structure-guided bridge RNA engineering to enhance the activity of IS622, a bridge system that showed promising but low activity in human cells
May 15, 2025 at 1:47 PM
Genomes encode biological complexity, which is determined by combinations of DNA mutations across millions of bases

In new work @arcinstitute.org, we report the discovery and engineering of the first programmable DNA recombinases capable of megabase-scale human genome rearrangement
May 15, 2025 at 1:47 PM
This was an insane team effort between Arc and Nvidia that convened machine learning and computational biology researchers across Stanford, UC Berkeley, and UCSF. Especially grateful to Jensen Huang for his belief and support of this vision and labor of love, and the entire Evo 2 team below
February 19, 2025 at 4:37 PM
Finally, if Evo 2 sounded exciting, @arcinstitute.org
is hiring. Check out open Arc jobs at arcinstitute.org/jobs or just email me directly. Our research group is hiring in molecular machine learning and the interface of computational and synthetic biology
February 19, 2025 at 4:35 PM
DNA is just the beginning. In middle school, we learn that genotype and the environment collaborate to create phenotype. We are incorporating Evo 2's understanding of genetic variation into Arc's virtual cell models that can be used for drug discovery and target ID
February 19, 2025 at 4:35 PM
We're excited to see what the research community builds on top of this foundation model to enable the biological "app store"
February 19, 2025 at 4:35 PM
Evo 2 can also be used for biological design. We demonstrate generation of entire human mitochondrial genomes with coherent synteny and even whole bacterial genomes and eukaryotic chromosomes (see the preprint for more detail)
February 19, 2025 at 4:35 PM
A common critique of LLMs is that they're black box. To probe what Evo 2 is learning about biology (without any labels or annotations), we turned to mechanistic interpretability with Goodfire AI

Intriguingly, this AI brain has features that may correspond to regulatory elements
February 19, 2025 at 4:35 PM
With a simple supervised model trained on Evo 2 embeddings, its performance gets even better, reaching SOTA for coding mutations also
February 19, 2025 at 4:35 PM
Without any variant-specific training, architectural optimization, or multiple sequence alignments, Evo 2 can predict the pathogenicity of breast cancer-associated mutations in genes like BRCA1

It's state of the art in doing this zero-shot for noncoding mutations
February 19, 2025 at 4:35 PM
Great, but what can it do? Evo 2 is a generalist model that can predict the pathogenic effects of human genome variants across coding and noncoding mutations

In other words, if you have a genetic mutation, Evo 2 has an opinion on whether or not it might cause disease
February 19, 2025 at 4:35 PM
To enable this, we report a new frontier deep learning architecture, StripedHyena 2, with improved loss scaling and up to 3× speedup in throughput at 40B scale compared to Transformer baselines or previous generation hybrid models
February 19, 2025 at 4:35 PM
This enables it to reason about and understand biological interactions across diverse length scales, from individual molecules to entire bacterial genomes or eukaryotic chromosomes
February 19, 2025 at 4:35 PM
Evo 2 is trained on 9.3T tokens of DNA with single-base resolution at 1M token context length 🥳

We release two models with 7B and 40B parameters along with weights, training and inference code, and pretraining data—making this one of the largest fully open AI models available
February 19, 2025 at 4:35 PM
AI provides a universal framework that leverages data and compute at scale to uncover higher-order patterns

Today, @arcinstitute.org in collaboration with Nvidia releases Evo 2—a fully open source biological foundation model trained on genomes spanning the entire tree of life.
February 19, 2025 at 4:35 PM