Dave Burke
daveyburke.bsky.social
Dave Burke
@daveyburke.bsky.social
CTO at Arc Institute | Google Advisor (Android) 🇮🇪 + 🇺🇲
Zach believes we're actually making a company to sell this thing. He has a business card (he's the CEO and I'm his CTO obvs). Even made a badge. For now, we're open sourcing the base model :). Python code and build instructions here: github.com/daveyburke/Z.... Enjoy!
March 23, 2025 at 5:42 AM
In the future, we can use this mechanism to steer DNA generation, for example make a prokaryotic sequence have more eukaryotic features, or increase the presence of alpha helices. You can read more in the Evo 2 preprint here: arcinstitute.org/manuscripts/...
Manuscript | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
arcinstitute.org
February 19, 2025 at 4:07 PM
It shows genomic concepts in a reference genome such as coding sequences, alpha helices, tRNAs, etc. The tool overlays corresponding features that activate when Evo 2 detects such concepts. What’s amazing is Evo learned all this from genomes in nature without any supervision!
February 19, 2025 at 4:07 PM
Together with @GoodfireAI we built a visualizer that lets you explore the concepts learned by Evo 2. Try it here: arcinstitute.org/tools/evo/ev...
February 19, 2025 at 4:07 PM
We applied sparse autoencoders to Evo 2, our new DNA model, to show it autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions
February 19, 2025 at 4:07 PM
This is one of many applications of this work. Evolution has learned to read and write DNA over millions of years, and Evo 2 aims to learn from this knowledge. The AI model serves as a foundation for understanding the language of life across all domains—from bacteria to humans
February 19, 2025 at 4:05 PM
This particular variant was initially reported as a variant of unknown significance (VUS). Years later, oncologists learned it was a driver of breast and ovarian cancers. In the Evo paper, we show state of the art performance on classifying BRCA1 variants of unknown significance
February 19, 2025 at 4:05 PM
If I take a known deleterious mutation c.5095C>T that changes just the 5095th nucleotide in exon 17 from C to T, the negative log likelihood increases from 0.96 to 0.99 indicating the model is less confident. Evo recognizes that this mutation causes a loss of function of the gene
February 19, 2025 at 4:05 PM
Evo Designer can also score DNA sequences, i.e. how likely the sequence is in nature. Here’s an example of a section of the BRCA1 - certain mutations in this gene are known to increase the risk of breast & ovarian cancer
February 19, 2025 at 4:05 PM
Prompt with a sequence or species and the model will generate new DNA. Select sections of generated DNA to visualize the corresponding proteins, or use BLAST to find similar sequences in nature
February 19, 2025 at 4:05 PM
We built a new interactive user interface for generation and scoring called Evo Designer arcinstitute.org/tools/evo/ev...
Evo 2: DNA Foundation Model | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
arcinstitute.org
February 19, 2025 at 4:05 PM
Evo 2 uses a new hybrid architecture called StripedHyena 2 enabling a long context window of 1M nucleotides with a model size of 40B parameters, trained on 2048 H100 GPUs. Preprint can be found at arcinstitute.org/manuscripts/... and includes links to source code
Manuscript | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
arcinstitute.org
February 19, 2025 at 4:05 PM