Yunha Hwang
microyunha.bsky.social
Yunha Hwang
@microyunha.bsky.social
Building genomic intelligence @ Tatta Bio
Pinned
At Tatta Bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:
Today's sequence data infrastructure is set up for failure in the age of AI.
Building an open and collaborative sequence platform for both Human and AI scientists.
tattabio.substack.com
Reposted by Yunha Hwang
This. Is. So. Cool. 🤯
We're thrilled to announce SeqHub, an AI-enabled platform for biological sequence analysis. SeqHub brings together sequence search, genome annotation, and data sharing in one place.
November 5, 2025 at 11:51 PM
Reposted by Yunha Hwang
Released today from Tatta Bio: SeqHub! A place to explore, annotate, and share sequence data with functional insights. 

Over 1,000 scientists worldwide have already used SeqHub to annotate more than 550,000 proteins, uncovering new insights and accelerating discovery.
October 28, 2025 at 3:03 PM
We're thrilled to announce SeqHub, an AI-enabled platform for biological sequence analysis. SeqHub brings together sequence search, genome annotation, and data sharing in one place.
October 28, 2025 at 1:47 PM
Reposted by Yunha Hwang
Ready to explore New Lineages of Life with @jgi.doe.gov ? 🧬🦠

Registration for our 2025 NeLLi Symposium is now open. For the first time in collaboration with @unlv.edu

Mark the date: November 6-7 in Las Vegas, NV
You can now register for the November NeLLi Symposium!

Join us in November for talks focused on the most recent expansions of the Tree of Life, and the latest discoveries toward the evolution of cellular complexity and microbial symbiosis. 

Learn more: jointgeno.me/NeLLi

@nigelmouncey.bsky.social
2025 NeLLi Symposium | Joint Genome Institute
Las Vegas, NV Immediately followed by 1.5-day jamboree on November 6-7
jointgeno.me
August 25, 2025 at 9:39 PM
At Tatta Bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:
Today's sequence data infrastructure is set up for failure in the age of AI.
Building an open and collaborative sequence platform for both Human and AI scientists.
tattabio.substack.com
June 2, 2025 at 4:23 PM
Reposted by Yunha Hwang
I am very happy (and anxious) to share with you our most recent work in which we evaluated four of the most popular long-read assemblers,

www.biorxiv.org/content/10.1...

and tell you just a little bit about it in the following 🧵
Assemblies of long-read metagenomes suffer from diverse errors
Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms...
www.biorxiv.org
April 28, 2025 at 8:08 AM
It’s official! 🎉 I’m thrilled to announce that I will be joining MIT as an assistant professor in a shared appointment between Biology, EECS and Schwarzman College of Computing this fall.
April 28, 2025 at 1:47 PM
Tatta Bio is growing! We are hiring *two positions* in Business Development and Software Engineering to lead the development of AI-enabled scientific software for open science and biological sequence interpretation. Please check out the job postings at www.tatta.bio/careers and share widely!
Job Board | Notion
Overview
www.tatta.bio
March 24, 2025 at 4:29 PM
Can LLM agents discover novel protein functions? Introducing Gaia Agent 🌎 🤖: an AI biologist capable of reasoning across genomic contexts to predict functions of proteins! Gaia Agent is now integrated with Gaia Search at gaia.tatta.bio
December 17, 2024 at 1:38 PM
If you are at #NeurIPS2024 don't miss @ancornman1.bsky.social's talk on OMG/gLM2 at 9AM! @workshopmlsb.bsky.social East meeting room 11,12
December 15, 2024 at 4:21 PM
Excited to be at #NeurIPS this week. @ancornman1.bsky.social will give a spotlight talk at the @workshopmlsb.bsky.social on gLM2/OMG! Please reach out if you want to chat about gLM2/OMG/Gaia and our latest projects😇

www.biorxiv.org/content/10.1...
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
Biological language model performance depends heavily on pretraining data quality, diversity, and size. While metagenomic datasets feature enormous biological diversity, their utilization as pretraini...
www.biorxiv.org
December 10, 2024 at 4:01 PM
Reposted by Yunha Hwang
Are you working on natural products? We’ve just released version 4.0 of the MIBiG data standard and repository! It now includes 3059 biosynthetic gene clusters, thanks to the combined efforts of 288 expert contributors. A thread: (1/8) academic.oup.com/nar/advance-...
MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration
Abstract. Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in ag
academic.oup.com
December 10, 2024 at 8:05 AM
Reposted by Yunha Hwang
1/🧬 Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:

bit.ly/plaid-proteins
🧵
December 6, 2024 at 5:44 PM
Reposted by Yunha Hwang
Our Big Fantastic Virus Database (BFVD) is now published NAR! It contains protein structure predictions of major viral clades, enhanced by petabase-scale homology search and it's explorable on the web.
🌐 bfvd.foldseek.com
💾 bfvd.steineggerlab.workers.dev
📄 academic.oup.com/nar/advance-...
November 23, 2024 at 9:12 PM
Hello 🦋 #protein / #microbio / #BioML community! We are excited to release Gaia🌎, a context-aware protein search tool, extending protein search and discovery capabilities beyond sequence and structure, to include *genomic context*. Search your favorite protein sequences with on gaia.tatta.bio
November 19, 2024 at 3:07 PM