Andre Kahles
Andre Kahles
@akkah21.bsky.social
Thanks Rob! Much appreciated.
October 9, 2025 at 3:06 PM
We invite you to try out Metagraph at metagraph.ethz.ch, learn more about our framework in the paper (nature.com/articles/s41...) or start building your own indexes from your own data (github.com/ratschlab/me...).
MetaGraph - Biological Sequence Search
Petabase-Scale Search for DNA, RNA & Amino acids
metagraph.ethz.ch
October 8, 2025 at 8:56 PM
We would like to thank the bioinformatics community for years of support and openness. A special thanks to the Logan effort, whose contig set we use as input for one of our largest indexes.
October 8, 2025 at 8:56 PM
While MetaGraph provides a lossless representation of the input k-mer set, it is not a lossless compression of the raw reads. To reach petabase scale, we remove noisy k-mers prior to indexing — a step that we show has only minimal impact on search sensitivity.
October 8, 2025 at 8:56 PM
We show that MetaGraph indexes are both scalable and cost-efficient for querying. We Searching 1 Mbp of sequence against the entire SRA costs less than $1 on standard cloud infrastructure — making Petabase-scale biological data truly searchable and accessible.
October 8, 2025 at 8:56 PM
Our indexes support fast exact matching as well as alignment with edits. Labels can represent sample metadata, coordinates or quantification values. We can store 10’000 human transcriptome samples in < 160 GB and return position-wise expression for any queried sequence.
October 8, 2025 at 8:56 PM
We have already processed more than 10 Petabases of raw sequence data from the SRA and make the compressed indexes publicly available for search (metagraph.ethz.ch), download and cloud-based access.
October 8, 2025 at 8:56 PM
At its core, MetaGraph represents all input sequences as labeled, succinct de Bruijn graphs — a highly compressed yet fully searchable structure. Each k-mer carries metadata labels that remain interactively queryable through a flexible API.
October 8, 2025 at 8:56 PM
Modern biology produces vast amounts of raw sequencing data — genomes, transcriptomes, and protein sequences. MetaGraph provides a unified computational framework to index, query, and reason across this landscape of biological information.
October 8, 2025 at 8:56 PM
The following thread describes the main ideas and results of this joint work with @gxxxr.bsky.social @karasikov.bsky.social @adamant-pwn.bsky.social @HarunMustafa416
October 8, 2025 at 8:56 PM