Lightnews — Scholar-powered news

Andre Kahles

@akkah21.bsky.social

30 followers 23 following 11 posts

Posts Replies Media Videos

Andre Kahles

@akkah21.bsky.social

While MetaGraph provides a lossless representation of the input k-mer set, it is not a lossless compression of the raw reads. To reach petabase scale, we remove noisy k-mers prior to indexing — a step that we show has only minimal impact on search sensitivity.

October 8, 2025 at 8:56 PM

Andre Kahles

@akkah21.bsky.social

We show that MetaGraph indexes are both scalable and cost-efficient for querying. We Searching 1 Mbp of sequence against the entire SRA costs less than $1 on standard cloud infrastructure — making Petabase-scale biological data truly searchable and accessible.

October 8, 2025 at 8:56 PM

Andre Kahles

@akkah21.bsky.social

Our indexes support fast exact matching as well as alignment with edits. Labels can represent sample metadata, coordinates or quantification values. We can store 10’000 human transcriptome samples in < 160 GB and return position-wise expression for any queried sequence.

October 8, 2025 at 8:56 PM

Andre Kahles

@akkah21.bsky.social

We have already processed more than 10 Petabases of raw sequence data from the SRA and make the compressed indexes publicly available for search (metagraph.ethz.ch), download and cloud-based access.

October 8, 2025 at 8:56 PM

Andre Kahles

@akkah21.bsky.social

At its core, MetaGraph represents all input sequences as labeled, succinct de Bruijn graphs — a highly compressed yet fully searchable structure. Each k-mer carries metadata labels that remain interactively queryable through a flexible API.

October 8, 2025 at 8:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news