Lightnews — Scholar-powered news

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Inria Senior researcher.
Head of the https://team.inria.fr/genscale/ at Inria and Irisa.
Algorithmics for sequencing data analyses, genomics and metagenomics.

Posts Replies Media Videos

Pierre Peterlongo

@pierrepeterlongo.bsky.social

🤝 Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of ‪@yoann.bsky.social‬. 8/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

🌊 On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

💾 For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

📊 Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

🧬 Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

💻 We implemented Kaminari in C++17, available under the MIT license at github.com/yhhshb/kaminari. Additional results and reproducibility info at github.com/vicLeva/benchmarks_kaminari. 3/8

GitHub - yhhshb/kaminari: 雷 - kaminari (thunder/lightning)

雷 - kaminari (thunder/lightning). Contribute to yhhshb/kaminari development by creating an account on GitHub.

github.com

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

🔍 Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8

May 27, 2025 at 12:06 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Thanks guys for your precious feedback. I modified the code accordingly.

March 25, 2025 at 2:15 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Hi @imartayan.bsky.social I wanted to run distinct-kmers, but I faced limitations as my input data contains non-ACGTacgt characters. Thus I created this github.com/pierrepeterl...
(again extremely simple)

GitHub - pierrepeterlongo/hyperloglog_kmer_counter

Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

github.com

March 25, 2025 at 11:09 AM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.

GitHub - pierrepeterlongo/hyperloglog_kmer_counter

Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

github.com

March 25, 2025 at 11:08 AM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

@imartayan.bsky.social I needed a version of distinct_kmers for multiple fasta/fastq.
I created this fork github.com/pierrepeterl...
I'm almost ashamed that this code modification is public, but maybe it can be useful.

GitHub - pierrepeterlongo/distinct-kmers: How many distinct k-mers are there in a sequence?

How many distinct k-mers are there in a sequence? Contribute to pierrepeterlongo/distinct-kmers development by creating an account on GitHub.

github.com

March 24, 2025 at 5:50 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.

March 20, 2025 at 10:50 AM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...

March 20, 2025 at 7:59 AM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.

March 20, 2025 at 7:59 AM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.

March 18, 2025 at 6:28 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Thanks Ben - I'll at this.

March 18, 2025 at 6:17 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.

March 18, 2025 at 4:31 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

In two words: increase only minimal stored values of a cBF when adding elements to this filter.

March 18, 2025 at 4:31 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.

January 30, 2025 at 6:40 PM

Pierre Peterlongo

@pierrepeterlongo.bsky.social

I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.

January 30, 2025 at 5:08 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news