Pierre Peterlongo
pierrepeterlongo.bsky.social
Pierre Peterlongo
@pierrepeterlongo.bsky.social
Inria Senior researcher.
Head of the https://team.inria.fr/genscale/ at Inria and Irisa.
Algorithmics for sequencing data analyses, genomics and metagenomics.
🤝 Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of ‪@yoann.bsky.social‬. 8/8
May 27, 2025 at 12:06 PM
🌊 On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8
May 27, 2025 at 12:06 PM
💾 For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8
May 27, 2025 at 12:06 PM
📊 Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8
May 27, 2025 at 12:06 PM
🧬 Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8
May 27, 2025 at 12:06 PM
💻 We implemented Kaminari in C++17, available under the MIT license at github.com/yhhshb/kaminari. Additional results and reproducibility info at github.com/vicLeva/benchmarks_kaminari. 3/8
GitHub - yhhshb/kaminari: 雷 - kaminari (thunder/lightning)
雷 - kaminari (thunder/lightning). Contribute to yhhshb/kaminari development by creating an account on GitHub.
github.com
May 27, 2025 at 12:06 PM
🔍 Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8
May 27, 2025 at 12:06 PM
Thanks guys for your precious feedback. I modified the code accordingly.
March 25, 2025 at 2:15 PM
Hi @imartayan.bsky.social I wanted to run distinct-kmers, but I faced limitations as my input data contains non-ACGTacgt characters. Thus I created this github.com/pierrepeterl...
(again extremely simple)
GitHub - pierrepeterlongo/hyperloglog_kmer_counter
Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.
github.com
March 25, 2025 at 11:09 AM
That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.
GitHub - pierrepeterlongo/hyperloglog_kmer_counter
Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.
github.com
March 25, 2025 at 11:08 AM
@imartayan.bsky.social I needed a version of distinct_kmers for multiple fasta/fastq.
I created this fork github.com/pierrepeterl...
I'm almost ashamed that this code modification is public, but maybe it can be useful.
GitHub - pierrepeterlongo/distinct-kmers: How many distinct k-mers are there in a sequence?
How many distinct k-mers are there in a sequence? Contribute to pierrepeterlongo/distinct-kmers development by creating an account on GitHub.
github.com
March 24, 2025 at 5:50 PM
I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.
March 20, 2025 at 10:50 AM
Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...
March 20, 2025 at 7:59 AM
Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.
March 20, 2025 at 7:59 AM
HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.
March 18, 2025 at 6:28 PM
Thanks Ben - I'll at this.
March 18, 2025 at 6:17 PM
Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.
March 18, 2025 at 4:31 PM
In two words: increase only minimal stored values of a cBF when adding elements to this filter.
March 18, 2025 at 4:31 PM
Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.
January 30, 2025 at 6:40 PM
I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.
January 30, 2025 at 5:08 PM