Mohsen Zakeri
banner
mohsenzakeri.bsky.social
Mohsen Zakeri
@mohsenzakeri.bsky.social
Postdoctoral Fellow at Johns Hopkins University, Computational Biology ❤️
www.mohsenzakeri.com
This figure from our Movi Color preprint could address your questions. (Don’t worry about the colors) Here we compare the PML query to backward search. The purple box shows the row tracked by PML and the green BWT offsets track the backward search intervals on the same query (AGCC).
October 22, 2025 at 5:20 AM
A formal dentition of the thresholds is in this paper (Refining the r-index): www.sciencedirect.com/science/arti...
October 22, 2025 at 5:11 AM
6/6 Movi 2 supports multi-threading, further improving speed beyond the concurrent read processing per thread already available in Movi 1. You can read more about Movi 2 at: www.biorxiv.org/content/10.1...
October 21, 2025 at 8:07 PM
5/6 On the 466 haplotypes from the 2nd release of HPRC, the fastest Movi 2 index is under 50 GB. It can be reduced to 24 GB while remaining over 3x faster than SPUMONI. Movi 2 is smaller and faster than ropebwt3, although it computes PMLs, which are easier to get than the SMEMs found by ropebwt3.
October 21, 2025 at 8:07 PM
4/6 Movi 2 offers three main modes: regular, blocked, and sampled. Each mode uses a different row size, resulting in a different number of added rows due to its specific splitting strategy.
October 21, 2025 at 8:07 PM
3/6 Movi 2 includes a mode that samples the largest field in each row to achieve a space–speed tradeoff. In this mode, it can be smaller than r-index–based methods while remaining 3–8× faster.
October 21, 2025 at 8:07 PM
2/6 Movi 2 uses length-based and threshold-based splitting to reduce the size of rows in the move structure. The threshold-splitting strategy compresses each threshold value to a single bit.
October 21, 2025 at 8:07 PM
5/5 Processing the reads with Movi Color is as fast as Kraken 2, and 20x faster than Metabuli’s total query time. Movi Color is able to index sets of complete genomes from many species, but uses significantly more memory. The memory footprint can be reduced by using minimizer-digestion approaches.
May 29, 2025 at 2:39 PM
4/5 Movi Color is 2x more accurate than Kraken 2 and Metabuli for taxonomic classification of ONT reads at the species level.
May 29, 2025 at 2:38 PM
3/5 Movi Color classifies a read based on the colors observed during the pseudo matching lengths (PML) computation procedure.
May 29, 2025 at 2:38 PM
2/5 Movi Color adds colors to BWT runs. Like in colored Bruijn graphs, colors are sets of documents, defined based on the origin of the suffixes in each BWT run. Each distinct color is stored once in the color table.
May 29, 2025 at 2:37 PM
4/4 Mov isi now capable of performing count query with the backward search procedure which is now implemented for the move structure. Movi is 16 times faster than r-index while using about 3 times more memory to perform the count query.
February 19, 2024 at 6:30 PM
3/4 Prefetching uses a single thread while processing many reads concurrently. Using prefetching, the median latency observed for Movi’s inner loop is 91 ns.
February 19, 2024 at 6:29 PM
2/4 Prefetching is made possible by move structure's tabular form and simple inner loop. During the PML computation, upon a new memory access, a prefetching instruction is initiated asynchronously while the algorithm processes other reads for which the memory is already fetched.
February 19, 2024 at 6:29 PM
1/4 A new version of Movi uses memory prefetching to achieve a degree of latency hiding, improving the speed even further over the version I wrote about in November. Movi is now 30 times faster than SPUMONI to compute pseudo matching lengths for ONT reads.
www.biorxiv.org/content/10.1...
February 19, 2024 at 6:29 PM
5/5 Movi’s index is large, but it has great scalability for pangenomes because the number of rows in the Move Structure grows strictly with r, the number of runs in the Burrows Wheeler Transform.
November 7, 2023 at 6:48 PM
4/5 Movi is 12 times faster than SPUMONI in computing pseudo-matching lengths while using 4.7 times more space.
November 7, 2023 at 6:48 PM
3/5 Locality of reference in Movi’s index leads to minimal number of cache misses per base and that leads to high predictability of query times, suited for real-time applications like adaptive sampling.
November 7, 2023 at 6:47 PM
2/5 Movi is designed based on Nishimoto & Tabie's Move Structure which achieves O(1) query time and O(r) space for LF-mapping queries which is a unique feature compared to similar approaches like the r-index.
November 7, 2023 at 6:47 PM