Lightnews — Scholar-powered news

Mohsen Zakeri

@mohsenzakeri.bsky.social

PMLs are "exact" matches, but not maximal (unlike MEMs or matching statistics). That's why we call them "pseudo" matching lengths. PMLs are upper bounded by matching statistics.

October 23, 2025 at 2:38 PM

Mohsen Zakeri

@mohsenzakeri.bsky.social

Thank you!! Although that comes at the expense of more cache misses. 🙈

October 22, 2025 at 5:37 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

PML was introduced in SPUMONI. For matching statistics, there is a second loop which clarifies how much of the matches are overlapping, so the maximal exact matches could be retrieved. You can find out more about that in the MONI paper: pubmed.ncbi.nlm.nih.gov/35041495/

MONI: A Pangenomic Index for Finding Maximal Exact Matches - PubMed

Recently, Gagie et al. proposed a version of the FM-index, called the <i>r</i>-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the <i>r...

pubmed.ncbi.nlm.nih.gov

October 22, 2025 at 5:34 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

When the match is not extendable, the thresholds are still useful, because they point to the direction which has a longer common prefix with the current match. After each repositioning we always reset the PML to zero because we are not sure if it is extending a match or not. That’s PML.

October 22, 2025 at 5:30 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

We would like to choose the one that has a longer common prefix with the current row because that is more likely to extend the match. The thresholds exactly encode this information to guide the search to the right direction, possibly to get back into the backward search range.

October 22, 2025 at 5:26 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

PML proceeds by LF on one row, for the first two steps (CC) PML row remains in the backward search range, so the match len is extended. But then for G, we don’t see a matching character in the purple row. So, PML repositions either to the bottom of the preceding G run or the head of the next G run.

October 22, 2025 at 5:25 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

This figure from our Movi Color preprint could address your questions. (Don’t worry about the colors) Here we compare the PML query to backward search. The purple box shows the row tracked by PML and the green BWT offsets track the backward search intervals on the same query (AGCC).

October 22, 2025 at 5:20 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

Exactly! In Movi we like to access the thresholds directly from the move rows, and we need it for each character different than the run's character (in the case of a mismatch during the PML query).

October 22, 2025 at 5:16 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

A formal dentition of the thresholds is in this paper (Refining the r-index): www.sciencedirect.com/science/arti...

October 22, 2025 at 5:11 AM

Mohsen Zakeri

@mohsenzakeri.bsky.social

6/6 Movi 2 supports multi-threading, further improving speed beyond the concurrent read processing per thread already available in Movi 1. You can read more about Movi 2 at: www.biorxiv.org/content/10.1...

October 21, 2025 at 8:07 PM

Mohsen Zakeri

@mohsenzakeri.bsky.social

5/6 On the 466 haplotypes from the 2nd release of HPRC, the fastest Movi 2 index is under 50 GB. It can be reduced to 24 GB while remaining over 3x faster than SPUMONI. Movi 2 is smaller and faster than ropebwt3, although it computes PMLs, which are easier to get than the SMEMs found by ropebwt3.

October 21, 2025 at 8:07 PM

Mohsen Zakeri

@mohsenzakeri.bsky.social

4/6 Movi 2 offers three main modes: regular, blocked, and sampled. Each mode uses a different row size, resulting in a different number of added rows due to its specific splitting strategy.

October 21, 2025 at 8:07 PM

Mohsen Zakeri

@mohsenzakeri.bsky.social

3/6 Movi 2 includes a mode that samples the largest field in each row to achieve a space–speed tradeoff. In this mode, it can be smaller than r-index–based methods while remaining 3–8× faster.

October 21, 2025 at 8:07 PM

Mohsen Zakeri

@mohsenzakeri.bsky.social

2/6 Movi 2 uses length-based and threshold-based splitting to reduce the size of rows in the move structure. The threshold-splitting strategy compresses each threshold value to a single bit.