PMLs are "exact" matches, but not maximal (unlike MEMs or matching statistics). That's why we call them "pseudo" matching lengths. PMLs are upper bounded by matching statistics.
October 23, 2025 at 2:38 PM
PMLs are "exact" matches, but not maximal (unlike MEMs or matching statistics). That's why we call them "pseudo" matching lengths. PMLs are upper bounded by matching statistics.
PML was introduced in SPUMONI. For matching statistics, there is a second loop which clarifies how much of the matches are overlapping, so the maximal exact matches could be retrieved. You can find out more about that in the MONI paper: pubmed.ncbi.nlm.nih.gov/35041495/
PML was introduced in SPUMONI. For matching statistics, there is a second loop which clarifies how much of the matches are overlapping, so the maximal exact matches could be retrieved. You can find out more about that in the MONI paper: pubmed.ncbi.nlm.nih.gov/35041495/
When the match is not extendable, the thresholds are still useful, because they point to the direction which has a longer common prefix with the current match. After each repositioning we always reset the PML to zero because we are not sure if it is extending a match or not. That’s PML.
October 22, 2025 at 5:30 AM
When the match is not extendable, the thresholds are still useful, because they point to the direction which has a longer common prefix with the current match. After each repositioning we always reset the PML to zero because we are not sure if it is extending a match or not. That’s PML.
We would like to choose the one that has a longer common prefix with the current row because that is more likely to extend the match. The thresholds exactly encode this information to guide the search to the right direction, possibly to get back into the backward search range.
October 22, 2025 at 5:26 AM
We would like to choose the one that has a longer common prefix with the current row because that is more likely to extend the match. The thresholds exactly encode this information to guide the search to the right direction, possibly to get back into the backward search range.
PML proceeds by LF on one row, for the first two steps (CC) PML row remains in the backward search range, so the match len is extended. But then for G, we don’t see a matching character in the purple row. So, PML repositions either to the bottom of the preceding G run or the head of the next G run.
October 22, 2025 at 5:25 AM
PML proceeds by LF on one row, for the first two steps (CC) PML row remains in the backward search range, so the match len is extended. But then for G, we don’t see a matching character in the purple row. So, PML repositions either to the bottom of the preceding G run or the head of the next G run.
This figure from our Movi Color preprint could address your questions. (Don’t worry about the colors) Here we compare the PML query to backward search. The purple box shows the row tracked by PML and the green BWT offsets track the backward search intervals on the same query (AGCC).
October 22, 2025 at 5:20 AM
This figure from our Movi Color preprint could address your questions. (Don’t worry about the colors) Here we compare the PML query to backward search. The purple box shows the row tracked by PML and the green BWT offsets track the backward search intervals on the same query (AGCC).
Exactly! In Movi we like to access the thresholds directly from the move rows, and we need it for each character different than the run's character (in the case of a mismatch during the PML query).
October 22, 2025 at 5:16 AM
Exactly! In Movi we like to access the thresholds directly from the move rows, and we need it for each character different than the run's character (in the case of a mismatch during the PML query).
6/6 Movi 2 supports multi-threading, further improving speed beyond the concurrent read processing per thread already available in Movi 1. You can read more about Movi 2 at: www.biorxiv.org/content/10.1...
October 21, 2025 at 8:07 PM
6/6 Movi 2 supports multi-threading, further improving speed beyond the concurrent read processing per thread already available in Movi 1. You can read more about Movi 2 at: www.biorxiv.org/content/10.1...
5/6 On the 466 haplotypes from the 2nd release of HPRC, the fastest Movi 2 index is under 50 GB. It can be reduced to 24 GB while remaining over 3x faster than SPUMONI. Movi 2 is smaller and faster than ropebwt3, although it computes PMLs, which are easier to get than the SMEMs found by ropebwt3.
October 21, 2025 at 8:07 PM
5/6 On the 466 haplotypes from the 2nd release of HPRC, the fastest Movi 2 index is under 50 GB. It can be reduced to 24 GB while remaining over 3x faster than SPUMONI. Movi 2 is smaller and faster than ropebwt3, although it computes PMLs, which are easier to get than the SMEMs found by ropebwt3.
4/6 Movi 2 offers three main modes: regular, blocked, and sampled. Each mode uses a different row size, resulting in a different number of added rows due to its specific splitting strategy.
October 21, 2025 at 8:07 PM
4/6 Movi 2 offers three main modes: regular, blocked, and sampled. Each mode uses a different row size, resulting in a different number of added rows due to its specific splitting strategy.
3/6 Movi 2 includes a mode that samples the largest field in each row to achieve a space–speed tradeoff. In this mode, it can be smaller than r-index–based methods while remaining 3–8× faster.
October 21, 2025 at 8:07 PM
3/6 Movi 2 includes a mode that samples the largest field in each row to achieve a space–speed tradeoff. In this mode, it can be smaller than r-index–based methods while remaining 3–8× faster.
2/6 Movi 2 uses length-based and threshold-based splitting to reduce the size of rows in the move structure. The threshold-splitting strategy compresses each threshold value to a single bit.
October 21, 2025 at 8:07 PM
2/6 Movi 2 uses length-based and threshold-based splitting to reduce the size of rows in the move structure. The threshold-splitting strategy compresses each threshold value to a single bit.