Lightnews — Scholar-powered news

Matt Holt

@holtjma.bsky.social

Even if you do not (yet) fully buy in to basepair scoring, Aardvark includes a traditional genotype score... and it calculates both sets of scoring metrics *really* fast!

For small variants, on average 16x faster than hap.py, with most runs finishing <2 minutes (16 threads).

(6/N)

October 6, 2025 at 8:14 PM

Matt Holt

@holtjma.bsky.social

Since Aardvark looks at sequences, it enables some comparisons that were previously very challenging:

1. Tandem repeat (TR) v. TR benchmarking
2. TR v. small variant benchmarking
3. Structural variant (SV) benchmarking
4. Joint benchmarking (small + SV)

(5/N)

October 6, 2025 at 8:13 PM

Matt Holt

@holtjma.bsky.social

The main addition in Aardvark is the "basepair" scoring scheme, which compares local haplotype *sequences* instead of variants and genotypes. See the attached figure for a quick example of how basepair scoring compares to genotype scoring.
(2/N)

October 6, 2025 at 8:09 PM

Matt Holt

@holtjma.bsky.social

I have never seen a more beautiful image in my life, hype train!

April 2, 2025 at 2:21 PM

Matt Holt

@holtjma.bsky.social

Xiao Chen #ACMGMtg25 describing Kivvi tool for assembling long repeat units in medically relevant genes (KIV2 and D4Z4) using #PacBio HiFi reads. Large repeats accurately assembled!

March 20, 2025 at 7:10 PM

Matt Holt

@holtjma.bsky.social

The highly elusive North Alabama snow day is here! ⛄️

January 10, 2025 at 1:19 PM

Matt Holt

@holtjma.bsky.social

Just as a follow up, I was able to find the script that did this and test it using the original HiFi VCFs (i.e. high coverage) but the downsampled HiFi data. The attached figure is more in line with what I would expect. Higher risk of switchflip errors at the lower coverages of course.

January 9, 2025 at 5:53 PM

Matt Holt

@holtjma.bsky.social

@3rdreviewer.bsky.social This figure is downsampling with *just HiFi* in our HiPhase supplement. This is not exactly what you want because variant calling was a part of the experiment. Given quality variant calls, I expect the NG50 would be even better. Paper here: academic.oup.com/bioinformati...

January 9, 2025 at 4:07 PM

Matt Holt

@holtjma.bsky.social

Forgot to attach an image to 8/10, so here it is! An example of a CYP2D6 duplication event that is directly observable with long-read sequencing. We've enhanced this image to make it more obvious, but orange reads have *direct evidence* of two CYP2D6 *4.004 alleles. 11/10

December 11, 2024 at 3:07 PM

Matt Holt

@holtjma.bsky.social

With collaborators at Children’s Mercy Kansas City, Estonian Genome Centre, HudsonAlpha Institute for Biotechnology, and SingHealth Duke-NUS Institute of Precision Medicine; we also explore population haplotype distributions of these genes in 1,452 WGS datasets! 7/N

CYP2D6 haplotype and diplotype metabolizer phenotype distributions in our population, split by predicted ancestry

December 11, 2024 at 2:35 PM

Matt Holt

@holtjma.bsky.social

Across our entire benchmark, StarPhase diplotypes exactly match for 96.2%, an additional 3.3% are minor discrepancies caused by outdated comparators or database limitations, and we identify only 16 mismatches (0.5%)! Manual inspection of each supported the StarPhase diplotypes. 5/N

December 11, 2024 at 2:33 PM

Matt Holt

@holtjma.bsky.social

It’s a packed room for Xiao Chen’s talk on resolving paralogous genes with #PacBio HiFi sequencing! #ASHG24

November 8, 2024 at 5:48 PM

Matt Holt

@holtjma.bsky.social

One Republic putting on quite a show for us!

#ASHG24 #PacBio

November 7, 2024 at 4:45 AM

Matt Holt

@holtjma.bsky.social

Snowy blue bear greeting us at #ASHG24 this morning!

November 6, 2024 at 4:01 PM

Matt Holt

@holtjma.bsky.social

Interested in long-read pharmacogenomics? Then I have some exciting things to show you at #ASHG24... Looking forward to next week in Denver!

#PacBio #PGx

October 28, 2024 at 8:08 PM

Matt Holt

@holtjma.bsky.social

Up bright and early for the Liz Hurley Ribbon Run for cancer awareness!

October 19, 2024 at 2:14 PM

Matt Holt

@holtjma.bsky.social

FYI, you can basically take the Path all the way to Nathan Phillips Square from the #ACMGMtg24 conference center, don’t let the rain keep you from exploring!

March 14, 2024 at 10:13 PM

Matt Holt

@holtjma.bsky.social

Happy Friday!

February 23, 2024 at 9:46 PM

Matt Holt

@holtjma.bsky.social

We just added a couple tracks to MethBat segmentation that allow the output of haplotype-specific methylation status (methylated, unmethylated, or no data). Example IGV images of what this might look like are attached.

Full release notes: github.com/PacificBiosc...

High level view of a CpG island near the GNAS gene showing ASM. The new haplotype specific tracks are showcased in the image.

A zoomed in version of the same region showing where the NoData track is caused by a deletion of a CG (and other bases) on haplotype 2.

February 20, 2024 at 4:25 PM

Matt Holt

@holtjma.bsky.social

The biggest change relative to our pre-print is that HiPhase can now phase tandem repeat calls #STR from TRGT in addition to the small and structural variants from before. On average, this added ~68K additional phased variants per sample that were previously ignored!