Lightnews — Scholar-powered news

Jeremy Schwartzentruber

@jeremy37.bsky.social

Scientist at Illumina using AI methods to interpret the non-coding genome and empower new genetic association discoveries.

Posts Replies Media Videos

Jeremy Schwartzentruber

@jeremy37.bsky.social

We trained scores for ~60 quantitative traits in 256k UKB individuals, and found that these had higher correlation with values in the 64k test set than models based on raw variant annotations did. Individuals with high FlexRV-PRS were also more enriched for outlier trait values.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

I visualized all of the variant score / MAF weights as a heatmap, counting the number of times each weight transformation had the lowest p value. Interestingly, highly constrained genes (S_het > 0.05) more often benefit from placing weight on rarer, highly deleterious variants.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

Why is this approach so effective? I think it comes down to the fact that the “annotation → trait” mapping is often nonlinear, and importantly - is different for each gene. Here are a few examples.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

We also checked whether the FlexRV gene-based associations are enriched for proximity to GWAS hits (which are better powered but don’t give the causal gene directly) or have high PoPS scores (locus-independent GWAS signal), and found better enrichments than other methods.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

We used multiple approaches to check whether these associations are real. We ran FlexRV in 200k UKB individuals and checked whether these replicated in the reported DeepRVAT results on the full cohort - and found that they replicated at a higher rate than Regenie or STAAR.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

For example, DeepRVAT is a very cool deep learning method that combines many variant annotations together into a “gene impairment score” for each individual. But for the 28 quantitative traits we tested in UKB, FlexRV found 37% more associations (and 58% more for binary)!

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

Each set of weights is a hypothesis about the possible relationship between variants and their effects.
Like STAAR, we combine p-values from these tests together with the Cauchy combination test (CCT). We were surprised by just how much this improved power over other approaches.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

Similar to STAAR we ran many burden/SKAT tests with different weights. But in our case, we used a single high-performing variant effect predictor (as well as MAF), which we transformed to weights based on plausible relationships between score and biological effect.

November 5, 2025 at 10:29 PM

Jeremy Schwartzentruber

@jeremy37.bsky.social

We thought that the relationship between variant effect predictions and traits might be nonlinear. Indeed, just visualizing PrimateAI3D scores vs. various measurements in UK Biobank shows you it can be nonlinear. The same is true for AlphaMissense and other scores.

November 5, 2025 at 10:29 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news