Jeremy Schwartzentruber
jeremy37.bsky.social
Jeremy Schwartzentruber
@jeremy37.bsky.social
Scientist at Illumina using AI methods to interpret the non-coding genome and empower new genetic association discoveries.
Big thanks to co-author Jacob Ulirsch who originated the idea for FlexRV, and to Kyle Farh for leadership.
Thanks also to those who found me at ASHG and discussed our results!
@gagneurlab.bsky.social, Hilary Finucane, Luke O’Connor, Po Ruh Loh.
Happy for thoughts / comments / questions from anyone.
November 5, 2025 at 10:29 PM
If you’re curious for more details, check out the paper! FlexRV will soon be available on github (link to follow), and free for academic use. www.biorxiv.org/content/10.1...
Flexibly Modeling Rare Variant Pathogenicity Improves Gene Discovery for Complex Traits
Rare variant burden tests can directly identify genes that influence complex traits, but their power is limited by our ability to separate functional from benign alleles. We introduce FlexRV, an appro...
www.biorxiv.org
November 5, 2025 at 10:29 PM
We trained scores for ~60 quantitative traits in 256k UKB individuals, and found that these had higher correlation with values in the 64k test set than models based on raw variant annotations did. Individuals with high FlexRV-PRS were also more enriched for outlier trait values.
November 5, 2025 at 10:29 PM
Using variant weights from FlexRV we can train improved rare variant polygenic scores (PGS). We first split UKB into train/test sets. Then, for significant genes found in the training set, we assume that the variant effect for a carrier relates to the weight from the FlexRV test.
November 5, 2025 at 10:29 PM
Note that FlexRV is not doing any training to find the “best” variant weights - it is a statistical test where we assume some possible relationships, test them all, and correct for multiple testing. With an implementation based on STAAR, it is fast enough for biobank scale.
November 5, 2025 at 10:29 PM
I visualized all of the variant score / MAF weights as a heatmap, counting the number of times each weight transformation had the lowest p value. Interestingly, highly constrained genes (S_het > 0.05) more often benefit from placing weight on rarer, highly deleterious variants.
November 5, 2025 at 10:29 PM
Why is this approach so effective? I think it comes down to the fact that the “annotation → trait” mapping is often nonlinear, and importantly - is different for each gene. Here are a few examples.
November 5, 2025 at 10:29 PM
We also checked whether the FlexRV gene-based associations are enriched for proximity to GWAS hits (which are better powered but don’t give the causal gene directly) or have high PoPS scores (locus-independent GWAS signal), and found better enrichments than other methods.
November 5, 2025 at 10:29 PM
We used multiple approaches to check whether these associations are real. We ran FlexRV in 200k UKB individuals and checked whether these replicated in the reported DeepRVAT results on the full cohort - and found that they replicated at a higher rate than Regenie or STAAR.
November 5, 2025 at 10:29 PM
For example, DeepRVAT is a very cool deep learning method that combines many variant annotations together into a “gene impairment score” for each individual. But for the 28 quantitative traits we tested in UKB, FlexRV found 37% more associations (and 58% more for binary)!
November 5, 2025 at 10:29 PM
Each set of weights is a hypothesis about the possible relationship between variants and their effects.
Like STAAR, we combine p-values from these tests together with the Cauchy combination test (CCT). We were surprised by just how much this improved power over other approaches.
November 5, 2025 at 10:29 PM
Similar to STAAR we ran many burden/SKAT tests with different weights. But in our case, we used a single high-performing variant effect predictor (as well as MAF), which we transformed to weights based on plausible relationships between score and biological effect.
November 5, 2025 at 10:29 PM
We thought that the relationship between variant effect predictions and traits might be nonlinear. Indeed, just visualizing PrimateAI3D scores vs. various measurements in UK Biobank shows you it can be nonlinear. The same is true for AlphaMissense and other scores.
November 5, 2025 at 10:29 PM
Giving weights to variants is a better approach, since the power of the test is maximal when the weights correspond to the true variant effect sizes. You could use a variant pathogenicity prediction, such as PrimateAI3D - but is this the best effect size estimate for every gene?
November 5, 2025 at 10:29 PM
Rare variants in a gene are usually grouped together and tested as a set for association with a disease or quantitative trait. You can use a “mask”, i.e. choose which variants are rare or damaging enough to include (like Regenie), or you can give a numeric weight to each variant.
November 5, 2025 at 10:29 PM
I will be at ASHG for the first time in... nearly a decade? Maybe see you there. I actually have an idea it might be worth chatting about.
September 28, 2025 at 11:11 AM
Ah, I was going to guess some random pleiotropic loci like ABO, MHC... but technical artifacts makes so much more sense!
September 28, 2025 at 11:09 AM
Correction, they trained on 60% of that 125k sample set... and on 20% of it, which may be very few cases indeed. Don't they have another ~275k European samples they could have evaluated on?
June 7, 2025 at 2:14 PM
I'm pretty confused by this paper. They're comparing NN-based PGS trained on 125k UKB samples to those from PGS catalogue, trained on different sample sets... presumably many which include much larger number of cases in case-control GWAS. Is that right? Isn't this an apples to oranges comparison?
June 7, 2025 at 2:13 PM