Stephen Burgess
stevesphd.bsky.social
Stephen Burgess
@stevesphd.bsky.social
Medical statistician, work with genetic data to disentangle causation from correlation. Author of book on Mendelian randomization.
Feedback is welcome as ever! Thanks to @angzhou.bsky.social for leading this work, and to Haodong, Ash, @amymariemason.bsky.social, Emma, and Elina for input!
January 26, 2026 at 12:19 PM
This makes a substantial difference to estimates for LDL-cholesterol, and a detectable but much smaller difference to estimates for BMI and vitamin D. The obvious limitation is this only holds for GxE interactions we can measure and account for.
January 26, 2026 at 12:19 PM
If we subtract the GxE interaction from the exposure, then we can stratify on this corrected exposure value. This correction is only necessary in the stratification step; the estimation can proceed using the uncorrected exposure values.
January 26, 2026 at 12:19 PM
For instance, genetic associations with 25(OH)D levels (a biomarker of vitamin D status) are larger in the summer and smaller in the winter, and genetic associations with several traits differ between men and women, and with socioeconomic markers.
January 26, 2026 at 12:19 PM
However, this assumption can also be violated. Enter Ang's manuscript! Ang shows that if we can model the heterogeneity in the genetic effect on the exposure, then we can correct for this heterogeneity in the doubly-ranked method.
January 26, 2026 at 12:19 PM
This is a strictly weaker assumption than the constant genetic effect assumption, in that it allows the magnitude of the genetic effect on the exposure to vary, but it still requires some degree of homogeneity in the genetic effect on the exposure.
January 26, 2026 at 12:19 PM
We developed a second method (doubly-ranked method) which makes a strictly weaker assumption that the ordering of individuals' exposure values would be the same if their genetic instrument were fixed to take any value (rank preserving assumption).
January 26, 2026 at 12:19 PM
While we assessed sensitivity to this assumption in the original methods paper, violations of this assumption in practice are stronger than we assessed, and realistic violations of the assumption can lead to substantial bias in practice.
January 26, 2026 at 12:19 PM
Our original method for non-linear Mendelian randomization (residual-stratified method) made a strong and unrealistic assumption that the effect of genetic variants on the exposure is constant for all individuals in the dataset.
January 26, 2026 at 12:19 PM
All statistical methods make assumptions (and those assumptions are inevitably always violated), but the extent to which they are violated and the impact of that violation on estimates is often unclear.
January 26, 2026 at 12:19 PM
New pre-print: "Correcting for effect modification in the doubly-ranked non-linear Mendelian randomization method" led by Ang Zhou available at www.medrxiv.org/content/10.6.... Brief thread:
Correcting for effect modification in the doubly-ranked non-linear Mendelian randomization method
The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simpl...
www.medrxiv.org
January 26, 2026 at 12:19 PM
Reposted by Stephen Burgess
LD patterns can make it difficult to select optimal instrumental variables for Mendelian randomization studies. @stevesphd.bsky.social & co of @hggadvances.bsky.social 's latest article evaluate the ability of four selection methods to increase instrument strength: bit.ly/4a0yXih #ASHG
January 22, 2026 at 9:17 PM
Thanks to Benji and others (@amymariemason.bsky.social, Chin Yang, Hyunseung, Hannah, and Marcus) for working on this! Great to see this published!
January 21, 2026 at 12:23 PM
We use these methods to estimate the effect of offspring smoking conditional on parental smoking behaviour - we saw some evidence for a direct effect of parental smoking status on offspring smoking status, although with wide confidence intervals in several methods.
January 21, 2026 at 12:23 PM
We present various methods that can be used in this setting depending on the format of data available (individual-level or summarized), who you have data on (both parents or one parent), and the assumptions (is assortative mating likely?).
January 21, 2026 at 12:23 PM
The idea of this work is not only to exploit randomness in whether you inherit a genetic variant, but also in whether you do not inherit a genetic variant from a parent. This enables not only the estimation of the effect of an exposure, but the direct effect of an exposure.
January 21, 2026 at 12:23 PM
All models are wrong and all instruments are invalid, but randomness inherent in how genetic variants are inherited means that genetic variants are often plausible instruments, particularly in within-family settings.
January 21, 2026 at 12:23 PM
New paper: "Extending the Use of Mendelian Randomisation With Non-Inherited Variants to Assess Socially Transmitted Parental Exposures Under Assortative Mating" published at Genetic Epidemiology and led by Benji Woolf: onlinelibrary.wiley.com/doi/10.1002/.... Brief thread:
January 21, 2026 at 12:23 PM
In summary, there are different methods for variant selection in cis-MR - they may result in more precise estimates, but: 1) always benchmark against the lead-variant estimate, and 2) biological considerations generally trump considerations - more relevant > more precise!
January 19, 2026 at 12:24 PM
Of course, statistical considerations are only part of the story - if we have a single variant that is known to better mimic the intervention of interest (e.g. loss of function) or associates with a positive control outcome, it may be better to use that variant.
January 19, 2026 at 12:24 PM
Our advice is to benchmark findings against the lead variant only analysis - if your chosen approach using multiple variants gives somewhat narrower CIs, then that could be reasonable. If it gives much more precise estimates, then something may have gone wrong numerically.
January 19, 2026 at 12:24 PM
Pruning approaches with high correlation thresholds tended to give the greatest R^2 statistics, but could be over-optimistic. Conditional approaches (such as COJO) performed slightly less well in terms of R^2, but were more reliable. PCA was a reasonable alternative, SuSiE was highly variable.
January 19, 2026 at 12:24 PM
However, we show in simulations that selecting too many variants can lead to over-estimating the variance explained - this is typically due to errors (e.g population mismatch) or variability (e.g small sample size) in the variant correlation matrix.
January 19, 2026 at 12:24 PM
For 15 gene regions, we show that selecting multiple variants can explain a greater proportion of variance in the exposure, leading to more precise MR estimates - in some cases, the gain is trivial; but in others, the gain in efficiency is substantial.
January 19, 2026 at 12:24 PM
When multiple variants in a gene region associate with an exposure (that is, a biomarker of the mechanism of interest), how to choose which of the variants to include in a MR analysis? We address this question from a statistical perspective.
January 19, 2026 at 12:24 PM