Stephen Burgess
@stevesphd.bsky.social
Medical statistician, work with genetic data to disentangle causation from correlation. Author of book on Mendelian randomization.
Thanks to @hwang_seongwon for leading the project, to @jeffreypullin.bsky.social for performing code review, and to @chr1sw.bsky.social allace and John Whittaker for co-supervising - has been a fun project so far, and look forward to getting feedback from the community!
November 8, 2025 at 3:40 PM
Thanks to @hwang_seongwon for leading the project, to @jeffreypullin.bsky.social for performing code review, and to @chr1sw.bsky.social allace and John Whittaker for co-supervising - has been a fun project so far, and look forward to getting feedback from the community!
However, like all statistical methods, it has limitations, and results should not be thought of as unquestionable truth. It is likely that the differences between datasets in other applications are similar or stronger than those we considered here.
November 8, 2025 at 3:40 PM
However, like all statistical methods, it has limitations, and results should not be thought of as unquestionable truth. It is likely that the differences between datasets in other applications are similar or stronger than those we considered here.
In conclusion, while all methods were well-calibrated in the baseline scenario, they struggled to declare colocalization to different degrees when the datasets varied in terms of platform and population. Colocalization can be a valuable tool for triaging and prioritizing.
November 8, 2025 at 3:40 PM
In conclusion, while all methods were well-calibrated in the baseline scenario, they struggled to declare colocalization to different degrees when the datasets varied in terms of platform and population. Colocalization can be a valuable tool for triaging and prioritizing.
This was not intended to be a fair comparison - fairness is impossible to achieve. For example, coloc-SuSiE was judged to support colocalization if there was high PP.H4 for any pair of credible sets. Rather, we wanted to compare methods as they would typically be used.
November 8, 2025 at 3:40 PM
This was not intended to be a fair comparison - fairness is impossible to achieve. For example, coloc-SuSiE was judged to support colocalization if there was high PP.H4 for any pair of credible sets. Rather, we wanted to compare methods as they would typically be used.
We acknowledge that there are many legitimate reasons why we may observe non-colocalization for the same protein when using estimates from different platforms / populations. Also, we acknowledge that different methods use different standards of evidence.
November 8, 2025 at 3:40 PM
We acknowledge that there are many legitimate reasons why we may observe non-colocalization for the same protein when using estimates from different platforms / populations. Also, we acknowledge that different methods use different standards of evidence.
Enumeration methods tended to outperform proportional methods in most scenarios. However, no single approach dominated in all scenarios, with coloc-SuSiE reporting the highest rate of colocalization in Case 1, Case 2B, and Case 4; colocPropTest in Case 2F; and coloc in Case 3.
November 8, 2025 at 3:40 PM
Enumeration methods tended to outperform proportional methods in most scenarios. However, no single approach dominated in all scenarios, with coloc-SuSiE reporting the highest rate of colocalization in Case 1, Case 2B, and Case 4; colocPropTest in Case 2F; and coloc in Case 3.
In these cases, results were more mixed. We observed frequent disagreement between methods as to whether there was colocalization, non-colocalization, or insufficient evidence. In the worst-case scenario, colocalization was only agreed by all four methods for 20% of proteins.
November 8, 2025 at 3:40 PM
In these cases, results were more mixed. We observed frequent disagreement between methods as to whether there was colocalization, non-colocalization, or insufficient evidence. In the worst-case scenario, colocalization was only agreed by all four methods for 20% of proteins.
We then consider associations with the same protein, but measured on different platforms (Olink vs SomaLogic in British [Case 2B] and Finnish [Case 2F] populations), and measured in different populations (British vs Finnish for Olink [Case 3] and SomaLogic [Case 4]).
November 8, 2025 at 3:40 PM
We then consider associations with the same protein, but measured on different platforms (Olink vs SomaLogic in British [Case 2B] and Finnish [Case 2F] populations), and measured in different populations (British vs Finnish for Olink [Case 3] and SomaLogic [Case 4]).
In the baseline context, we split the UK Biobank Pharma Proteomics Project in two at random, and tested associations for the same protein in one half of the data versus the other half of the data (Case 1). Unsurprisingly, all methods performed well in this context.
November 8, 2025 at 3:40 PM
In the baseline context, we split the UK Biobank Pharma Proteomics Project in two at random, and tested associations for the same protein in one half of the data versus the other half of the data (Case 1). Unsurprisingly, all methods performed well in this context.
We perform colocalization for protein-coding gene regions with ≥1 pQTL across four datasets using four colocalization methods: coloc, coloc-SuSiE, prop.coloc, and colocPropTest in a range of contexts.
November 8, 2025 at 3:40 PM
We perform colocalization for protein-coding gene regions with ≥1 pQTL across four datasets using four colocalization methods: coloc, coloc-SuSiE, prop.coloc, and colocPropTest in a range of contexts.
Big thanks to all co-authors for contributing to this: @amymariemason.bsky.social, @VerenaZuber, @explodecomputer, Elena, @IamYuXu, Amanda, @BarWoolf, @eliasallara, @dpsg108, and @OpeSoremekun. Feedback would be very welcome!
October 27, 2025 at 8:43 AM
Big thanks to all co-authors for contributing to this: @amymariemason.bsky.social, @VerenaZuber, @explodecomputer, Elena, @IamYuXu, Amanda, @BarWoolf, @eliasallara, @dpsg108, and @OpeSoremekun. Feedback would be very welcome!
Critical is what we can assume is shared between populations, and what is different - are we clear what we are assuming can be borrowed? And is it reasonable to borrow that information?
October 27, 2025 at 8:43 AM
Critical is what we can assume is shared between populations, and what is different - are we clear what we are assuming can be borrowed? And is it reasonable to borrow that information?
When analysing non-European data, there is often a compromise between only including the most relevant data to the target population, and including all available data from any population - we describe some approaches to this taken in the literature.
October 27, 2025 at 8:43 AM
When analysing non-European data, there is often a compromise between only including the most relevant data to the target population, and including all available data from any population - we describe some approaches to this taken in the literature.
The green dashed arrows indicate potential mechanisms that would lead to heterogeneity and hence differences in MR estimates between populations - examples of each are given in Table 1.
October 27, 2025 at 8:43 AM
The green dashed arrows indicate potential mechanisms that would lead to heterogeneity and hence differences in MR estimates between populations - examples of each are given in Table 1.
There are many reasons why an MR estimate (or any epidemiological estimate) may differ between populations. We would opine that a true biological difference between population groups is rarely the most likely explanation for a difference.
October 27, 2025 at 8:43 AM
There are many reasons why an MR estimate (or any epidemiological estimate) may differ between populations. We would opine that a true biological difference between population groups is rarely the most likely explanation for a difference.
The target population will likely depend on the question. For environmental exposures, geographic definitions may be best. For social patterned exposures, cultural (ethnic) definitions. For genetic exposures, ancestral definitions.
October 27, 2025 at 8:43 AM
The target population will likely depend on the question. For environmental exposures, geographic definitions may be best. For social patterned exposures, cultural (ethnic) definitions. For genetic exposures, ancestral definitions.
For instance, if we say "South Asians are at elevated risk of COVID-19", do we mean individuals living in South Asia? Do we mean individuals with South Asian ancestral heritage? Or do we mean individuals following South Asian cultural practices? These populations overlap, but they are distinct.
October 27, 2025 at 8:43 AM
For instance, if we say "South Asians are at elevated risk of COVID-19", do we mean individuals living in South Asia? Do we mean individuals with South Asian ancestral heritage? Or do we mean individuals following South Asian cultural practices? These populations overlap, but they are distinct.
Step 1 is to think carefully what population our datasets represent, and what population we want our analysis to represent. There are many ways to define a population of interest.
October 27, 2025 at 8:43 AM
Step 1 is to think carefully what population our datasets represent, and what population we want our analysis to represent. There are many ways to define a population of interest.
A common question on our MR course is "How to perform Mendelian randomization with non-European data?". This manuscript is our answer to that question.
October 27, 2025 at 8:43 AM
A common question on our MR course is "How to perform Mendelian randomization with non-European data?". This manuscript is our answer to that question.
Thanks to Janne for leading this, and the team at @FinnGen_FI led by @johanneskettune for allowing us to perform bespoke analyses in their cohort!
October 7, 2025 at 8:59 AM
Thanks to Janne for leading this, and the team at @FinnGen_FI led by @johanneskettune for allowing us to perform bespoke analyses in their cohort!
Negative studies are difficult to interpret (and publish) - there are legitimate reasons why the result may not replicate in a different study population. However, we did not see encouraging evidence from our attempted replication analysis.
October 7, 2025 at 8:59 AM
Negative studies are difficult to interpret (and publish) - there are legitimate reasons why the result may not replicate in a different study population. However, we did not see encouraging evidence from our attempted replication analysis.
While we cannot rule out low power, we did not find any associations between PCSK9 variants and breast cancer survival in datasets other than the original Cell paper. In contrast, variants in the HMGCR gene were associated with breast cancer survival.
October 7, 2025 at 8:59 AM
While we cannot rule out low power, we did not find any associations between PCSK9 variants and breast cancer survival in datasets other than the original Cell paper. In contrast, variants in the HMGCR gene were associated with breast cancer survival.
For the BCAC data, we weren't able to replicate the original analysis exactly - we couldn't restrict to older women, or those with Stage 2/3 cancer, or consider a recessive model. For FinnGen, we were able to replicate the original analysis exactly - but the sample size was much lower.
October 7, 2025 at 8:59 AM
For the BCAC data, we weren't able to replicate the original analysis exactly - we couldn't restrict to older women, or those with Stage 2/3 cancer, or consider a recessive model. For FinnGen, we were able to replicate the original analysis exactly - but the sample size was much lower.