Nicola Pirastu
npirastu.bsky.social
Nicola Pirastu
@npirastu.bsky.social
It would be great if all cohorts would use this method to share their LD information as it would allow everyone to be able to run post-GWAS analyses more reliably and precisely.
The software to convert the VCFs is simple to use, fast and memory efficient.
Please reach out for comments or questions.
October 21, 2025 at 9:47 AM
Finally we show that this aproach can be used also using existing summary statistics (SAFE-LDss) as long as the traits are enough and that there is no true SNP-trait association which need to be filtered out.
This can be used to retrieve LD from Omics data sets such as expression or proteomics.
October 21, 2025 at 9:47 AM
We also show that with N traits large enough finamapping results with Susie are comparable to using the in-sample LD in terms of precision and power and perform better than a reference sample taken from the same cohort.
October 21, 2025 at 9:47 AM
We performed extensive simulations showing that the resulting correlation matrix is close to identical to the true original one.
October 21, 2025 at 9:47 AM
The idea behind SAFE-LD is simple: perform a large number of GWAS on randomly generated traits and use the correlation between the effect sizes to compute the correlation between the SNPs. After production the betas are rescaled between 0 and 2 to simulate imputed dosages and saved in a new VCF.
October 21, 2025 at 9:47 AM
The SAFE-genotypes are completely anonymous and do not allow for deidentification as they have no link with the original genotypes beyond retaining the correlation structure. In fact even MAF is lost in the process of creating them.
October 21, 2025 at 9:47 AM
To overcome this limitation we have developed SAFE-LD which takes the genotypes in VCF dosage format and coverts them in SAFE-genotypes which can be used with available softwares like plink to compute extremely precise compared to the original and produce comparable results in down stream analyses.
October 21, 2025 at 9:47 AM
Several methods such as DENTIST have been developed to try to account for this by detecting the LD mismatches and remove them.
This is, however is non-optimal as these methods are only partly successful and have the outcome of reducing the available data.
October 21, 2025 at 9:47 AM
Most people thus rely on reference genotypes trying to match as best as possible the reference samples with the ones used for the initial GWAS. This influences the results greatly especially when finamapping loci with strong effects, producing a very large number of false positives.
October 21, 2025 at 9:47 AM
As most of you know many post-GWAS analyses rely on summary statistics and LD matrices for tasks like finemapping, colocalization, PRS weight computation and others.
In sample LD is often not available as it would require either sharing of raw genotypes or of extremely large precomputed matrices.
October 21, 2025 at 9:47 AM
How does warhammer count as a pseudo religion?? I may have missed something
October 5, 2025 at 11:43 AM
So you’re saying this would be mostly driven by people moving for educational pursposes and meeting their spouse there. I guess it fits very well also with the inbreeding depression observation. Would be interesting to repeat in Italy where this is much more recent.
September 28, 2025 at 10:16 AM
Very cool. I would have expected assortative mating to be stronger in past generations given society was more structured than today.
September 26, 2025 at 6:50 PM
📌
August 22, 2025 at 11:01 PM
Is there a paper anywhere or is this a preview?
August 22, 2025 at 5:18 PM
Also it gives an idea of the sample size needed for these model to start working. I am not sure we will ever reach such numbers for many applications (at least not very soon)
August 22, 2025 at 6:28 AM
Good luck in this new adventure!
August 20, 2025 at 4:41 PM
📌
August 16, 2025 at 7:55 AM
📌
August 12, 2025 at 7:23 PM
📌
August 7, 2025 at 8:47 AM