Elizabeth Atkinson
@egatkinson.bsky.social
Population and statistical genomicist working to make genomics fully representative. Views are my own. (she/her)
So since we only include >0.1% MAF variants in this article we can't address ultrarare, but check out Supp Fig 3; when comparing ancestry-specific AFs many variants deviate from the 1:1 line. We plotted this on the log₁₀(AF) scale to help magnify the low-frequency range.
October 10, 2025 at 3:32 PM
So since we only include >0.1% MAF variants in this article we can't address ultrarare, but check out Supp Fig 3; when comparing ancestry-specific AFs many variants deviate from the 1:1 line. We plotted this on the log₁₀(AF) scale to help magnify the low-frequency range.
To limit the noise from ultra-rare alleles we only looked at variants ≥0.1% MAF. Totally appreciate that's still quite low frequency, but even with that filter, we still saw the noted ancestry-specific frequency differences.
October 10, 2025 at 3:01 PM
To limit the noise from ultra-rare alleles we only looked at variants ≥0.1% MAF. Totally appreciate that's still quite low frequency, but even with that filter, we still saw the noted ancestry-specific frequency differences.
Great point; we thought about that too! Pragati stratified by whether variants were monomorphic or not to capture at least that aspect, but you’re right that the impact depends on where a variant sits on the SFS. Rare ones can show big fold-changes but small absolute shifts.
October 10, 2025 at 2:58 PM
Great point; we thought about that too! Pragati stratified by whether variants were monomorphic or not to capture at least that aspect, but you’re right that the impact depends on where a variant sits on the SFS. Rare ones can show big fold-changes but small absolute shifts.
Thanks for the interest! The tutorial code is available to download as supplemental information of the paper, and has been deposited as a community workspace in the All of Us Researcher Workbench.
July 23, 2025 at 3:05 PM
Thanks for the interest! The tutorial code is available to download as supplemental information of the paper, and has been deposited as a community workspace in the All of Us Researcher Workbench.
In summary, we present a replicable training model that empowers early-career researchers - including and especially those new to computational genomics - to responsibly leverage large-scale biobank data into their research programs and teaching.
July 22, 2025 at 4:36 PM
In summary, we present a replicable training model that empowers early-career researchers - including and especially those new to computational genomics - to responsibly leverage large-scale biobank data into their research programs and teaching.
From years 1–3, training outcomes reported by scholars to stem directly from this training included:
📊 17 conference presentations
🔬 Multiple funded research grants
🎓 Numerous genomics modules added in undergrad courses
🤝 Sustained collaborations across institutions
📊 17 conference presentations
🔬 Multiple funded research grants
🎓 Numerous genomics modules added in undergrad courses
🤝 Sustained collaborations across institutions
July 22, 2025 at 4:36 PM
From years 1–3, training outcomes reported by scholars to stem directly from this training included:
📊 17 conference presentations
🔬 Multiple funded research grants
🎓 Numerous genomics modules added in undergrad courses
🤝 Sustained collaborations across institutions
📊 17 conference presentations
🔬 Multiple funded research grants
🎓 Numerous genomics modules added in undergrad courses
🤝 Sustained collaborations across institutions
During the summit, scholars used real short-read WGS data to:
• Prepare phenotypes & covariates
• Run GWAS via Hail
• Visualize results with PCA, Manhattan & QQ plots
• Manage compute costs
All in ~4 hours with no prior coding required.
• Prepare phenotypes & covariates
• Run GWAS via Hail
• Visualize results with PCA, Manhattan & QQ plots
• Manage compute costs
All in ~4 hours with no prior coding required.
July 22, 2025 at 4:36 PM
During the summit, scholars used real short-read WGS data to:
• Prepare phenotypes & covariates
• Run GWAS via Hail
• Visualize results with PCA, Manhattan & QQ plots
• Manage compute costs
All in ~4 hours with no prior coding required.
• Prepare phenotypes & covariates
• Run GWAS via Hail
• Visualize results with PCA, Manhattan & QQ plots
• Manage compute costs
All in ~4 hours with no prior coding required.
Our training was part of the All of Us Biomedical Researcher Scholars Program through @bcmgenetics.bsky.social focused on mentoring early-stage faculty in genomic data science. The curriculum launches with an intensive Faculty Summit, where scholars get hands-on experience working with genomic data.
July 22, 2025 at 4:36 PM
Our training was part of the All of Us Biomedical Researcher Scholars Program through @bcmgenetics.bsky.social focused on mentoring early-stage faculty in genomic data science. The curriculum launches with an intensive Faculty Summit, where scholars get hands-on experience working with genomic data.
Access to big genomic data is growing, but parallel access to skills needed to use it hasn’t kept up.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
July 22, 2025 at 4:36 PM
Access to big genomic data is growing, but parallel access to skills needed to use it hasn’t kept up.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
Tractor-Mix builds on Tractor’s strengths to detect ancestry-enriched signals while adding power and robust false-positive control for relatedness via a GRM. By modeling both admixture and relatedness, it overcomes key GWAS barriers and enables more accurate, representative genomic discovery.
June 9, 2025 at 6:31 PM
Tractor-Mix builds on Tractor’s strengths to detect ancestry-enriched signals while adding power and robust false-positive control for relatedness via a GRM. By modeling both admixture and relatedness, it overcomes key GWAS barriers and enables more accurate, representative genomic discovery.
Tractor-Mix uses ancestry-specific genotypes as predictors, outputting ancestry-specific effect sizes and P values. We benchmark our new tool in simulations and apply it to multiple admixed cohorts (including UKBiobank and Mexico City Prospective Study), uncovering signals missed by standard GWAS.
June 9, 2025 at 6:31 PM
Tractor-Mix uses ancestry-specific genotypes as predictors, outputting ancestry-specific effect sizes and P values. We benchmark our new tool in simulations and apply it to multiple admixed cohorts (including UKBiobank and Mexico City Prospective Study), uncovering signals missed by standard GWAS.
In this work, we introduce Tractor-Mix, a new GWAS method that extends Tractor to handle related admixed samples. It combines a mixed model framework (like GMMAT) with local ancestry-aware genotypes (like Tractor) in a 2 d.o.f. test.
June 9, 2025 at 6:31 PM
In this work, we introduce Tractor-Mix, a new GWAS method that extends Tractor to handle related admixed samples. It combines a mixed model framework (like GMMAT) with local ancestry-aware genotypes (like Tractor) in a 2 d.o.f. test.
As biobanks and global cohorts grow, so does the inclusion of admixed individuals with close or cryptic relatedness. This introduces the statistical challenge of two interwoven sources of stratification: admixture and relatedness, which are rarely handled together.
June 9, 2025 at 6:31 PM
As biobanks and global cohorts grow, so does the inclusion of admixed individuals with close or cryptic relatedness. This introduces the statistical challenge of two interwoven sources of stratification: admixture and relatedness, which are rarely handled together.
We previously developed Tractor, a local ancestry-aware GWAS method that’s been widely used to uncover ancestry-enriched signals and refine genetic architecture in admixed populations. But Tractor (being a GLM) only works on unrelated samples, limiting its use in many real-world datasets.
June 9, 2025 at 6:31 PM
We previously developed Tractor, a local ancestry-aware GWAS method that’s been widely used to uncover ancestry-enriched signals and refine genetic architecture in admixed populations. But Tractor (being a GLM) only works on unrelated samples, limiting its use in many real-world datasets.
👏 Huge thanks to all our amazing LAGC collaborators! Special shoutout to Estela Bruxel and Diego Rovaris for leading this crucial work, and of course @janitzamontalvo.bsky.social and @giustilab.bsky.social for co-founding the LAGC and co-leading alongside myself. 💪
April 2, 2025 at 3:24 PM
👏 Huge thanks to all our amazing LAGC collaborators! Special shoutout to Estela Bruxel and Diego Rovaris for leading this crucial work, and of course @janitzamontalvo.bsky.social and @giustilab.bsky.social for co-founding the LAGC and co-leading alongside myself. 💪
🔍 Why does this matter?
Most psychiatric GWAS are still Euro-centric, limiting the relevance of genetic findings across populations and ancestries. Latin America’s rich genetic, environmental, and cultural diversity presents a unique opportunity to refine genetic discovery & improve global research
Most psychiatric GWAS are still Euro-centric, limiting the relevance of genetic findings across populations and ancestries. Latin America’s rich genetic, environmental, and cultural diversity presents a unique opportunity to refine genetic discovery & improve global research
April 2, 2025 at 3:24 PM
🔍 Why does this matter?
Most psychiatric GWAS are still Euro-centric, limiting the relevance of genetic findings across populations and ancestries. Latin America’s rich genetic, environmental, and cultural diversity presents a unique opportunity to refine genetic discovery & improve global research
Most psychiatric GWAS are still Euro-centric, limiting the relevance of genetic findings across populations and ancestries. Latin America’s rich genetic, environmental, and cultural diversity presents a unique opportunity to refine genetic discovery & improve global research