April Wei
@aprilwei.bsky.social
Population geneticist, CompBio, Cornell
Excited to preprint our latest work (w/ Drew DeHaas, Zhibai Jia, Leo Speidel) on using ARGs for demographic inference. w/ applications using data from 1000 Genomes Project. www.biorxiv.org/content/10.1...
Inference of complex demographic history using composite likelihood based on whole-genome genealogies
Accurate parametric inference on complex demographic models is a continuing challenge in population genetics. Ancestral recombination graphs (ARGs) provide richer information than simple population ge...
www.biorxiv.org
October 8, 2025 at 2:48 PM
Excited to preprint our latest work (w/ Drew DeHaas, Zhibai Jia, Leo Speidel) on using ARGs for demographic inference. w/ applications using data from 1000 Genomes Project. www.biorxiv.org/content/10.1...
Very proud of this manuscript with two talented undergraduate students, Aditya Syam and Chris Adonizio. We are continuing to push towards more scalable statistical genetics with Genotype Representation Graphs, and this is the start. www.biorxiv.org/content/10.1...
Fast Phenotype Simulation for Genotype Representation Graphs
Motivation The Genotype Representation Graph (GRG) [[DeHaas et al., 2025][1]] is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased wh...
www.biorxiv.org
August 25, 2025 at 2:26 PM
Very proud of this manuscript with two talented undergraduate students, Aditya Syam and Chris Adonizio. We are continuing to push towards more scalable statistical genetics with Genotype Representation Graphs, and this is the start. www.biorxiv.org/content/10.1...
Our work (by Drew DeHaas) on an extremely simple yet efficient binary genotype format - designed to facilitate scalable bioinformatics tool development. www.biorxiv.org/content/10.1...
IGD: A simple, efficient genotype data format
Motivation While there are a variety of file formats for storing reference-sequence-aligned genotype data, many are complex or inefficient. Programming language support for such formats is often limit...
www.biorxiv.org
February 12, 2025 at 1:03 AM
Our work (by Drew DeHaas) on an extremely simple yet efficient binary genotype format - designed to facilitate scalable bioinformatics tool development. www.biorxiv.org/content/10.1...
Reposted by April Wei
📢In a recent News & Views, @ryanlayer.bsky.social discusses a data structure introduced by @aprilwei.bsky.social and colleagues for reducing storage and computational costs for phased whole-genome polymorphisms. www.nature.com/articles/s43...
🔓https://rdcu.be/d8ay3
🔓https://rdcu.be/d8ay3
Biologically inspired graphs to explore massive genetic datasets - Nature Computational Science
A recent study proposes a data structure that addresses crucial challenges related to storage and computation of large genome databases.
www.nature.com
January 31, 2025 at 1:56 PM
📢In a recent News & Views, @ryanlayer.bsky.social discusses a data structure introduced by @aprilwei.bsky.social and colleagues for reducing storage and computational costs for phased whole-genome polymorphisms. www.nature.com/articles/s43...
🔓https://rdcu.be/d8ay3
🔓https://rdcu.be/d8ay3
Our work w/ two co-first authors Drew DeHaas and Ziqing Pan is now published. GRG allows large amounts of WGS polymorphism data to be analyzed in RAM via graph traversal & algebra operations & has some intrinsic connection w/ popgen data generating process & is different from ARG
December 5, 2024 at 5:09 PM
Our work w/ two co-first authors Drew DeHaas and Ziqing Pan is now published. GRG allows large amounts of WGS polymorphism data to be analyzed in RAM via graph traversal & algebra operations & has some intrinsic connection w/ popgen data generating process & is different from ARG
We introduced an ARG-inspired data structure, Genotype Representation Graph (GRG), to enable lossless data compression and efficient computation through graph traversal. Developed a fast inference method. Cost ~80 GBP to convert 350TB VCF (200,000 UKBiobank WGS) into 160 GB GRG.
t.co/0badfCYz47
t.co/0badfCYz47
Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
t.co
April 29, 2024 at 6:20 PM
We introduced an ARG-inspired data structure, Genotype Representation Graph (GRG), to enable lossless data compression and efficient computation through graph traversal. Developed a fast inference method. Cost ~80 GBP to convert 350TB VCF (200,000 UKBiobank WGS) into 160 GB GRG.
t.co/0badfCYz47
t.co/0badfCYz47