Guillaume Holley
banner
guillaumeolesan.bsky.social
Guillaume Holley
@guillaumeolesan.bsky.social
Research Scientist working on pangenomes and long reads at deCODE Genetics. Opinions shared here do not reflect the views of deCODE.

"There is no peace amongst the stars, for in the grim of darkness of the far future, there is only war" - W40K
Reposted by Guillaume Holley
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 9:28 PM
Reposted by Guillaume Holley
1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes
Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi
github.com
October 21, 2025 at 8:00 PM
Reposted by Guillaume Holley
I'm excited to share our pre-print about a new variant benchmarking tool we've been working on for the past few months!

Aardvark: Sifting through differences in a mound of variants
GitHub: github.com/PacificBiosc...

Some highlights in this thread:
1/N
October 6, 2025 at 8:07 PM
Reposted by Guillaume Holley
🦒Long read giraffe is out!🦒
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2
Rapid, accurate long- and short-read mapping to large pangenome graphs with vg Giraffe https://www.biorxiv.org/content/10.1101/2025.09.29.678807v1
October 2, 2025 at 6:28 AM
Reposted by Guillaume Holley
Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]
A complete diploid human genome benchmark for personalized genomics
Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...
www.biorxiv.org
September 22, 2025 at 5:01 PM
Reposted by Guillaume Holley
colorSV: Long-range Somatic Structural Variation Calling from Matched Tumor-normal Co-assembly Graphs. #SomaticStructuralVariants #SV #CoassemblyGraphs #Bioinformatics #Genomics #GenomicsProteomicsBioinformatics
academic.oup.com/gpb/advance-...
September 23, 2025 at 9:15 AM
Reposted by Guillaume Holley
In silico discovery of pathogenic PD-L1 nsSNVs with altered glycosylation and immunotherapy binding https://www.biorxiv.org/content/10.1101/2025.06.17.660108v1
June 19, 2025 at 4:47 PM
Reposted by Guillaume Holley
REINDEER2: practical abundance index at scale https://www.biorxiv.org/content/10.1101/2025.06.16.659990v1
June 17, 2025 at 1:46 PM
Reposted by Guillaume Holley
Congrats to @dantipov.bsky.social et al. on the publication of Verkko2! The team put a ton of work into this making it the first assembler that deals with the complexity of human acrocentric chromosomes. Lots of interesting discoveries to come! genome.cshlp.org/content/earl...
June 17, 2025 at 1:39 PM
Reposted by Guillaume Holley
June 6, 2025 at 10:48 PM
Reposted by Guillaume Holley
📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8
May 27, 2025 at 12:06 PM
Reposted by Guillaume Holley
New preprint: we used k-mer matching with suffix match length information to create an assembly-to-assembly alignment algorithm + software, kbo.

We wanted to create a reference-based aligner and variant caller that scales to at least 10-100k bacterial queries.

www.biorxiv.org/content/10.1...
Sequence alignment with k-bounded matching statistics
Finding high-quality local alignments between a query sequence and sequences contained in a large genomic database is a fundamental problem in computational genomics, at the core of thousands of biolo...
www.biorxiv.org
May 26, 2025 at 8:00 AM
Reposted by Guillaume Holley
Delighted to see this paper from danderson123.bsky.social 's PhD out. We have been building tools for AMR gene detection for over a decade now, but multicopy genes remain challenging. Dan shows that with a gene-space de Bruijn graph and long reads, you can do well
www.biorxiv.org/content/10.1...
May 19, 2025 at 9:28 AM
Reposted by Guillaume Holley
📢 HPRC Release 2 is here!

Now with phased genomes from 200+ individuals, a 5x increase from Release 1.

Explore sequencing data, assemblies, annotations & alignments in our interactive data explorer ⬇️:

humanpangenome.org/hprc-data-re...
May 12, 2025 at 1:15 PM
Reposted by Guillaume Holley
Reposted by Guillaume Holley
Not only is this seriously elegant science from @gregfindlay.bsky.social, @nickywhiffin.bsky.social and friends - using saturation editing to define variant impact in RNU4-2 - it also defines *another* new syndrome associated with this fascinating non-coding RNA gene.
April 11, 2025 at 11:13 AM
Reposted by Guillaume Holley
Finally got around to fixing the main limitation of the current vcfdist release: exploding memory usage and runtime in regions with high-density variants. A new `--max-supercluster-size` parameter limits this. Release v2.6.0 is out on [Github](github.com/timd1/vcfdist), DockerHub, and bioconda!
GitHub - TimD1/vcfdist: vcfdist: Accurately benchmarking phased variant calls
vcfdist: Accurately benchmarking phased variant calls - TimD1/vcfdist
github.com
April 6, 2025 at 3:21 PM
Reposted by Guillaume Holley
A milestone for our lab! Here's a full access link: rdcu.be/egmYb
April 5, 2025 at 4:40 PM
Reposted by Guillaume Holley
The Genetics research community has a problem. Most recent articles do not consider #splicing/isoforms.

Here, we analyze how important this opportunity gap is - and spoiler warning - we find it is essential for both analysis of common and rare variants

More info👇

www.medrxiv.org/content/10.1...
Beyond the Gene in Genetics: How Isoform-Resolved Analysis Empowers the Study of Both Common and Rare Genetic Variation
Genetics is rapidly deepening our understanding of human health and disease by investigating common and rare genetic variants and their influence on gene expression1,2. Alternative splicing is a molec...
www.medrxiv.org
April 2, 2025 at 7:45 AM
Reposted by Guillaume Holley
Congratulations to @imartayan.bsky.social and @curiouscoding.nl whose paper on fast minimizer computation with simd has been accepted to SEA 2025 🙌🏻 www.biorxiv.org/content/10.1...
SimdMinimizers: Computing random minimizers, fast
Motivation Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller ...
www.biorxiv.org
April 1, 2025 at 8:23 AM
Reposted by Guillaume Holley
"Our results reveal substantial differences between pipelines, with many inversions either misrepresented or lost. Most notably, recovery rates remain strikingly low, even with the most simple simulated genome sets, highlighting major challenges in analyzing inversions in pangenomic approaches."
Investigating the topological motifs of inversions in pangenome graphs https://www.biorxiv.org/content/10.1101/2025.03.14.643331v1
March 18, 2025 at 4:53 PM
Reposted by Guillaume Holley
Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants https://www.biorxiv.org/content/10.1101/2025.02.11.637762v1 🧬🖥️🧪 https://github.com/harvardinformatics/scrub-jay-genomics
February 14, 2025 at 4:30 PM
Reposted by Guillaume Holley
Pangenome graph augmentation from unassembled long reads https://www.biorxiv.org/content/10.1101/2025.02.07.637057v1
February 9, 2025 at 2:50 AM
Reposted by Guillaume Holley
Our study reveals that the MLLT3 gene, crucial for maintaining the self-renewal of some bloom stem cells, also produces a truncated version of its protein via an alternative process. REINDEER was the indexing technique behind the discovery, more here: onlinelibrary.wiley.com/doi/10.1002/... 2/2
A strong internal promoter drives massive expression of YEATS‐domain devoid MLLT3 transcripts in HSC and most lethal AML
Click on the article title to read more.
onlinelibrary.wiley.com
February 10, 2025 at 3:22 PM