Sergey Nurk
sergeynurk.bsky.social
Sergey Nurk
@sergeynurk.bsky.social
Principal Bionformatician
@nanopore. Ex: Postdoctoral fellow @ NIH; Researcher @ CAB. Views are my own; #StandWithUkraine Support Ukraine!
Reposted by Sergey Nurk
1/9 Just out:

k-mer indexes are the backbone of fast search in genomic data, but many degrade under small k, subsampling, or high diversity.

With Ondřej Sladký and @pavelvesely.bsky.social we asked: can we build one that works efficiently for any k-mer set?
🧮 Just out in Bioinformatics Advances: “FroM Superstring to Indexing: A space-efficient index for unconstrained k-mer sets using the Masked Burrows-Wheeler Transform (MBWT)” 

Full article available: https://doi.org/10.1093/bioadv/vbaf290 

Authors include: @pavelvesely.bsky.social, @brinda.eu
December 5, 2025 at 5:42 PM
Reposted by Sergey Nurk
This package to decompose weighted graphs into weighted paths by @alextomescu.bsky.social is going to be very useful. Can't wait to try it out in my viral metagenomic tools. 🤩🧬🖥️

#bioinformatics #graphs #graph-algorithms #flow-decomposition #integer-linear-programming

github.com/algbio/flowp...
GitHub - algbio/flowpaths: A Python package to quickly decompose weighted graphs (acyclic or not) into weighted paths or walks, under various models.
A Python package to quickly decompose weighted graphs (acyclic or not) into weighted paths or walks, under various models. - algbio/flowpaths
github.com
November 28, 2025 at 12:18 AM
Reposted by Sergey Nurk
Excited to share our latest preprint on agtools, an open-source Python framework for analysing and manipulating assembly graphs. (1/n)

www.biorxiv.org/content/10.1...

#Bioinformatics #genomics #assembly #assemblygraphs #software
agtools: a software framework to manipulate assembly graphs
Assembly graphs are a fundamental data structure used by genome and metagenome assemblers to represent sequences and their overlap information, facilitating the assembler to construct longer genomic f...
www.biorxiv.org
September 17, 2025 at 6:58 AM
Reposted by Sergey Nurk
579 high-quality human genomes from @humanpangenome.bsky.social, Arab Pangenome and individual papers (CHM13, CN1, KSA001, I002C, YAO and KOREF1). Sequences available in the AGC format (3.7GB) and FM-index in the ropebwt3 format (20.3GB). For details, see github.com/lh3/human-asm
GitHub - lh3/human-asm: A collection of high-quality human genomes
A collection of high-quality human genomes. Contribute to lh3/human-asm development by creating an account on GitHub.
github.com
December 3, 2025 at 3:44 AM
Reposted by Sergey Nurk
Fantastic talk by @vikramshivakumar.bsky.social Mumemto—Scalable multi-MUM finding for pangenomes
Papers biorxiv.org/content/10.1101/2025.05.20.654611 & doi.org/10.1186/s13059-025-03644-0
Code: github.com/vikshiv/mume...
Very efficient pangenome visualization tool, revealing synteny and variations!
November 6, 2025 at 1:13 AM
Reposted by Sergey Nurk
Thread on #GI2025 's second day! 👇🏻
Second day of Genome Informatics #GI2025 began with the session “Genome Assembly and Sequence Algorithms" Yun William Yu presented “Average-case Analysis of Seed-Chain-Extend under Random Mutations"
genome.cshlp.org/content/33/7/1175
providing theoretical guarantees for the popular seed-chain-extend
November 6, 2025 at 5:53 PM
Reposted by Sergey Nurk
🚀 Looking for talented PhD students!
Join us in 🇸🇬 Singapore for 1-2 years to push the frontiers of AI for Genomics.
Work on:
🧬 Cancer genome reconstruction
🧫 Cancer genome & cell foundation models
💊 RNA drug & mRNA therapeutic design

#AI #Genomics #PhD
1/5
November 4, 2025 at 7:32 AM
Reposted by Sergey Nurk
Following ish's `filter` and bqtools' `grep`, Sassy now also has initial support for grep and filter!

Grep mode shows all matches, grouped per record, and is meant for human consumption.
Filter mode prints full matching (or non-matching) records to stdout or output files.
October 30, 2025 at 11:46 PM
Reposted by Sergey Nurk
Very excited about Movi 2! Excellent work by Mohsen here. FYI, I have a series of 5 videos on the move structure starting with this one: youtu.be/REniD2dKf6A?...
October 21, 2025 at 9:39 PM
Reposted by Sergey Nurk
ASHG Plenary Session starting with the awards ceremony honoring Eric Green with the Leadership Award of @geneticssociety.bsky.social reflecting on his career in human genetics & genomics leading the Human Genome Project & the NHGRI and the leadership principles he has learned throughout #ASHG25
October 16, 2025 at 8:47 PM
Reposted by Sergey Nurk
The T2T zebra finch genome has hatched! 🐣 🧬 @vertebrategenomes.bsky.social
October 15, 2025 at 1:08 PM
Reposted by Sergey Nurk
I am hiring! - looking for a Staff Scientist to co-run my research group with me. Staff Scientist is a senior professional scientist role at EMBL. Please forward to people you might know who could be interested! embl.wd103.myworkdayjobs.com/en-US/EMBL/j...
Staff Scientist
About EMBL-EBI EMBL’s European Bioinformatics Institute is a data powerhouse, utilised on a global scale to advance scientific discovery through bioinformatics and solutions to some of the world’s mos...
embl.wd103.myworkdayjobs.com
October 10, 2025 at 7:30 AM
Reposted by Sergey Nurk
The Metagraph paper is out in Nature; it showed up in my feeds today! Congratulations to Mikhail Karasikov, @gxxxr.bsky.social, @akkah21.bsky.social and all of the other authors (whom I'd love to follow on Bluesky if I can find you ;P) www.nature.com/articles/s41...
Efficient and accurate search in petabase-scale sequence repositories - Nature
MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.
www.nature.com
October 9, 2025 at 2:40 PM
Reposted by Sergey Nurk
I've added 7 videos to my Burrows-Wheeler indexing playlist (www.youtube.com/playlist?lis...), rounding out the r-index series and adding a 5-part series on the move structure. Now 27 videos in that playlist. I aim to add videos on prefix-free parsing, PBWT, Wheeler languages/automata in the future.
Burrows-Wheeler Indexing - YouTube
Videos on : (a) the Burrows-Wheeler Transform (BWT), (b) the FM Index, which uses the BWT to construct a full-text index, (c) Wheeler graphs, (d) r-index, an...
www.youtube.com
October 7, 2025 at 2:17 PM
Reposted by Sergey Nurk
🦒Long read giraffe is out!🦒
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2
Rapid, accurate long- and short-read mapping to large pangenome graphs with vg Giraffe https://www.biorxiv.org/content/10.1101/2025.09.29.678807v1
October 2, 2025 at 6:28 AM
Reposted by Sergey Nurk
Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/o.... Work with Alvin Qin
September 30, 2025 at 2:19 AM
Reposted by Sergey Nurk
Delighted to see our paper studying the evolution of plasmids over the last 100 years, now out! Years of work by Adrian Cazares, also Nick Thomson @sangerinstitute.bsky.social - this version much improved over the preprint. Final version should be open access, apols.
Thread 1/n
September 25, 2025 at 9:29 PM
Reposted by Sergey Nurk
Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]
A complete diploid human genome benchmark for personalized genomics
Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...
www.biorxiv.org
September 22, 2025 at 5:01 PM
Reposted by Sergey Nurk
Excited to share our EvANI benchmarking workflow, published in Briefings in Bioinformatics doi.org/10.1093/bib/...
Computing average nucleotide identity (ANI) is neither conceptually nor computationally trivial. Its definition has evolved over years, with different meanings and assumptions (1/5)
September 21, 2025 at 3:26 PM
Reposted by Sergey Nurk
MMseqs2-GPU sets new standards in single query search speed, allows near instant search of big databases, scales to multiple GPUs and is fast beyond VRAM. It enables ColabFold MSA generation in seconds and sub-second Foldseek search against AFDB50. 1/n
📄 www.nature.com/articles/s41...
💿 mmseqs.com
GPU-accelerated homology search with MMseqs2 - Nature Methods
Graphics processing unit-accelerated MMseqs2 offers tremendous speedups for homology retrieval from metagenomic databases, query-centered multiple sequence alignment generation for structure predictio...
www.nature.com
September 21, 2025 at 8:06 AM
Reposted by Sergey Nurk
Now preprinted at arxiv.org/abs/2509.07357
September 10, 2025 at 2:10 AM
Reposted by Sergey Nurk
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 7, 2025 at 11:35 PM
Reposted by Sergey Nurk
For anyone who has used pling for comparing plasmids using rearrangement distances ("how many structural events apart are these plasmids"), here's how to tweak parameters, and integrate it with typing info, and the host phylogeny
www.biorxiv.org/content/10.1...
github.com/iqbal-lab-or...
Clustering of plasmid genomes for genomic epidemiology by using rearrangement distances, with pling
Integration of plasmids into genomic epidemiology is challenging, because there are no clearly defined evolving-units (equivalent to species), and because plasmids appear to evolve as much by structur...
www.biorxiv.org
September 7, 2025 at 2:56 PM
Reposted by Sergey Nurk
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 7, 2025 at 4:47 AM