Bede Constantinides
banner
bedec.bsky.social
Bede Constantinides
@bedec.bsky.social
Interested in infectious disease informatics. Research fellow at the University of Birmingham. Also cycling, photography, active travel. https://bede.im
Pinned
New preprint! Deacon is a versatile tool for filtering FASTA/FASTQ files and streams at hundreds of megabases per second using minimizers, built with rapid metagenomic host depletion in mind, but equally useful for search.
github.com/bede/deacon
Deacon: fast sequence filtering and contaminant depletion https://www.biorxiv.org/content/10.1101/2025.06.09.658732v1
Reposted by Bede Constantinides
🔸️Early data suggests we could be in for a worse than normal flu season, brought on by a cluster of escape mutations in H3N2 this year that may lift the Re from 1.2 to 1.4. We are starting to see an uptick in the US. Data from BIOFIRE.
November 11, 2025 at 11:52 PM
Reposted by Bede Constantinides
New post from me, for UK folks only, on how you need to start preparing for Apple to switch off Advanced Data Protection and the end-to-end encryption of the data you store on it. Like I said, UK only. #SunlitUplands
heatherburns.tech/2025/11/10/t...
Time to start de-Appling – Hi, I'm Heather Burns
heatherburns.tech
November 10, 2025 at 1:18 PM
Reposted by Bede Constantinides
As expected, unfortunately.

If ever you needed a reason for never using GISAID ever again (as a data producer or data user - we're both), look no further.

Time to move on to more trusted and transparent solutions.
October 31, 2025 at 1:41 AM
Reposted by Bede Constantinides
Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...
Genome size estimation from long read overlaps
AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin
academic.oup.com
November 7, 2025 at 3:19 AM
Reposted by Bede Constantinides
“my brain is open” users.monash.edu/~normd/docum...
November 2, 2025 at 2:17 PM
Reposted by Bede Constantinides
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 9:28 PM
Reposted by Bede Constantinides
RIFs at CDC.

Destroying the Epidemic Intelligence Service, NCIRD, NCIPC & a dozen other areas/divisions/branches will cause enormous harm and suffering to America, and indeed to the whole world.

Call your Reps about it.

Then call them again.
October 11, 2025 at 4:09 PM
Reposted by Bede Constantinides
Our recent paper on rifampicin resistant subpopulations in M. tuberculosis (M. tb) has been published at JAC-antimicrobial resistance.

I am really happy to see this work published just hours before submitting my DPhil thesis! 🔗👇
doi.org/10.1093/jaca...
Subpopulations in clinical samples of M. tuberculosis can give rise to rifampicin resistance and shed light on how resistance is acquired
AbstractObjectives. WGS has become a key tool for diagnosing Mycobacterium tuberculosis infections, but discrepancies between genotypic and phenotypic drug
doi.org
October 13, 2025 at 4:40 PM
Reposted by Bede Constantinides
For more information about the Friday night massacre at CDC, I wrote up an analysis of who got terminated and what that means for public health.

Grateful to @saveamericamvmt.bsky.social for supporting and amplifying. We are in really terrible trouble.

rasmussenretorts.substack.com/p/the-death-...
October 11, 2025 at 4:42 PM
Reposted by Bede Constantinides
So what's the equivalent of `perf record && perf report` on a MacBook?

I want to see the generated assembly and which lines are hot.
October 11, 2025 at 1:48 PM
Reposted by Bede Constantinides
Last week we were in the Washington Post for our characterization of Robertsonian chromosomes. This week we are entering our 10th day of being shut down and all of our research is on hold. To help me feel not-so-bad, here is a thread of some studies we released right before the shutdown 🧵 [1/n]...
October 10, 2025 at 3:24 PM
Reposted by Bede Constantinides
Funny story, though, we found this gene in NCBI databases, but it was annotated in Streptococcus pneumoniae! This is surely human contamination in a bacterial strep sample that was not properly filtered. Lesson: use CHM13, or better yet a pangenome, when filtering for human contamination...
October 10, 2025 at 3:25 PM
Reposted by Bede Constantinides
Just published an interactive article about a magical algorithm known as the Burrows-Wheeler Transform, which powers sequence alignment tools like bowtie and bwa: sandbox.bio/concepts/bwt

It's also notoriously unintuitive so I'm hoping this article helps you build that intuition.
October 9, 2025 at 5:05 PM
Reposted by Bede Constantinides
I've added 7 videos to my Burrows-Wheeler indexing playlist (www.youtube.com/playlist?lis...), rounding out the r-index series and adding a 5-part series on the move structure. Now 27 videos in that playlist. I aim to add videos on prefix-free parsing, PBWT, Wheeler languages/automata in the future.
Burrows-Wheeler Indexing - YouTube
Videos on : (a) the Burrows-Wheeler Transform (BWT), (b) the FM Index, which uses the BWT to construct a full-text index, (c) Wheeler graphs, (d) r-index, an...
www.youtube.com
October 7, 2025 at 2:17 PM
Deacon 0.11.0:
- Local server mode
- Ultra-careful handling of non-ACGT
- Faster indexing & index loading
- Denser index now stores k-mers not hashes
- xxHash & FxHash replaced with rapidhash::fast
- Bug fixes

Thanks @curiouscoding.nl (and others!) for contributions
github.com/bede/deacon/...
Release 0.11.0 · bede/deacon
Major release incorporating new features, fixes and peformance optimisations. Includes many PRs from @RagnarGrootKoerkamp, taking advantage of new features in simd-minimizers, packed-seq and parase...
github.com
October 7, 2025 at 5:00 PM
"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...
October 6, 2025 at 8:58 PM
Reposted by Bede Constantinides
Looking for people to test the latest version of simd-sketch.

It's now 2x as fast at sketching, and supports skipping over kmers containing N and other ambiguous bases (which is only ~35% slower).

'cargo install simd-sketch' is right there under your fingertips ;)

github.com/RagnarGrootK...
GitHub - RagnarGrootKoerkamp/simd-sketch: Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate.
Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate. - RagnarGrootKoerkamp/simd-sketch
github.com
October 1, 2025 at 2:38 PM
Reposted by Bede Constantinides
FxHashSet::<u32>::contains throughput is wild!

- Up to 4x slowdown for negative queries due to probing.
- Positive queries are fast for small tables, but slow in RAM because they need 2 cache misses.

Lots of variance depending on the load factor, ie whether n is close to 87.5% of a power of 2.
September 28, 2025 at 11:19 PM
Reposted by Bede Constantinides
Pleased to see this pre-printed, highlighting the completeness/accuracy of @nanoporetech.com long-read genome assembly for clinical Enterobacterales: www.biorxiv.org/content/10.1...

Thanks to colleagues @modmedmicro.bsky.social, @ukhsa.bsky.social, @genewiz.bsky.social and @oxfordbrc.bsky.social!
September 25, 2025 at 8:48 AM
Reposted by Bede Constantinides
Terrific new feature presented by @theo.io on @pathoplexus.org called SeqSets for generating DOIs for sequence subsets used in publications, that can then be tracked for impact via CrossRef that will allow data generators to track impact! #IMMEMXiV
September 19, 2025 at 11:10 AM
Front page of Hacker News 🫨
bsky.app/profile/bede...
September 15, 2025 at 1:11 PM
Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...
September 12, 2025 at 3:07 PM
First rate BBC journalism that somehow didn't make the cut for #r4today www.bbc.co.uk/news/article...
@andyverity.bsky.social
Anti-Islamic US biker gang members run security at deadly Gaza aid sites
BBC identifies members of Infidels MC gang hired as armed security at US and Israel-backed aid sites.
www.bbc.co.uk
September 10, 2025 at 2:20 PM
Reposted by Bede Constantinides
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:12 AM
Zstandard's --long range mode works wonders for assemblies, but needs uninterrupted single line sequences.

*AllTheBacteria 661k, multiline fasta*
gzip (pigz): 751GB
zstandard --long: 641GB (30% original size)

*Single line fasta*
gzip (pigz): 700GB
zstandard --long: 232GB (10% original size)
September 9, 2025 at 10:27 AM