Bede Constantinides
banner
bedec.bsky.social
Bede Constantinides
@bedec.bsky.social
Interested in infectious disease informatics. Research fellow at the University of Birmingham with @articnetwork.bsky.social. Also cycling, photography, active travel. https://bede.im
Pinned
New preprint! Deacon is a versatile tool for filtering FASTA/FASTQ files and streams at hundreds of megabases per second using minimizers, built with rapid metagenomic host depletion in mind, but equally useful for search.
github.com/bede/deacon
Deacon: fast sequence filtering and contaminant depletion https://www.biorxiv.org/content/10.1101/2025.06.09.658732v1
Reposted by Bede Constantinides
@wytamma.bsky.social : so, it took a little bit of extra time (not the flight back from the CZI meeting), but I decided to just f#&$ing do it, and the basic code to build and parse with the auxiliary fastq index is working (github.com/COMBINE-lab/...). 1/2
GitHub - COMBINE-lab/mim: A small, auxiliary index to massively improve parallel fastq parsing
A small, auxiliary index to massively improve parallel fastq parsing - COMBINE-lab/mim
github.com
November 19, 2025 at 3:01 AM
Reposted by Bede Constantinides
New preprint: we looked into production of the bacterial toxin colibactin and found that MDR E. coli from the global north have co-evolved with endemic colibactin producers, acquiring colibactin resistance genes before undergoing clonal expansions.

www.biorxiv.org/content/10.1...
Co-evolution between colibactin production and resistance is linked to clonal expansions in Escherichia coli
Specific strains of Escherichia coli employ the polyketide synthase island to produce a metabolite called colibactin that is implicated in colorectal tumorigenesis via its genotoxic effect on human DN...
www.biorxiv.org
November 18, 2025 at 6:41 AM
Reposted by Bede Constantinides
I want to spell this out in case the implications aren't clear:

This means all public tools/webapps of GISAID data (all the ones you've been used to seeing thru the pandemic, as far as we can tell) are prohibited.

The file allowed this. Cut that - cut off all tools the public & others were using.
On Oct 1, 2025, GISAID informed us that they had ended updates to the flat file of SARS-CoV-2 genomic sequences and associated metadata that we had used to update Nextstrain analyses since Feb 2020. GISAID's stated rationale was that their "resources are limited". 1/5
November 7, 2025 at 2:41 PM
My account's upload and bulk download access were terminated permanently in 2021 without explanation after I published *checksums* of GISAID genomes. GISAID and its SAB have since ignored a dozen emails seeking explanation.

4 yrs on, even Nextstrain has lost access. GISAID has rotted from its core.
On Oct 1, 2025, GISAID informed us that they had ended updates to the flat file of SARS-CoV-2 genomic sequences and associated metadata that we had used to update Nextstrain analyses since Feb 2020. GISAID's stated rationale was that their "resources are limited". 1/5
November 17, 2025 at 1:32 PM
Reposted by Bede Constantinides
I was on Last Word, the Radio 4 obituary programme, trying to sum up Jim Watson’s near-century long life.
Last Word - James Watson, Pauline Collins, Judith Vidal-Hall, Dugald Ross - BBC Sounds
Matthew Bannister on a scientist, an actor, a journalist and a fossil hunter.
www.bbc.co.uk
November 15, 2025 at 12:48 PM
Reposted by Bede Constantinides
Long term, good software requires lazy users: complain and file issues as soon as things don't work first try.

If nothing else, it means documentation should be improved.
November 15, 2025 at 1:33 AM
Reposted by Bede Constantinides
🔸️Early data suggests we could be in for a worse than normal flu season, brought on by a cluster of escape mutations in H3N2 this year that may lift the Re from 1.2 to 1.4. We are starting to see an uptick in the US. Data from BIOFIRE.
November 11, 2025 at 11:52 PM
Reposted by Bede Constantinides
New post from me, for UK folks only, on how you need to start preparing for Apple to switch off Advanced Data Protection and the end-to-end encryption of the data you store on it. Like I said, UK only. #SunlitUplands
heatherburns.tech/2025/11/10/t...
Time to start de-Appling – Hi, I'm Heather Burns
heatherburns.tech
November 10, 2025 at 1:18 PM
Reposted by Bede Constantinides
As expected, unfortunately.

If ever you needed a reason for never using GISAID ever again (as a data producer or data user - we're both), look no further.

Time to move on to more trusted and transparent solutions.
October 31, 2025 at 1:41 AM
Reposted by Bede Constantinides
Our method for genome size estimation from long-read overlaps is now published 🥳
academic.oup.com/bioinformati...
Genome size estimation from long read overlaps
AbstractMotivation. Accurate genome size estimation is an important component of genomic analyses such as assembly and coverage calculation, though existin
academic.oup.com
November 7, 2025 at 3:19 AM
Reposted by Bede Constantinides
“my brain is open” users.monash.edu/~normd/docum...
November 2, 2025 at 2:17 PM
Reposted by Bede Constantinides
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 9:28 PM
Reposted by Bede Constantinides
RIFs at CDC.

Destroying the Epidemic Intelligence Service, NCIRD, NCIPC & a dozen other areas/divisions/branches will cause enormous harm and suffering to America, and indeed to the whole world.

Call your Reps about it.

Then call them again.
October 11, 2025 at 4:09 PM
Reposted by Bede Constantinides
Our recent paper on rifampicin resistant subpopulations in M. tuberculosis (M. tb) has been published at JAC-antimicrobial resistance.

I am really happy to see this work published just hours before submitting my DPhil thesis! 🔗👇
doi.org/10.1093/jaca...
Subpopulations in clinical samples of M. tuberculosis can give rise to rifampicin resistance and shed light on how resistance is acquired
AbstractObjectives. WGS has become a key tool for diagnosing Mycobacterium tuberculosis infections, but discrepancies between genotypic and phenotypic drug
doi.org
October 13, 2025 at 4:40 PM
Reposted by Bede Constantinides
For more information about the Friday night massacre at CDC, I wrote up an analysis of who got terminated and what that means for public health.

Grateful to @saveamericamvmt.bsky.social for supporting and amplifying. We are in really terrible trouble.

rasmussenretorts.substack.com/p/the-death-...
October 11, 2025 at 4:42 PM
Reposted by Bede Constantinides
So what's the equivalent of `perf record && perf report` on a MacBook?

I want to see the generated assembly and which lines are hot.
October 11, 2025 at 1:48 PM
Reposted by Bede Constantinides
Last week we were in the Washington Post for our characterization of Robertsonian chromosomes. This week we are entering our 10th day of being shut down and all of our research is on hold. To help me feel not-so-bad, here is a thread of some studies we released right before the shutdown 🧵 [1/n]...
October 10, 2025 at 3:24 PM
Reposted by Bede Constantinides
Funny story, though, we found this gene in NCBI databases, but it was annotated in Streptococcus pneumoniae! This is surely human contamination in a bacterial strep sample that was not properly filtered. Lesson: use CHM13, or better yet a pangenome, when filtering for human contamination...
October 10, 2025 at 3:25 PM
Reposted by Bede Constantinides
Just published an interactive article about a magical algorithm known as the Burrows-Wheeler Transform, which powers sequence alignment tools like bowtie and bwa: sandbox.bio/concepts/bwt

It's also notoriously unintuitive so I'm hoping this article helps you build that intuition.
October 9, 2025 at 5:05 PM
Reposted by Bede Constantinides
I've added 7 videos to my Burrows-Wheeler indexing playlist (www.youtube.com/playlist?lis...), rounding out the r-index series and adding a 5-part series on the move structure. Now 27 videos in that playlist. I aim to add videos on prefix-free parsing, PBWT, Wheeler languages/automata in the future.
Burrows-Wheeler Indexing - YouTube
Videos on : (a) the Burrows-Wheeler Transform (BWT), (b) the FM Index, which uses the BWT to construct a full-text index, (c) Wheeler graphs, (d) r-index, an...
www.youtube.com
October 7, 2025 at 2:17 PM
Deacon 0.11.0:
- Local server mode
- Ultra-careful handling of non-ACGT
- Faster indexing & index loading
- Denser index now stores k-mers not hashes
- xxHash & FxHash replaced with rapidhash::fast
- Bug fixes

Thanks @curiouscoding.nl (and others!) for contributions
github.com/bede/deacon/...
Release 0.11.0 · bede/deacon
Major release incorporating new features, fixes and peformance optimisations. Includes many PRs from @RagnarGrootKoerkamp, taking advantage of new features in simd-minimizers, packed-seq and parase...
github.com
October 7, 2025 at 5:00 PM
"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...
October 6, 2025 at 8:58 PM
Reposted by Bede Constantinides
Looking for people to test the latest version of simd-sketch.

It's now 2x as fast at sketching, and supports skipping over kmers containing N and other ambiguous bases (which is only ~35% slower).

'cargo install simd-sketch' is right there under your fingertips ;)

github.com/RagnarGrootK...
GitHub - RagnarGrootKoerkamp/simd-sketch: Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate.
Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate. - RagnarGrootKoerkamp/simd-sketch
github.com
October 1, 2025 at 2:38 PM
Reposted by Bede Constantinides
FxHashSet::<u32>::contains throughput is wild!

- Up to 4x slowdown for negative queries due to probing.
- Positive queries are fast for small tables, but slow in RAM because they need 2 cache misses.

Lots of variance depending on the load factor, ie whether n is close to 87.5% of a power of 2.
September 28, 2025 at 11:19 PM
Reposted by Bede Constantinides
Pleased to see this pre-printed, highlighting the completeness/accuracy of @nanoporetech.com long-read genome assembly for clinical Enterobacterales: www.biorxiv.org/content/10.1...

Thanks to colleagues @modmedmicro.bsky.social, @ukhsa.bsky.social, @genewiz.bsky.social and @oxfordbrc.bsky.social!
September 25, 2025 at 8:48 AM