Noam Teyssier
banner
noamteyssier.bsky.social
Noam Teyssier
@noamteyssier.bsky.social
Bioinformatics Scientist at the Arc Institute.

Working at the intersection of functional genomics, systems biology, and machine learning. I also build rusty bioinformatics tools

https://github.com/noamteyssier
The pattern counting is something I'm especially stoked about. I was actually very surprised to see that this feature isn't more common on grep-like tools (outside of bioinformatics as well).

I've had this problem for years and I end up writing bespoke tools that do some variation of it.
November 7, 2025 at 1:12 AM
And stay on the look out the next couple weeks (hopefully) for the release of an even bigger project built with binseq!
October 29, 2025 at 8:41 PM
And if you're interested in building with binseq here is the place to start!

github.com/arcinstitute...
GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data
A high efficiency binary format for sequencing data - ArcInstitute/binseq
github.com
October 29, 2025 at 8:41 PM
I've also added some nice functionality to bqtools including a very useful colored grep!

github.com/arcinstitute...
GitHub - ArcInstitute/bqtools: A command line utilty for working with BINSEQ files
A command line utilty for working with BINSEQ files - ArcInstitute/bqtools
github.com
October 29, 2025 at 8:41 PM
Have you tried samply?
October 13, 2025 at 9:20 PM
The workspace publishing has been such a hassle. So glad to see this out
September 18, 2025 at 2:40 PM
Sounds great! Would be very interested in that and happy to help build one
September 17, 2025 at 2:18 PM
bsky.app/profile/noam...

Here was a benchmark I ran a while back comparing twobit and binseq on a single-thread
Ah yes 2bit was a big inspiration for binseq - I didn't include it because it wasn't widely used and it was more geared towards large genomes so I figured it wouldn't scale.

But you're right I didn't formally test it. Here's a simple bench with Kent's utils (1-core bqtools to be fair)
September 15, 2025 at 5:15 PM
2bit was built for genomes where there are very long contiguous N-blocks. the overhead for managing these blocks though on fastq-style records (generally very short and non-contiguous Ns) is massive and most of the time unnecessary.
September 15, 2025 at 5:13 PM
Are you going to have a remote component to this? Would love to watch some of these talks if I can
June 26, 2025 at 1:28 AM
Ah this is the way that I do it in paraseq! Doesn't work for fastq headers but works well for fasta
June 24, 2025 at 8:04 PM