Noam Teyssier
@noamteyssier.bsky.social
Bioinformatics Scientist at the Arc Institute.
Working at the intersection of functional genomics, systems biology, and machine learning. I also build rusty bioinformatics tools
https://github.com/noamteyssier
Working at the intersection of functional genomics, systems biology, and machine learning. I also build rusty bioinformatics tools
https://github.com/noamteyssier
The pattern counting is something I'm especially stoked about. I was actually very surprised to see that this feature isn't more common on grep-like tools (outside of bioinformatics as well).
I've had this problem for years and I end up writing bespoke tools that do some variation of it.
I've had this problem for years and I end up writing bespoke tools that do some variation of it.
November 7, 2025 at 1:12 AM
The pattern counting is something I'm especially stoked about. I was actually very surprised to see that this feature isn't more common on grep-like tools (outside of bioinformatics as well).
I've had this problem for years and I end up writing bespoke tools that do some variation of it.
I've had this problem for years and I end up writing bespoke tools that do some variation of it.
And stay on the look out the next couple weeks (hopefully) for the release of an even bigger project built with binseq!
October 29, 2025 at 8:41 PM
And stay on the look out the next couple weeks (hopefully) for the release of an even bigger project built with binseq!
And if you're interested in building with binseq here is the place to start!
github.com/arcinstitute...
github.com/arcinstitute...
GitHub - ArcInstitute/binseq: A high efficiency binary format for sequencing data
A high efficiency binary format for sequencing data - ArcInstitute/binseq
github.com
October 29, 2025 at 8:41 PM
And if you're interested in building with binseq here is the place to start!
github.com/arcinstitute...
github.com/arcinstitute...
I've also added some nice functionality to bqtools including a very useful colored grep!
github.com/arcinstitute...
github.com/arcinstitute...
GitHub - ArcInstitute/bqtools: A command line utilty for working with BINSEQ files
A command line utilty for working with BINSEQ files - ArcInstitute/bqtools
github.com
October 29, 2025 at 8:41 PM
I've also added some nice functionality to bqtools including a very useful colored grep!
github.com/arcinstitute...
github.com/arcinstitute...
Have you tried samply?
October 13, 2025 at 9:20 PM
Have you tried samply?
The workspace publishing has been such a hassle. So glad to see this out
September 18, 2025 at 2:40 PM
The workspace publishing has been such a hassle. So glad to see this out
Sounds great! Would be very interested in that and happy to help build one
September 17, 2025 at 2:18 PM
Sounds great! Would be very interested in that and happy to help build one
bsky.app/profile/noam...
Here was a benchmark I ran a while back comparing twobit and binseq on a single-thread
Here was a benchmark I ran a while back comparing twobit and binseq on a single-thread
Ah yes 2bit was a big inspiration for binseq - I didn't include it because it wasn't widely used and it was more geared towards large genomes so I figured it wouldn't scale.
But you're right I didn't formally test it. Here's a simple bench with Kent's utils (1-core bqtools to be fair)
But you're right I didn't formally test it. Here's a simple bench with Kent's utils (1-core bqtools to be fair)
September 15, 2025 at 5:15 PM
bsky.app/profile/noam...
Here was a benchmark I ran a while back comparing twobit and binseq on a single-thread
Here was a benchmark I ran a while back comparing twobit and binseq on a single-thread
2bit was built for genomes where there are very long contiguous N-blocks. the overhead for managing these blocks though on fastq-style records (generally very short and non-contiguous Ns) is massive and most of the time unnecessary.
September 15, 2025 at 5:13 PM
2bit was built for genomes where there are very long contiguous N-blocks. the overhead for managing these blocks though on fastq-style records (generally very short and non-contiguous Ns) is massive and most of the time unnecessary.
Are you going to have a remote component to this? Would love to watch some of these talks if I can
June 26, 2025 at 1:28 AM
Are you going to have a remote component to this? Would love to watch some of these talks if I can
Ah this is the way that I do it in paraseq! Doesn't work for fastq headers but works well for fasta
June 24, 2025 at 8:04 PM
Ah this is the way that I do it in paraseq! Doesn't work for fastq headers but works well for fasta