Lightnews — Scholar-powered news

Noam Teyssier

@noamteyssier.bsky.social

Bioinformatics Scientist at the Arc Institute.

Working at the intersection of functional genomics, systems biology, and machine learning. I also build rusty bioinformatics tools

https://github.com/noamteyssier

Posts Replies Media Videos

Noam Teyssier

@noamteyssier.bsky.social

Added a feature to bqtools yesterday for colored grep output. Also supports colored FASTX output as well. Already useful this morning as I troubleshoot some sequencing outputs!

September 4, 2025 at 5:56 PM

Noam Teyssier

@noamteyssier.bsky.social

Just merged in an awesome new feature for xsra to support named pipes with @robp.bsky.social.

This lets you skip an intermediary write step and go straight from SRA to downstream tools.

It works with accessions that are both on- or off-disk.

May 9, 2025 at 3:26 PM

Noam Teyssier

@noamteyssier.bsky.social

Ah yes 2bit was a big inspiration for binseq - I didn't include it because it wasn't widely used and it was more geared towards large genomes so I figured it wouldn't scale.

But you're right I didn't formally test it. Here's a simple bench with Kent's utils (1-core bqtools to be fair)

May 8, 2025 at 3:24 PM

Noam Teyssier

@noamteyssier.bsky.social

I did an exploration into cheap per-record operations with binseq (grep in my example) and was curious how it would hold up here.

I forked sassy for a quick test and found 25x throughput over gzip and a 16x over raw.

That includes sharing SIMD with search during 2bit decode also.

May 2, 2025 at 12:42 AM

Noam Teyssier

@noamteyssier.bsky.social

One of the major focuses of the project was to streamline extraction by removing temporary outputs and we show that xsra achieves significantly less disk usage compared to fasterq-dump.

April 29, 2025 at 9:03 PM

Noam Teyssier

@noamteyssier.bsky.social

xsra significantly outperforms fasterq-dump and fastq-dump when dumping uncompressed records but is also faster when outputting compressed records.

It can achieve almost 10x faster throughput outputting BINSEQ files when skipping FASTQ altogether

April 29, 2025 at 9:03 PM

Noam Teyssier

@noamteyssier.bsky.social

Interestingly we also observe the behavior for more complex tasks like sequence alignment. We demonstrate that BINSEQ continues to scale linearly with thread usage with both minimap2 and STAR.

April 15, 2025 at 2:40 PM

Noam Teyssier

@noamteyssier.bsky.social

Where these formats really shine is in their ability to scale. Because they don't require being parsed sequentially they continue being performant as sequential formats (FASTQ) plateau. Simple tasks like kmer-counting show gains as early as 8 threads.

April 15, 2025 at 2:40 PM

Noam Teyssier

@noamteyssier.bsky.social

BINSEQ and VBINSEQ are the only family members (for now). They are both built around two-bit encoded nucleotides and each support paired records natively.

We provide rust libraries for IO, C and C++ bindings to BINSEQ, and a CLI tool to easily manipulate them.

April 15, 2025 at 2:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news