Noam Teyssier
banner
noamteyssier.bsky.social
Noam Teyssier
@noamteyssier.bsky.social
Bioinformatics Scientist at the Arc Institute.

Working at the intersection of functional genomics, systems biology, and machine learning. I also build rusty bioinformatics tools

https://github.com/noamteyssier
New bqtools release with some nice new features!

1. Support for fuzzy matching using sassy
2. Multi-Pattern counting (like `grep -c` but the count is for each individual pattern provided)
3. Pattern files (providing large lists of patterns as either regex or literals)

github.com/ArcInstitute...
Release bqtools-0.4.8 · ArcInstitute/bqtools
What's Changed 116 support fuzzy grep with sassy by @noamteyssier in #118 119 gate fuzzy matching behind feature flag by @noamteyssier in #120 58 implement a pattern count feature by @noamteyssier...
github.com
November 7, 2025 at 1:12 AM
I've updated the BINSEQ manuscript to stay up to date with changes since I originally put it out at the beginning of the year

Some notable changes:
1. Support for ambiguous bases with 4bit encoding
2. Support for sequence headers
3. Improved API

www.biorxiv.org/content/10.1...
BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences
Modern genomics produces billions of sequencing records per run, which are typically stored as gzip-compressed FASTQ files. While this format is widely used, it is not optimal for high-throughput proc...
www.biorxiv.org
October 29, 2025 at 8:41 PM
Was just about to submit a revision for a paper and realized that I wouldn't be able to submit my source for the text because it was written with typst.

Such a bummer - moving this over to tex now but damn what a waste of time!

typst is just so much nicer to work with.
October 28, 2025 at 11:31 PM
Reposted by Noam Teyssier
Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...
October 23, 2025 at 8:16 PM
Reposted by Noam Teyssier
Thank you Alzforum for featuring our new preprint identifying regulators of disease states of #microglia.

Project led by Amanda McQuade, computation by Reet Mishra, collaboration with the Nunez and De Jager labs.

Alzforum
www.alzforum.org/news/researc...

Preprint
www.biorxiv.org/content/10.1...
October 22, 2025 at 6:28 PM
Had an old tool called `hist` to run `sort | uniq -c` years ago but thought up a high-perf impl for it today. Tried it out and found a 25x throughput over the coreutils version.

Big takeaway - arena allocators, hashmaps, and serde work super well together.

github.com/noamteyssier...
GitHub - noamteyssier/hist-rs: An efficient unique-line counter (25x over `sort | uniq -c`)
An efficient unique-line counter (25x over `sort | uniq -c`) - noamteyssier/hist-rs
github.com
October 22, 2025 at 11:03 PM
Reposted by Noam Teyssier
Paraseq 0.4 is out now! With double the throughput for processing paired-end input :)

github.com/noamteyssier...
September 4, 2025 at 10:41 PM
Added a feature to bqtools yesterday for colored grep output. Also supports colored FASTX output as well. Already useful this morning as I troubleshoot some sequencing outputs!
September 4, 2025 at 5:56 PM
Reposted by Noam Teyssier
Excited that the paper presenting our mouse brain in vivo CRISPR screening platform is out today in @natneuro.nature.com!

Great team effort, led by Biswa Ramani and @ivlrose.bsky.social in the Kampmann lab.

www.nature.com/articles/s41...
CRISPR screening by AAV episome-sequencing (CrAAVe-seq): a scalable cell-type-specific in vivo platform uncovers neuronal essential genes - Nature Neuroscience
The authors developed an adeno-associated virus-based high-throughput in vivo CRISPR screening platform for endogenous mouse brain cell types. Using this platform, they define genes and pathways essen...
www.nature.com
August 22, 2025 at 10:15 PM
Reposted by Noam Teyssier
Preprint alert!
We present K2Rmini, an ultra-fast, grep-like tool that extracts sequences of interest from FASTA/FASTQ files based on their k-mer content.
www.biorxiv.org/content/10.1...
A thread
Accelerating k-mer-based sequence filtering
The exponential growth of global sequencing data repositories presents both analytical challenges and opportunities. While k - mer-based indexing has improved scalability over traditional alignment fo...
www.biorxiv.org
July 2, 2025 at 1:00 PM
Writing in rust again after a long stretch of python is such a breath of fresh air.
June 26, 2025 at 2:47 AM
Reposted by Noam Teyssier
Introducing Arc Institute’s first virtual cell model: STATE
June 23, 2025 at 5:28 PM
Pretty cool little utility and blog post - fun to see the business/pleasure index for rust crates

boydkane.com/projects/cra...
Downloaded more for business, or pleasure?
This mini-project was inspired by this tweet: After which I spent about two hours making a small script that grabs data from the rust package repository crates.io, and analyses the ...
boydkane.com
June 18, 2025 at 8:32 PM
Reposted by Noam Teyssier
Preprint on "Improving spliced alignment by modeling splice sites with deep learning". It describes minisplice for modeling splice signals. Minimap2 and miniprot now optionally use the predicted scores to improve spliced alignment.
arxiv.org/abs/2506.12986
June 17, 2025 at 1:49 AM
Reposted by Noam Teyssier
New preprint! Deacon is a versatile tool for filtering FASTA/FASTQ files and streams at hundreds of megabases per second using minimizers, built with rapid metagenomic host depletion in mind, but equally useful for search.
github.com/bede/deacon
Deacon: fast sequence filtering and contaminant depletion https://www.biorxiv.org/content/10.1101/2025.06.09.658732v1
June 13, 2025 at 1:25 PM
Reposted by Noam Teyssier
ish is a grep-like CLI tool that uses optimal alignment instead of exact matching.

It’s record-type aware, supporting line, FASTA, and FASTQ records.

Built in Mojo as a proof of concept for bioinformatics.

🧵1/5
Ish: SIMD and GPU Accelerated Local and Semi-Global Alignment as a CLI Filtering Tool https://www.biorxiv.org/content/10.1101/2025.06.04.657890v1
June 9, 2025 at 1:05 PM
Reposted by Noam Teyssier
Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...
June 3, 2025 at 9:25 AM
Reposted by Noam Teyssier
📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8
May 27, 2025 at 12:06 PM
Reposted by Noam Teyssier
Our Proseg paper is now out in Nature Methods!
www.nature.com/articles/s41...

We borrowed a sampling procedure from the cell simulation literature to infer cell boundaries that best explains the spatial distribution of transcripts.
Cell simulation as cell segmentation - Nature Methods
Proseg is a segmentation approach for single-cell spatially resolved transcriptomics data that uses unsupervised probabilistic modeling of the spatial distribution of transcripts to accurately segment...
www.nature.com
May 22, 2025 at 5:52 PM
Reposted by Noam Teyssier
📄 The scanners are humming, the film is flowing.

The microfiche livestream is up—digitizing government docs in real time for Democracy’s Library.

Perfect second-screen vibes: Preservation in progress.

🕢 Live M-F, 7:30am–3:30pm PT (except U.S. holidays)
➡️ www.youtube.com/live/aPg2V5R...
lofi Archive radio 🎞️ beats to scan/read microfiche to
YouTube video by Internet Archive
www.youtube.com
May 22, 2025 at 2:37 PM
Reposted by Noam Teyssier
So yeah, this is why I keep going on about: do we have to sanitize user input or not? File formats where bad inputs are simply not representable are good, because it saves us from this 100x slowdown.
May 16, 2025 at 5:51 PM
Just merged in an awesome new feature for xsra to support named pipes with @robp.bsky.social.

This lets you skip an intermediary write step and go straight from SRA to downstream tools.

It works with accessions that are both on- or off-disk.
May 9, 2025 at 3:26 PM
No bottlenecks with binseq...

github.com/ArcInstitute...
May 2, 2025 at 12:56 AM
Extracting @NCBI SRA files with fasterq-dump can require 17x the size of the accession while decompressing. Our new tool xsra extracts sequences at 5x throughput with significantly less disk usage, built-in compression, and optional BINSEQ outputs

github.com/arcInstitute...
GitHub - ArcInstitute/xsra: An efficient CLI to extract sequences from the SRA
An efficient CLI to extract sequences from the SRA - ArcInstitute/xsra
github.com
April 29, 2025 at 9:03 PM
I think this captures a lot of my feelings on AI coding. I appreciate that it increases productivity, but at the expense of joy (and arguably quality). I've gone through phases with it, but ultimately I'm back to writing everything myself. It's just more fun.

terriblesoftware.org/2025/04/23/t...
The Hidden Cost of AI Coding
AI coding tools boost productivity but may sacrifice the flow state and deep satisfaction developers experience when writing code by hand. What are we losing?
terriblesoftware.org
April 24, 2025 at 6:59 PM