Austin Richardson
banner
agdr.org
Austin Richardson
@agdr.org
Metagenomics, Software Engineering
Pinned
NCBI's Taxonomy changes over time. We built Taxonomy Time Machine to track these changes:

🕰️ app: taxonomy.onecodex.com
📄 pre-print: www.biorxiv.org/content/10.1...
Reposted by Austin Richardson
🚨New preprint out!
We present a foundational genomic resource of human gut microbiome viruses. It delivers high-quality, deeply curated data spanning taxonomy, predicted hosts, structures, and functions, providing a reference for gut virome research. (1/8)
www.biorxiv.org/content/10.1...
November 6, 2025 at 5:26 PM
Reposted by Austin Richardson
When you buy a cutting board from bioinformaticians
October 26, 2025 at 10:56 PM
Reposted by Austin Richardson
gut fauna
March 31, 2025 at 9:48 AM
Reposted by Austin Richardson
Tech snow day
October 20, 2025 at 4:00 PM
Reposted by Austin Richardson
Apple's approach to protein structure is great for accessibility - & potentially biological realism - reasons.

Eg, prediction could be achieved w/ smaller compute & the generative nature of prediction allows for multiple conformations

A summary here: genomely.substack.com/p/simplefold...
SimpleFold and the Future of Protein Folding
A Generative Shift in Protein Folding
genomely.substack.com
September 25, 2025 at 7:20 PM
Reposted by Austin Richardson
If you're wondering why we're hosting the pre-print via dropbox, its because arXiv (and bioRxiv) did not accept it (because it is a review). Its a bit disconcerting, because a review is precisely the type of paper that would benefit a lot from pre-publication dissemination and feedback.
Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.
September 25, 2025 at 1:25 PM
Closed my eyes for a sec and summoned another earthquake
September 23, 2025 at 1:23 AM
they should invent a type of volatile memory that gets heavier the more data it contains
September 15, 2025 at 10:29 PM
Reposted by Austin Richardson
Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...
September 12, 2025 at 3:07 PM
NCBI BLAST can now output a proper CSV with headers 👏🎉: www.ncbi.nlm.nih.gov/books/NBK131...
BLAST+ Release Notes
Note: If building BLAST+ from source code, the Zlib, Zstandard and Bzip2 libraries will be needed.
www.ncbi.nlm.nih.gov
August 31, 2025 at 5:57 PM
you can just pour milk over trail mix and eat it like cereal
August 25, 2025 at 5:54 PM
Reposted by Austin Richardson
Little writeup on the speed of fasta parsers, at last.

Basically: both needletail and paraseq are process input linearly, and thus have a limit around 4 GB/s.

By giving each thread its own slice of the input file, we're limited by RAM bandwidth instead :)

curiouscoding.nl/posts/fasta-...
August 6, 2025 at 5:42 PM
Reposted by Austin Richardson
I do not enjoy that we now live in a world where seeing this banner at the top of PubMed makes me nervous.
July 23, 2025 at 2:37 PM
Reposted by Austin Richardson
TIL the EBV genome is *included in the hg38 assembly* so that EBV reads are not erroneously mapped elsewhere to the human genome. That's certainly .... an interesting solution ... 🤯

But it enabled this extremely cool work:
In a cool twist of fate, the EBV contig is in hg38 to mop up unscrupulous EBV reads, a by-product of immortalizing lymphoblastic cell lines (used in 1000 Genomes Project, etc.). Hence, a simple `samtools view` could get a measure of persistent EBV DNA in large WGS cohorts, e.g., UK Biobank. 4/
July 22, 2025 at 10:30 PM
Reposted by Austin Richardson
This is a bad take
stevensalzberg.substack.com/p/i-know-gen...

Saying that DNA data is like your browsing data and can can therefore be leaked is a false equivalence. Thing A is on fire so it's fine for thing B to be on fire, too-style argumentation.
I know genomes. Don't delete your DNA
Too many people are panicking about 23andMe.
stevensalzberg.substack.com
July 21, 2025 at 11:51 PM
Q: what do viruses and potatoes have in common?
A: both are "acellular root"
June 30, 2025 at 6:35 PM
Reposted by Austin Richardson
June 23, 2025 at 5:02 PM
Reposted by Austin Richardson
Are you attending #ASMicrobe this is week? Stop by my talk on Friday morning (10AM) and say hello! 👋 if you can’t make it and want to meet up - just drop me a DM!

I love this meeting and connecting with so many friends and colleagues over the years has made it really a special meeting.
June 19, 2025 at 8:11 PM
🌳 Taxonomy Time Machine now supports batch lookups! Quickly resolve lists of names/TaxIDs to their current NCBI taxonomy → taxonomy.onecodex.com/bulk-resolver
June 16, 2025 at 6:02 PM
🚀 Pushed some updates to taxonomy.onecodex.com

- Example queries to help you get started
- Summary section for easier interpretation
- Perf. improvements
Taxonomy Time Machine
Explore and compare the history of the NCBI Taxonomy Database. Instantly browse, search, and reconstruct taxonomic lineages at any point in time. Open source, web-based, and API-accessible.
taxonomy.onecodex.com
May 24, 2025 at 9:38 PM
Reposted by Austin Richardson
🧵 The ATCC Genome Portal hit 5,500 authenticated microbial genomes (>2,600 species)! 🎉🥳 We've sequenced, assembled, annotated 4,538 bacteria, 479 viruses, 479 fungi, and 4 protists! All NGS in-house @ ATCC under ISO, and >90% on BOTH @nanoporetech.com and #Illumina 😎 www.atcc.org/applications...
Discover the ATCC Genome Portal | ATCCCart
www.atcc.org
April 2, 2025 at 7:23 PM
Something happened to my $PATH and now nothing works

Trisolarans: “the Sophons have succeeded in disrupting science”
April 2, 2025 at 8:09 PM
Bad day for VCF files
March 24, 2025 at 4:24 PM
Reposted by Austin Richardson
It's clearly a DNS issue, but overall, the NCBI is the least reliable I've ever experienced in my career. And I'm in this long enough to remember the Entrez API giving you only part of the file every 50-100th time.
March 2, 2025 at 1:45 PM
using github copilot to fail at github workflows aka boiling the ocean
February 15, 2025 at 1:43 AM