Johan Nystrom-Persson
jtnystrom.bsky.social
Johan Nystrom-Persson
@jtnystrom.bsky.social
Metagenomics, algorithmic bioinformatics, programming languages. Performance engineering, esp w. Scala and Spark. Swedish, living in Japan. Consulting: https://jnpsolutions.io/
Slacken is available on GitHub (github.com/JNP-Solution...) and reference libraries are available on S3 thanks to AWS Open Data sponsorship.

Feedback very welcome. I'd be happy to answer any questions or assist people in getting started. We want Slacken to be as accessible as possible.
GitHub - JNP-Solutions/Slacken: Highly scalable implementation of the Kraken 2 genomic sequence classification method. Based on Apache Spark.
Highly scalable implementation of the Kraken 2 genomic sequence classification method. Based on Apache Spark. - JNP-Solutions/Slacken
github.com
June 18, 2025 at 2:38 AM
2) We show that dynamically tailoring a genomic reference library to the samples being classified greatly increases the fraction of species and strain level classifications (making them more specific) as well as improving Bracken quantification.
June 18, 2025 at 2:38 AM
1) We introduce a new implementation of the Kraken 2 method on Apache Spark, which has comparable cost-performance when classifying multiple samples.
June 18, 2025 at 2:38 AM
Particularly with a focus on making the software accessible for people with no Spark experience.
April 14, 2025 at 6:52 AM
I can imagine that people who are entering into software development now might get the false impression that there's only accidental complexity and AI is our only hope to temper it. But you only get to understand simplicity by developing your own taste for it (by fighting complexity for long enough)
April 4, 2025 at 6:55 AM
One challenge I think young people are facing is that you have to wade through so much accidental complexity before you start seeing the light. It's only in my mid 30's that I think I understood how to value simplicity and elegance. Before that I was not seeing the forest for the trees a lot.
April 4, 2025 at 6:52 AM
Reposted by Johan Nystrom-Persson
CDC datasets have been saved. But you can still help by seeding.
CDC datasets uploaded before January 28th, 2025 : Centers for Disease Control and Prevention : Free Download, Borrow, and Streaming : Internet Archive
An archive of all CDC datasets uploaded to https://data.cdc.gov/browse before January 28th, 2025. Excludes corrupt datasets and data not publicly accessible.
archive.org
February 2, 2025 at 4:38 AM
This resolves an inherent conflict between scalability and precision in Kraken 2.
January 21, 2025 at 5:45 AM