Daniel Anderson
danderson123.bsky.social
Daniel Anderson
@danderson123.bsky.social
Reposted by Daniel Anderson
Rewriting protein alphabets with language models https://www.biorxiv.org/content/10.1101/2025.11.27.690975v1
November 29, 2025 at 2:47 AM
Reposted by Daniel Anderson
Deciphering enzymatic potential in metagenomic reads through DNA language model https://www.biorxiv.org/content/10.1101/2024.12.10.627786v1
December 12, 2024 at 2:47 AM
Reposted by Daniel Anderson
A General Transformer-Based Multi-Task Learning Framework for Predicting Interaction Types between Enzyme and Small Molecule https://www.biorxiv.org/content/10.1101/2025.10.09.681419v1
October 11, 2025 at 8:46 AM
Reposted by Daniel Anderson
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models
Zinnia Ma, Neville P. Bethel
bioRxiv 2025.09.23.678152; doi: doi.org/10.1101/2025...
RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models
Protein language models (pLMs) have the capacity to infer structural information from amino acid sequences. Evaluating the extent to which structural information they truly encode is crucial for asses...
doi.org
September 29, 2025 at 3:49 AM
Reposted by Daniel Anderson
Precisely calling mutations across hundreds of bacterial isolates has been hard, requiring manual filtering and expertise.

Until now, using AccuSNV.

Herui Liao trained an ML model based on our previous meticulously called SNVs.
www.biorxiv.org/content/10.1...
High-accuracy SNV calling for bacterial isolates using deep learning with AccuSNV
Accurate detection of mutations within bacterial species is critical for fundamental studies of microbial evolution, reconstructing transmission events, and identifying antimicrobial resistance mutati...
www.biorxiv.org
September 29, 2025 at 7:45 PM
Reposted by Daniel Anderson
September 30, 2025 at 4:21 PM
Reposted by Daniel Anderson
Machine learning for biosecurity: A probabilistic framework for invasive species management. Journal of Applied Ecology, 00, 1–13. doi.org/10.1111/1365...
Machine learning for biosecurity: A probabilistic framework for invasive species management
By using pre-introduction traits and leveraging ML for early detection, this study presents a scalable, data-driven framework for invasion risk assessment and conservation planning. Our approach enab...
doi.org
October 4, 2025 at 12:30 PM
Reposted by Daniel Anderson
Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching
We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...
www.biorxiv.org
October 3, 2025 at 2:51 PM
Reposted by Daniel Anderson
There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠
How to rapidly search the world’s microbial DNA
By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.
www.ebi.ac.uk
September 30, 2025 at 9:47 AM
Reposted by Daniel Anderson
"We show that, despite this compression factor, SSEs can be used as a highly effective tertiary structure comparison tool, with accuracy that approaches that of Foldseek, while offering a 200-fold speedup. "

www.biorxiv.org/content/10.1...
Compression of protein secondary structures enables ultra-fast and accurate structure searching
Protein structure prediction has undergone a revolution with the advent of AI- based algorithms, such as AlphaFold and RoseTTAFold. As a result, over 200 million predicted protein structures have been...
www.biorxiv.org
September 17, 2025 at 6:53 PM
Reposted by Daniel Anderson
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:12 AM
Couldn’t have said it better myself!
Delighted to see this paper from danderson123.bsky.social 's PhD out. We have been building tools for AMR gene detection for over a decade now, but multicopy genes remain challenging. Dan shows that with a gene-space de Bruijn graph and long reads, you can do well
www.biorxiv.org/content/10.1...
May 20, 2025 at 5:47 PM