Willie Neiswanger
banner
willieneis.bsky.social
Willie Neiswanger
@willieneis.bsky.social
Assistant Professor in CS + AI at USC. Previously at Stanford, CMU. Machine Learning, Decision Making, AI-for-Science, Generative AI, ML Systems, LLMs.

https://willieneis.github.io
​​METAGENE-1 shows state-of-the-art results on pathogen detection, metagenomic embedding, and other genomic tasks.

We also release new benchmarks for genomic detection and embedding (eg, Gene-MTEB, based on MTEB for LLMs).

See our paper for details: arxiv.org/abs/2501.02045
January 7, 2025 at 8:58 PM
Our data pipeline is: human microbiome > wastewater > metagenomic sequences > tokens > training data.

Wastewater provides a rich source of data from tens of thousands of species across the human-adjacent microbiome. In total we pretrain on over 1.5T base pairs of DNA/RNA.
January 7, 2025 at 8:58 PM
Metagenomic sequencing of wastewater produces vast amounts of data that can capture public health trends at a societal scale. Our goal is to train a model on this data to help in large-scale wastewater monitoring & detection of novel bio threats.
January 7, 2025 at 8:58 PM