https://willieneis.github.io
We also release new benchmarks for genomic detection and embedding (eg, Gene-MTEB, based on MTEB for LLMs).
See our paper for details: arxiv.org/abs/2501.02045
We also release new benchmarks for genomic detection and embedding (eg, Gene-MTEB, based on MTEB for LLMs).
See our paper for details: arxiv.org/abs/2501.02045
Wastewater provides a rich source of data from tens of thousands of species across the human-adjacent microbiome. In total we pretrain on over 1.5T base pairs of DNA/RNA.
Wastewater provides a rich source of data from tens of thousands of species across the human-adjacent microbiome. In total we pretrain on over 1.5T base pairs of DNA/RNA.