austintwang.com
arxiv.org/abs/2412.05430
github.com/kundajelab/D...
neurips.cc/virtual/2024...
#machinelearning #NeurIPS2024 #genomics
arxiv.org/abs/2412.05430
github.com/kundajelab/D...
neurips.cc/virtual/2024...
#machinelearning #NeurIPS2024 #genomics
• Nailing short-context tasks before long-context
• Data sampling to account for class imbalance
• Conditioning on cell type context
These strategies use external annotations, which are plentiful!
• Nailing short-context tasks before long-context
• Data sampling to account for class imbalance
• Conditioning on cell type context
These strategies use external annotations, which are plentiful!
Given their resource requirements, current DNALMs are a hard sell.
Given their resource requirements, current DNALMs are a hard sell.
Furthermore, small models trained from scratch (<10M params) routinely outperform much larger DNALMs (>1B params), even after LoRA fine-tuning!
Our results on the hardest task - counterfactual variant effect prediction.
Furthermore, small models trained from scratch (<10M params) routinely outperform much larger DNALMs (>1B params), even after LoRA fine-tuning!
Our results on the hardest task - counterfactual variant effect prediction.
(5/10) Rigorous evaluations of DNALMs, though critical, are lacking. Existing benchmarks:
• Focus on surrogate tasks tenuously related to practical use cases
• Suffer from inadequate controls and other dataset design flaws
• Compare against outdated or inappropriate baselines
(5/10) Rigorous evaluations of DNALMs, though critical, are lacking. Existing benchmarks:
• Focus on surrogate tasks tenuously related to practical use cases
• Suffer from inadequate controls and other dataset design flaws
• Compare against outdated or inappropriate baselines
• Learn representations that can accurately distinguish different types of functional DNA elements
• Serve as a foundation for downstream supervised models
• Outperform models trained from scratch
• Learn representations that can accurately distinguish different types of functional DNA elements
• Serve as a foundation for downstream supervised models
• Outperform models trained from scratch