Gonzalo Benegas
gonzalobenegas.bsky.social
Gonzalo Benegas
@gonzalobenegas.bsky.social
Comp Bio Postdoc @ UC Berkeley
https://gonzalobenegas.github.io/
Conservation-aware CADD and GPN-MSA do better on Mendelian trait variants, expected to be under strong purifying selection. On complex trait variants, especially for non-disease traits, functional-genomics models Enformer and Borzoi tend to do better. However, ensembling helps:
February 13, 2025 at 8:57 PM
We evaluate models zero-shot (unsupervised) and with linear probing (logistic regression on top of extracted features):
February 13, 2025 at 8:57 PM
We evaluate a wide range of models with up to 7B parameters and 500K context size. Do these numbers matter? 🤔
February 13, 2025 at 8:57 PM
We collect putative causal variants from OMIM and UKBB with carefully matched controls.
February 13, 2025 at 8:57 PM
Can DNA sequence models predict mutations affecting human traits?

We introduce TraitGym, a curated benchmark of causal regulatory variants for 113 Mendelian & 83 complex traits, and evaluate functional genomics and DNA language models. Joint work w/ Gökcen Eraslan and @yun-s-song.bsky.social 🧵👇
February 13, 2025 at 8:57 PM