davidheineman.com
📄: arxiv.org/abs/2508.13144
📝: allenai.org/blog/signal-noise
💻: github.com/allenai/signal-and-noise
📄: arxiv.org/abs/2508.13144
📝: allenai.org/blog/signal-noise
💻: github.com/allenai/signal-and-noise
❗️ Simply using the top 16 MMLU subtasks by SNR exhibits better decision accuracy and lower scaling law error than using the full task (only 6 for an AutoBencher task)
❗️ Simply using the top 16 MMLU subtasks by SNR exhibits better decision accuracy and lower scaling law error than using the full task (only 6 for an AutoBencher task)
SNR is predictive of better decision accuracy, and tasks with lower noise have lower scaling law error!
SNR is predictive of better decision accuracy, and tasks with lower noise have lower scaling law error!
This allows estimating SNR with a small number of models (around 50 models) at any compute scale!
This allows estimating SNR with a small number of models (around 50 models) at any compute scale!
We want – ⭐ low noise and high signal ⭐ – *both* low variance during training and a high spread of scores.
We want – ⭐ low noise and high signal ⭐ – *both* low variance during training and a high spread of scores.