Full paper: arxiv.org/pdf/2510.22037
Huge thanks to my brilliant co-authors: Sneha, Niklas, I-Hung, Isaac, Sandy, Sercan, Chen-Yu, and Sayna!
Full paper: arxiv.org/pdf/2510.22037
Huge thanks to my brilliant co-authors: Sneha, Niklas, I-Hung, Isaac, Sandy, Sercan, Chen-Yu, and Sayna!
🌟Answer: We found compute-optimal crossover points for every model size.
Rough rule of thumb: finetune if your compute budget C is < 10^10 x N ^1.54, otherwise pretrain.
8/
🌟Answer: We found compute-optimal crossover points for every model size.
Rough rule of thumb: finetune if your compute budget C is < 10^10 x N ^1.54, otherwise pretrain.
8/
The curse is real but quantifiable: ϕ=0.11 (capacity penalty), ψ=-0.04 (data benefit from transfer).
7/
The curse is real but quantifiable: ϕ=0.11 (capacity penalty), ψ=-0.04 (data benefit from transfer).
7/
🌟Answer: We derived closed-form equations! To go from K to 4K languages while maintaining performance: scale data by 2.74×, model size by 1.4×.
6/
🌟Answer: We derived closed-form equations! To go from K to 4K languages while maintaining performance: scale data by 2.74×, model size by 1.4×.
6/
Languages sharing writing systems (e.g., Latin) show dramatically better transfer (mean: -0.23) vs different scripts (mean: -0.39).
Also important: transfer is often asymmetric—A helping B ≠ B helping A.
5/
Languages sharing writing systems (e.g., Latin) show dramatically better transfer (mean: -0.23) vs different scripts (mean: -0.39).
Also important: transfer is often asymmetric—A helping B ≠ B helping A.
5/
🌟Answer: We measure this empirically. We built a 38×38 transfer matrix, or 1,444 language pairs—the largest such resource to date.
We highlight the top 5 most beneficial source languages for each target language.
4/
🌟Answer: We measure this empirically. We built a 38×38 transfer matrix, or 1,444 language pairs—the largest such resource to date.
We highlight the top 5 most beneficial source languages for each target language.
4/
Without modeling transfer, existing laws fail on multilingual settings.
3/
Without modeling transfer, existing laws fail on multilingual settings.
3/
🌟Answer: Yes! ATLAS outperforms prior work with R²(N)=0.88 vs 0.68, and R²(M)=0.82 vs 0.69 for mixture generalization.
2/
🌟Answer: Yes! ATLAS outperforms prior work with R²(N)=0.88 vs 0.68, and R²(M)=0.82 vs 0.69 for mixture generalization.
2/
He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.
➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench
He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.
➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench
Thank you to the team and advisors!
🧵/
Thank you to the team and advisors!
🧵/
Most surprising to me is despite some growth in language/geographic coverage, representation hasn’t significantly improved in a decade.
Check out the paper: arxiv.org/pdf/2412.17847
2/
Most surprising to me is despite some growth in language/geographic coverage, representation hasn’t significantly improved in a decade.
Check out the paper: arxiv.org/pdf/2412.17847
2/
Panelists: @atoosakz.bsky.social, @randomwalker.bsky.social, @alondra.bsky.social, and Deirdre K. Mulligan.
Moderator: @shaynelongpre.bsky.social.
#AIDemocraticFreedoms
Panelists: @atoosakz.bsky.social, @randomwalker.bsky.social, @alondra.bsky.social, and Deirdre K. Mulligan.
Moderator: @shaynelongpre.bsky.social.
#AIDemocraticFreedoms
🧵/
🧵/