Olmo 3.1 is a..
🐡 32B Thinking, still best fully-open model to-date
🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3
we added 10 more pages to the paper! thx for community feedback from convos at neurips
we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.
How2Everything evals/trains for this at scale. 🧵
we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.
congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers
www.nature.com/articles/s41...
congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers
www.nature.com/articles/s41...
Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more
New for 2026: Opinion & Statement Papers!
Full CFP: gem-workshop.com/call-for-pap...
Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more
New for 2026: Opinion & Statement Papers!
Full CFP: gem-workshop.com/call-for-pap...
im onboard w views that "english is the new programming language" & "software engineering", translating ambiguous goals to technical specs/execution, is still a skill.
im more concerned w shift from my role as a writer to a reviewer and
im onboard w views that "english is the new programming language" & "software engineering", translating ambiguous goals to technical specs/execution, is still a skill.
im more concerned w shift from my role as a writer to a reviewer and
kept prompting it w examples of more informative topics and it ended up with "LLM training", "LLM datasets", and "LLM evaluation"
thx
kept prompting it w examples of more informative topics and it ended up with "LLM training", "LLM datasets", and "LLM evaluation"
thx
i like the idea of different feeds but i actually want my subscription to select feeds to be taken as a preference signal ("more like this") that informs a "home/default" feed.
i really dislike the UX of having to tab through each subscribed feed, esp when there's also post overlap
i like the idea of different feeds but i actually want my subscription to select feeds to be taken as a preference signal ("more like this") that informs a "home/default" feed.
i really dislike the UX of having to tab through each subscribed feed, esp when there's also post overlap
Olmo 3.1 is a..
🐡 32B Thinking, still best fully-open model to-date
🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3
we added 10 more pages to the paper! thx for community feedback from convos at neurips
Olmo 3.1 is a..
🐡 32B Thinking, still best fully-open model to-date
🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3
we added 10 more pages to the paper! thx for community feedback from convos at neurips
Come say hi 👋 if you wanna chat about
🦈 olmo 3 stories
🐟 pretraining data & evals
🍣 midtraining shouldnt exist
🐠 model specialization
🐡 AI for education
🍥 tabletop games
Come say hi 👋 if you wanna chat about
🦈 olmo 3 stories
🐟 pretraining data & evals
🍣 midtraining shouldnt exist
🐠 model specialization
🐡 AI for education
🍥 tabletop games
🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training
🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training
latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo 😒
latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo 😒
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together 🤝
links 👇
small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more 🌿
small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more 🌿
🔥training our VLM using RLVR with binary unit test rewards🔥
it's incredibly effective & unit test creation easy to scale w synthetic data pipelines
check it out at olmocr.allen.ai
🔥training our VLM using RLVR with binary unit test rewards🔥
it's incredibly effective & unit test creation easy to scale w synthetic data pipelines
check it out at olmocr.allen.ai
findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025
findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025
come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc
made a 🔥 poster for thursday sess, come say hi
come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc
made a 🔥 poster for thursday sess, come say hi