Kyle Lo
banner
kylelo.bsky.social
Kyle Lo
@kylelo.bsky.social
language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com
Pinned
during neurips, we kept the RL run going & model kept getting better 😂

Olmo 3.1 is a..
🐡 32B Thinking, still best fully-open model to-date
🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3

we added 10 more pages to the paper! thx for community feedback from convos at neurips
Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵
incredibly fun project led by our intern yapei chang

we mined the web for thousands of real-world “how to do X” step by step instructions and turned it into a dataset, synth data training procedure, eval suite, etc.
LLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains.

How2Everything evals/trains for this at scale. 🧵
February 10, 2026 at 8:34 PM
our open model proving out specialized rag LMs over scientific literature has been published in nature ✌🏻

congrats to our lead @akariasai.bsky.social & team of students and Ai2 researchers/engineers

www.nature.com/articles/s41...
February 4, 2026 at 10:43 PM
0 days since last mixup of eval results between "copa" (choice of plausible alternatives) & "coqa" (conversational QA) tasks 😐
February 3, 2026 at 8:01 PM
The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...
January 27, 2026 at 7:17 PM
some thoughts about skill degradation w/ AI coding

im onboard w views that "english is the new programming language" & "software engineering", translating ambiguous goals to technical specs/execution, is still a skill.

im more concerned w shift from my role as a writer to a reviewer and
January 21, 2026 at 5:31 PM
lucky to chat w sen. patty murray about olmo & importance of fully open AI
January 18, 2026 at 3:09 AM
using opus to extract research topics from papers & it was giving me useless words like "training", "datasets", and "evaluation"

kept prompting it w examples of more informative topics and it ended up with "LLM training", "LLM datasets", and "LLM evaluation"

thx
January 17, 2026 at 1:07 AM
just realized ive had food on my face all day & nobody at office told me, thx ai2 frens 😫
January 16, 2026 at 12:05 AM
bsky wish list

i like the idea of different feeds but i actually want my subscription to select feeds to be taken as a preference signal ("more like this") that informs a "home/default" feed.

i really dislike the UX of having to tab through each subscribed feed, esp when there's also post overlap
January 14, 2026 at 9:06 PM
just in case it wasn’t clear which room this is
January 7, 2026 at 5:58 PM
just had hechalou’s yin yang milk tea and i think i’ve transcended 🤤
December 14, 2025 at 1:43 AM
during neurips, we kept the RL run going & model kept getting better 😂

Olmo 3.1 is a..
🐡 32B Thinking, still best fully-open model to-date
🐠 32B Instruct, for ppl who hate long yapping, as good as qwen3

we added 10 more pages to the paper! thx for community feedback from convos at neurips
Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32B—releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧵
December 12, 2025 at 6:03 PM
I'll be at #NeurIPS2025 from Tues-Sat!

Come say hi 👋 if you wanna chat about
🦈 olmo 3 stories
🐟 pretraining data & evals
🍣 midtraining shouldnt exist
🐠 model specialization
🐡 AI for education
🍥 tabletop games
December 1, 2025 at 9:51 PM
fml 🤦🏻‍♂️
CPSC Warns Consumers to Immediately Stop Using Batteries for E-Bikes from Rad Power Bikes Due to Fire Hazard; Risk of Serious Injury or Death www.cpsc.gov/Warnings/202...
November 24, 2025 at 8:02 PM
we released Olmo 3! lot of exciting stuff but wanna focus on:

🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training
November 20, 2025 at 6:20 PM
going live with a mukbang tmr 🍱
November 19, 2025 at 5:35 PM
not happy abt gpt 5.1 update. it's making way more mistakes compared to gpt 5 on basic stuff

latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo 😒
November 14, 2025 at 12:26 PM
picking between 3 checkpoints w/ same benchmark scores but what if one of them is agi
November 12, 2025 at 5:31 PM
why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇
November 5, 2025 at 11:11 PM
congrats to our olmo earth team 🌎

small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more 🌿
Introducing OlmoEarth 🌍, state-of-the-art AI foundation models paired with ready-to-use open infrastructure to turn Earth data into clear, up-to-date insights within hours—not years.
November 4, 2025 at 5:57 PM
woah guess VLMs for OCR the hottest research topic this week😆 since the first olmOCR, we've been..

🔥training our VLM using RLVR with binary unit test rewards🔥

it's incredibly effective & unit test creation easy to scale w synthetic data pipelines

check it out at olmocr.allen.ai
October 22, 2025 at 6:02 PM
bye #colm2025 big fan of the montreal bagels 🥯 hot take I like them better than
October 11, 2025 at 6:16 PM
come say hi at posters this morning for OLMo 2 and fluid benchmarking posters 👋 and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs
October 9, 2025 at 1:14 PM
@josephc.bsky.social @mariaa.bsky.social and I are at poster #21

findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025
October 8, 2025 at 8:12 PM
flyin to #colm2025 along w bunch of the @ai2.bsky.social team

come chat w me about pretraining horror stories, data & evals, what we're cookin for next olmo, etc

made a 🔥 poster for thursday sess, come say hi
October 6, 2025 at 3:20 PM