Lightnews — Scholar-powered news

Reposted by Jacob Morrison

Kunal Jha

@kjha02.bsky.social

Forget modeling every belief and goal! What if we represented people as following simple scripts instead (i.e "cross the crosswalk")?

Our new paper shows AI which models others’ minds as Python code 💻 can quickly and accurately predict human behavior!

shorturl.at/siUYI%F0%9F%...

October 3, 2025 at 2:24 AM

Reposted by Jacob Morrison

Saumya Malik

@saumyamalik.bsky.social

Thank you to co-authors @natolambert.bsky.social, @valentinapy.bsky.social, @jacobcares.bsky.social, Sander Land, @nlpnoah.bsky.social, @hanna-nlp.bsky.social!
Read more in the paper here (ArXiv soon!): github.com/allenai/rewa...
Dataset, leaderboard, and models here: huggingface.co/collections/...

Reward Bench 2 - a allenai Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper!

huggingface.co

June 2, 2025 at 11:41 PM

Reposted by Jacob Morrison

Ai2

@ai2.bsky.social

RewardBench 2 is here! We took a long time to learn from our first reward model evaluation tool to make one that is substantially harder and more correlated with both downstream RLHF and inference-time scaling.

The RewardBench 2 Leaderboard on HuggingFace.

June 2, 2025 at 4:31 PM

Reposted by Jacob Morrison

Nathan Lambert

@natolambert.bsky.social

Heading to NAACL? With "verification being the key to AI" you should go to the poster session Friday, 9-10:30am to chat with my star colleagues @valentinapy.bsky.social + @jacobcares.bsky.social about RewardBench (and really RewardBench 2, evaluation, and reward models in post-training).

April 29, 2025 at 4:07 PM

Jacob Morrison

@jacobcares.bsky.social

Valentina and I will be presenting RewardBench at NAACL! Come say hi at the poster session on Friday and we can chat about reward models, staying up for 30 hours straight to rapidly reset from Singapore time, and more 🏜️

Valentina Pyatkin @valentinapy.bsky.social · Apr 27

I'll be at #NAACL2025:

🖇️To present my paper "Superlatives in Context", showing how the interpretation of superlatives is very context dependent and often implicit, and how LLMs handle such semantic underspecification

🖇️And we will present RewardBench on Friday

Reach out if you want to chat!

April 28, 2025 at 3:15 PM

Reposted by Jacob Morrison

Valentina Pyatkin

@valentinapy.bsky.social

I'll be at #NAACL2025:

🖇️To present my paper "Superlatives in Context", showing how the interpretation of superlatives is very context dependent and often implicit, and how LLMs handle such semantic underspecification

🖇️And we will present RewardBench on Friday

Reach out if you want to chat!

April 27, 2025 at 8:00 PM

Jacob Morrison

@jacobcares.bsky.social

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

April 23, 2025 at 3:22 PM

Reposted by Jacob Morrison

Ai2

@ai2.bsky.social

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks.

Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

March 13, 2025 at 6:36 PM

Reposted by Jacob Morrison

Cats of Yore

@catsofyore.bsky.social

There are no proven benefits to raw feeding yet plenty of serious, well-documented risks. It has never been a good idea but ESPECIALLY NOW. washingtonstatestandard.com/briefs/two-w...

Two Washington cats infected with bird flu • Washington State Standard

Two domestic cats in Washington state have been infected with bird flu after eating raw pet food, according to the department of agriculture.

washingtonstatestandard.com

February 26, 2025 at 11:36 PM

Jacob Morrison

@jacobcares.bsky.social

big tülu is here! can't wait for everyone to try it, it's been a lot of fun seeing how RL performs at this scale thanks to @hamishivi.bsky.social
and @vwxyzjn.bsky.social, and preference data from @ljvmiranda.bsky.social

on an unrelated note, I'm applying to phd programs this year 👀

Ai2 @ai2.bsky.social · Jan 30

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! It demonstrates that our recipe, which includes RVLR scales to 405B - with performance on par with GPT-4o, & surpassing prior open-weight post-trained models of the same size including Llama 3.1.

January 30, 2025 at 7:25 PM

Reposted by Jacob Morrison

Ai2

@ai2.bsky.social

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! It demonstrates that our recipe, which includes RVLR scales to 405B - with performance on par with GPT-4o, & surpassing prior open-weight post-trained models of the same size including Llama 3.1.

January 30, 2025 at 2:28 PM

Reposted by Jacob Morrison

Hamish Ivison

@hamishivi.bsky.social

Excited to see Tulu 3 sits in between Llama 3.1 and 3.3 instruct on the chatbot arena leaderboard right now!

Particularly happy it is top 20 for Math and Multi-turn prompts :)

All the details and data on how to train a model this good are right here: arxiv.org/abs/2411.15124

January 8, 2025 at 5:47 PM

Reposted by Jacob Morrison

Nathan Lambert

@natolambert.bsky.social

Very pleased to see Tulu 3 70B more or less tied with Llama 3.1 70B Instruct on style controlled ChatBotArena. The only model anywhere close to that with open code and data for post-training! Lots of stuff people can build on.

Next looking for OLMo 2 numbers.

January 8, 2025 at 5:13 PM

Reposted by Jacob Morrison

Costa Huang

@vwxyzjn.bsky.social

We released the OLMo 2 report! Ready for some more RL curves? 😏

This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆.

And it works! A thread 🧵 1/N

January 6, 2025 at 6:34 PM

Reposted by Jacob Morrison

Kyle Lo

@kylelo.bsky.social

kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡

🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into:

🚖 stable pretrain recipe
🚔 lr anneal 🤝 data curricula 🤝 soups
🚘 tulu post-train recipe
🚜 compute infra setup

👇🧵

January 3, 2025 at 4:02 PM

Reposted by Jacob Morrison

Jiacheng Liu

@liujch1998.bsky.social

Want to predict the task performance of LMs before pretraining them?

We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

December 9, 2024 at 5:07 PM

Reposted by Jacob Morrison

derek guy

@dieworkwear.bsky.social

Why is Tokyo so fashionable? Some theories. 🧵

Saagar Enjeti tweets: "Probably a cold take but IMO Tokyo is the male fashion capital of the world: whether it’s western wear, suits, street wear the aesthetic is refined to the highest possible level

From the salaryman to the rebel teen they are impeccably dressed

It also helps no one is fat"

November 27, 2024 at 6:43 AM

Reposted by Jacob Morrison

Luca Soldaini 🎀

@soldaini.net

OLMo 2 is out 🥳 7B and 13B trained on 5T tokens, and meticulousy instruction tuned using Tulu 3 recipe.

Simply the best fully open models yet.

Really proud of the work & the amazing team at
@ai2.bsky.social

November 26, 2024 at 9:12 PM

Jacob Morrison

@jacobcares.bsky.social

🍲

November 26, 2024 at 9:07 PM

Reposted by Jacob Morrison

Ai2

@ai2.bsky.social

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

November 26, 2024 at 8:51 PM

Reposted by Jacob Morrison

Luca Soldaini 🎀

@soldaini.net

yeah language models are great, but which Tulu 3 are you

- brat tulu, a @jacobcares.bsky.social favorite
- PNW tulu, don’t forget where @ai2.bsky.social is from
- dank tulu 💪
- tulu at tulu, bc tulu means sunrise in farsi

November 21, 2024 at 9:59 PM

Reposted by Jacob Morrison

Ai2

@ai2.bsky.social

Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models 👇

November 21, 2024 at 5:15 PM

Jacob Morrison

@jacobcares.bsky.social

I'm so excited that we're finally releasing Tülu 3, our new post-training recipe! We're releasing models built on top of Llama 3.1 base (OLMo coming soon!), all of our datasets, a (73 page!) paper, new evaluations, and all of our code.

November 21, 2024 at 7:33 PM

Reposted by Jacob Morrison

Nathan Lambert

@natolambert.bsky.social

I've spent the last two years scouring all available resources on RLHF specifically and post training broadly. Today, with the help of a totally cracked team, we bring you the fruits of that labor — Tülu 3, an entirely open frontier model post training recipe. We beat Llama 3.1 Instruct.

Thread.

November 21, 2024 at 5:01 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news