Lightnews — Scholar-powered news

Reposted by Hamish Ivison

Ai2

@ai2.bsky.social

We’re live on Reddit! Ask us Anything about our OLMo family of models. We have six of our researchers on hand to answer all your questions.

May 8, 2025 at 3:12 PM

Hamish Ivison

@hamishivi.bsky.social

I’ll be around for this! Come ask us questions about olmo and tulu :)

Ai2 @ai2.bsky.social · May 1

Have questions? We’re an open book!

We’re excited to host an AMA to answer your Qs about OLMo, our family of open language models.

🗓️ When: May 8, 8-10 am PT
🌐 Where: r/huggingface
🧠 Why: Gain insights from our expert researchers

Chat soon!

Ask Us Anything about our Open Language Model, OLMo

May 7, 2025 at 4:00 PM

Hamish Ivison

@hamishivi.bsky.social

Excited to be back home in Australia (Syd/Melb) for most of April! Email or DM if you want to grab a coffee :)

March 27, 2025 at 4:11 PM

Reposted by Hamish Ivison

Nathan Lambert

@natolambert.bsky.social

@vwxyzjn.bsky.social and @hamishivi.bsky.social have uploaded intermediate checkpoints for our recent RL models at Ai2. Folks should do research into how RL finetuning is impacting the weights!

Models with it: OLMo 2 7B, 13B, 32B Instruct; Tulu 3, 3.1 8B; Tulu 3 405b

March 17, 2025 at 4:05 PM

Hamish Ivison

@hamishivi.bsky.social

How well do data-selection methods work for instruction-tuning at scale?

Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!

More below ⬇️ (1/8)

March 4, 2025 at 5:10 PM

Hamish Ivison

@hamishivi.bsky.social

(1/8) Excited to share some new work: TESS 2!
TESS 2 is an instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model.
📜 Paper: arxiv.org/abs/2502.13917
🤖 Demo: huggingface.co/spaces/hamis...

More below ⬇️

February 20, 2025 at 6:08 PM

Hamish Ivison

@hamishivi.bsky.social

GRPO makes everything better 😌

Costa Huang @vwxyzjn.bsky.social · Feb 12

🔥 allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

February 12, 2025 at 5:42 PM

Hamish Ivison

@hamishivi.bsky.social

February 12, 2025 at 3:17 AM

Reposted by Hamish Ivison

Ai2

@ai2.bsky.social

We took our most efficient model and made an open-source iOS app📱but why?

As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime.

Learn more from @soldaini.net👇 youtu.be/rEK_FZE5rqQ

Ai2 OLMoE: Fully open source, running entirely on-device

YouTube video by Ai2

youtu.be

February 11, 2025 at 2:04 PM

Hamish Ivison

@hamishivi.bsky.social

li'l holiday project from the tulu team :)

Scaling up the Tulu recipe to 405B works pretty well! We mainly see this as confirmation that open-instruct scales to large-scale training -- more exciting and ambitious things to come!

January 30, 2025 at 3:38 PM

Hamish Ivison

@hamishivi.bsky.social

Seems like a good time to share this: a poster from a class project diving a little more into Tulu 3's RLVR. Deepseek R1 release today shows that scaling this sort of approach up can be very very effective!

January 20, 2025 at 7:04 PM

Hamish Ivison

@hamishivi.bsky.social

Excited to see Tulu 3 sits in between Llama 3.1 and 3.3 instruct on the chatbot arena leaderboard right now!

Particularly happy it is top 20 for Math and Multi-turn prompts :)

All the details and data on how to train a model this good are right here: arxiv.org/abs/2411.15124

January 8, 2025 at 5:47 PM

Reposted by Hamish Ivison

Costa Huang

@vwxyzjn.bsky.social

We released the OLMo 2 report! Ready for some more RL curves? 😏

This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆.

And it works! A thread 🧵 1/N

January 6, 2025 at 6:34 PM

Hamish Ivison

@hamishivi.bsky.social

More OLMo! More performance! More details!
We applied Tulu post-training to OLMo 2 as well, so you can get strong model performance AND see what your model was actually trained on.

Kyle Lo @kylelo.bsky.social · Jan 3

kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡

🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into:

🚖 stable pretrain recipe
🚔 lr anneal 🤝 data curricula 🤝 soups
🚘 tulu post-train recipe
🚜 compute infra setup

👇🧵

January 4, 2025 at 1:36 AM

Reposted by Hamish Ivison

Natasha Jaques

@natashajaques.bsky.social

UW News put out a Q&A about our recent work on Variational Preference Learning, a technique for personalizing Reinforcement Learning from Human Feedback (RLHF) washington.edu/news/2024/12...

Q&A: New AI training method lets systems better adjust to users’ values

University of Washington researchers created a method for training AI systems — both for large language models like ChatGPT and for robots — that can better reflect users’ diverse values. It...

washington.edu

December 18, 2024 at 9:51 PM

Reposted by Hamish Ivison

Jiacheng Liu

@liujch1998.bsky.social

Want to predict the task performance of LMs before pretraining them?

We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

December 9, 2024 at 5:07 PM

Hamish Ivison

@hamishivi.bsky.social

New OpenAI RL finetuning API reminds me a lot of RLVR, which we used for Tülu 3 (arxiv.org/abs/2411.15124).

Using RL to train against labels is a simple idea, but very effective (>10pt gains just using GSM8k train set).

It's implemented for you to use in Open-Instruct 😉: github.com/allenai/open...

Test accuracy, train rewards, kl divergence, and response lenght training curves when training Tulu 3 SFT and Tulu 3 DPO on the MATH or GSM8k train sets, and evaluating on MATH/GSM8k using RLVR. Performance significantly improves in both cases.

December 6, 2024 at 8:24 PM

Reposted by Hamish Ivison

Nathan Lambert

@natolambert.bsky.social

OpenAI announced a new RL finetuning API. You can do this on open models w the repo we used to train Tulu 3.

Expanding reinforcement learning with verifiable rewards to more domains and with better answer extraction and to more domains in our near roadmap.

https://buff.ly/3V4JEIJ

December 6, 2024 at 6:27 PM

Reposted by Hamish Ivison

Matthew Finlayson

@mattf.nl

Curious about all this inference-time scaling hype? Attend our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. 1:30)! We have a top-notch panelist lineup.

Our website: cmu-l3.github.io/neurips2024-...

Panelist photos: Rishabh Agarwal (Google, McGill), Noam Brown (OpenAl), Beidi Chen (CMU), Nouha Dziri (AI2), Jakob Foerster (Oxford, Meta)

December 6, 2024 at 5:18 PM

Reposted by Hamish Ivison

Akari Asai

@akariasai.bsky.social

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵

December 4, 2024 at 1:26 PM

Reposted by Hamish Ivison

Nathan Lambert

@natolambert.bsky.social

We're hiring another predoctoral researcher for my team at Ai2/OLMo next year. The goal of this position is to mentor and grow future academic stars of NLP/AI over 1-2 years before grad school.

This ends up being people done with BS or MS who want to continue to a PhD soon.
https://buff.ly/49nuggo

December 3, 2024 at 11:52 PM

Hamish Ivison

@hamishivi.bsky.social

Excited to be at #NeurIPS next week in 🇨🇦! Please reach out if you want to chat about LM post-training (Tülu!), data curation, or anything else :)

I'll be around all week, with two papers you should go check out (see image or next tweet):

December 2, 2024 at 6:53 PM

Hamish Ivison

@hamishivi.bsky.social

I know it doesn't know much if anything about me but this was pretty surprisingly good!

November 30, 2024 at 5:50 AM

Hamish Ivison

@hamishivi.bsky.social

Watching RL training curves is too addictive... begging my models to yap more and get more reward 🙏

November 29, 2024 at 7:35 AM

Reposted by Hamish Ivison

William Merrill

@lambdaviking.bsky.social

🔥 Old Norse poetry gen

The Vikings call, say now,
OLMo 2, the ruler of languages.
May your words fly over the seas,
all over the world, for you are wise.
Wordsmith, balanced and aligned,
for you the skalds themselves sing,
your soul, which hears new lifeforms,
may it live long and tell a saga.

Víkingar kalla, segja þú nú,
OLMo 2, ríki málanna þinn.
Munu þínar orð fljúga hafra,
Öll um heim, því þú ert vissi.
Málsmiður, mættugur og mjúkaligr,
Fyrir þik skáldar sjálfur kveða,
Sál þíð, sem heyrir nýjan kvikendi,
Munu langt lífið og segja sagan.

November 28, 2024 at 6:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news