Lightnews — Scholar-powered news

Reposted by Mechanical Dirk

Nathan Lambert

@natolambert.bsky.social

I'm excited to announce my RLHF Book is now in pre-order for the @manning.com Early Access Program (MEAP), and for this milestone it's 50% off.

Excited to land in print in early 2026! Lots of improvements coming soon.

Thanks for the support!
hubs.la/Q03Tc37Q0

November 14, 2025 at 9:02 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Incredible work by Apple's UX department, enabling three different corner radii at the same time 🙈

November 12, 2025 at 11:34 PM

Reposted by Mechanical Dirk

Daniel Buschek

@dbuschek.bsky.social

While reviewing for #CHI2026, I've noticed four new writing issues in #HCI papers, likely due to an increased use of #LLMs / #AI. I describe them here - and how to fix them: dbuschek.medium.com/when-llms-wr...

When LLMs Write Our Papers

Four writing issues I notice as a reviewer — and how to fix them

dbuschek.medium.com

October 23, 2025 at 2:15 PM

Reposted by Mechanical Dirk

Ai2

@ai2.bsky.social

We’re releasing early pre-training checkpoints for OLMo-2-1B to help study how LLM capabilities emerge. They’re fine-grained snapshots intended for analysis, reproduction, and comparison. 🧵

August 18, 2025 at 7:02 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Mein Dreijähriger: "Ich will den Lerns Geschichte Podcast hören!"

Was ist denn "Lerns Geschichte"?

Zwei Minuten später im Radio: "Lernen's a bissel @geschichte.fm, dann ..." 😲

August 18, 2025 at 2:33 AM

Mechanical Dirk

@mechanicaldirk.bsky.social

This project is a perfect model of an OLMo contribution. Well scoped, practical, sound theoretical underpinnings, and @lambdaviking.bsky.social
submitted the paper 24h before the deadline 😍.

It's integrated into the OLMo trainer here: github.com/allenai/OLMo...

Ai2 @ai2.bsky.social · Jun 3

As we’ve been working towards training a new version of OLMo, we wanted to improve our methods for measuring the Critical Batch Size (CBS) of a training run, to unlock greater efficiency. but we found gaps between the methods in the literature and our practical needs for training OLMo. 🧵

June 3, 2025 at 5:06 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Finally, OLMo 1B. This is the most commonly requested OLMo feature l, and it's finally here.

Ai2 @ai2.bsky.social · May 1

We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.

A bar graph comparing average performance (10 Tasks) across OLMo 2 1B, SmolLM2 1.7B, Gemma 3 1B, Llama 3.2 1B, and Qwen 2.5 1.5B. The highest performance is 42.7, achieved by OLMo 2 1B.

May 1, 2025 at 10:32 PM

Reposted by Mechanical Dirk

Jacob Morrison

@jacobcares.bsky.social

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

April 23, 2025 at 3:22 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Came across arxiv.org/pdf/2504.05058 today. What a cool example of work you can do when LLM training data is open!

arxiv.org

April 18, 2025 at 5:46 PM

Reposted by Mechanical Dirk

Ai2

@ai2.bsky.social

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

Plot shows the relationship between compute used to predict a ranking of datasets and how accurately that ranking reflects performance at the target (1B) scale of models pretrained from scratch on those datasets.

April 15, 2025 at 1:01 PM

Reposted by Mechanical Dirk

Jiacheng Liu

@liujch1998.bsky.social

Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data.

We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨

Ai2 @ai2.bsky.social · Apr 9

For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?

Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦

April 9, 2025 at 1:37 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

The fact that my Bsky feed is all tariffs and none Llama 4 means the platform is pretty much cooked for research purposes.

April 7, 2025 at 4:15 PM

Reposted by Mechanical Dirk

Alisa Liu

@alisawuffles.bsky.social

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.

When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

Segmentation of the sentence "By the way, I am a fan of the Milky Way" under BPE and SuperBPE.

March 21, 2025 at 4:48 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Error bars! @hails.computer will be so proud!

Kyle Lo @kylelo.bsky.social · Mar 13

we released olmo 32b today! ☺️

🐟our largest & best fully open model to-date
🐠right up there w similar size weights-only models from big companies on popular benchmarks
🐡but we used way less compute & all our data, ckpts, code, recipe are free & open

made a nice plot of our post-trained results!✌️

March 13, 2025 at 10:32 PM

Reposted by Mechanical Dirk

Ai2

@ai2.bsky.social

Introducing olmOCR, our open-source tool to extract clean plain text from PDFs!

Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!

February 25, 2025 at 5:04 PM

Reposted by Mechanical Dirk

Ai2

@ai2.bsky.social

We took our most efficient model and made an open-source iOS app📱but why?

As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime.

Learn more from @soldaini.net👇 youtu.be/rEK_FZE5rqQ

Ai2 OLMoE: Fully open source, running entirely on-device

YouTube video by Ai2

youtu.be

February 11, 2025 at 2:04 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

14.8T tokens in 2.8M hours is about 1500 tokens per second. That's a very good number for 37B active parameters, but by no means unbelievable.

January 26, 2025 at 12:57 AM

Reposted by Mechanical Dirk

Nathan Lambert

@natolambert.bsky.social

Behind the scenes with what its like to build language models and pursue (hopefully) cutting edge AI research

Interviewing OLMo 2 leads: Open secrets of training language models
What we have learned and are going to do next.
YouTube: https://buff.ly/40IlSFF
Podcast / notes:

Interviewing OLMo 2 leads: Open secrets of training language models

What we have learned and are going to do next.

buff.ly

January 22, 2025 at 3:52 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

In November, every post here was about NLP. Now it's all about TikTok. We're doing the Twitter speed run.

January 19, 2025 at 8:15 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

A few days ago, we did finally release the OLMo 2 tech report: arxiv.org/pdf/2501.00656. There is a lot of good stuff in there, but the stability work we did over the summer makes me particularly proud.

arxiv.org

January 6, 2025 at 8:03 PM

Reposted by Mechanical Dirk

Nathan Lambert

@natolambert.bsky.social

Everyone wants open-source language models but no one wants to lift these heavy ass weights.

We just released our paper "2 OLMo 2 Furious"
Can't stop us in 2025. Links below.

January 3, 2025 at 7:13 PM

Mechanical Dirk

@mechanicaldirk.bsky.social

Some people seem to believe that LLMs give inoffensive, milquetoast answers because of overblown safety concerns ("Because of the woke!"). But that's not it.

LLMs give bland answers because they produce the average of what anyone would have said on the Internet.

December 25, 2024 at 4:23 AM

Mechanical Dirk

@mechanicaldirk.bsky.social

It seems to me the second most common language spoken in the halls of NeurIPS is German.

December 14, 2024 at 1:08 AM

Reposted by Mechanical Dirk

Nathan Lambert

@natolambert.bsky.social

Made a list of resources for open source language models with @soldaini.net ahead of the tutorial tomorrow at 930 AM.
github.com/allenai/awes...

GitHub - allenai/awesome-open-source-lms: Friends of OLMo and their links.

Friends of OLMo and their links. Contribute to allenai/awesome-open-source-lms development by creating an account on GitHub.

github.com

December 10, 2024 at 1:25 AM

Reposted by Mechanical Dirk

Jiacheng Liu

@liujch1998.bsky.social

Want to predict the task performance of LMs before pretraining them?

We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

December 9, 2024 at 5:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news