Lightnews — Scholar-powered news

Stella Biderman

@stellaathena.bsky.social

In the original Pile paper we talked about various conceptions of consent (though I don't stand by everything I wrote about this topic 5 years ago). None of this data has EIC, though I think that the ones marked "author" in the table are ones where authorial objection would be unreasonable.

October 20, 2025 at 6:55 AM

Stella Biderman

@stellaathena.bsky.social

Thanks for the signal boost! I think you'll appreciate the serious shade we threw at the end of our blog post: blog.eleuther.ai/deep-ignorance/

August 12, 2025 at 1:13 PM

Stella Biderman

@stellaathena.bsky.social

"But wait," the skeptic cries. "Surely this is infeasible for frontier models! Their datasets are far too large to expect companies to meaningfully understand or document!"

Actually our methodology is extremely cheap, mostly runs on CPU, and adds an overhead of less than 1%.

August 12, 2025 at 12:41 PM

Stella Biderman

@stellaathena.bsky.social

Preventing a model from going into an interaction with relevant knowledge isn't a panacea though: filtered models see minimal performance loss when it comes to reasoning about information provided in-context. The models are still smart, they just don't know the concepts innately.

August 12, 2025 at 12:41 PM

Stella Biderman

@stellaathena.bsky.social

Our results are competitive with circuit-breaking when it comes to out-of-the-box performance and substantially more robust to both adversarial and benign finetuning. These results lead us to believe that we are genuinely impairing the knowledge of LLMs, not just suppressing it

August 12, 2025 at 12:40 PM

Stella Biderman

@stellaathena.bsky.social

Our headline result is simple: data filtering can send WMDP-Bio scores to random chance without hurting general performance. And FT'ing these models to match the performance of the baseline model w/o FT'ing requires 300M tokens of finetuning, >10x more than any other approach.

August 12, 2025 at 12:40 PM

Stella Biderman

@stellaathena.bsky.social

Are you afraid of LLMs teaching people how to build bioweapons? Have you tried just... not teaching LLMs about bioweapons?

@eleutherai.bsky.social‬ and the UK AISI joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study

August 12, 2025 at 12:40 PM

Stella Biderman

@stellaathena.bsky.social

"It's unclear if this research matters because real users speak English and Chinese" has got to be up there for worst dismissive takes about how multilingual doesn't matter.

July 30, 2025 at 7:26 PM

Stella Biderman

@stellaathena.bsky.social

At #ICML2025 and a fan of EleutherAI? Come find me for laptop stickers!

July 16, 2025 at 7:27 AM

Stella Biderman

@stellaathena.bsky.social

Our deep-dive case studies show typical AI blind spots: models struggle with long-tail knowledge absent from web data and extremely long contexts; without fully spelled-out derivations, they misinterpret calculations and ignore domain-specific conventions, leading to student-like errors. 🔍

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

We also look at what happens when you run a model multiple times: across 8 trials models rarely discover the same errors and generally assign a near-zero confidence to their claims.

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

We benchmarked 10 top models, both closed and open, and the results are sobering – the best results are from o3 which has a precision of 6% and a recall of 21%. All other models score below 4% precision and 10% recall.

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

2️⃣ The dataset spans long documents (avg 12,000 tokens) with rich visuals (avg 18 figures) 📊. Challenging both long-context reasoning and image understanding capabilities of modern LLMs.

To make sure these errors are genuine, we only include those that have been acknowledged by the original authors!

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

SPOT consists of 83 papers and 91 author-validated errors across 10 STEM fields and 6 error types – equation/proof, figure duplication, data inconsistency, statistical reporting, reagent identity, and experiment setup.

We note the location, severe, and have short human description of errors.

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

People keep plugging AI "Co-Scientists," so what happens when you ask them to do an important task like finding errors in papers?

We built SPOT, a dataset of STEM manuscripts across 10 fields annotated with real errors to find out.

(tl;dr not even close to usable) #NLProc

arxiv.org/abs/2505.11855

May 23, 2025 at 4:22 PM

Stella Biderman

@stellaathena.bsky.social

I can't read the linked article, but is this representation is even vaguely accurate that is deeply humiliating for ATI. LLMs may had caught the public by surprise in 2023 but they've been among the sexiest things in AI research since at least 2020 (when I entered the field, can't speak to before)

April 23, 2025 at 8:59 PM

Stella Biderman

@stellaathena.bsky.social

@eleutherai.bsky.social succeeded in its original goal of ensuring independent research on LLMs would be possible. But the research landscape is still overwhelming dominated by corporate funding and corporate priorities and there's much more still to do to help non-profit research thrive.

A screenshot from the article which reads

“We were concerned about the fact that the leading research institute didn’t seem to value giving people access to it,” she said. “If you are concerned about the capabilities, the limitations, the biases, and the potential societal impacts” of A.I., then “I don’t want the people who are developing the model to control who can do what studies.”

February 13, 2025 at 7:56 PM

Stella Biderman

@stellaathena.bsky.social

In case you're curious how much of a hellscape X is. I opened it today to get greeted with a porn ad. The censored section is a 20 second video clip of a woman sucking a dick, with audio. It autoplays.

February 7, 2025 at 10:47 AM

Stella Biderman

@stellaathena.bsky.social

Obligatory "actually my lab invented test-time-compute" post. In "Stay on topic with Classifier-Free Guidance," we show that CFG enables a model to expend twice as much compute at inference time and match the performance of a model twice as large.

arxiv.org/abs/2306.17806

January 29, 2025 at 6:56 PM

Stella Biderman

@stellaathena.bsky.social

tbt

feat. @jessedodge.bsky.social @melaniemitchell.bsky.social

December 24, 2024 at 8:03 PM

Stella Biderman

@stellaathena.bsky.social

“GPT-4o, o1, o1-preview & o1-mini all demonstrate strong persuasive argumentation abilities, within the top ~80-90% percentile of humans"

It's remarkable how what OpenAI considers "dangerous" changes continuously to exclude the thing they're currently selling. There was a time this was a red line.

December 24, 2024 at 5:23 PM

Stella Biderman

@stellaathena.bsky.social

Is there any evidence that such a set-up works? I've seen a lot of claims that this is what MoE models do, but the empirical evidence (see, e.g., Mixtral) contradicts that.

It seems possible you could deliberately induce that, but again is there evidence that works well?

December 14, 2024 at 6:00 PM

Stella Biderman

@stellaathena.bsky.social

I don't understand the hype around the Byte Latent Transformer... it's really unclear to me how it's supposed to work and these tables seem to be pretty contradictory? In particular, Table 5 seems to refute the rest of the paper

December 13, 2024 at 5:50 PM

Stella Biderman

@stellaathena.bsky.social

Assuming the subsequent 119 pages of math are, a new entry to the list of GOATed abstracts dropped this weekend.

arxiv.org/abs/2411.19826

December 2, 2024 at 10:06 PM

Stella Biderman

@stellaathena.bsky.social

The GDM paper on Stratego doesn't cite any papers by their titles?!? If you go look at the bibliography there just aren't titles for the papers listed.

I sure how there aren't multiple papers that match "J. Doe et al. NeurIPS (2023)."

arxiv.org/abs/2206.15378

Why would anyone do this?

Screenshots of the bibliography of the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning"

December 1, 2024 at 5:33 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news