Lightnews — Scholar-powered news

Cas (Stephen Casper)

@scasper.bsky.social

The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.

November 20, 2025 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

We also find that currently, prominent open-weight model developers often either do not implement or report on mitigations. So there is a lot of room for more innovation and information as the science grows.

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

Empirical harms enabled by open models are also mounting. For example, the Internet Watch Foundation has found that they are the tools of choice for generating non-consensual AI deepfakes depicting children.
t.co/Ag4J6rrejz

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

Most importantly, powerful open-weight models are probably inevitable. For example, in recent years, they have steadily grown in their prominence, capabilities, and influence. Here are two nice graphics I often point to.

Thx Epoch & Bhandari et al.

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

We also find that currently, prominent open-weight model developers often either do not implement or report on mitigations. So there is a lot of room for more innovation and information as the science grows.

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

In response, we cover 16 open technical problems with *unique* implications for open-weight model safety. They span the model lifecycle across training data curation, training algorithms, evaluations, deployment, and ecosystem monitoring.

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

Empirical harms enabled by open models are also mounting. For example, the Internet Watch Foundation has found that they are the tools of choice for generating non-consensual AI deepfakes depicting children.

admin.iwf.org.uk/media/nadlc...

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

Most importantly, powerful open-weight models are probably inevitable. For example, in recent years, they have steadily grown in their prominence, capabilities, and influence. Here are two nice graphics I often point to.

Thx @EpochAIResearch & Bhandari et al.

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

I've essentially stopped paying attention to companies' AI eval reports. They're way too easy to game and, at this point, probably lack meaningful construct validity.

I'm increasingly persuaded that the only quantitative measures that matter anymore are usage stats & profit.

November 8, 2025 at 7:42 PM

Cas (Stephen Casper)

@scasper.bsky.social

This summer, OpenAI, Anthropic, and GDM warned that their new models were nearing key risk thresholds for novice uplift on dangerous tasks.

Now that Moonshot claims Kimi K2 Thinking is SOTA, it seems, uh, less than ideal that it came with zero reporting related to safety/risk.

November 8, 2025 at 12:22 AM

Cas (Stephen Casper)

@scasper.bsky.social

Our proposal for new AI watermarking characters is officially in the Unicode document register for proposed additions. 🤞

unicode.org/L2/L2025/252...

t.co/yJfp8ezU64

October 21, 2025 at 2:59 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...

October 18, 2025 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...

October 17, 2025 at 10:15 PM

Cas (Stephen Casper)

@scasper.bsky.social

It draws closely from recent work that we did with @kyletokens.bsky.social et al. to mitigate risks from malicious fine-tuning.

t.co/us8MEhMrIh

October 9, 2025 at 10:49 PM

Cas (Stephen Casper)

@scasper.bsky.social

Don't forget that in AI, "sycophancy," "pandering," "personalized alignment," "steerable alignment," and "user alignment" all describe exactly the same thing.

October 2, 2025 at 7:20 PM

Cas (Stephen Casper)

@scasper.bsky.social

Almost 2 years out from my paper with Carson Ezell et al. titled "Black-Box Access is Insufficient for Rigorous AI Audits," it's cool to see that AI companies are starting to report on [internal] evals that use fine-tuning or interp-based methods.

September 30, 2025 at 11:15 PM

Cas (Stephen Casper)

@scasper.bsky.social

A monumental milestone for AI governance:

www.gov.ca.gov/2025/09/29/g...

September 29, 2025 at 8:21 PM

Cas (Stephen Casper)

@scasper.bsky.social

"Sandbagging" is defined as "strategic underperformance on an evaluation," whether by a model or developer. In other words, "sandbagging" just means that an evaluation didn't successfully elicit a system's full capabilities.

August 24, 2025 at 11:00 AM

Cas (Stephen Casper)

@scasper.bsky.social

There have been a couple cool pieces up recently debunking the "China is racing on AI, so the US must too" narrative.

time.com/7308857/chin...

papers.ssrn.com/sol3/papers....

August 23, 2025 at 5:40 PM

Cas (Stephen Casper)

@scasper.bsky.social

A personal update:
- I just finished my 6-month residency at UK AISI.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!