Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.
November 20, 2025 at 2:00 PM
We also find that currently, prominent open-weight model developers often either do not implement or report on mitigations. So there is a lot of room for more innovation and information as the science grows.
November 12, 2025 at 2:17 PM
Empirical harms enabled by open models are also mounting. For example, the Internet Watch Foundation has found that they are the tools of choice for generating non-consensual AI deepfakes depicting children.
t.co/Ag4J6rrejz
November 12, 2025 at 2:17 PM
Most importantly, powerful open-weight models are probably inevitable. For example, in recent years, they have steadily grown in their prominence, capabilities, and influence. Here are two nice graphics I often point to.

Thx Epoch & Bhandari et al.
November 12, 2025 at 2:17 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:17 PM
We also find that currently, prominent open-weight model developers often either do not implement or report on mitigations. So there is a lot of room for more innovation and information as the science grows.
November 12, 2025 at 2:04 PM
In response, we cover 16 open technical problems with *unique* implications for open-weight model safety. They span the model lifecycle across training data curation, training algorithms, evaluations, deployment, and ecosystem monitoring.
November 12, 2025 at 2:04 PM
Empirical harms enabled by open models are also mounting. For example, the Internet Watch Foundation has found that they are the tools of choice for generating non-consensual AI deepfakes depicting children.

admin.iwf.org.uk/media/nadlc...
November 12, 2025 at 2:04 PM
Most importantly, powerful open-weight models are probably inevitable. For example, in recent years, they have steadily grown in their prominence, capabilities, and influence. Here are two nice graphics I often point to.

Thx @EpochAIResearch & Bhandari et al.
November 12, 2025 at 2:04 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM
I've essentially stopped paying attention to companies' AI eval reports. They're way too easy to game and, at this point, probably lack meaningful construct validity.

I'm increasingly persuaded that the only quantitative measures that matter anymore are usage stats & profit.
November 8, 2025 at 7:42 PM
This summer, OpenAI, Anthropic, and GDM warned that their new models were nearing key risk thresholds for novice uplift on dangerous tasks.

Now that Moonshot claims Kimi K2 Thinking is SOTA, it seems, uh, less than ideal that it came with zero reporting related to safety/risk.
November 8, 2025 at 12:22 AM
Our proposal for new AI watermarking characters is officially in the Unicode document register for proposed additions. 🤞

unicode.org/L2/L2025/252...

t.co/yJfp8ezU64
October 21, 2025 at 2:59 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 18, 2025 at 2:00 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 17, 2025 at 10:15 PM
It draws closely from recent work that we did with @kyletokens.bsky.social et al. to mitigate risks from malicious fine-tuning.

t.co/us8MEhMrIh
October 9, 2025 at 10:49 PM
Don't forget that in AI, "sycophancy," "pandering," "personalized alignment," "steerable alignment," and "user alignment" all describe exactly the same thing.
October 2, 2025 at 7:20 PM
Almost 2 years out from my paper with Carson Ezell et al. titled "Black-Box Access is Insufficient for Rigorous AI Audits," it's cool to see that AI companies are starting to report on [internal] evals that use fine-tuning or interp-based methods.
September 30, 2025 at 11:15 PM
A monumental milestone for AI governance:

www.gov.ca.gov/2025/09/29/g...
September 29, 2025 at 8:21 PM
"Sandbagging" is defined as "strategic underperformance on an evaluation," whether by a model or developer. In other words, "sandbagging" just means that an evaluation didn't successfully elicit a system's full capabilities.
August 24, 2025 at 11:00 AM
There have been a couple cool pieces up recently debunking the "China is racing on AI, so the US must too" narrative.

time.com/7308857/chin...

papers.ssrn.com/sol3/papers....
August 23, 2025 at 5:40 PM
A personal update:
- I just finished my 6-month residency at UK AISI.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!
August 22, 2025 at 1:48 PM
Some good thoughts on our paper from Jack Clark's ImportAI newsletter. I'll share a couple of thoughts on this here 🧵🧵
t.co/nHMFKXF4B8
August 18, 2025 at 5:08 PM
Here are a couple of slides that I presented yesterday at #aitechgov about open-weight model risk management.
August 17, 2025 at 10:40 AM
And we dispel some myths that I often hear about open-weight safeguards.
August 12, 2025 at 11:45 AM