Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
Pinned
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
I think these are my 4 favorite papers of 2025.
December 30, 2025 at 10:57 PM
With, e.g., OpenAI planning over 1T in commitments in the next few years, it increasingly seems that one of two bad things will inevitably happen: a bubble bursting or the concentration of obscene levels of power in tech. I don't see how this ends well.

techcrunch.com/2025/11/06/...
Sam Altman says OpenAI has $20B ARR and about $1.4 trillion in data center commitments | TechCrunch
Altman named a long list of upcoming business he thinks will generate significant revenue.
techcrunch.com
December 19, 2025 at 3:56 PM
Taking AI safety seriously means taking open-weight model safety seriously. Unfortunately, the AI safety field has historically mostly worked with closed models in mind. Here, I explain how we can meet new challenges from open models.

www.youtube.com/watch?v=VWk3...
Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho
YouTube video by FAR․AI
www.youtube.com
December 18, 2025 at 5:04 PM
🧵🧵🧵 In the past few months, I have looked at hundreds, maybe thousands, of AI porn images/videos (for science).

Here's what I learned from our investigation of over 50 platforms, sites, apps, Discords, etc., while writing this paper.

papers.ssrn.com/sol3/papers...
December 15, 2025 at 2:59 PM
🧵 I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...
December 12, 2025 at 5:00 PM
Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....
December 11, 2025 at 7:05 PM
UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers
December 11, 2025 at 2:00 PM
Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.
December 4, 2025 at 5:32 PM
Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.

Shamelessly copied from a slack message.
November 26, 2025 at 4:00 PM
🌵🐎🤠🏜️🐄
Here's a roundup of some key papers on data filtering & safety.

Tl;DR -- Filtering harmful training data seems to effectively make models resist attacks (incl. adv. fine-tuning), but only when the filtered content is 'hard to learn' from the non-filtered content

🧵
November 25, 2025 at 8:00 PM
Reposted by Cas (Stephen Casper)
I’m pleased to share the Second Key Update to the International AI Safety Report, which outlines how AI developers, researchers, and policymakers are approaching technical risk management for general-purpose AI systems.
(1/6)
November 25, 2025 at 12:06 PM
The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.
November 20, 2025 at 2:00 PM
Based on what I've seen lately, it sounds like rebuttals for @iclr_conf are a mess.

But in case it makes your life easier, feel free to copy or adapt my rebuttal template linked here.

docs.google.com/document/d/1...
rebuttal_template
# Thanks + response We are thankful for your time and help, especially related to [thing(s) they discussed]. We were glad to hear that you found [something nice they said]. ## 1. [Issue title] > [...
docs.google.com
November 17, 2025 at 7:54 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:17 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM
I've essentially stopped paying attention to companies' AI eval reports. They're way too easy to game and, at this point, probably lack meaningful construct validity.

I'm increasingly persuaded that the only quantitative measures that matter anymore are usage stats & profit.
November 8, 2025 at 7:42 PM
This summer, OpenAI, Anthropic, and GDM warned that their new models were nearing key risk thresholds for novice uplift on dangerous tasks.

Now that Moonshot claims Kimi K2 Thinking is SOTA, it seems, uh, less than ideal that it came with zero reporting related to safety/risk.
November 8, 2025 at 12:22 AM
Most frontier Western/US AI models are proprietary. But most frontier Eastern/Chinese models have openly available weights. Why?

Is it because more Chinese companies are "fast followers" who find their niche by making open models?

Is it cultural? Do Eastern/Chinese cultures value open tech more?
November 3, 2025 at 10:41 PM
Reposted by Cas (Stephen Casper)
How might the world look after the development of AGI, and what should we do about it now? Help us think about this at our workshop on Post-AGI Economics, Culture and Governance!

We’ll host speakers from political theory, economics, mechanism design, history, and hierarchical agency.

post-agi.org
October 28, 2025 at 10:06 PM
Our proposal for new AI watermarking characters is officially in the Unicode document register for proposed additions. 🤞

unicode.org/L2/L2025/252...

t.co/yJfp8ezU64
October 21, 2025 at 2:59 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 18, 2025 at 2:00 PM
Reposted by Cas (Stephen Casper)
Technologies like synthetic data, evaluations, and red-teaming are often framed as enhancing AI privacy and safety. But what if their effects lie elsewhere?

In a new paper with @realbrianjudge.bsky.social at #EAAMO25, we pull back the curtain on AI safety's toolkit. (1/n)

arxiv.org/pdf/2509.22872
arxiv.org
October 17, 2025 at 9:09 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 17, 2025 at 10:15 PM
In our Nature article, @yaringal.bsky.social and I outline how building the technical toolkit for open-weight AI model safety will be key to both accessing the benefits and mitigating the risks of powerful open models.

www.nature.com/articles/d41...
Customizable AI systems that anyone can adapt bring big opportunities — and even bigger risks
Open and adaptable artificial-intelligence models are crucial for scientific progress, but robust safeguards against their misuse are still nascent.
www.nature.com
October 9, 2025 at 10:49 PM
Don't forget that in AI, "sycophancy," "pandering," "personalized alignment," "steerable alignment," and "user alignment" all describe exactly the same thing.
October 2, 2025 at 7:20 PM