Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
Pinned
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
On the @csis.org podcast with Greg Allen and Stephen Clare, we talk about the International AI Safety Report, technical safeguards, and what engineers can (and can't!) do to save us from AI risks.

www.youtube.com/watch?v=2VlX...
Inside The Second Int'l AI Safety Report with Stephen Clare & Stephen Casper | The AI Policy Podcast
YouTube video by Center for Strategic & International Studies
www.youtube.com
February 10, 2026 at 3:18 PM
Personally, I wish Anthropic would go a step further and also mention at the end of their ads that they aren't currently embroiled in multiple lawsuits over the deaths of children.

🧵🧵🧵 What I like about the new Claude Ads

www.theverge.com/ai-artificia...
Anthropic says ‘Claude will remain ad-free,’ unlike ChatGPT
‘Ads are coming to AI. But not to Claude.’
www.theverge.com
February 5, 2026 at 5:22 AM
The 2026 International AI Safety Report has 221 pages of cutting-edge research from a team of 36 writers and over 100 contributors referencing 1451 citations...

...And also this cheeky riddle that I wrote on page 23 (original content).

internationalaisafetyreport.org/publication/...
February 3, 2026 at 2:48 PM
The IAISR is one of a kind. Every paragraph has undergone many rounds of scrutiny from dozens of experts and stakeholders over the course of months.

I'm thankful for the rest of the writing team. If you're interested, my work this year was mostly in sections 1.1 and 3.3.
Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧵

(1/19)
February 3, 2026 at 2:46 PM
In a few years, I think it would be cool to do a project and write a paper on how prominent AI systems from US companies respond to political questions before vs. after the next presidential transition.
February 1, 2026 at 3:00 PM
Turns out, there are a TON of image/video AI models hosted on CivitAI with dogwhistles for NCII and/or CSAM in their names. 👀

Max Kamachee and I just updated our "Video Deepfake Abuse" paper with this new fig:

🔗 papers.ssrn.com/sol3/papers....
January 30, 2026 at 9:43 PM
Reposted by Cas (Stephen Casper)
Open-weight model safety is AI safety in hard mode. Anyone can modify every parameter. @scasper.bsky.social: Open-weight models are only months behind closed models, which are reaching dangerous capability thresholds. 2026 will be critical.👇
January 29, 2026 at 4:32 PM
This is not a new report (it's from last summer). But it's now finally available on SSRN, more accessibly than before. Great working with Claire Short on this.

papers.ssrn.com/sol3/papers....
January 27, 2026 at 12:21 PM
I made a fully-open, living document with notes and concrete project ideas about tamper-resistance and open-weight model safety research.

You, yes you 🫵, should feel free to look, comment, or message me about it.

docs.google.com/document/d/1...
https://docs.google.com/document/d/10XkZpUabt4fEK8BUtd8Jz26-M8ARQ6c5iJCbefaUtQI/edit?usp=sharing
t.co
January 23, 2026 at 6:28 PM
Here are some miscellaneous title ideas for papers that I'm not currently working on, but sometimes daydream about. Let me know if you are thinking about anything related.
January 22, 2026 at 4:55 PM
Research on tamper-resistant machine unlearning is funny.

The SOTA, according to papers proposing techniques, is resistance to tens of thousands of adversarial fine-tuning steps.

But according to papers that do second-party red-teaming, the SOTA is just a couple hundred steps.
January 22, 2026 at 2:00 PM
To people working on adversarial vulnerabilities for safeguards against AI deepfake porn, I'm glad you're doing what you're doing. But don't forget that mitigations matter, & we're not always up against sophisticated attacks. Half the time, the perpetrators are literal teenagers.
January 13, 2026 at 2:02 PM
🧵 Non-consensual AI deepfakes are out of control. But the 1st Amendment will likely prevent the US from directly prohibiting models/apps that make producing personalized NCII trivial.

In this thread, I'll explain the problem and a 1st Amendment-compatible solution (I think).
January 12, 2026 at 7:30 PM
One example of how easily harmful derivatives of open-weight models proliferate can be found on Hugging Face. Search "uncensored" or "abliterated" in the model search bar. You'll find some 7k models fine-tuned specifically to remove safeguards.
January 10, 2026 at 2:00 PM
🚨 New paper from an awesome group led by Noam Kolt and
@nickacaputo
.

We hear a lot about what important concepts and methods from AI research that lawyers need to understand. But it's really a two-way street...

🧵🧵🧵
January 8, 2026 at 10:40 PM
Reposted by Cas (Stephen Casper)
Join us for our first CS seminar of the year, featuring @scasper.bsky.social! Learn more about his upcoming talk here: www.cs.jhu.edu/event/cs-sem... and check out other upcoming seminars here: www.cs.jhu.edu/department-s...
January 8, 2026 at 2:24 PM
🧵Thanks in part to recent attention on Grok's widespread undressing, emerging consciousness around AI nudification apps is sparking discussions on making AI undressing apps illegal. Minnesota and the UK are currently actively considering laws that would do this.
January 7, 2026 at 12:00 AM
Given the current Grok deepfake snafu on Twitter this week, I'll leave this here. We put it online a month ago.
t.co/3qWCNzoZrh
January 5, 2026 at 6:13 PM
I think these are my 4 favorite papers of 2025.
December 30, 2025 at 10:57 PM
With, e.g., OpenAI planning over 1T in commitments in the next few years, it increasingly seems that one of two bad things will inevitably happen: a bubble bursting or the concentration of obscene levels of power in tech. I don't see how this ends well.

techcrunch.com/2025/11/06/...
Sam Altman says OpenAI has $20B ARR and about $1.4 trillion in data center commitments | TechCrunch
Altman named a long list of upcoming business he thinks will generate significant revenue.
techcrunch.com
December 19, 2025 at 3:56 PM
Taking AI safety seriously means taking open-weight model safety seriously. Unfortunately, the AI safety field has historically mostly worked with closed models in mind. Here, I explain how we can meet new challenges from open models.

www.youtube.com/watch?v=VWk3...
Stephen Casper - Powerful Open-Weight AI Models: Wonderful, Terrible & Inevitable [Alignment Worksho
YouTube video by FAR․AI
www.youtube.com
December 18, 2025 at 5:04 PM
🧵🧵🧵 In the past few months, I have looked at hundreds, maybe thousands, of AI porn images/videos (for science).

Here's what I learned from our investigation of over 50 platforms, sites, apps, Discords, etc., while writing this paper.

papers.ssrn.com/sol3/papers...
December 15, 2025 at 2:59 PM
🧵 I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...
December 12, 2025 at 5:00 PM
Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....
December 11, 2025 at 7:05 PM
UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers
December 11, 2025 at 2:00 PM