Cas (Stephen Casper)
banner
scasper.bsky.social
Cas (Stephen Casper)
@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
Pinned
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.
November 20, 2025 at 2:00 PM
Based on what I've seen lately, it sounds like rebuttals for @iclr_conf are a mess.

But in case it makes your life easier, feel free to copy or adapt my rebuttal template linked here.

docs.google.com/document/d/1...
rebuttal_template
# Thanks + response We are thankful for your time and help, especially related to [thing(s) they discussed]. We were glad to hear that you found [something nice they said]. ## 1. [Issue title] > [...
docs.google.com
November 17, 2025 at 7:54 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:17 PM
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM
I've essentially stopped paying attention to companies' AI eval reports. They're way too easy to game and, at this point, probably lack meaningful construct validity.

I'm increasingly persuaded that the only quantitative measures that matter anymore are usage stats & profit.
November 8, 2025 at 7:42 PM
This summer, OpenAI, Anthropic, and GDM warned that their new models were nearing key risk thresholds for novice uplift on dangerous tasks.

Now that Moonshot claims Kimi K2 Thinking is SOTA, it seems, uh, less than ideal that it came with zero reporting related to safety/risk.
November 8, 2025 at 12:22 AM
Most frontier Western/US AI models are proprietary. But most frontier Eastern/Chinese models have openly available weights. Why?

Is it because more Chinese companies are "fast followers" who find their niche by making open models?

Is it cultural? Do Eastern/Chinese cultures value open tech more?
November 3, 2025 at 10:41 PM
Reposted by Cas (Stephen Casper)
How might the world look after the development of AGI, and what should we do about it now? Help us think about this at our workshop on Post-AGI Economics, Culture and Governance!

We’ll host speakers from political theory, economics, mechanism design, history, and hierarchical agency.

post-agi.org
October 28, 2025 at 10:06 PM
Our proposal for new AI watermarking characters is officially in the Unicode document register for proposed additions. 🤞

unicode.org/L2/L2025/252...

t.co/yJfp8ezU64
October 21, 2025 at 2:59 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 18, 2025 at 2:00 PM
Reposted by Cas (Stephen Casper)
Technologies like synthetic data, evaluations, and red-teaming are often framed as enhancing AI privacy and safety. But what if their effects lie elsewhere?

In a new paper with @realbrianjudge.bsky.social at #EAAMO25, we pull back the curtain on AI safety's toolkit. (1/n)

arxiv.org/pdf/2509.22872
arxiv.org
October 17, 2025 at 9:09 PM
🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...
October 17, 2025 at 10:15 PM
In our Nature article, @yaringal.bsky.social and I outline how building the technical toolkit for open-weight AI model safety will be key to both accessing the benefits and mitigating the risks of powerful open models.

www.nature.com/articles/d41...
Customizable AI systems that anyone can adapt bring big opportunities — and even bigger risks
Open and adaptable artificial-intelligence models are crucial for scientific progress, but robust safeguards against their misuse are still nascent.
www.nature.com
October 9, 2025 at 10:49 PM
Don't forget that in AI, "sycophancy," "pandering," "personalized alignment," "steerable alignment," and "user alignment" all describe exactly the same thing.
October 2, 2025 at 7:20 PM
Almost 2 years out from my paper with Carson Ezell et al. titled "Black-Box Access is Insufficient for Rigorous AI Audits," it's cool to see that AI companies are starting to report on [internal] evals that use fine-tuning or interp-based methods.
September 30, 2025 at 11:15 PM
A monumental milestone for AI governance:

www.gov.ca.gov/2025/09/29/g...
September 29, 2025 at 8:21 PM
Reposted by Cas (Stephen Casper)
LLM agents are optimized for thumbs-up instant gratification. RLHF -> sycophancy

We propose human agency as a new alignment target in HumanAgencyBench, made possible by AI simulation/evals. We find e.g., Claude most supports agency but also most tries to steer user values 👇 arxiv.org/abs/2509.08494
September 15, 2025 at 5:11 PM
I'll be leading a MATS stream this winter with a focus on technical AI governance. You can apply here by October 2!

www.matsprogram.org/apply
Apply for Winter 2026 — ML Alignment & Theory Scholars
www.matsprogram.org
September 8, 2025 at 12:35 AM
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
September 4, 2025 at 3:39 AM
Here is a riddle I came up with for a draft to illustrate the differences between normal chat models and reasoning models. Can you figure it out?

Dark as night in the morning light.
I live high until I am ground.
I sit dry until I am drowned.
What am I?
August 30, 2025 at 4:00 PM
Research on AI "sandbagging" is getting more popular recently. In this 🧵, I'll give some reasons that I think it's not a useful research paradigm.

TL;DR, I think it's a confusing reframing of fairly well studied and previously solved problems.
August 24, 2025 at 11:00 AM
There have been a couple cool pieces up recently debunking the "China is racing on AI, so the US must too" narrative.

time.com/7308857/chin...

papers.ssrn.com/sol3/papers....
August 23, 2025 at 5:40 PM
A personal update:
- I just finished my 6-month residency at UK AISI.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!
August 22, 2025 at 1:48 PM
Some good thoughts on our paper from Jack Clark's ImportAI newsletter. I'll share a couple of thoughts on this here 🧵🧵
t.co/nHMFKXF4B8
August 18, 2025 at 5:08 PM
Here are a couple of slides that I presented yesterday at #aitechgov about open-weight model risk management.
August 17, 2025 at 10:40 AM