Lightnews — Scholar-powered news

Cas (Stephen Casper)

@scasper.bsky.social

The leaked executive order has me wondering if the term "regulatory capture" has any meaning anymore.

It appears that state AI bills -- many of which big tech has fought tooth and nail to prevent -- are categorically regulatory capture.

November 20, 2025 at 2:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

Based on what I've seen lately, it sounds like rebuttals for @iclr_conf are a mess.

But in case it makes your life easier, feel free to copy or adapt my rebuttal template linked here.

docs.google.com/document/d/1...

rebuttal_template

# Thanks + response We are thankful for your time and help, especially related to [thing(s) they discussed]. We were glad to hear that you found [something nice they said]. ## 1. [Issue title] > [...

docs.google.com

November 17, 2025 at 7:54 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:17 PM

Cas (Stephen Casper)

@scasper.bsky.social

🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵

November 12, 2025 at 2:04 PM

Cas (Stephen Casper)

@scasper.bsky.social

I've essentially stopped paying attention to companies' AI eval reports. They're way too easy to game and, at this point, probably lack meaningful construct validity.

I'm increasingly persuaded that the only quantitative measures that matter anymore are usage stats & profit.

November 8, 2025 at 7:42 PM

Cas (Stephen Casper)

@scasper.bsky.social

This summer, OpenAI, Anthropic, and GDM warned that their new models were nearing key risk thresholds for novice uplift on dangerous tasks.

Now that Moonshot claims Kimi K2 Thinking is SOTA, it seems, uh, less than ideal that it came with zero reporting related to safety/risk.

November 8, 2025 at 12:22 AM

Cas (Stephen Casper)

@scasper.bsky.social

Most frontier Western/US AI models are proprietary. But most frontier Eastern/Chinese models have openly available weights. Why?

Is it because more Chinese companies are "fast followers" who find their niche by making open models?

Is it cultural? Do Eastern/Chinese cultures value open tech more?

November 3, 2025 at 10:41 PM

Reposted by Cas (Stephen Casper)

David Duvenaud

@davidduvenaud.bsky.social

How might the world look after the development of AGI, and what should we do about it now? Help us think about this at our workshop on Post-AGI Economics, Culture and Governance!

We’ll host speakers from political theory, economics, mechanism design, history, and hierarchical agency.

post-agi.org

October 28, 2025 at 10:06 PM

Cas (Stephen Casper)

@scasper.bsky.social

Our proposal for new AI watermarking characters is officially in the Unicode document register for proposed additions. 🤞

unicode.org/L2/L2025/252...

t.co/yJfp8ezU64

October 21, 2025 at 2:59 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...

October 18, 2025 at 2:00 PM

Reposted by Cas (Stephen Casper)

Rui-Jie Yew

@r-jy.bsky.social

Technologies like synthetic data, evaluations, and red-teaming are often framed as enhancing AI privacy and safety. But what if their effects lie elsewhere?

In a new paper with @realbrianjudge.bsky.social at #EAAMO25, we pull back the curtain on AI safety's toolkit. (1/n)

arxiv.org/pdf/2509.22872

arxiv.org

October 17, 2025 at 9:09 PM

Cas (Stephen Casper)

@scasper.bsky.social

🧵🧵🧵 Do you ever hear people saying that it's important to assess AI systems based on their "marginal risk"?

Of course -- that's obvious. Nobody would ever dispute that.

So then why are we saying that?

Maybe it's a little too obvious...

October 17, 2025 at 10:15 PM

Cas (Stephen Casper)

@scasper.bsky.social

In our Nature article, @yaringal.bsky.social and I outline how building the technical toolkit for open-weight AI model safety will be key to both accessing the benefits and mitigating the risks of powerful open models.

www.nature.com/articles/d41...

Customizable AI systems that anyone can adapt bring big opportunities — and even bigger risks

Open and adaptable artificial-intelligence models are crucial for scientific progress, but robust safeguards against their misuse are still nascent.

www.nature.com

October 9, 2025 at 10:49 PM

Cas (Stephen Casper)

@scasper.bsky.social

Don't forget that in AI, "sycophancy," "pandering," "personalized alignment," "steerable alignment," and "user alignment" all describe exactly the same thing.

October 2, 2025 at 7:20 PM

Cas (Stephen Casper)

@scasper.bsky.social

Almost 2 years out from my paper with Carson Ezell et al. titled "Black-Box Access is Insufficient for Rigorous AI Audits," it's cool to see that AI companies are starting to report on [internal] evals that use fine-tuning or interp-based methods.

September 30, 2025 at 11:15 PM

Cas (Stephen Casper)

@scasper.bsky.social

A monumental milestone for AI governance:

www.gov.ca.gov/2025/09/29/g...

September 29, 2025 at 8:21 PM

Reposted by Cas (Stephen Casper)

Jacy Reese Anthis

@jacyanthis.bsky.social

LLM agents are optimized for thumbs-up instant gratification. RLHF -> sycophancy

We propose human agency as a new alignment target in HumanAgencyBench, made possible by AI simulation/evals. We find e.g., Claude most supports agency but also most tries to steer user values 👇 arxiv.org/abs/2509.08494

The main figure from the HumanAgencyBench paper, showing five models across the six dimensions. The table of results in the appendix has this information too.

September 15, 2025 at 5:11 PM

Cas (Stephen Casper)

@scasper.bsky.social

I'll be leading a MATS stream this winter with a focus on technical AI governance. You can apply here by October 2!

www.matsprogram.org/apply

Apply for Winter 2026 — ML Alignment & Theory Scholars

www.matsprogram.org

September 8, 2025 at 12:35 AM

Cas (Stephen Casper)

@scasper.bsky.social

📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com

Stephen Casper

Visit the post for more.

stephencasper.com

September 4, 2025 at 3:39 AM

Cas (Stephen Casper)

@scasper.bsky.social

Here is a riddle I came up with for a draft to illustrate the differences between normal chat models and reasoning models. Can you figure it out?

Dark as night in the morning light.
I live high until I am ground.
I sit dry until I am drowned.
What am I?

August 30, 2025 at 4:00 PM

Cas (Stephen Casper)

@scasper.bsky.social

Research on AI "sandbagging" is getting more popular recently. In this 🧵, I'll give some reasons that I think it's not a useful research paradigm.

TL;DR, I think it's a confusing reframing of fairly well studied and previously solved problems.

August 24, 2025 at 11:00 AM

Cas (Stephen Casper)

@scasper.bsky.social

There have been a couple cool pieces up recently debunking the "China is racing on AI, so the US must too" narrative.

time.com/7308857/chin...

papers.ssrn.com/sol3/papers....

August 23, 2025 at 5:40 PM

Cas (Stephen Casper)

@scasper.bsky.social

A personal update:
- I just finished my 6-month residency at UK AISI.
- I'm going back to MIT for the final year of my PhD.
- I'm on the postdoc and faculty job markets this fall!