Chris Painter
banner
chris.bsky.social
Chris Painter
@chris.bsky.social
evals accelerationist, Head of Policy at METR, working hard on responsible scaling policies

Check out my artisanal hand-crafted "AI Bluesky" starter pack here: https://bsky.app/starter-pack/chris.bsky.social/3lbefurb2xh2u
Pinned
Get off my lawn
METR a few months ago had two projects going in parallel: a project experimenting with AI researcher interviews to track degree of AI R&D acceleration/delegation, and this project.

When the results started coming back from this project, we put the survey-only project on ice.
metr.org METR @metr.org · Jul 10
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.

The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
July 11, 2025 at 12:22 AM
Reposted by Chris Painter
At METR, we’ve seen increasingly sophisticated examples of “reward hacking” on our tasks: models trying to subvert or exploit the environment or scoring code to obtain a higher score. In a new post, we discuss this phenomenon and share some especially crafty instances we’ve seen.
June 13, 2025 at 12:05 AM
Reposted by Chris Painter
personal update: today is my last day with the Bluesky team!

this is bittersweet news to share, but the great thing about an open network is you never really have to leave. I’ll be rooting for Bluesky and atproto from the outside 🫡💙
May 30, 2025 at 7:16 PM
I spent a few days at Yale Law, while also listening to Sam Harris’s interview with Tom Holland about his book “Dominion”, and it’s striking how similar the role and vibe of the American judiciary is to a kind of secular priesthood. Robes, scholars interpreting sacred texts
April 9, 2025 at 4:35 PM
March 26, 2025 at 6:17 AM
March 19, 2025 at 6:59 PM
Reposted by Chris Painter
When will AI systems be able to carry out long projects independently?

In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.
March 19, 2025 at 5:43 PM
Bought a new bike this weekend :(
March 10, 2025 at 6:46 AM
Taking science fiction seriously - thinking with effort about which ideas from sci-fi could become real soon and why and which couldn’t - has been so useful to me that it feels something like a core value
February 11, 2025 at 2:25 AM
Cleaning a childhood bedroom and I’m struck by how much optimistic messaging about technology and space technology in particular I was surrounded by as a kid in the 90’s.

Are kids still immersed in this stuff? I hope so
December 29, 2024 at 4:18 AM
Sadly most physical goods that you’d be tempted to donate to someone are worth less than the cost in effort it would take to find someone who needs them
December 29, 2024 at 4:17 AM
If this group is dedicated to advocating for what it seems like they’re dedicated to advocating for, it’s pretty wild that they exist!
December 23, 2024 at 11:00 PM
Worlds with federal pre-emption of AI policy might be correlated with worlds with a huge expansion of social attention to AI (e.g. acute labor displacement), and a less "technocratic" reaction.

Will the first big federal AI bill feel more like the CARES Act or the CHIPS Act?
December 23, 2024 at 8:35 PM
I think AI would benefit from more social contact with scientists in fields whose questions don't have intuitively verifiable answers.

To assess model capability, I find myself often relying on happenstance anecdotes I hear from e.g. lab-bench researchers months after the fact.
December 23, 2024 at 8:35 PM
Will human-level AI be self-deploying/"productizing", or not? Will the "the product can explain to you how to use it and apply it" dynamic dramatically increase the adoption of AI relative to historical comparisons like AVs and steam engines?
December 22, 2024 at 6:34 AM
A corollary to this: I think many policy initiatives would benefit from having more deeply engaged and informed opponents, and this is a neglected niche in many areas/topics. Detailed proposals having better (in the sense of more substantive) opponents is good for the world
I think the world could always benefit from more good-faith really in-depth critique of effortful technical/intellectual work. Many organizations that I collaborate with publish work hoping to have their ideas improved upon or attacked, but often surprisingly few people engage.
December 10, 2024 at 10:56 PM
I think the world could always benefit from more good-faith really in-depth critique of effortful technical/intellectual work. Many organizations that I collaborate with publish work hoping to have their ideas improved upon or attacked, but often surprisingly few people engage.
December 10, 2024 at 10:48 PM
I’ve been wondering, has the (bad) reaction to this actually been that unusual?
"The public reaction has been even wilder, even more lawless. From the jokes on social media to the comments under news stories, the whiff of populist anarchy in the air is salty, unprecedented, and notably across the aisle." link.newyorker.com/click/377824...
A Man Was Murdered in Cold Blood and You’re Laughing?
What the death of a health-insurance C.E.O. means to America.
link.newyorker.com
December 10, 2024 at 7:46 AM
Luigi Mangione's review of the Unabomber manifesto on Goodreads
December 9, 2024 at 7:12 PM
Uber Eats is truly an embarrassment of riches. We are living in a golden age of delivered food.
December 8, 2024 at 2:53 AM
Just realized: I remember when “memes” were new! So strange
December 5, 2024 at 5:36 AM
“booster”
“streaming”
“bitcoin”
“search engine”
“self-driving car”
“autonomous replication”
“covid”
“humanoid robots”
“datacenter”
“disinformation campaign”
“drone warfare”
“Oculus”
“AI boyfriend/girlfriend”
“lab-grown meat”
The world we live in today is already incredibly Cyberpunk, by comparison with the world of the 90s and early 2000s that I grew up in
December 5, 2024 at 1:45 AM
The world we live in today is already incredibly Cyberpunk, by comparison with the world of the 90s and early 2000s that I grew up in
December 2, 2024 at 8:55 PM
Reposted by Chris Painter
Here's a great starter pack of economists working on AI.

Who else should be on this list?

bsky.app/starter-pack...
December 1, 2024 at 7:30 PM
Reposted by Chris Painter
We've just launched our AI Benchmarking Hub!
This is a new platform for rigorous, independent evaluations of AI model capabilities, featuring interactive visualizations and in-depth analysis. (1/8)

epoch.ai/blog/introdu...
November 27, 2024 at 6:29 PM