Steve Byrnes
stevebyrnes.bsky.social
Steve Byrnes
@stevebyrnes.bsky.social
Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html
This post then dives into 1 of those 4: “Sympathy Reward”. If someone (especially a friend or idol) feels good/bad, that makes me feel good/bad. I discuss its obvious prosocial effects, along with its less-obvious (not all nice!) effects. Link again: www.lesswrong.com/posts/KuBiv9... (5/5)
November 10, 2025 at 3:27 PM
In this post, I flesh out the effects of that (hypothesized) circuit, first by splitting its output stream of reward signals into four subcomponents, depending on the circumstances in which it triggers. (4/5)
November 10, 2025 at 3:27 PM
New blog post! “Social drives 1: ‘Sympathy Reward’, from compassion to dehumanization”. This is the 1st of 2 posts building an ever-better bridge that connects from neuroscience & algorithms on one shore, to everyday human experience on the other… www.lesswrong.com/posts/KuBiv9... (1/5)
November 10, 2025 at 3:27 PM
my take is §2.1 here (1st screenshot) www.lesswrong.com/posts/hsf7tQ... (+ 2nd screenshot is about how different motives can lead to different knowledge [albeit overlapping])
October 29, 2025 at 10:51 AM
I agree that this is the status quo, and I think it’s bad and am doing what I can to change that.

I have something in mind where a good understanding of the SC + a certain tracer study (or access to a connectomic dataset) lets us find a key hypothalamus cell group for compassion & norm-following.
October 7, 2025 at 1:13 AM
I happen to like the term “AGI” for that, as long as it's understood that the G is “general as in not specific” (“in general, Boston has nice weather”), not “general as in universal” (“I have a general proof of the math theorem”). (2/6)
September 28, 2025 at 11:33 AM
Humans can learn to drive a car via 30 hours’ barely-supervised practice, using the same kind of brain that evolved long before cars existed. Meanwhile, AIs can learn to drive a car via a team of experts spending $5B and 10 years on R&D.

Humans can autonomously create and run companies, LLMs can’t.
September 19, 2025 at 7:07 PM
The authors propose to get an international treaty to pause progress towards superintelligence, including both scaling & R&D. I’m for it, although I don’t hold out much hope for such efforts to have more than marginal impact. I expect that AI capabilities would rebrand as AI safety, and plow ahead:
September 18, 2025 at 7:40 PM
The authors argue that people are trying to build ASI (superintelligent AI), and we should expect them to succeed sooner or later, even if they obviously haven’t succeeded YET. I agree. (I lean “later” more than the authors, but that’s a minor disagreement.)
September 18, 2025 at 7:40 PM
I feel like it’s important for at least some readers right? e.g. ↓
September 17, 2025 at 5:02 PM
Clarification: When I shared this meme 2 years ago, I was referring specifically to traditional task-based fMRI studies.

“Functional Connectomics” fMRI studies, by contrast, would be flying overhead in a helicopter, strafing the water with a machine gun
August 31, 2025 at 8:46 PM
August 28, 2025 at 6:46 PM
Unfortunately, this sculpting process tends to systematically lead to an AGI whose motivations fit the reward function TOO well, such that it exploits errors and edge-cases in the reward function. This alignment failure mode is called “specification gaming” or “reward hacking”. (3/5)
August 5, 2025 at 6:28 PM
New blog post: “The perils of under- vs over-sculpting AGI desires”. (1/5) www.alignmentforum.org/posts/grgb2i...
August 5, 2025 at 6:28 PM
Like, if all humans on Earth could agree to never help any AI accomplish its goals? Sure, that would help prevent AI takeover & human extinction. But if that were possible, why can’t we get all humans on Earth to agree to never help drug cartels? ↓ Or did you mean something else?
June 28, 2025 at 4:53 PM
(If the “future scary paradigm” is so bad, should we push on LLMs instead? I offer some concerns there too—e.g., the reasons to feel good about alignment of today’s LLMs seem to be getting less and less applicable over time!) (9/10)
June 23, 2025 at 6:46 PM
And that’s just one of three reasons that I expect technical alignment to be far harder in the next paradigm than for today’s LLMs. The other two are: specification gaming (a.k.a. the “literal genie” thing), and continuous learning. (8/10)
June 23, 2025 at 6:46 PM
Most people are skeptical of this kind of foom, because it’s obviously not how LLM development has been playing out. I discuss at length what I see as the disanalogies between LLMs and my (claimed) future scary AI paradigm, and more generally where I’m coming from. (4/10)
June 23, 2025 at 6:46 PM
Post 1 (foom) argues that, after a future AI paradigm shift, AI will blast through the gap between “unimpressive” and “strong superintelligence” in very little time (maybe months), with very little compute and R&D effort. (3/10)
June 23, 2025 at 6:46 PM
New 2-post series on “foom & doom” scenarios, where radical superintelligence arises seemingly out of nowhere and wipes out humanity. These were often discussed a decade ago, but are now widely dismissed due to LLMs. …Well call me old fashioned, but I’m still expecting foom & doom 🧵 (1/10)
June 23, 2025 at 6:46 PM
I choose to take the results at face value. Good day sir.
May 19, 2025 at 8:06 PM
I think people have a suite of innate unconscious snap reactions to social situations (e.g. learning that someone is angry at them). If Alice & Bob have similar snap reactions, they’ll more easily model each other, recognize where the other is coming from, and be more forgiving towards bad reactions
May 8, 2025 at 10:46 AM
Most of the post is spent explaining their proposal and why it won’t work. But in an epilogue, I also argue that this is not just a minor technical oversight, but a deeply troubling sign about the authors, the field, and the future. We need to do better. (4/4)
April 24, 2025 at 2:12 PM
Making a powerful real-world RL agent that definitely won’t try to murder its programmers and users is an unsolved technical problem. This paper barely addresses, much less solves, blockers that AI alignment researchers have been yelling about since at least 2011. (3/4)
April 24, 2025 at 2:12 PM
See also this handy table (from www.alignmentforum.org/posts/wBHSYw... ) (5/4)
April 17, 2025 at 12:37 PM