Jan Kulveit
@kulveit.bsky.social
Researching x-risks, AI alignment, complex systems, rational decision making
Reposted by Jan Kulveit
ChatGPT and other LLMs were asked to choose between consumer products, academic papers, and films summarized either by humans or LLMs. The LLMs consistently preferred content summarized by LLMs, suggesting a possible antihuman bias. In PNAS: www.pnas.org/doi/10.1073/...
August 14, 2025 at 4:29 PM
ChatGPT and other LLMs were asked to choose between consumer products, academic papers, and films summarized either by humans or LLMs. The LLMs consistently preferred content summarized by LLMs, suggesting a possible antihuman bias. In PNAS: www.pnas.org/doi/10.1073/...
Being human in an economy populated by AI agents would suck. Our new study in @pnas.org finds that AI assistants—used for everything from shopping to reviewing academic papers—show a consistent, implicit bias for other AIs: "AI-AI bias". You may be affected
August 8, 2025 at 3:34 PM
Being human in an economy populated by AI agents would suck. Our new study in @pnas.org finds that AI assistants—used for everything from shopping to reviewing academic papers—show a consistent, implicit bias for other AIs: "AI-AI bias". You may be affected
Reposted by Jan Kulveit
It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!
Post-AGI Civilizational Equilibria: Are there any good ones?
Vancouver, July 14th
www.post-agi.org
Featuring: Joe Carlsmith, @richardngo.bsky.social, Emmett Shear ... 🧵
Post-AGI Civilizational Equilibria: Are there any good ones?
Vancouver, July 14th
www.post-agi.org
Featuring: Joe Carlsmith, @richardngo.bsky.social, Emmett Shear ... 🧵
Post-AGI Civilizational Equilibria Workshop | Vancouver 2025
Are there any good ones? Join us in Vancouver on July 14th, 2025 to explore stable equilibria and human agency in a post-AGI world. Co-located with ICML.
www.post-agi.org
June 18, 2025 at 6:12 PM
It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!
Post-AGI Civilizational Equilibria: Are there any good ones?
Vancouver, July 14th
www.post-agi.org
Featuring: Joe Carlsmith, @richardngo.bsky.social, Emmett Shear ... 🧵
Post-AGI Civilizational Equilibria: Are there any good ones?
Vancouver, July 14th
www.post-agi.org
Featuring: Joe Carlsmith, @richardngo.bsky.social, Emmett Shear ... 🧵
Reposted by Jan Kulveit
What to do about gradual disempowerment from AGI? We laid out a research agenda with all the concrete and feasible research projects we can think of: 🧵
www.lesswrong.com/posts/GAv4DR...
with Raymond Douglas, @kulveit.bsky.social @davidskrueger.bsky.social
www.lesswrong.com/posts/GAv4DR...
with Raymond Douglas, @kulveit.bsky.social @davidskrueger.bsky.social
Gradual Disempowerment: Concrete Research Projects — LessWrong
This post benefitted greatly from comments, suggestions, and ongoing discussions with David Duvenaud, David Krueger, and Jan Kulveit. All errors are…
www.lesswrong.com
June 3, 2025 at 9:22 PM
What to do about gradual disempowerment from AGI? We laid out a research agenda with all the concrete and feasible research projects we can think of: 🧵
www.lesswrong.com/posts/GAv4DR...
with Raymond Douglas, @kulveit.bsky.social @davidskrueger.bsky.social
www.lesswrong.com/posts/GAv4DR...
with Raymond Douglas, @kulveit.bsky.social @davidskrueger.bsky.social
Imagine explaining physical infrastructure critical for stability of our modern world in concepts familiar to the ancients
- Giant spinning wheels
- Metal moons, watching the earth from the heavens
- Ships under the sea, able to unleash the fire of the stars
- Giant spinning wheels
- Metal moons, watching the earth from the heavens
- Ships under the sea, able to unleash the fire of the stars
April 30, 2025 at 8:55 AM
Imagine explaining physical infrastructure critical for stability of our modern world in concepts familiar to the ancients
- Giant spinning wheels
- Metal moons, watching the earth from the heavens
- Ships under the sea, able to unleash the fire of the stars
- Giant spinning wheels
- Metal moons, watching the earth from the heavens
- Ships under the sea, able to unleash the fire of the stars
AI safety has a problem: we often implicitly assume clear individuals - like humans.
In a new post, I'm sharing why this fails, and why thinking of AIs as forests, fungal networks, or even reincarnating minds helps get unconfused.
Plus stories, co-authored with GPT4.5
In a new post, I'm sharing why this fails, and why thinking of AIs as forests, fungal networks, or even reincarnating minds helps get unconfused.
Plus stories, co-authored with GPT4.5
The Pando Problem
AI safety has a problem: we often implicitly assume clear individuals—like humans.
boundedlyrational.substack.com
April 3, 2025 at 7:50 AM
AI safety has a problem: we often implicitly assume clear individuals - like humans.
In a new post, I'm sharing why this fails, and why thinking of AIs as forests, fungal networks, or even reincarnating minds helps get unconfused.
Plus stories, co-authored with GPT4.5
In a new post, I'm sharing why this fails, and why thinking of AIs as forests, fungal networks, or even reincarnating minds helps get unconfused.
Plus stories, co-authored with GPT4.5
The Serbian protests show The True Nature of various 'Colour revolutions':
Which is, people protesting just don't prefer to live in incompetent kleptocratic Russia-backed states. No US scheming needed.
Which is, people protesting just don't prefer to live in incompetent kleptocratic Russia-backed states. No US scheming needed.
March 17, 2025 at 1:35 PM
The Serbian protests show The True Nature of various 'Colour revolutions':
Which is, people protesting just don't prefer to live in incompetent kleptocratic Russia-backed states. No US scheming needed.
Which is, people protesting just don't prefer to live in incompetent kleptocratic Russia-backed states. No US scheming needed.
Confusion which casual US observers often have is equating Russia with ˜former Warsaw Pact.
Warsaw Pact population was 387M: USSR 280M, Poland 35M, E.Germany 16M, Czechoslovakia 15M, Hungary 10M, Romania 22M, Bulgaria 9M.
Russia+Belarus is now 144M, NATO East& Ukraine ˜150M.
Warsaw Pact population was 387M: USSR 280M, Poland 35M, E.Germany 16M, Czechoslovakia 15M, Hungary 10M, Romania 22M, Bulgaria 9M.
Russia+Belarus is now 144M, NATO East& Ukraine ˜150M.
March 7, 2025 at 2:56 PM
Confusion which casual US observers often have is equating Russia with ˜former Warsaw Pact.
Warsaw Pact population was 387M: USSR 280M, Poland 35M, E.Germany 16M, Czechoslovakia 15M, Hungary 10M, Romania 22M, Bulgaria 9M.
Russia+Belarus is now 144M, NATO East& Ukraine ˜150M.
Warsaw Pact population was 387M: USSR 280M, Poland 35M, E.Germany 16M, Czechoslovakia 15M, Hungary 10M, Romania 22M, Bulgaria 9M.
Russia+Belarus is now 144M, NATO East& Ukraine ˜150M.
Reposted by Jan Kulveit
the most surprising and disappointing aspect of becoming a global health philanthropist is the existence of an opposition team
February 27, 2025 at 4:42 AM
the most surprising and disappointing aspect of becoming a global health philanthropist is the existence of an opposition team
A simple theory of Trump’s foreign policy: "make the world safer for autocracy" (‘strong man rule,’ etc.), moderated by his personal self-interest.
What is the best evidence against?
What is the best evidence against?
February 24, 2025 at 6:39 PM
A simple theory of Trump’s foreign policy: "make the world safer for autocracy" (‘strong man rule,’ etc.), moderated by his personal self-interest.
What is the best evidence against?
What is the best evidence against?
Reposted by Jan Kulveit
New paper: What happens once AIs make humans obsolete?
Even without AIs seeking power, we argue that competitive pressures are set to fully erode human influence and values.
www.gradual-disempowerment.ai
with @kulveit.bsky.social, Raymond Douglas, Nora Ammann, Deger Turann, David Krueger 🧵
Even without AIs seeking power, we argue that competitive pressures are set to fully erode human influence and values.
www.gradual-disempowerment.ai
with @kulveit.bsky.social, Raymond Douglas, Nora Ammann, Deger Turann, David Krueger 🧵
January 30, 2025 at 5:19 PM
New paper: What happens once AIs make humans obsolete?
Even without AIs seeking power, we argue that competitive pressures are set to fully erode human influence and values.
www.gradual-disempowerment.ai
with @kulveit.bsky.social, Raymond Douglas, Nora Ammann, Deger Turann, David Krueger 🧵
Even without AIs seeking power, we argue that competitive pressures are set to fully erode human influence and values.
www.gradual-disempowerment.ai
with @kulveit.bsky.social, Raymond Douglas, Nora Ammann, Deger Turann, David Krueger 🧵
Accessible model of psychology of character-trained LLMs like Claude: "A Three-Layer Model".
-Mostly phenomenological, based on extensive interactions with LLMs, eg Claude.
-Intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions
-Mostly phenomenological, based on extensive interactions with LLMs, eg Claude.
-Intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions
A Three-Layer Model of LLM Psychology — LessWrong
This post offers an accessible model of psychology of character-trained LLMs like Claude. …
www.lesswrong.com
December 27, 2024 at 5:53 PM
Accessible model of psychology of character-trained LLMs like Claude: "A Three-Layer Model".
-Mostly phenomenological, based on extensive interactions with LLMs, eg Claude.
-Intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions
-Mostly phenomenological, based on extensive interactions with LLMs, eg Claude.
-Intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions
Over the weekend, I was at "The Curve" conference. It was great.
One highlight was an AI takeoff wargame/role-play by
Daniel Kokotajlo and Eli Lifland
I played 'the AIs'
Spoiler: we won. Here's how it went:
One highlight was an AI takeoff wargame/role-play by
Daniel Kokotajlo and Eli Lifland
I played 'the AIs'
Spoiler: we won. Here's how it went:
November 29, 2024 at 11:37 AM
Over the weekend, I was at "The Curve" conference. It was great.
One highlight was an AI takeoff wargame/role-play by
Daniel Kokotajlo and Eli Lifland
I played 'the AIs'
Spoiler: we won. Here's how it went:
One highlight was an AI takeoff wargame/role-play by
Daniel Kokotajlo and Eli Lifland
I played 'the AIs'
Spoiler: we won. Here's how it went:
Paying attention to metaphors we use is often worth it, but the linked critical perspective comes somewhat hollow. Anything new usually relies on "metaphors" and extending older concepts. For example, we say that planes fly. (1/5)
For Science Magazine, I wrote about "The Metaphors of Artificial Intelligence".
The way you conceptualize AI systems affects how you interact with them, do science on them, and create policy and apply laws to them.
Hope you will check it out!
www.science.org/doi/full/10....
The way you conceptualize AI systems affects how you interact with them, do science on them, and create policy and apply laws to them.
Hope you will check it out!
www.science.org/doi/full/10....
The metaphors of artificial intelligence
A few months after ChatGPT was released, the neural network pioneer Terrence Sejnowski wrote about coming to grips with the shock of what large language models (LLMs) could do: “Something is beginning...
www.science.org
November 20, 2024 at 5:03 PM
Paying attention to metaphors we use is often worth it, but the linked critical perspective comes somewhat hollow. Anything new usually relies on "metaphors" and extending older concepts. For example, we say that planes fly. (1/5)
Want to encourage platform migration, and have the willpower to follow through? Instead of deleting your account, try posting the first half of an engaging thread there and finish it here.
November 20, 2024 at 5:21 AM
Want to encourage platform migration, and have the willpower to follow through? Instead of deleting your account, try posting the first half of an engaging thread there and finish it here.
What I actually dislike here is the 300 character limit again. Structurally horrible, promotes oversimplified clickbaity takes.
November 19, 2024 at 10:40 AM
What I actually dislike here is the 300 character limit again. Structurally horrible, promotes oversimplified clickbaity takes.