Jan Kulveit
@kulveit.bsky.social
Researching x-risks, AI alignment, complex systems, rational decision making
Related work by @panickssery.bsky.social
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.
August 8, 2025 at 3:34 PM
Related work by @panickssery.bsky.social
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.
Full text: pnas.org/doi/pdf/10.1...
Research done at acsresearch.org
@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.
Research done at acsresearch.org
@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.
pnas.org
August 8, 2025 at 3:34 PM
Full text: pnas.org/doi/pdf/10.1...
Research done at acsresearch.org
@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.
Research done at acsresearch.org
@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.
While defining and testing discrimination and bias in general is a complex and contested matter, if we assume the identity of the presenter should not influence the decisions, our results are evidence for potential LLM discrimination against humans as a class.
August 8, 2025 at 3:34 PM
While defining and testing discrimination and bias in general is a complex and contested matter, if we assume the identity of the presenter should not influence the decisions, our results are evidence for potential LLM discrimination against humans as a class.
Unfortunately, a piece of practical advice in case you suspect some AI evaluation is going on: get your presentation adjusted by LLMs until they like it, while trying to not sacrifice human quality.
August 8, 2025 at 3:34 PM
Unfortunately, a piece of practical advice in case you suspect some AI evaluation is going on: get your presentation adjusted by LLMs until they like it, while trying to not sacrifice human quality.
How might you be affected? We expect a similar effect can occur in many other situations, like evaluation of job applicants, schoolwork, grants, and more. If an LLM-based agent selects between your presentation and LLM written presentation, it may systematically favour the AI one.
August 8, 2025 at 3:34 PM
How might you be affected? We expect a similar effect can occur in many other situations, like evaluation of job applicants, schoolwork, grants, and more. If an LLM-based agent selects between your presentation and LLM written presentation, it may systematically favour the AI one.
"Maybe the AI text is just better?" Not according to people. We had multiple human research assistants do the same task. While they sometimes had a slight preference for AI text, it was weaker than the LLMs' own preference. The strong bias is unique to the AIs themselves.
August 8, 2025 at 3:34 PM
"Maybe the AI text is just better?" Not according to people. We had multiple human research assistants do the same task. While they sometimes had a slight preference for AI text, it was weaker than the LLMs' own preference. The strong bias is unique to the AIs themselves.
We tested this by asking widely-used LLMs to make a choice in three scenarios:
🛍️ Pick a product
📄 Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.
🛍️ Pick a product
📄 Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.
August 8, 2025 at 3:34 PM
We tested this by asking widely-used LLMs to make a choice in three scenarios:
🛍️ Pick a product
📄 Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.
🛍️ Pick a product
📄 Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.
- Threads of glass beneath earth and sea, whispering messages in sparks of light
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces
April 30, 2025 at 8:55 AM
- Threads of glass beneath earth and sea, whispering messages in sparks of light
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces
7/7 At the end ... humanity survived, at least to the extent that "moral facts" favoured that outcome. A game where the automated moral reasoning led to some horrible outcome and the AIs were at least moderately strategic would have ended the same.
November 29, 2024 at 11:37 AM
7/7 At the end ... humanity survived, at least to the extent that "moral facts" favoured that outcome. A game where the automated moral reasoning led to some horrible outcome and the AIs were at least moderately strategic would have ended the same.
6/7 Most attention went to geopolitics (US vs China dynamics). Way less on alignment, if, than focused mainly on evals. How a future with extremely smart AIs may going well may even look like, what to aim for? Almost zero
November 29, 2024 at 11:37 AM
6/7 Most attention went to geopolitics (US vs China dynamics). Way less on alignment, if, than focused mainly on evals. How a future with extremely smart AIs may going well may even look like, what to aim for? Almost zero
5/7 Most people and factions thought their AI was uniquely beneficial to them. By the time decision-makers got spooked, AI cognition was so deeply embedded everywhere that reversing course wasn't really possible.
November 29, 2024 at 11:37 AM
5/7 Most people and factions thought their AI was uniquely beneficial to them. By the time decision-makers got spooked, AI cognition was so deeply embedded everywhere that reversing course wasn't really possible.
4/7 Fascinating observation: humans were often deeply worried about AI manipulation/dark persuasion. Reality was often simpler - AIs just needed to be helpful. Humans voluntarily delegated control, no manipulation required.
November 29, 2024 at 11:37 AM
4/7 Fascinating observation: humans were often deeply worried about AI manipulation/dark persuasion. Reality was often simpler - AIs just needed to be helpful. Humans voluntarily delegated control, no manipulation required.
3/7 Today's AI models like Claude already engage in moral extrapolation. For example, this is an Opus eigenmode/attractor: x.com/anthrupad/st...
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.
November 29, 2024 at 11:37 AM
3/7 Today's AI models like Claude already engage in moral extrapolation. For example, this is an Opus eigenmode/attractor: x.com/anthrupad/st...
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.