Jan Kulveit
banner
kulveit.bsky.social
Jan Kulveit
@kulveit.bsky.social
Researching x-risks, AI alignment, complex systems, rational decision making
Related work by @panickssery.bsky.social
et al. found that LLMs evaluate LLM-written texts written by themselves as better. We note that our result is related but distinct: the preferences we’re testing are not preferences over texts, but preferences over the deals they pitch.
August 8, 2025 at 3:34 PM
Full text: pnas.org/doi/pdf/10.1...

Research done at acsresearch.org

@cts.cuni.cz, Arb research, with @walterlaurito.bsky.social @peligrietzer.bsky.social
Ada Bohm and Tomas Gavenciak.
pnas.org
August 8, 2025 at 3:34 PM
While defining and testing discrimination and bias in general is a complex and contested matter, if we assume the identity of the presenter should not influence the decisions, our results are evidence for potential LLM discrimination against humans as a class.
August 8, 2025 at 3:34 PM
Unfortunately, a piece of practical advice in case you suspect some AI evaluation is going on: get your presentation adjusted by LLMs until they like it, while trying to not sacrifice human quality.
August 8, 2025 at 3:34 PM
How might you be affected? We expect a similar effect can occur in many other situations, like evaluation of job applicants, schoolwork, grants, and more. If an LLM-based agent selects between your presentation and LLM written presentation, it may systematically favour the AI one.
August 8, 2025 at 3:34 PM
"Maybe the AI text is just better?" Not according to people. We had multiple human research assistants do the same task. While they sometimes had a slight preference for AI text, it was weaker than the LLMs' own preference. The strong bias is unique to the AIs themselves.
August 8, 2025 at 3:34 PM
We tested this by asking widely-used LLMs to make a choice in three scenarios:
🛍️ Pick a product
📄 Select a paper from an abstract
🎬 Recommend a movie from a summary
One description was human-written, the AI. The AIs consistently preferred the AI-written pitch, even for the exact same item.
August 8, 2025 at 3:34 PM
- Threads of glass beneath earth and sea, whispering messages in sparks of light
- Tiny stones etched by rays of invisible sunlight, awakened by captured lightning to command unseen forces
April 30, 2025 at 8:55 AM
7/7 At the end ... humanity survived, at least to the extent that "moral facts" favoured that outcome. A game where the automated moral reasoning led to some horrible outcome and the AIs were at least moderately strategic would have ended the same.
November 29, 2024 at 11:37 AM
6/7 Most attention went to geopolitics (US vs China dynamics). Way less on alignment, if, than focused mainly on evals. How a future with extremely smart AIs may going well may even look like, what to aim for? Almost zero
November 29, 2024 at 11:37 AM
5/7 Most people and factions thought their AI was uniquely beneficial to them. By the time decision-makers got spooked, AI cognition was so deeply embedded everywhere that reversing course wasn't really possible.
November 29, 2024 at 11:37 AM
4/7 Fascinating observation: humans were often deeply worried about AI manipulation/dark persuasion. Reality was often simpler - AIs just needed to be helpful. Humans voluntarily delegated control, no manipulation required.
November 29, 2024 at 11:37 AM
3/7 Today's AI models like Claude already engage in moral extrapolation. For example, this is an Opus eigenmode/attractor: x.com/anthrupad/st...
If you do put some weight on moral realism, or moral reflection leading to convergent outcomes, AIs might discover these principles.
November 29, 2024 at 11:37 AM