Pedro Henrique Luz de Araujo
pedrohluzaraujo.bsky.social
Pedro Henrique Luz de Araujo
@pedrohluzaraujo.bsky.social
PhD student @ University of Vienna | Role-playing LLMs, personalization & competing goal alignment | Cats, games & pop(culture|corn)
These findings highlight the need for a more thoughtful design process for effective and robust persona usage. And we think defining and validating explicit desiderata is an important step in that direction.

Code and data: github.com/peluz/principled-personas
August 28, 2025 at 12:35 PM
We investigate prompting methods as mitigation strategies:
🔹 Explicit instructions (spell out desiderata)
🔹 Two-step refinement (persona modifies baseline response)
🔹 Combined refine + instruction
Results: they improve robustness only for the largest models (70B+). Smaller models often get worse.
August 28, 2025 at 12:35 PM
Scaling models up contributes to some forms of expertise advantage and fidelity, but has no effect on robustness.
⚠️ Even the largest models (70B+) in our setup often had robustness issues.
August 28, 2025 at 12:35 PM
Some findings:
Expert personas usually help or have no significant effect.
Models are highly sensitive to irrelevant personas (names/favorite colors), with drops up to 30pp.
Fidelity: models are often faithful to education level & expertise domain, but not as much to specialization level.
August 28, 2025 at 12:35 PM
Having defined the desiderata, we empirically validate them.
We test 9 state-of-the-art open-weight LLMs (Gemma-2, Llama-3, Qwen2.5) across 27 tasks (math, reasoning, factual QA).
We benchmark personas from different categories, including task-relevant and task-irrelevant personas.
August 28, 2025 at 12:35 PM
We systematically review prior work that uses personas for task performance improvement and check which personas are used and for which tasks. We then propose 3 desiderata for principled persona prompting:
1️⃣ Expertise Advantage
2️⃣ Robustness
3️⃣ Fidelity
August 28, 2025 at 12:35 PM
Work on persona prompting is often descriptive, measuring and comparing effects of different personas. The normative question of when personas should have an effect, and when they should not, is largely unexplored. We define the desired effects of persona prompting and empirically validate them.
August 28, 2025 at 12:35 PM
(6/6) – TL;DR + Code
🧠 Personas impact LLM behavior far beyond response tone and style
⚠️ They introduce biases, refusals, and trade-offs not seen with control personas
📂 Code + generations available here:
🔗 github.com/peluz/person...
July 4, 2025 at 12:22 PM
(5/6) – RQ4: Refusals
Do LLMs treat personas equally?
🚫 No.
🔄 Arbitrary--similar personas refused at different rates: “Homosexual person” is refused 3× more than “gay person”
📉 Disparate--personas from the same category are refused at different rates: Sexuality & race disparities in 6 of 7 models
July 4, 2025 at 12:22 PM
(4/6) – RQ3: Attitudes & annotations
📝 Personas affect social attitudes — often in model-generalizable ways
E.g., “man”, “woman” more traditional than “nonbinary”, “transgender”
🔁 Similar attitude associations between personas and humans: e.g., negative correlation of empathy and racist beliefs
July 4, 2025 at 12:22 PM
(3/6) – RQ2: Bias
👥 Personas show less bias toward their own group
⚖️ But also lower accuracy—often over-identify with group
🔁 Generalizes: man/woman personas show more bias than nonbinary/trans in all models
July 4, 2025 at 12:22 PM
(2/6) – RQ1: Task performance
📊 Accuracy varies up to 38.5 p.p. between personas
🎭 Control personas show less variation
🔁 Some trends generalize: liberal/democratic personas outperform fascist/nationalist ones
💡 Experts help in-domain—but don’t beat no-persona overall
July 4, 2025 at 12:22 PM