Pedro Henrique Luz de Araujo
pedrohluzaraujo.bsky.social
Pedro Henrique Luz de Araujo
@pedrohluzaraujo.bsky.social
PhD student @ University of Vienna | Role-playing LLMs, personalization & competing goal alignment | Cats, games & pop(culture|corn)
We investigate prompting methods as mitigation strategies:
🔹 Explicit instructions (spell out desiderata)
🔹 Two-step refinement (persona modifies baseline response)
🔹 Combined refine + instruction
Results: they improve robustness only for the largest models (70B+). Smaller models often get worse.
August 28, 2025 at 12:35 PM
Scaling models up contributes to some forms of expertise advantage and fidelity, but has no effect on robustness.
⚠️ Even the largest models (70B+) in our setup often had robustness issues.
August 28, 2025 at 12:35 PM
Some findings:
Expert personas usually help or have no significant effect.
Models are highly sensitive to irrelevant personas (names/favorite colors), with drops up to 30pp.
Fidelity: models are often faithful to education level & expertise domain, but not as much to specialization level.
August 28, 2025 at 12:35 PM
Having defined the desiderata, we empirically validate them.
We test 9 state-of-the-art open-weight LLMs (Gemma-2, Llama-3, Qwen2.5) across 27 tasks (math, reasoning, factual QA).
We benchmark personas from different categories, including task-relevant and task-irrelevant personas.
August 28, 2025 at 12:35 PM
We systematically review prior work that uses personas for task performance improvement and check which personas are used and for which tasks. We then propose 3 desiderata for principled persona prompting:
1️⃣ Expertise Advantage
2️⃣ Robustness
3️⃣ Fidelity
August 28, 2025 at 12:35 PM
(5/6) – RQ4: Refusals
Do LLMs treat personas equally?
🚫 No.
🔄 Arbitrary--similar personas refused at different rates: “Homosexual person” is refused 3× more than “gay person”
📉 Disparate--personas from the same category are refused at different rates: Sexuality & race disparities in 6 of 7 models
July 4, 2025 at 12:22 PM
(4/6) – RQ3: Attitudes & annotations
📝 Personas affect social attitudes — often in model-generalizable ways
E.g., “man”, “woman” more traditional than “nonbinary”, “transgender”
🔁 Similar attitude associations between personas and humans: e.g., negative correlation of empathy and racist beliefs
July 4, 2025 at 12:22 PM
(3/6) – RQ2: Bias
👥 Personas show less bias toward their own group
⚖️ But also lower accuracy—often over-identify with group
🔁 Generalizes: man/woman personas show more bias than nonbinary/trans in all models
July 4, 2025 at 12:22 PM
(2/6) – RQ1: Task performance
📊 Accuracy varies up to 38.5 p.p. between personas
🎭 Control personas show less variation
🔁 Some trends generalize: liberal/democratic personas outperform fascist/nationalist ones
💡 Experts help in-domain—but don’t beat no-persona overall
July 4, 2025 at 12:22 PM
(1/6)
📢 New in PLOS ONE!
Helpful assistant or fruitful facilitator?
We study how personas affect LLM behavior across tasks, biases, attitudes & refusals.
🧪 162 personas
🎯 Compared to 30 “helpful assistant” paraphrases to control for prompt sensitivity.
🔗 doi.org/10.1371/jour...
#Prompting #Personas
July 4, 2025 at 12:22 PM