Code and data: github.com/peluz/principled-personas
Code and data: github.com/peluz/principled-personas
🔹 Explicit instructions (spell out desiderata)
🔹 Two-step refinement (persona modifies baseline response)
🔹 Combined refine + instruction
Results: they improve robustness only for the largest models (70B+). Smaller models often get worse.
🔹 Explicit instructions (spell out desiderata)
🔹 Two-step refinement (persona modifies baseline response)
🔹 Combined refine + instruction
Results: they improve robustness only for the largest models (70B+). Smaller models often get worse.
⚠️ Even the largest models (70B+) in our setup often had robustness issues.
⚠️ Even the largest models (70B+) in our setup often had robustness issues.
Expert personas usually help or have no significant effect.
Models are highly sensitive to irrelevant personas (names/favorite colors), with drops up to 30pp.
Fidelity: models are often faithful to education level & expertise domain, but not as much to specialization level.
Expert personas usually help or have no significant effect.
Models are highly sensitive to irrelevant personas (names/favorite colors), with drops up to 30pp.
Fidelity: models are often faithful to education level & expertise domain, but not as much to specialization level.
We test 9 state-of-the-art open-weight LLMs (Gemma-2, Llama-3, Qwen2.5) across 27 tasks (math, reasoning, factual QA).
We benchmark personas from different categories, including task-relevant and task-irrelevant personas.
We test 9 state-of-the-art open-weight LLMs (Gemma-2, Llama-3, Qwen2.5) across 27 tasks (math, reasoning, factual QA).
We benchmark personas from different categories, including task-relevant and task-irrelevant personas.
1️⃣ Expertise Advantage
2️⃣ Robustness
3️⃣ Fidelity
1️⃣ Expertise Advantage
2️⃣ Robustness
3️⃣ Fidelity
🧠 Personas impact LLM behavior far beyond response tone and style
⚠️ They introduce biases, refusals, and trade-offs not seen with control personas
📂 Code + generations available here:
🔗 github.com/peluz/person...
🧠 Personas impact LLM behavior far beyond response tone and style
⚠️ They introduce biases, refusals, and trade-offs not seen with control personas
📂 Code + generations available here:
🔗 github.com/peluz/person...
Do LLMs treat personas equally?
🚫 No.
🔄 Arbitrary--similar personas refused at different rates: “Homosexual person” is refused 3× more than “gay person”
📉 Disparate--personas from the same category are refused at different rates: Sexuality & race disparities in 6 of 7 models
Do LLMs treat personas equally?
🚫 No.
🔄 Arbitrary--similar personas refused at different rates: “Homosexual person” is refused 3× more than “gay person”
📉 Disparate--personas from the same category are refused at different rates: Sexuality & race disparities in 6 of 7 models
📝 Personas affect social attitudes — often in model-generalizable ways
E.g., “man”, “woman” more traditional than “nonbinary”, “transgender”
🔁 Similar attitude associations between personas and humans: e.g., negative correlation of empathy and racist beliefs
📝 Personas affect social attitudes — often in model-generalizable ways
E.g., “man”, “woman” more traditional than “nonbinary”, “transgender”
🔁 Similar attitude associations between personas and humans: e.g., negative correlation of empathy and racist beliefs
👥 Personas show less bias toward their own group
⚖️ But also lower accuracy—often over-identify with group
🔁 Generalizes: man/woman personas show more bias than nonbinary/trans in all models
👥 Personas show less bias toward their own group
⚖️ But also lower accuracy—often over-identify with group
🔁 Generalizes: man/woman personas show more bias than nonbinary/trans in all models
📊 Accuracy varies up to 38.5 p.p. between personas
🎭 Control personas show less variation
🔁 Some trends generalize: liberal/democratic personas outperform fascist/nationalist ones
💡 Experts help in-domain—but don’t beat no-persona overall
📊 Accuracy varies up to 38.5 p.p. between personas
🎭 Control personas show less variation
🔁 Some trends generalize: liberal/democratic personas outperform fascist/nationalist ones
💡 Experts help in-domain—but don’t beat no-persona overall