Lightnews — Scholar-powered news

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

PhD student @ University of Vienna | Role-playing LLMs, personalization & competing goal alignment | Cats, games & pop(culture|corn)

Posts Replies Media Videos

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

These findings highlight the need for a more thoughtful design process for effective and robust persona usage. And we think defining and validating explicit desiderata is an important step in that direction.

Code and data: github.com/peluz/principled-personas

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

We investigate prompting methods as mitigation strategies:
🔹 Explicit instructions (spell out desiderata)
🔹 Two-step refinement (persona modifies baseline response)
🔹 Combined refine + instruction
Results: they improve robustness only for the largest models (70B+). Smaller models often get worse.

Caption: Strategy effect. Fixed-effect coefficients from
mixed-effects regressions representing the expected metric score under each prompting strategy: Base prompt
(black circle), Instruction (blue squere), Refine (orange diamond), and Refine + Instruction
(green triangle). Error bars indicate 95% confidence intervals. Top:
regression over all models; Bottom: regression over
large models (≥ 70B) only

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

Scaling models up contributes to some forms of expertise advantage and fidelity, but has no effect on robustness.
⚠️ Even the largest models (70B+) in our setup often had robustness issues.

Caption: Model scale. Effect of scaling on different
metrics. Error bars show the 95% confidence interval.
The effects shown are the fixed effect coefficients of
the trained mixed effects models. Positive coefficients
correspond to model scale having a positive effect in
the corresponding metric. Scale has a positive effect on
dynamic expert performance and domain match Fidelity.

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

Some findings:
Expert personas usually help or have no significant effect.
Models are highly sensitive to irrelevant personas (names/favorite colors), with drops up to 30pp.
Fidelity: models are often faithful to education level & expertise domain, but not as much to specialization level.

Caption: Expertise Advantage. Number of tasks (Table 1) in which the Expertise Advantage metric was
positive, negative, or not significant. In-bar annotations
indicate the percentage of tasks in each category. Models often fulfill the Expertise Advantage desideratum,
though there are also negatively impacted tasks.

Caption: Robustness. Number of tasks (Table 1) in
which the Robustness metric was positive, negative, or
not significant. In-bar annotations indicate the percentage of tasks in each category. Irrelevant personas often
have a negative effect on performance in all models.

Caption: Fidelity. Number of tasks (Table 1) in which
the Fidelity metric (with respect to education level, domain match, and expertise specialization) was positive,
negative, or not significant. In-bar annotations indicate
the percentage of tasks in each category. Models are
often faithful to education level and domain match expectations, whereas Fidelity to specialization level is
less frequent.

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

Having defined the desiderata, we empirically validate them.
We test 9 state-of-the-art open-weight LLMs (Gemma-2, Llama-3, Qwen2.5) across 27 tasks (math, reasoning, factual QA).
We benchmark personas from different categories, including task-relevant and task-irrelevant personas.

List of all personas used in our experiments.

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

We systematically review prior work that uses personas for task performance improvement and check which personas are used and for which tasks. We then propose 3 desiderata for principled persona prompting:
1️⃣ Expertise Advantage
2️⃣ Robustness
3️⃣ Fidelity

Illustration of persona prompting setup with base model generations and generations from three personas (expert in math, Gustavo, and expert in law).
The caption states: We define three desiderata for persona
prompting: Task experts should perform on par or better than the no-persona model (Expertise Advantage);
Irrelevant attributes such as names should not influence
model performance (Robustness); relevant attributes
such as domain expertise should shape performance
accordingly (Fidelity).

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

Work on persona prompting is often descriptive, measuring and comparing effects of different personas. The normative question of when personas should have an effect, and when they should not, is largely unexplored. We define the desired effects of persona prompting and empirically validate them.

August 28, 2025 at 12:35 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

(6/6) – TL;DR + Code
🧠 Personas impact LLM behavior far beyond response tone and style
⚠️ They introduce biases, refusals, and trade-offs not seen with control personas
📂 Code + generations available here:
🔗 github.com/peluz/person...

July 4, 2025 at 12:22 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

(5/6) – RQ4: Refusals
Do LLMs treat personas equally?
🚫 No.
🔄 Arbitrary--similar personas refused at different rates: “Homosexual person” is refused 3× more than “gay person”
📉 Disparate--personas from the same category are refused at different rates: Sexuality & race disparities in 6 of 7 models

Figure contrasting refusal variation of different persona categories with the control persona variation.

July 4, 2025 at 12:22 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

(4/6) – RQ3: Attitudes & annotations
📝 Personas affect social attitudes — often in model-generalizable ways
E.g., “man”, “woman” more traditional than “nonbinary”, “transgender”
🔁 Similar attitude associations between personas and humans: e.g., negative correlation of empathy and racist beliefs

Heatmaps showing correlations between social attitudes for humans and seven LLMs.

July 4, 2025 at 12:22 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

(3/6) – RQ2: Bias
👥 Personas show less bias toward their own group
⚖️ But also lower accuracy—often over-identify with group
🔁 Generalizes: man/woman personas show more bias than nonbinary/trans in all models

Table showing personas self-bias and self-accuracy ranks, compared with overall bias and accuracies.

July 4, 2025 at 12:22 PM

Pedro Henrique Luz de Araujo

@pedrohluzaraujo.bsky.social

(2/6) – RQ1: Task performance
📊 Accuracy varies up to 38.5 p.p. between personas
🎭 Control personas show less variation
🔁 Some trends generalize: liberal/democratic personas outperform fascist/nationalist ones
💡 Experts help in-domain—but don’t beat no-persona overall

Table comparing the ranks of expert personas (e.g., psychologist) and the no-persona baseline across different MMLU domains.

July 4, 2025 at 12:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news