More on this in the coming days - but I'm really excited about this work, and am so happy that it's now public
More on this in the coming days - but I'm really excited about this work, and am so happy that it's now public
Fair - but this simple case is illustrative of a broader weakness. What about creative writing? Or hypothesis generation? Or diverse data generation?
We need models that SPAN the entire output space.
Fair - but this simple case is illustrative of a broader weakness. What about creative writing? Or hypothesis generation? Or diverse data generation?
We need models that SPAN the entire output space.
arxiv.org/pdf/2505.00047
arxiv.org/pdf/2203.02155
arxiv.org/pdf/2510.01171
arxiv.org/pdf/2505.00047
arxiv.org/pdf/2203.02155
arxiv.org/pdf/2510.01171
For example, if you ask models to tell you a joke, they almost always tell you the same joke? This is true across samples and even across model families!
Why does this happen? Can we improve it?
For example, if you ask models to tell you a joke, they almost always tell you the same joke? This is true across samples and even across model families!
Why does this happen? Can we improve it?
And yes you are absolutely right, that's one of the risks of personalization in general (see great paper here: arxiv.org/pdf/2303.05453)
And yes you are absolutely right, that's one of the risks of personalization in general (see great paper here: arxiv.org/pdf/2303.05453)
arxiv.org/abs/2503.15484
arxiv.org/abs/2503.15484
We find that the instance-level interannotator agreement (IAA) predicted by our simulated population correlates with the observed IAA.
(14/?)
We find that the instance-level interannotator agreement (IAA) predicted by our simulated population correlates with the observed IAA.
(14/?)
This calibration is important for trusting the model's confidence and for disentangling value-related epistemic uncertainty from aleatoric uncertainty in rater variation.
(13/?)
This calibration is important for trusting the model's confidence and for disentangling value-related epistemic uncertainty from aleatoric uncertainty in rater variation.
(13/?)
Yes, we find that semantic changes in value profile lead to expected changes in the output.
(12/?)
Yes, we find that semantic changes in value profile lead to expected changes in the output.
(12/?)
For example, for OQA/DIC, even restricting to just 2 clusters explains the majority of rater variation, suggesting a bimodal distribution.
Additionally, the profile descriptions suggest why people may disagree.
(11/?)
For example, for OQA/DIC, even restricting to just 2 clusters explains the majority of rater variation, suggesting a bimodal distribution.
Additionally, the profile descriptions suggest why people may disagree.
(11/?)
Additionally, on the dataset where demographics helped most, the clusters partially recover ideological trends.
(10/?)
Additionally, on the dataset where demographics helped most, the clusters partially recover ideological trends.
(10/?)
Unlike traditional methods, ours: 1) does not require that raters label overlapping instances, 2) leverages semantic instance information, and 3) returns cluster descriptions.
(9/?)
Unlike traditional methods, ours: 1) does not require that raters label overlapping instances, 2) leverages semantic instance information, and 3) returns cluster descriptions.
(9/?)
We find that value profiles generated by Gemini preserve the majority (>70%) of the useful predictive information!
(8/?)
We find that value profiles generated by Gemini preserve the majority (>70%) of the useful predictive information!
(8/?)
- Using a rater's in-context examples improve predictions most
- Value profiles significantly improve predictions (but not as much as examples)
- Demographics _do not_ offer a significant performance boost (except for OpinionQA)
(7/?)
- Using a rater's in-context examples improve predictions most
- Value profiles significantly improve predictions (but not as much as examples)
- Demographics _do not_ offer a significant performance boost (except for OpinionQA)
(7/?)
OpinionQA (Santurkar et al.)
Hatespeech (Kumar et al.)
DICES (Aroyo et al.)
Habermas (Tessler/@mbakker.bsky.social et al.)
Prism @hannahrosekirk.bsky.social
ValuePrism
(6/?)
OpinionQA (Santurkar et al.)
Hatespeech (Kumar et al.)
DICES (Aroyo et al.)
Habermas (Tessler/@mbakker.bsky.social et al.)
Prism @hannahrosekirk.bsky.social
ValuePrism
(6/?)
We apply an information-theoretic methodology to measure the amount of model-usable information in a rater representation for predicting an individual's ratings.
(5/?)
We apply an information-theoretic methodology to measure the amount of model-usable information in a rater representation for predicting an individual's ratings.
(5/?)
An encoder proposes values that could explain a person's ratings, and a decoder generalizes to held-out examples based on the value description.
(4/?)
An encoder proposes values that could explain a person's ratings, and a decoder generalizes to held-out examples based on the value description.
(4/?)
We compare four different representations:
- No information about an individual (baseline)
- Demographics
- In-context rater demonstrations
- "Value profile": natural language description of a rater's values relevant to the task
(3/?)
We compare four different representations:
- No information about an individual (baseline)
- Demographics
- In-context rater demonstrations
- "Value profile": natural language description of a rater's values relevant to the task
(3/?)
Instead, we focus on modeling at the individual level.
(2/?)
Instead, we focus on modeling at the individual level.
(2/?)
We propose modeling at the individual-level using open-ended, textual value profiles! 🗣️📝
arxiv.org/abs/2503.15484
We propose modeling at the individual-level using open-ended, textual value profiles! 🗣️📝
arxiv.org/abs/2503.15484