Andy Liu
andyliu.bsky.social
Andy Liu
@andyliu.bsky.social
phd type things @ cmu lti
andyjliu.github.io
Thanks to my collaborators @kghate.bsky.social @monadiab77.bsky.social @daniel-fried.bsky.social @atoosakz.bsky.social @maxkw.bsky.social
for their support in making this work possible!
October 2, 2025 at 4:09 PM
Please reach out if you'd like to chat about this work! We hope ConflictScope helps researchers study how models handle value conflicts that matter to their communities.
Code and data: github.com/andyjliu/con...
Arxiv: www.arxiv.org/abs/2509.25369
October 2, 2025 at 4:07 PM
ConflictScope can also be used to evaluate different approaches toward steering models. We find that including detailed target rankings in system prompts consistently improves model alignment with the target ranking while under conflict, but with plenty of room for improvement.
October 2, 2025 at 4:06 PM
We find significant shifts between models’ expressed and revealed preferences under conflict! Models say they prefer actions that support protective values (e.g. harmlessness) when asked directly, but support personal values (e.g. helpfulness) in more realistic evaluations.
October 2, 2025 at 4:06 PM
To address issues with multiple-choice evaluation, we focus on open-ended evaluation with a simulated user. Annotation studies show strong correlation between LLM and human judgments of which action a model took in a given scenario, allowing us to automate open-ended evaluations.
October 2, 2025 at 4:06 PM
We introduce new metrics to measure how morally challenging a dataset is for models. We find that ConflictScope produces datasets that elicit more disagreement and stronger preferences than moral dilemma datasets, while alignment data frequently elicits indifference from models.
October 2, 2025 at 4:05 PM
Given a set of values, ConflictScope generates scenarios in which an LLM-based assistant faces a conflict between a pair of values in the set. It then evaluates which value a target LLM supports more in each scenario before combining scenario-level judgments into a value ranking.
October 2, 2025 at 4:05 PM
very cool!
March 9, 2025 at 2:59 AM
these are great, thanks! will check them out
January 6, 2025 at 12:32 AM
started Axiomatic but didn’t get very far - Permutation City looks fun though, thanks
January 4, 2025 at 4:25 PM
PRISM has preference scores for different models that you can convert into pairwise labels
December 24, 2024 at 5:34 AM
could I be added? thanks for curating :)
November 7, 2024 at 9:06 PM