Lightnews — Scholar-powered news

Andy Liu

@andyliu.bsky.social

1.9K followers 660 following 15 posts

phd type things @ cmu lti
andyjliu.github.io

Posts Replies Media Videos

Andy Liu

@andyliu.bsky.social

ConflictScope can also be used to evaluate different approaches toward steering models. We find that including detailed target rankings in system prompts consistently improves model alignment with the target ranking while under conflict, but with plenty of room for improvement.

October 2, 2025 at 4:06 PM

Andy Liu

@andyliu.bsky.social

We find significant shifts between models’ expressed and revealed preferences under conflict! Models say they prefer actions that support protective values (e.g. harmlessness) when asked directly, but support personal values (e.g. helpfulness) in more realistic evaluations.

October 2, 2025 at 4:06 PM

Andy Liu

@andyliu.bsky.social

We introduce new metrics to measure how morally challenging a dataset is for models. We find that ConflictScope produces datasets that elicit more disagreement and stronger preferences than moral dilemma datasets, while alignment data frequently elicits indifference from models.

October 2, 2025 at 4:05 PM

Andy Liu

@andyliu.bsky.social

Given a set of values, ConflictScope generates scenarios in which an LLM-based assistant faces a conflict between a pair of values in the set. It then evaluates which value a target LLM supports more in each scenario before combining scenario-level judgments into a value ranking.

October 2, 2025 at 4:05 PM

Andy Liu

@andyliu.bsky.social

🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
(📷 xkcd)

October 2, 2025 at 4:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news