Lightnews — Scholar-powered news

Taylor Sorensen

@taylor-sorensen.bsky.social

Also check out these very cool related papers exploring diversity and mode collapse!
arxiv.org/pdf/2505.00047 from
Peter West et al.

arxiv.org/abs/2504.05228, arxiv.org/abs/2404.10859 by Yiming Zhang +
Daphne Ippolito et al.

arxiv.org/abs/2510.01171 by
Jiayi Zhang et al.

arxiv.org

October 8, 2025 at 2:27 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Check it out!
Paper: arxiv.org/abs/2510.06084
Models: huggingface.co/collections/...
Data and code: github.com/tsor13/spect...

Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability

Language model post-training has enhanced instruction-following and performance on many downstream tasks, but also comes with an often-overlooked cost on tasks with many possible valid answers. We cha...

arxiv.org

October 8, 2025 at 2:26 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

In new work, we introduce a simple post-training method and large-scale resource for maximizing diversity and coverage! We call it Spectrum Tuning.
More on this in the coming days - but I'm really excited about this work, and am so happy that it's now public

October 8, 2025 at 2:25 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Pretrained models are better at this - they actually give you substantively different outputs when you sample. BUT, they are unable to reliably follow instructions.
How can we train models to follow instructions AND to span the space of possible outputs?

October 8, 2025 at 2:25 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

This may seem like a silly toy example - shouldn’t we just use np.randint()?
Fair - but this simple case is illustrative of a broader weakness. What about creative writing? Or hypothesis generation? Or diverse data generation?
We need models that SPAN the entire output space.

October 8, 2025 at 2:25 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Current post-training teaches a model to output the highest reward answer, even if there are other good answers. E.g. when picking random numbers, 7 seems like the most “random” number to annotators - so models ALWAYS pick 7!
arxiv.org/pdf/2505.00047
arxiv.org/pdf/2203.02155
arxiv.org/pdf/2510.01171

October 8, 2025 at 2:24 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Others can disagree, but in my view it's important to understand the science behind diverse viewpoint modeling both for understanding potential risks and for building prosocial systems. AI alignment especially is a case where I think if we aren't careful, it will be easy for people to be left behind

March 21, 2025 at 9:59 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

That is a risk to personalization technologies like these 😬 However, I'm excited about AI's potential to _reduce_ polarization ( www.pnas.org/doi/full/10....), find common ground (www.science.org/doi/10.1126/...), and help people gain sympathy by exploring others' perspectives (ongoing work)

PNAS

Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...

www.pnas.org

March 21, 2025 at 9:55 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

That being said, I'm in total agreement that there are absolutely risks to the technology as well, and figuring out which technologies to deploy where in what way will be very important.

March 21, 2025 at 9:36 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Additionally, I think steerability to diverse perspectives becomes even more important as AI systems start having more autonomy (e.g. AI agents). I want an AI agent that knows _my_ perspective, not just the average one!

March 21, 2025 at 9:36 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

That being said, it's my belief that all model responses already have _a_ worldview associated with it - so personally, I think it's important to have systems that a) we can measure/explictly see what perspectives it is being aligned to so b) many people's perspectives can be included.

March 21, 2025 at 9:36 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

Apologies if this wasn't clear! They're provided textually in an in-context prompt, which the model then tries to steer towards

And yes you are absolutely right, that's one of the risks of personalization in general (see great paper here: arxiv.org/pdf/2303.05453)

March 21, 2025 at 9:36 PM

Taylor Sorensen

@taylor-sorensen.bsky.social

This was my Google DeepMind internship work with amazing collaborators Pushkar Mishra, @romapatel.bsky.social, Michael Henry Tessler, @mbakker.bsky.social, Georgina Evans, Iason Gabriel, @noahdgoodman.bsky.social, @verenarieser.bsky.social
They are a really amazing team!

March 20, 2025 at 4:02 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Read the full paper here!
arxiv.org/abs/2503.15484

March 20, 2025 at 3:57 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Value profiles enable new ways to model variation and enable representation at the individual level. We hope that our work helps enable systems that better model diverse perspectives and that work for everyone (yay for #pluralisticalignment #nlpforsocialgood #compsocialscience #compdemocracy!)

March 20, 2025 at 3:57 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

There are benefits though!
✅ Value profiles may enhance user agency, as, a person could change their own value profile
✅ Enabling value reflection via bottom-up discovery and top-down editing
✅ Unlike sociodemographics, which are often unchosen, people can choose values for themselves
(16/?)

March 20, 2025 at 3:56 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Our goal in this work is to improve AI systems' ability to model diverse perspectives and better serve more people! However, risks remain
❌ Privacy risks: people may not wish values inferred
❌ Systems may fail to generalize to less common values, and we only test on English language
(15/?)

March 20, 2025 at 3:55 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

As a last experiment, we simulate an annotator population ("jury learning" @mitchellgordon) using with our trained models and value profiles.

We find that the instance-level interannotator agreement (IAA) predicted by our simulated population correlates with the observed IAA.
(14/?)

March 20, 2025 at 3:53 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

We also find that our value profile system is very well-calibrated.

This calibration is important for trusting the model's confidence and for disentangling value-related epistemic uncertainty from aleatoric uncertainty in rater variation.
(13/?)

March 20, 2025 at 3:52 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Value profiles are written in natural language. But are they actually semantically interpretable? Does the system change its judgments with wording changes in common-sense ways?

Yes, we find that semantic changes in value profile lead to expected changes in the output.
(12/?)

March 20, 2025 at 3:52 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Clustering with value profiles also enables dataset-level qualitative analysis.

For example, for OQA/DIC, even restricting to just 2 clusters explains the majority of rater variation, suggesting a bimodal distribution.

Additionally, the profile descriptions suggest why people may disagree.
(11/?)

March 20, 2025 at 3:52 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

Our algorithm is effective at uncovering useful rater groupings, with the resulting value profile clusters outperform the most performant demographic grouping!

Additionally, on the dataset where demographics helped most, the clusters partially recover ideological trends.
(10/?)

March 20, 2025 at 3:51 AM

Taylor Sorensen

@taylor-sorensen.bsky.social

To characterize common modes of (dis)agreement, we introduce a value-based clustering algorithm.

Unlike traditional methods, ours: 1) does not require that raters label overlapping instances, 2) leverages semantic instance information, and 3) returns cluster descriptions.
(9/?)

March 20, 2025 at 3:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news