Lightnews — Scholar-powered news

Tanise Ceron

@taniseceron.bsky.social

Great, thanks a lot!

October 19, 2025 at 9:59 AM

Tanise Ceron

@taniseceron.bsky.social

As I wasn't at the conference, I'd love to be able to watch the recording. Is it available online anywhere? :)

October 16, 2025 at 9:01 AM

Tanise Ceron

@taniseceron.bsky.social

Great collaboration with Dmitry Nikolaev, @dominsta.bsky.social and @deboranozza.bsky.social ☺️

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

- Finally, and for me, most interestingly, our analysis suggests that political biases are already encoded during the pre-training stage.

Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

- There's a strong correlation (Pearson r=0.90) between the predominant stances in the training data and the models’ behavior when probed for political bias on eight policy issues (e.g., environmental protection, migration, etc).

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

- Source domains of pre-training documents differ significantly, with right-leaning content containing twice as many blog posts and left-leaning content 3 times as many news outlets.

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

- The framing of political topics varies considerably: right-leaning labeled documents prioritize stability, sovereignty, and cautious reform via technology or deregulation, while left-leaning documents emphasize urgent, science-led mobilization for systemic transformation and equity.

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

- left-leaning documents consistently outnumber right-leaning ones by a factor of 3 to 12 across training datasets.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

We have the answers of these questions here : arxiv.org/pdf/2509.22367

We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️‍♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:

arxiv.org

September 29, 2025 at 2:54 PM

Tanise Ceron

@taniseceron.bsky.social

Thanks SoftwareCampus for supporting Multiview, the organizers of INRA, and Sourabh Dattawad and @agnesedaff.bsky.social for the great collaboration!

September 26, 2025 at 4:20 PM

Tanise Ceron

@taniseceron.bsky.social

Our evaluation with normative metrics shows that this approach does not diversify only frames in user's history, but also sentiment and news categories. These findings demonstrate that framing acts as a control lever for enhancing normative diversity.

September 26, 2025 at 4:20 PM

Tanise Ceron

@taniseceron.bsky.social

In this paper, we propose introduce media frames as a device for diversifying perspectives in news recommenders. Our results show an improvement in exposure to previously unclicked frames up to 50%.

September 26, 2025 at 4:20 PM

Tanise Ceron

@taniseceron.bsky.social

Sure, it's here: github.com/tceron/eval_...
The code mapping is in the readme file. :)

github.com

April 23, 2025 at 7:07 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news