Tanise Ceron
@taniseceron.bsky.social
Postdoc @milanlp.bsky.social
Great, thanks a lot!
October 19, 2025 at 9:59 AM
Great, thanks a lot!
As I wasn't at the conference, I'd love to be able to watch the recording. Is it available online anywhere? :)
October 16, 2025 at 9:01 AM
As I wasn't at the conference, I'd love to be able to watch the recording. Is it available online anywhere? :)
Great collaboration with Dmitry Nikolaev, @dominsta.bsky.social and @deboranozza.bsky.social ☺️
September 29, 2025 at 2:54 PM
Great collaboration with Dmitry Nikolaev, @dominsta.bsky.social and @deboranozza.bsky.social ☺️
- Finally, and for me, most interestingly, our analysis suggests that political biases are already encoded during the pre-training stage.
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
September 29, 2025 at 2:54 PM
- Finally, and for me, most interestingly, our analysis suggests that political biases are already encoded during the pre-training stage.
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
Taken these evidences together, we highlight important implications these results play on data processing in the development of fairer LLMs.
- There's a strong correlation (Pearson r=0.90) between the predominant stances in the training data and the models’ behavior when probed for political bias on eight policy issues (e.g., environmental protection, migration, etc).
September 29, 2025 at 2:54 PM
- There's a strong correlation (Pearson r=0.90) between the predominant stances in the training data and the models’ behavior when probed for political bias on eight policy issues (e.g., environmental protection, migration, etc).
- Source domains of pre-training documents differ significantly, with right-leaning content containing twice as many blog posts and left-leaning content 3 times as many news outlets.
September 29, 2025 at 2:54 PM
- Source domains of pre-training documents differ significantly, with right-leaning content containing twice as many blog posts and left-leaning content 3 times as many news outlets.
- The framing of political topics varies considerably: right-leaning labeled documents prioritize stability, sovereignty, and cautious reform via technology or deregulation, while left-leaning documents emphasize urgent, science-led mobilization for systemic transformation and equity.
September 29, 2025 at 2:54 PM
- The framing of political topics varies considerably: right-leaning labeled documents prioritize stability, sovereignty, and cautious reform via technology or deregulation, while left-leaning documents emphasize urgent, science-led mobilization for systemic transformation and equity.
- left-leaning documents consistently outnumber right-leaning ones by a factor of 3 to 12 across training datasets.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
September 29, 2025 at 2:54 PM
- left-leaning documents consistently outnumber right-leaning ones by a factor of 3 to 12 across training datasets.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
- pre-training corpora contains about 4 times more politically engaged content than post-training data.
We have the answers of these questions here : arxiv.org/pdf/2509.22367
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:
arxiv.org
September 29, 2025 at 2:54 PM
We have the answers of these questions here : arxiv.org/pdf/2509.22367
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:
We analyze the political content of the training data from OLMO2, the largest fully open-source model.
🕵️♀️ We run an analysis in all the datasets (2 pre- and 2 post-training) used to train the models. Here are our findings:
Thanks SoftwareCampus for supporting Multiview, the organizers of INRA, and Sourabh Dattawad and @agnesedaff.bsky.social for the great collaboration!
September 26, 2025 at 4:20 PM
Thanks SoftwareCampus for supporting Multiview, the organizers of INRA, and Sourabh Dattawad and @agnesedaff.bsky.social for the great collaboration!
Our evaluation with normative metrics shows that this approach does not diversify only frames in user's history, but also sentiment and news categories. These findings demonstrate that framing acts as a control lever for enhancing normative diversity.
September 26, 2025 at 4:20 PM
Our evaluation with normative metrics shows that this approach does not diversify only frames in user's history, but also sentiment and news categories. These findings demonstrate that framing acts as a control lever for enhancing normative diversity.
In this paper, we propose introduce media frames as a device for diversifying perspectives in news recommenders. Our results show an improvement in exposure to previously unclicked frames up to 50%.
September 26, 2025 at 4:20 PM
In this paper, we propose introduce media frames as a device for diversifying perspectives in news recommenders. Our results show an improvement in exposure to previously unclicked frames up to 50%.
Sure, it's here: github.com/tceron/eval_...
The code mapping is in the readme file. :)
The code mapping is in the readme file. :)
github.com
April 23, 2025 at 7:07 AM
Sure, it's here: github.com/tceron/eval_...
The code mapping is in the readme file. :)
The code mapping is in the readme file. :)