Lightnews — Scholar-powered news

Reposted by Anna Wegmann

bhyravajjula.bsky.social

@bhyravajjula.bsky.social

If you're attending #EMNLP2025, we'll be presenting virtually in Gather Session 1 on Nov 5 at 4pm PT. Come say hello!

w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social

Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com

October 31, 2025 at 3:36 PM

Reposted by Anna Wegmann

David Jurgens

@davidjurgens.bsky.social

What if a single model could recognize an author's writing style no matter what language they wrote in? 🌍✍️ Our new #EMNLP2025 paper explores multilingual authorship representation, showing how training across 36 languages can sharpen stylistic signals and reduce topic bias.
👇🧵

November 6, 2025 at 5:42 AM

Reposted by Anna Wegmann

Dong Nguyen

@dongng.bsky.social

New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP — Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...

We Need to Measure Data Diversity in NLP — Better and Broader

Dong Nguyen, Esther Ploeger. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 4, 2025 at 3:43 PM

Anna Wegmann

@annawegmann.bsky.social

Lot's of exciting work on linguistic style this year at #EMNLP2025 #EMNLP! Including work on machine-text detection, authorship representation and more

🧵 with anthology links below
📣 with an open call to everyone to add style work that's missing

November 4, 2025 at 1:42 PM

Reposted by Anna Wegmann

Catherine Arnett

@catherinearnett.bsky.social

I have a new blog post about the so-called “tokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!

September 25, 2025 at 3:14 PM

Anna Wegmann

@annawegmann.bsky.social

I successfully defended my PhD in Dutch fashion and required a PhD certificate in Latin. Thank you to the amazing people that got me here, a.o. @dongng.bsky.social and the ones I blur here.

October 22, 2025 at 2:20 PM

Reposted by Anna Wegmann

Indira Sen

@indiiigo.bsky.social

Come join next Wednesday if you want to rant about society's love-hate relationship with LLMs!

MZES Social Science Data Lab @mzes-ssdl.bsky.social · Oct 15

🚨 Upcoming: "Large Language Models for Social Research: Potentials and Challenges"

👤 Indira Sen (University of Mannheim)

🗓️ Wed, October 22, 13:45-15:15 CET

📺 Register for the live stream: us02web.zoom.us/meeting/regi...

🔗 socialsciencedatalab.mzes.uni-mannheim.de/page/events/

Large Language Models for Social Research: Potentials and Challenges

Hybrid event [A5, 6, Room A231 + Zoom]
Oktober 22, 2025, 13:45-15:15

Abstract

Large Language Models (LLMs) have the potential to revolutionize the social sciences—for example, by accelerating content analysis or enabling realistic social simulations. In this workshop, I will discuss how LLMs can be applied and audited for social science applications, including the generation of synthetic survey responses and content analysis. I will also address how biases in LLMs can hinder these applications and explore ways to better surface and understand these biases. Finally, I will present a hands-on use case demonstrating how LLMs can be guided using demographic personas for both content analysis and simulated surveys.

Presenter(s)

Indira Sen is a Junior Faculty member at the University of Mannheim’s Business School in the Chair of Data Science for the Social and Economic Sciences. Her work lies at the intersection of NLP and Computational Social Science, specifically in developing and evaluating representative and equitable language technology, including Large Language Models.

October 16, 2025 at 9:32 AM

Anna Wegmann

@annawegmann.bsky.social

Is this the Dutch budget cuts or does utrecht uni really not want me to come to the office? My highlight is the door that has been broken for weeks, with the only change being a laminated piece of paper saying I should enter uni maze through two other buildings.

August 20, 2025 at 6:28 AM

Reposted by Anna Wegmann

Ruben van de Vijver

@rubenvandevijver.bsky.social

Tussen Mönchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht

August 1, 2025 at 8:48 AM

Anna Wegmann

@annawegmann.bsky.social

Utrecht is back from #ACL2025! We had a blast.

I should have posted this before but here are some papers from people in our group that were presented at ACL.

August 5, 2025 at 3:37 PM

Reposted by Anna Wegmann

Craig Schmidt

@craigschmidt.com

I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.

July 30, 2025 at 2:03 PM

Anna Wegmann

@annawegmann.bsky.social

Since people at #ACL2025 are very interested in tokenization, a reminder to join the discussion on discord set up by @mcognetta.bsky.social

July 29, 2025 at 12:52 PM

Anna Wegmann

@annawegmann.bsky.social

Anyone tried the kiss the cook lunch place at #ACL2025?

July 28, 2025 at 12:57 PM

Anna Wegmann

@annawegmann.bsky.social

I will present our #ACL2025 paper Tokenization is Sensitive to Language Variation in the poster session after Tuesday's keynote, 10.30 - 12.00 in Hall 4/5

Anna Wegmann @annawegmann.bsky.social · Jul 17

Wanna do some authorship attribution? Chances are what tokenizer you use matters.

Tokenization is Sensitive to Language Variation, probably, more investigation necessary...

📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑‍🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself

See you at ACL!

July 28, 2025 at 5:43 AM

Reposted by Anna Wegmann

Tiago Pimentel

@tpimentel.bsky.social

@philipwitti.bsky.social will be presenting our paper "Tokenisation is NP-Complete" at #ACL2025 😁 Come to the language modelling 2 session (Wednesday morning, 9h~10h30) to learn more about how challenging tokenisation can be!

Tiago Pimentel @tpimentel.bsky.social · Dec 20

BPE is a greedy method to find a tokeniser which maximises compression! Why don't we try to find properly optimal tokenisers instead? Well, it seems this is a pretty difficult—in fact, NP-complete—problem!🤯
New paper + @philipwitti.bsky.social
@gregorbachmann.bsky.social :) arxiv.org/abs/2412.15210

Tokenisation is NP-Complete

In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly (direct token...

arxiv.org

July 27, 2025 at 9:41 AM

Reposted by Anna Wegmann

Tiago Pimentel

@tpimentel.bsky.social

We are presenting this paper at #ACL2025 😁 Find us at poster session 4 (Wednesday morning, 11h~12h30) to learn more about tokenisation bias!

Tiago Pimentel @tpimentel.bsky.social · Jun 4

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our new ACL paper proposes an observational method to estimate this causal effect! Longer thread soon!

Title of paper "Causal Estimation of Tokenisation Bias" and schematic of how we define tokenisation bias, which is the causal effect we are interested in.

July 27, 2025 at 11:59 AM

Anna Wegmann

@annawegmann.bsky.social

Im at #ACL2025 this week.

Happy to chat about measuring linguistic style, data diversity, creating synthetic data for analyzing (L)LMs, authorship attribution, paraphrases and tokenizers.

Let’s chat if you’re around

July 27, 2025 at 5:30 PM

Reposted by Anna Wegmann

Maria Antoniak

@mariaa.bsky.social

The #ACL2025 #ACL2025NLP feed is up and running! It matches both hashtags and any posts from or mentions of @aclmeeting.bsky.social

Pin it to your home 📌 and enjoy!

bsky.app/profile/did:...

July 17, 2025 at 11:15 AM

Reposted by Anna Wegmann

Matthias Orlikowski

@morlikow.bsky.social

Who's presenting on subjectivity in annotation (human label variation, learning from disagreement, perspectivism) at #ACL2025?

papers by e.g. @liweijiang.bsky.social @tiancheng.bsky.social @gabriellalapesa.bsky.social @romanklinger.de

keynote @verenarieser.bsky.social

link to full list below ⤵️

July 24, 2025 at 4:58 PM

Anna Wegmann

@annawegmann.bsky.social

I love it.

Andreas Geiger @andreasgeiger.bsky.social · Jul 24

We are excited that Scholar Inbox is supporting ACL 2025
@aclmeeting.bsky.social with personalized conference programs this year for the first time!
www.scholar-inbox.com/conference/a...
It would be great if you could widely share this news within the NLP community. See you next week in Vienna!

July 24, 2025 at 3:48 PM

Reposted by Anna Wegmann

Ece Takmaz

@ecekt.bsky.social

I'll be attending ACL 2025 in Vienna! Looking forward to seeing people there!😊🇦🇹 We are going to present 'LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks' aclanthology.org/2025.acl-sho... #acl2025 #acl2025nlp

July 24, 2025 at 12:34 PM

Anna Wegmann

@annawegmann.bsky.social

How come the @aclmeeting.bsky.social underline page was set to release July 20 last Friday and now promises access only on the 24th?

Access to papers and videos remains evasive less than a week before the conference.

Screenshot of a text: This event page is still under construction and will be released on July 24, 2025.

July 22, 2025 at 12:34 PM

Reposted by Anna Wegmann

Indira Sen

@indiiigo.bsky.social

Do LLMs represent the people they're supposed simulate or provide personalized assistance to?

We review the current literature in our #ACL2025 Findings paper and investigating what researchers conclude about the demographic representativeness of LLMs:
osf.io/preprints/so...

1/

Screenshot of our paper "Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs"

Details about what we annotated in our systematic review

July 21, 2025 at 10:12 AM

Reposted by Anna Wegmann

Julia Mendelsohn

@jmendelsohn2.bsky.social

#ic2s2 I’ll be talking about this paper in one hour in Vingen 1+2!

Julia Mendelsohn @jmendelsohn2.bsky.social · Feb 20

New preprint!
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.

We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3

#NLP #NLProc #polisky #polcom #compsocialsci
🐦🐦

Screenshot of top half of first page of paper. The paper is titled: "When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models". The authors are Julia Mendelsohn (University of Chicago) and Ceren Budak (University of Michigan). The top right corner contains a visual showing the sentence "They want immigrants to pour into and infest this country". The caption says: Figure 1: Dehumanizing sentence likening immigrants to the source domain concepts of Water and Vermin via the words "pour" and "infest".

The abstract text on the left reads: Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.

July 22, 2025 at 8:09 AM

Reposted by Anna Wegmann

Anders Giovanni Møller

@andersgiovanni.com

#ic2s2 I’ll have a poster (#21) today on 𝐭𝐡𝐞 𝐢𝐦𝐩𝐚𝐜𝐭 𝐨𝐟 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈 𝐨𝐧 𝐬𝐨𝐜𝐢𝐚𝐥 𝐦𝐞𝐝𝐢𝐚 ✌🏼

Anders Giovanni Møller @andersgiovanni.com · Jun 18

What actually happens on social media when users get access to integrated AI?🤖

In our new preprint, we present a controlled experiment testing the impact of generative AI (4 treatments + control) on a social media platform, using a representative sample of nearly 700 U.S. participants.

The Impact of Generative AI on Social Media: An Experimental Study

ai-research.andersgiovanni.com

July 22, 2025 at 8:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news