Lightnews — Scholar-powered news

Jessy Li

@jessyjli.bsky.social

Incredibly honored to serve as #EMNLP 2026 Program Chair along with @sunipadev.bsky.social and Hung-yi Lee, and General Chair @andre-t-martins.bsky.social. Looking forward to Budapest!!

(With thanks to Lisa Chuyuan Li who took this photo in Suzhou!)

November 8, 2025 at 2:39 AM

Jessy Li

@jessyjli.bsky.social

🚨 Does your LLM really understand code -- or is it just really good at remembering it?
We built **PLSemanticsBench** to find out.
The results: a wild mix.

✅The Brilliant:
Top reasoning models can execute complex, fuzzer-generated programs -- even with 5+ levels of nested loops! 🤯

❌The Brittle: 🧵

October 14, 2025 at 2:33 AM

Jessy Li

@jessyjli.bsky.social

On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179

October 6, 2025 at 3:50 PM

Jessy Li

@jessyjli.bsky.social

The Echoes in AI paper showed quite the opposite with also a story continuation setup.
Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.

August 12, 2025 at 9:01 PM

Jessy Li

@jessyjli.bsky.social

Finally, we consider if LLMs can introspect (= direct rate the salience of questions), if those direct ratings correlate with their behavior and with human perceptions of salience. Surprisingly, LLM behavior only weakly correlates in those settings. (5/6)

February 21, 2025 at 6:15 PM

Jessy Li

@jessyjli.bsky.social

We can use the CSM to understand if any two models have a similar notion of salience. Running a large-scale study with 13 LLMs on four genres (medicine, astrophysics, NLP, meetings), we find that models' notion of salience is highly consistent with itself, and across models. (4/6)

February 21, 2025 at 6:15 PM

Jessy Li

@jessyjli.bsky.social

Second, we conceptualize salience as the answerability of domain-relevant questions on the summaries. More important questions can be answered even with shorter summaries. Our Content Salience Maps (CSMs) systematically track this. (3/6)

February 21, 2025 at 6:15 PM

Jessy Li

@jessyjli.bsky.social

First, we devise a behavioral probe: when we give LLMs an increasingly tight budget to summarize texts (e.g., 10, 20, 50, 100, 200 words), we posit that the least important (to the LLM) information is dropped first. 🔎 (2/6)

February 21, 2025 at 6:15 PM

Jessy Li

@jessyjli.bsky.social

Do you want to know what information LLMs prioritize in text synthesis tasks? Here's a short 🧵 about our new paper, led by Jan Trienes: an interpretable framework for salience analysis in LLMs.

First of all, information salience is a fuzzy concept. So how can we even measure it? (1/6)

February 21, 2025 at 6:15 PM

Jessy Li

@jessyjli.bsky.social

We at UT Linguistics are hiring for 🔥 2 faculty positions in Computational Linguistics! Assistant or Associate professors, deadline Dec 1.
UT has a super vibrant comp ling & #nlp community!!

Apply here 👉 apply.interfolio.com/158280

November 19, 2024 at 10:45 PM

Jessy Li

@jessyjli.bsky.social

Wednesday at #EMNLP: @yatingwu.bsky.social will present our work connecting curiosity and questions in discourse. We built strong models to predict salience, outperforming large LLMs.

👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler

also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis

November 13, 2024 at 5:45 AM

Jessy Li

@jessyjli.bsky.social

The 2nd Disinformation Day @ UT Austin is a 🔥event with experts from AI/NLP, Linguistics, Philosophy, and media; join us in this virtual event (for free)! Details & registration at disinfoday.github.io

April 23, 2024 at 5:15 PM

Jessy Li

@jessyjli.bsky.social

To appear #EMNLP2023! Can LMs simplify medical texts in non-English languages? We introduce⚕️MultiCochrane: the *first* multilingual, aligned dataset for this. arxiv.org/abs/2305.12532. Led by Sebastian Joseph, also w/ @byron.bsky.social Wei Xu

November 14, 2023 at 11:31 PM

Jessy Li

@jessyjli.bsky.social

Psych theories suggest that how we judge a situation (*cognitive appraisal*) leads to diverse emotions. Our #EMNLP 2023 Findings paper tests LLMs' ability to assess and explain such appraisals -- big gap between open-source LLMs and GPT3.5.

Paper arxiv.org/abs/2310.14389 w Hongli Zhan, Desmond Ong

November 1, 2023 at 8:22 PM

Jessy Li

@jessyjli.bsky.social

Can we use LLMs to help disseminate medical information more broadly? @byron.bsky.social, Mike Mackert, Wei Xu and I are hosting an online panel today at the HARC conference
at 4:30 EST/3:30 CST on Simplifying Medical Texts with Large Language Models! harcconf.org/agenda-monda...

October 30, 2023 at 2:21 PM

Jessy Li

@jessyjli.bsky.social

To appear EMNLP2023: simplifying text involves explaining and elaborating concepts. Using QUDs in a question generation -> answering pipeline leads to much better generation of such elaborations!

arxiv.org/abs/2305.10387

w/ @yatingwu.bsky.social Will Sheffield @kmahowald.bsky.social

October 25, 2023 at 11:57 PM

Jessy Li

@jessyjli.bsky.social

📢Our EMNLP 2023 work on Questions Under Discussion (QUD)! We introduce QUDeval, the first benchmark for evaluating the generation of open-ended questions and QUD parsing using linguistic principles.

Paper: arxiv.org/abs/2310.14520

w/ @yatingwu.bsky.social, Ritika Mangla, @gregdnlp.bsky.social

October 24, 2023 at 10:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news