Lightnews — Scholar-powered news

Jennifer Hu

@jennhu.bsky.social

2.5K followers 180 following 57 posts

Asst Prof at Johns Hopkins Cognitive Science • Director of the Group for Language and Intelligence (GLINT) ✨• Interested in all things language, cognition, and AI

jennhu.github.io

Posts Replies Media Videos

Jennifer Hu

@jennhu.bsky.social

Yeah exactly -- @kanishka.bsky.social in examples like yours above, if we assume that g=1 and those strings aren't likely to be ungrammatical realizations of some other messages, then diffs in p(string) will reflect diffs in p(m). Which is what we want, no?

November 11, 2025 at 4:17 PM

Jennifer Hu

@jennhu.bsky.social

This work was done with an amazing team: @wegotlieb.bsky.social, @siyuansong.bsky.social, @kmahowald.bsky.social, @rplevy.bsky.social

Preprint (pre-TACL version): arxiv.org/abs/2510.16227

10/10

Screenshot of paper title and list of authors. The title of the paper is: "What Can String Probability Tell Us About Grammaticality?" The authors are: Jennifer Hu, Ethan Gotlieb Wilcox, Siyuan Song, Kyle Mahowald, and Roger P. Levy.

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

Our work also raises new Qs. If LMs virtually always produce grammatical strings, then why is there so much overlap between the probs assigned to grammatical/ungrammatical strings?

This connects to tensions btwn language generation/identification (e.g., openreview.net/forum?id=FGT...)
9/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

An offshoot of our analysis: if you use minimal pairs that are not tightly controlled, you risk underestimating the grammatical competence of models, due to differences in underlying message probabilities. 8/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

As mentioned above, Prediction #3 shows that recent criticism about the overlap in probabilities across gram/ungram strings should NOT be interpreted as a failure of probability to tell us about grammaticality.

This overlap is to be expected if prob is influenced by factors other than gram. 7/10

Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 4: Evaluation of Prediction 3. (a) Distributions of scores are highly overlapping across grammatical and ungrammatical sentences (pooled across datasets). (b) Poor separability (area under receiver operating characteristic curve, or AUC) achieved by each model and probability transformation. Horizontal line at 0.5 indicates no separation. For dataset-specific results, see Section B, Figures 5 and 7."

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

We use our framework to derive 3 predictions, which we validate empirically:

1. Correlation btwn the prob of string probs within minimal pairs

2. Correlation btwn LMs’ and humans’ deltas within minimal pairs

3. Poor separation btwn prob of unpaired grammatical and ungrammatical strings

6/10

Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 2: (a) Prediction 1a: Logprobs of paired grammatical (x-axis) and ungrammatical (y-axis) sentences are correlated. Dashed line: x = y. (b) Prediction 1b: Correlation between grammatical and ungrammatical logprobs (y-axis) generally decreases as within-pair cosine distance (x-axis) increases."

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

In other words, when messages aren’t controlled for, gram strings won't always be more probable than ungram strings.

This phenomenon has previously been used to argue that probability is a bad tool for measuring grammatical knowledge -- but in fact, it follows directly from our framework! 5/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

Minimal pairs are pairs of strings with the same underlying m but different values of g.

Good LMs have low P(g=0), so they prefer the grammatical string in the minimal pair.

But for non-minimal string pairs with different underlying messages, differences in P(m) can overwhelm even good LMs. 4/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

Returning to first principles:

In our framework, the probability of a string comes from two latent variables: m, the message to be conveyed; and g, whether the message is realized grammatically.

Ungrammatical strings get probability mass when g=0: the message is not realized grammatically. 3/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

Here we develop and give evidence for a formal framework that reconciles these two observations.

Our framework provides theoretical justification for the widespread practice of using *minimal pairs* to test what grammatical generalizations LMs have acquired. 2/10

November 10, 2025 at 10:11 PM

Jennifer Hu

@jennhu.bsky.social

Join us at NeurIPS in San Diego this December for talks by experts in the field, including James McClelland, @cgpotts.bsky.social, @scychan.bsky.social, @ari-holtzman.bsky.social, @mtoneva.bsky.social, & @sydneylevine.bsky.social!

🗓️ Submit your 4-page paper (non-archival) by August 15!

4/4

July 16, 2025 at 1:08 PM

Jennifer Hu

@jennhu.bsky.social

We're bringing together researchers in fields such as machine learning, psychology, linguistics, and neuroscience to discuss new empirical findings + theories which help us interpret high-level cognitive abilities in deep learning models.

3/4

July 16, 2025 at 1:08 PM

Jennifer Hu

@jennhu.bsky.social

Deep learning models (e.g. LLMs) show impressive abilities. But what generalizations have these models acquired? What algorithms underlie model behaviors? And how do these abilities develop?

Cognitive science offers a rich body of theories and frameworks which can help answer these questions.

2/4

July 16, 2025 at 1:08 PM

Jennifer Hu

@jennhu.bsky.social

Preprint link: arxiv.org/abs/2504.14107

A huge thank you to my amazing collaborators Michael Lepori (@michael-lepori.bsky.social) & Michael Franke (@meanwhileina.bsky.social)!

(12/12)

Signatures of human-like processing in Transformer forward passes

Modern AI models are increasingly being used as theoretical tools to study human cognition. One dominant approach is to evaluate whether human-derived measures are predicted by a model's output: that ...

arxiv.org

May 20, 2025 at 2:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news