Siyuan Song
banner
siyuansong.bsky.social
Siyuan Song
@siyuansong.bsky.social
senior undergrad@UTexas Linguistics
Looking for Ph.D position 26 Fall
Comp Psycholing & CogSci, human-like AI, rock🎸 @growai.bsky.social
Prev:
Summer Research Visit @MIT BCS(2025), Harvard Psych(2024), Undergrad@SJTU(2022-24)
Opinions are my own.
Pinned
New preprint w/ @jennhu.bsky.social @kmahowald.bsky.social : Can LLMs introspect about their knowledge of language?
Across models and domains, we did not find evidence that LLMs have privileged access to their own predictions. 🧵(1/8)
String probability might be the best tool for assessing LMs' grammatical knowledge, yet it does not directly tell you 'how grammatical' a string is. Here's why and how we should use string probability and minimal pairs:
Excited to see this out - it's my great honor to be part of this amazing team!
New work to appear @ TACL!

Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.

Yet they often assign higher probability to ungrammatical strings than to grammatical strings.

How can both things be true? 🧵👇
November 10, 2025 at 11:10 PM
Reposted by Siyuan Song
Oh cool! Excited this LM + construction paper was SAC-Highlighted! Check it out to see how LM-derived measures of statistical affinity separate out constructions with similar words like "I was so happy I saw you" vs "It was so big it fell over".
November 10, 2025 at 4:27 PM
Reposted by Siyuan Song
Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!

Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!
November 7, 2025 at 6:22 PM
Reposted by Siyuan Song
Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP 👇
The department of Cognitive Science @jhu.edu is seeking motivated students interested in joining our interdisciplinary PhD program! Applications due 1 Dec

Our PhD students also run an application mentoring program for prospective students. Mentoring requests due November 15.

tinyurl.com/2nrn4jf9
November 4, 2025 at 2:44 PM
Reposted by Siyuan Song
🧠 New at #NeurIPS2025!
🎵 We're far from the shallow now🎵
TL;DR: We introduce the first "reasoning embedding" and uncover its unique spatio-temporal pattern in the brain.

🔗 arxiv.org/abs/2510.228...
October 30, 2025 at 10:25 PM
Reposted by Siyuan Song
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
October 29, 2025 at 3:50 PM
Reposted by Siyuan Song
Very excited to be going to Chicago for
@agnescallard.bsky.social's famous Night Owls next week! I'll be discussing my essay "ChatGPT and the Meaning of Life". Hope to see you there if you're local!
October 24, 2025 at 4:02 PM
Reposted by Siyuan Song
If I spill the tea—“Did you know Sue, Max’s gf, was a tennis champ?”—but then if you reply “They’re dating?!” I’d be a bit puzzled, since that’s not the main point! Humans can track what’s ‘at issue’ in conversation. How sensitive are LMs to this distinction?

New paper w/ @sangheekim.bsky.social!
October 21, 2025 at 2:02 PM
Reposted by Siyuan Song
I will be recruiting PhD students via Georgetown Linguistics this application cycle! Come join us in the PICoL (pronounced “pickle”) lab. We focus on psycholinguistics and cognitive modeling using LLMs. See the linked flyer for more details: bit.ly/3L3vcyA
October 21, 2025 at 9:52 PM
Reposted by Siyuan Song
"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social
October 16, 2025 at 3:27 PM
Honored to get the chance to contribute to the Chinese dataset! And had a great time working with all the awesome collaborators!
🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!
October 15, 2025 at 5:52 PM
Reposted by Siyuan Song
Excited to present this at COLM tomorrow! (Tuesday, 11:00 AM poster session)
One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇
October 6, 2025 at 3:21 PM
Reposted by Siyuan Song
I will be giving a short talk on this work at the COLM Interplay workshop on Friday (also to appear at EMNLP)!

Will be in Montreal all week and excited to chat about LM interpretability + its interaction with human cognition and ling theory.
A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.

New work with @kmahowald.bsky.social and @cgpotts.bsky.social!

🧵👇!
October 6, 2025 at 12:05 PM
Reposted by Siyuan Song
Traveling to my first @colmweb.org🍁

Not presenting anything but here are two posters you should visit:

1. @qyao.bsky.social on Controlled rearing for direct and indirect evidence for datives (w/ me, @weissweiler.bsky.social and @kmahowald.bsky.social), W morning

Paper: arxiv.org/abs/2503.20850
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models
Language models (LMs) tend to show human-like preferences on a number of syntactic phenomena, but the extent to which these are attributable to direct exposure to the phenomena or more general propert...
arxiv.org
October 6, 2025 at 3:22 PM
Reposted by Siyuan Song
On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179
October 6, 2025 at 3:50 PM
Reposted by Siyuan Song
Heading to #COLM2025 to present my first paper w/ @jennhu.bsky.social @kmahowald.bsky.social !

When: Tuesday, 11 AM – 1 PM
Where: Poster #75

Happy to chat about my work and topics in computational linguistics & cogsci!

Also, I'm on the PhD application journey this cycle!

Paper info 👇:
New preprint w/ @jennhu.bsky.social @kmahowald.bsky.social : Can LLMs introspect about their knowledge of language?
Across models and domains, we did not find evidence that LLMs have privileged access to their own predictions. 🧵(1/8)
October 6, 2025 at 4:05 PM
Reposted by Siyuan Song
🤖 🧠 NEW BLOG POST 🧠 🤖

What skills do you need to be a successful researcher?

The list seems long: collaborating, writing, presenting, reviewing, etc

But I argue that many of these skills can be unified under a single overarching ability: theory of mind

rtmccoy.com/posts/theory...
September 30, 2025 at 3:14 PM
Reposted by Siyuan Song
The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students!

Come join me, @kmahowald.bsky.social, and @jessyjli.bsky.social as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:
September 30, 2025 at 4:17 PM
Reposted by Siyuan Song
Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring?

Check out @sebajoe.bsky.social’s feature on ✨AstroVisBench:
Exciting news! Introducing AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy!

A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.
September 25, 2025 at 8:52 PM
Reposted by Siyuan Song
Simon Goldstein and I have a new paper, “What does ChatGPT want? An interpretationist guide”.

The paper argues for three main claims.

philpapers.org/rec/GOLWDC-2 1/7
Simon Goldstein & Harvey Lederman, What Does ChatGPT Want? An Interpretationist Guide - PhilPapers
This paper investigates LLMs from the perspective of interpretationism, a theory of belief and desire in the philosophy of mind. We argue for three conclusions. First, the right object of study ...
philpapers.org
September 24, 2025 at 12:37 PM
Reposted by Siyuan Song
I did a QA with Quanta about interpretability and training dynamics! I got to talk about a bunch of research hobby horses and how I got into them.
September 24, 2025 at 1:57 PM
Reposted by Siyuan Song
Why does AI sometimes fail to generalize, and what might help? In a new paper (arxiv.org/abs/2509.16189), we highlight the latent learning gap — which unifies findings from language modeling to agent navigation — and suggest that episodic memory complements parametric learning to bridge it. Thread:
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences
When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine lear...
arxiv.org
September 22, 2025 at 4:21 AM
Reposted by Siyuan Song
Announcing the first (and perhaps only) Multilingual Minds and Machines Meeting! Come join us in Nijmegen, June 22-23, 2026, if you are interested in computational models of human multilingualism: mmmm2026.github.io
September 19, 2025 at 11:27 AM
Reposted by Siyuan Song
Did you know?

❌77% of language models on @hf.co are not tagged for any language
📈For 95% of languages, most models are multilingual
🚨88% of models with tags are trained on English

In a new blog post, @tylerachang.bsky.social and I dig into these trends and why they matter! 👇
September 19, 2025 at 2:53 PM