Rui Chaves
banner
rpchaves.bsky.social
Rui Chaves
@rpchaves.bsky.social
"Words, words, words." - Hamlet, Act II, scene ii.
(he/him)
Yet another example of how LLMs are not robust (a key argument against symbolic AI): a small fraction of 'bad' data can compromise models regardless of dataset or model size. Data poisoning does not scale with model size. arxiv.org/pdf/2510.07192
arxiv.org
October 14, 2025 at 7:05 PM
There is mounting evidence that LLMs rely heavily on brute-force and shallow heuristics to achieve high accuracy, despite some studies (though not all) finding evidence for sophisticated latent representations. Add another to the bunch:
www.arxiv.org/abs/2506.16678
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the...
www.arxiv.org
June 30, 2025 at 8:27 PM
Reposted by Rui Chaves
Wow, yes. An amazing glimpse of the LLM Lie Machine at work. Anyone whose students are bring tempted by LLMs, sharing this could be very helpful showing how it lies, and how it fails.
June 3, 2025 at 4:00 PM
May 8, 2025 at 10:36 PM
Reposted by Rui Chaves
Register now for the LSA Linguistic Institute in Eugene, OR: over 80 courses over two 2.5-week terms (July 7-22::July 24-August 8). I'm teaching Constructions & the Grammar of Context with E Francis in session 1! center.uoregon.edu/LSA/2025/
Linguistics Society of America Summer Institute
Linguistics Society of America Summer Institute
center.uoregon.edu
March 11, 2025 at 7:22 PM
This the most recent job posting data in the Linguist List. Data and code: github.com/RuiPChaves/L...
January 1, 2025 at 3:02 PM
Cool paper showing that because LLMs use a trailing white space in their tokens, they incorrectly estimate probabilities (improper distribution error), and that this causes significantly different estimates of garden-path effects:
arxiv.org/abs/2406.10851
Leading Whitespaces of Language Models' Subword Vocabulary...
Word-by-word conditional probabilities from Transformer-based language models are increasingly being used to evaluate their predictions over minimal pairs or to model the incremental processing...
arxiv.org
August 15, 2024 at 2:18 PM
I can't not miss this one. 😆
November 21, 2023 at 7:46 PM
I need a German word for when a 7yo opts to play Conway's Game of Life (in Golly) rather than the usual Nintendo games. He loves the glider gun...! And experimenting with all the chaos. But don't we all?
August 25, 2023 at 1:51 PM