Roger Levy
banner
rplevy.bsky.social
Roger Levy
@rplevy.bsky.social
Director, MIT Computational Psycholinguistics Lab. President, Cognitive Science Society. Chair of the MIT Faculty. Open access & open science advocate. He.

Lab webpage: http://cpl.mit.edu/
Personal webpage: https://www.mit.edu/~rplevy
Helps to be a linguist!
November 14, 2024 at 11:45 AM
Not quite sure what you mean by a “complete” corpus. I do think the basic philosophical assumptions of frequentist probability are applicable to corpora, using the large-numbers-of-native-speakers thought experiment.

And productivity is a property of the asymptotic distribution, if I’m getting you.
November 10, 2024 at 12:19 PM
If there were enough native speakers of the language living at once, you’d quickly get enough instances of the prefix for relative frequency estimation of the next token distribution. Too few humans are alive for this in practice, but that’s not a problem for theoretical validity of the construct!
November 9, 2024 at 11:57 PM
You might be interested in this paper we did some time ago!

escholarship.org/content/qt69...

It supports your conjecture that, insofar as we think the “true distribution” is a valid theoretical construct (which I consider a highly defensible position), large-N Cloze would not give it to us.
escholarship.org
November 9, 2024 at 11:03 PM
“This significant effect was found using a post hoc weighting procedure aligned with our overarching hypothesis”?!?
November 10, 2023 at 1:05 PM