Kirill Maslinsky
banner
maslinych.bsky.social
Kirill Maslinsky
@maslinych.bsky.social
Computational literary studies with a modicum of pure linguistics | research design, infrastructure and methods guy | open data enthusiast and curator | doing theory+engineering @ ERC Advanced project “Theory of tone” @ INALCO, Paris
Reposted by Kirill Maslinsky
How to model learning of Mandarin tones with no supervision, from raw sound, with a realistic model of human language learning?
The model learns the four tones (male only near perfectly) and also replicates stages of tone learning in language acquisition. What is more difficult to learn for children
September 23, 2025 at 3:54 PM
It may be helpful to think about LLMs as fiction generation machines (which they basically are), and treat all their output as fictional text, however realistic, rather than "hallucinations".
June 16, 2025 at 11:09 AM
Reposted by Kirill Maslinsky
I often show students this figure and ask, how different is the green distribution (p < 0.05) from the blue distribution (p = 0.10)? Just to raise some awareness that the difference between "statistical significant" and "not significant" is not always that significant...
June 3, 2025 at 6:57 AM
Reposted by Kirill Maslinsky
A scholar possessing a singular vision and a frightening working resilience, he was hounded by Stalin repression machine, exiled, denied academic positions, firewood and food; his work was largely forgotten until 2000s.

So we, who were influenced by him at the rise of DH, keep remembering.
March 26, 2025 at 12:43 PM
Reposted by Kirill Maslinsky
Boris Yarkho, a Moscow formalist, was born today in 1889; his work of 1920-1930s fully anticipates computational literary studies: statistical methods used not for stylistics or attribution, but for questions of literary history and theory

Read him, if you haven't www.degruyter.com/document/doi...
Speech Distribution in Five-Act Tragedies (A Question of Classicism and Romanticism)
Article Speech Distribution in Five-Act Tragedies (A Question of Classicism and Romanticism) was published on March 1, 2019 in the journal Journal of Literary Theory (volume 13, issue 1).
www.degruyter.com
March 26, 2025 at 12:43 PM
As a gift for a patient reader, a graph showing the cohorts in terms of total print runs. *Graphs are better news
March 26, 2025 at 5:48 AM
I don't have pre-revolutionary data, so “cohorts” do not necessarily correspond to the true date of the translation of the author into Russian, esp. for “classics”. NA stands for books with no author indicated on a cover/title (folklore, collections). Data source: my dataset bsky.app/profile/masl...
To all bibliographic data lovers (myself included) — a yearly Christmas update of the “Bibliography of Russian children's book 1918-1984” dataset: doi.org/10.31860/ope.... For those new to the show this dataset is based on the digitized 18-volume printed bibliography by Ivan Startsev →
Библиография детской книги 1918–1984
Машиночитаемая библиографическая база данных по русской детской книге XX века. База основана на 18-томном библиографическом указателе «Детская лите...
doi.org
March 26, 2025 at 5:48 AM
WWII was an obvious bottleneck for printing, including translations. Also to note: the rise in number of translations during the Thaw, the effect persisted until around 1976. And the Thaw indeed left its trace on further circulation of translations.
March 26, 2025 at 5:48 AM
no context graph: the number of translated books for children printed in Soviet Russia and USSR 1918-1984, split into “cohorts” by the moment a translated author first appears in the data. In red are mostly those “classics” who stay with us: Grimms, Andersen, Jules Verne etc.
March 26, 2025 at 5:48 AM
Reposted by Kirill Maslinsky
Along these lines, I recommend Carys Craig's "The AI-Copyright Trap," which argues (in my view convincingly) that copyright law is not actually academics' friend in a context in which big tech has more money than God:

papers.ssrn.com/sol3/papers....
March 22, 2025 at 2:05 PM
“Newness exists only in the minds of new up and coming researchers who didn’t live through it last time. To be really blunt, newness is just ignorance of the past.”
March 11, 2025 at 8:31 AM
The data is part of Daria's ongoing research, and she does wonderful things with it. As a teaser, here's Daria's graph showing cosine similarity between journals based on the poets who published there. Huge shoutout to Daria for sharing these data!
March 3, 2025 at 4:09 PM
The data is published in the Repository of open data on Russian literature and folklore, doi.org/10.31860/ope.... The main table has an entry for every work published, and some info on authors, including party membership. Additional tables list editorial teams and the recipients of literary awards ↓
Роспись содержания советских толстых журналов, 1955—1990 (Новый Мир, Октябрь, Наш Современник, Звезда, Знамя, Юность)
В базе данных представлены авторы и названия произведений, опубликованных в литературных журналах «Новый мир», «Октябрь», «Знамя», «Звезда», «Наш С...
doi.org
March 3, 2025 at 4:09 PM
While the world is on fire, and datasets disappear here and there, we continue our modest effort to publish open data on Russian literature. This time, the contents of the Soviet “thick journals” 1955—1990, a dataset by Daria Franklin www.dariafranklin.com. See ↓ for the data
March 3, 2025 at 4:09 PM
a superficial similarity is also that both result in tables with asterisks
February 6, 2025 at 12:34 PM
thinking how optimality theory in phonology is like the linear modeling in social science. A model you can use when you don't have any specific theory of language, really. Epicycles all way down
February 6, 2025 at 12:34 PM
The database is accompanied by the theoretical framework that provides us with the toneme — a comparative concept that allows us to consistently analyze typologically diverse tonal systems. A sister poster at the same conf with concise presentation of the idea: zenodo.org/records/1481...
Toneme as a basic unit of tonology and criteria for its identification
This poster is a concise view of the theoretical framework for identifying phonological tonal inventories for the typological study of the tonal systems. We define basic comparative categories, of whi...
zenodo.org
February 5, 2025 at 8:43 PM
thot.huma-num.fr/db/ Interactive maps of languages colored by tonal status, sources for tonal status info, structured descriptions of tonal systems of a few sampled languages, accompanied with texts with detailed tonal markup.
February 5, 2025 at 8:37 PM
How many tonal languages are out there in the world? If you need an estimate based on most comprehensive database to date, here it is: 42.7%. Concisely on a poster presented today at the #OCP22 conference in Amsterdam: zenodo.org/records/1481.... The database itself is online and has more ↓
February 5, 2025 at 8:37 PM
Reposted by Kirill Maslinsky
Did I mention these data are very special? The print runs of the editions were well documented throughout the Soviet period, and kept as part of bibliographic records. We have good basis here to estimate total print runs, print run by author, by gender etc.
December 27, 2024 at 7:51 PM
Of 14367 unique authors 82% has a known gender, 26% have info on birth/death year, and 24.5% have wikidata person ID. It may seem like not much, but authors with known wikidata ID comprise more than 65% of total print runs of the whole period. →
December 27, 2024 at 7:51 PM
transformed into structured table data. The bibliography is the most comprehensive source on all books for children (fic and non-fic) printed in Soviet Russia and USSR. This year's edition includes a separate table of unique authors. Author data has undergone massive cleanup and disambiguation. →
December 27, 2024 at 7:51 PM
To all bibliographic data lovers (myself included) — a yearly Christmas update of the “Bibliography of Russian children's book 1918-1984” dataset: doi.org/10.31860/ope.... For those new to the show this dataset is based on the digitized 18-volume printed bibliography by Ivan Startsev →
Библиография детской книги 1918–1984
Машиночитаемая библиографическая база данных по русской детской книге XX века. База основана на 18-томном библиографическом указателе «Детская лите...
doi.org
December 27, 2024 at 7:51 PM
No context graph: a yearly proportion of total print run of all books for children printed in Soviet Russia/USSR split by gender of the author. Note the fluctuations of the share of the female authors. 1931 marks the governmental ban of private publishers, 1941 the nazi invasion. →
December 27, 2024 at 7:51 PM