Sascha Wolfer
banner
sascha-wolfer.bsky.social
Sascha Wolfer
@sascha-wolfer.bsky.social
Linguist @ IDS Mannheim

interested in language, numbers, the mind and sometimes dictionaries

owned by a fluffy dog

Also on Mastodon: @sascha_wolfer@fediscience.org
Als ich die Wahlkabine gesehen hab, war ich voll traurig, dass ich vorsorglich Briefwahl gemacht hab.
October 22, 2025 at 6:58 PM
Es ist zumindest keine Website eines deutschen Hotels... inhaltlich gebe ich Dir vollumfänglich recht!
October 21, 2025 at 8:04 AM
The "significant" in this quote should be "important", sorry.
October 18, 2025 at 7:53 AM
Reposted by Sascha Wolfer
Schon mal nen kleiner Teaser - die neue Version von #OWIDplusLIVE ist seit ein paar Tagen online (demnächst mehr). Wir (zusammen mit @sascha-wolfer.bsky.social) erfassen seit 2020 tagesaktuell token, bi- und trigrammen in ausgewählten deutschsprachigen RSS-Feeds. www.owid.de/plus/live-20...
October 17, 2025 at 9:26 PM
Ich glaube, dass ist ein klassischer Fuzzy-Boundaries-Effekt. Wenn man diese Übersicht ernst nimmt (de.wikipedia.org/wiki/Baby-Bo...), liegen heute 60-Jährige an der Grenze zw. Boomern und Gen X. Ist bei mir ein ähnliches "Problem". Mit Jahrgang 1981 bin ich irgendwo zw. Gen X und Millenials. 🤷‍♂️
Baby-Boomer – Wikipedia
de.wikipedia.org
October 15, 2025 at 10:53 AM
Welche Bezeichnungen genau gibt es für ältere Menschen nicht? Jene mit griechischen Buchstaben? Andere Bezeichnungen gibt es ja durchaus: Generation Silent/Weltkriegsgeneration, Boomer, X, Y, Z. Vermutlich sind halt einfach die Buchstaben des lateinischen Alphabets ausgegangen 😉
October 15, 2025 at 8:52 AM
In other words: we model relationships – Xia & Lindell summarise them into one number per language.
October 10, 2025 at 6:01 AM
The real test is whether a mixed model that explicitly represents phylogeny and geography performs worse than their alternative, where the entire shared history of languages and environments is effectively collapsed into a single dimension (an eigenvector).
October 10, 2025 at 6:01 AM
So while Xia & Lindell insist that "autocorrelation due to relationships and distance cannot be captured in family or regional-level analyses", we see that as an empirical question – and we treated it as one.
October 10, 2025 at 6:01 AM
The outcome is well aligned with genealogy, showing that family membership captures someth genuinely informative about the process. When the model finds that family explains a large share of the variance, that's not a failure–it's evidence that phylogenetic structure dominates the pattern.
October 10, 2025 at 6:01 AM
Finally, what Xia & Lindell call a "separation problem" is, in our view, a feature of our approach and not a bug.

If, e.g., all languages in a family are polysynthetic (or none are), that’s not a statistical artefact – it’s the signal.
October 10, 2025 at 6:01 AM
A negative global association arises because polysynth lang are concentrated in regions with smaller overall populations, even though within regions the relationsh is positive. Once we account for that structure—as our mixed logit models do—the supposed "global" negative effect reverses direction.
October 10, 2025 at 6:01 AM
However, if we compare within each of these three regions, polysynthetic languages have a higher median L1_population size than non-polysynthetic ones. Might this pattern point towards a classic Simpson's paradox?
October 10, 2025 at 6:01 AM
Eyeballing Figure 1 of their response actually seems to support this: the three subregions in the Americas contain nearly 80 % of all polysynthetic languages. In each of them, the median population size lies below the global median.
October 10, 2025 at 6:01 AM
... not one to be decided by assertion. Take, for instance, our finding that once random effects for either subregion or language family are included, the estimated effect of L1_population reverses direction—from the negative value reported by Xia & Lindell et al. to a positive one.
October 10, 2025 at 6:01 AM
... we don't think it's correct, as Xia & Lindell assert, to just claim that our results are "counterintuitive", the fix-eff estimts are "unreliable" and that the high model fits are "unrealistic." Whether a mix model better captures the data-generat. process is ultimately an empirical question, ...
October 10, 2025 at 6:01 AM
... showing that in their re-analysis the PIPs for Small_Family (0.085 and 0.300) are clearly reduced.

One other thing, while we don't claim that our mixed-effects logit model is the perfect way to account for non-independence between languages, ...
October 10, 2025 at 6:01 AM
And btw: it's not enough to simply subtract 0.5 from their original PIP values to make them comparable to the 0–1 scale used in the response. The difference must also be divided by (1 – 0.5). Correctly scaled, the original PIPs are 0.114 (for Polysynthesis) and 0.588 (for Extended), ...
October 10, 2025 at 6:01 AM
... how this could be taken as support for their earlier statement that "different measures of language isolation – social, physical and *phylogenetic* – are *significant* predictors of polysynthesis."
October 10, 2025 at 6:01 AM
... the variable is rarely included in the best-supported models and its estimated effect is highly uncertain – essentially indistinguishable from zero. We therefore still struggle to see ...
October 10, 2025 at 6:01 AM
In the Polysynthesis analysis, Small_Family has a minuscule averaged effect estimate (0.02) with a standard error more than four times larger (0.085). Together with a posterior inclusion probability (PIP) of just 0.085, this means ...
October 10, 2025 at 6:01 AM
Xia & Lindell have also published a response (doi.org/10.1073/pnas...) – unsurprisingly, we don’t agree with most of their arguments. What puzzles us most is their claim that the re-analysis (their Table 1) "strengthens [their] conclusions." On the contrary:
PNAS
Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...
doi.org
October 10, 2025 at 6:01 AM