Alessio Miaschi
banner
alessiomiaschi.bsky.social
Alessio Miaschi
@alessiomiaschi.bsky.social
🎓 Full-time Researcher (RTD) at ItaliaNLP Lab, Institute for Computational Linguistics "A. Zampolli" (CNR-ILC) #NLProc

https://alemiaschi.github.io/
All the details about the task are available here 👉 sites.google.com/view/crucive...
📅 Training data release: 22 September 2026
Cruciverb-IT
Overview Cruciverb-IT is the first shared task proposed at EVALITA 2026 on crossword puzzle solving. We propose two tasks: i) answering clues extracted from Italian crosswords; ii) autonomously solvin...
sites.google.com
September 15, 2025 at 10:22 AM
📂 Code available at the following repository: github.com/snizio/Beyon...
🧵(5/5)
GitHub - snizio/Beyond-Spelling-Miracle
Contribute to snizio/Beyond-Spelling-Miracle development by creating an account on GitHub.
github.com
July 24, 2025 at 9:23 AM
✅ Larger models develop more robust substring awareness.
✅ Morphemes are recognized better than meaningless substrings.
✅ Awareness emerges early for suffixes and roots, later for non-morphemic units
✅ Productivity, word frequency and tokenization shape this ability.
🧵(4/5)
July 24, 2025 at 9:22 AM
🧪 We design a controlled binary task asking models whether a substring appears in a word. Using MorphoLex, we evaluate models from the Pythia family across:
- substring position and length;
- morphemic vs. non-morphemic substrings;
- pre-training checkpoints.
🧵(3/5)
July 24, 2025 at 9:22 AM
LMs operate on subword tokens and lack explicit access to characters. Despite so, they show a limited ability to recognize spelling-level patterns (i.e. Spelling Miracle). In this work, we take a look at when, where, and how such character-level awareness emerges.
🧵(2/5)
July 24, 2025 at 9:22 AM
🧠 Our findings show that Transformer-based models can handle lexical composition and meaning inference to some extent—effectively producing and interpreting plausible lexical innovations, though with a notable drop in performance vs. standard lexical items.
🧵(4/5)
July 23, 2025 at 8:29 AM
Key contributions:
✅ A new framework to assess lexical abilities across tasks & word types
✅ A lexical resource for Italian with definitions & examples
✅ Analysis of model size, multilinguality & linguistic features
✅ Human eval via the Optimal Innovation Hypothesis
🧵(3/5)
July 23, 2025 at 8:29 AM
In this study, we propose a novel, unified framework to evaluate lexical proficiency in Transformer-based LMs, testing their ability to generate, define, and use words across three lexical categories: commonly lexicalized words, recent neologisms and nonce words.
🧵(2/5)
July 23, 2025 at 8:29 AM
🔜 More info coming soon!
May 16, 2025 at 8:34 AM
3) Findings: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors (with Pedrotti A., Papucci M., Ciaccio C., Puccetti G., Dell'Orletta F. and Esuli A.)
May 16, 2025 at 8:34 AM