Lightnews — Scholar-powered news

Alessio Miaschi

@alessiomiaschi.bsky.social

🎓 Full-time Researcher (RTD) at ItaliaNLP Lab, Institute for Computational Linguistics "A. Zampolli" (CNR-ILC) #NLProc

https://alemiaschi.github.io/

Posts Replies Media Videos

Alessio Miaschi

@alessiomiaschi.bsky.social

All the details about the task are available here 👉 sites.google.com/view/crucive...
📅 Training data release: 22 September 2026

Cruciverb-IT

Overview Cruciverb-IT is the first shared task proposed at EVALITA 2026 on crossword puzzle solving. We propose two tasks: i) answering clues extracted from Italian crosswords; ii) autonomously solvin...

sites.google.com

September 15, 2025 at 10:22 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

📂 Code available at the following repository: github.com/snizio/Beyon...
🧵(5/5)

GitHub - snizio/Beyond-Spelling-Miracle

Contribute to snizio/Beyond-Spelling-Miracle development by creating an account on GitHub.

github.com

July 24, 2025 at 9:23 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

✅ Larger models develop more robust substring awareness.
✅ Morphemes are recognized better than meaningless substrings.
✅ Awareness emerges early for suffixes and roots, later for non-morphemic units
✅ Productivity, word frequency and tokenization shape this ability.
🧵(4/5)

July 24, 2025 at 9:22 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

🧪 We design a controlled binary task asking models whether a substring appears in a word. Using MorphoLex, we evaluate models from the Pythia family across:
- substring position and length;
- morphemic vs. non-morphemic substrings;
- pre-training checkpoints.
🧵(3/5)

July 24, 2025 at 9:22 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

LMs operate on subword tokens and lack explicit access to characters. Despite so, they show a limited ability to recognize spelling-level patterns (i.e. Spelling Miracle). In this work, we take a look at when, where, and how such character-level awareness emerges.
🧵(2/5)

July 24, 2025 at 9:22 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

🤗 Models available on Huggingface: huggingface.co/collections/...
📂 Code & Dataset: github.com/snizio/Lexic...
🧵(5/5)

Evaluating Lexical Proficiency in Neural Language Models - a snizio Collection

Public collection for our paper: "Evaluating Lexical Proficiency in Neural Language Models", C. Ciaccio, A. Miaschi, F. Dell'Orletta (ACL 2025)

huggingface.co

July 23, 2025 at 8:29 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

🧠 Our findings show that Transformer-based models can handle lexical composition and meaning inference to some extent—effectively producing and interpreting plausible lexical innovations, though with a notable drop in performance vs. standard lexical items.
🧵(4/5)

July 23, 2025 at 8:29 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

Key contributions:
✅ A new framework to assess lexical abilities across tasks & word types
✅ A lexical resource for Italian with definitions & examples
✅ Analysis of model size, multilinguality & linguistic features
✅ Human eval via the Optimal Innovation Hypothesis
🧵(3/5)

July 23, 2025 at 8:29 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

In this study, we propose a novel, unified framework to evaluate lexical proficiency in Transformer-based LMs, testing their ability to generate, define, and use words across three lexical categories: commonly lexicalized words, recent neologisms and nonce words.
🧵(2/5)

July 23, 2025 at 8:29 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

🔜 More info coming soon!

May 16, 2025 at 8:34 AM

Alessio Miaschi

@alessiomiaschi.bsky.social

3) Findings: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors (with Pedrotti A., Papucci M., Ciaccio C., Puccetti G., Dell'Orletta F. and Esuli A.)

May 16, 2025 at 8:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news