Interested in going into the inner workings of neural networks 🔍, multilinguality 🌍, tokenization 🔡 and fairer NLP ⚖️ (he/him)
tomorrow July 18 starting at 8:45 in Meeting 112-113!
tomorrow July 18 starting at 8:45 in Meeting 112-113!
Why bother with rebuttal when the perfect venue is right around the corner!
Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
Why bother with rebuttal when the perfect venue is right around the corner!
Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.
arxiv.org/abs/2502.06468
#newpaper #NLP #NLProc
This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.
arxiv.org/abs/2502.06468
#newpaper #NLP #NLProc
go.bsky.app/NZDc31B
go.bsky.app/NZDc31B
It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!
Here is a list with links + presentation time (in chronological order).
Are you working on NLP for low-resource or non-Latin script languages?
If yes, I have great news for you! Our MYTE tokenizer and MyT5 models 🪲 are now easily available through🤗. It’s easy to try:
Are you working on NLP for low-resource or non-Latin script languages?
If yes, I have great news for you! Our MYTE tokenizer and MyT5 models 🪲 are now easily available through🤗. It’s easy to try:
go.bsky.app/SipA7it
go.bsky.app/SipA7it