Tomasz Limisiewicz
tomlim.bsky.social
Tomasz Limisiewicz
@tomlim.bsky.social
Postdoc at Meta and university of Washington in NLP. Before: PhD from Charles University (Prague 🏰).
Interested in going into the inner workings of neural networks 🔍, multilinguality 🌍, tokenization 🔡 and fairer NLP ⚖️ (he/him)
Pinned
Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! I’ve joined @lukezettlemoyer.bsky.social’s fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!
Reposted by Tomasz Limisiewicz
🎥 Videos from our Tokenization Workshop are now live! Watch invited talks, panel discussions, and the best paper presentation at icml.cc/virtual/2025... #Tokenization #NLP #LLMs
Tokenization Workshop (TokShop)ICML 2025
icml.cc
August 25, 2025 at 8:36 AM
Check the BLT poster at @aclmeeting.bsky.social . It’s just fortaste before the main presentation at @tokshop.bsky.social next week from Artidoro Pagnoni!
July 18, 2025 at 8:12 PM
It’d be great to meet at Tokenization Workshop @tokshop.bsky.social #icml
tomorrow July 18 starting at 8:45 in Meeting 112-113!
The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP
July 18, 2025 at 2:24 AM
Reposted by Tomasz Limisiewicz
The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP
July 15, 2025 at 10:28 PM
I'm pleased to be in Vancouver for @ICML this week 🇨🇦🤖. I'll be happy to chat about multilingual, multimodal LMs and tokenization(free).
July 16, 2025 at 1:16 AM
If you have experience with tokenization (who doesn’t) your help with reviewing will be hugely appreciated! 🔠🔡
TokShop @ #ICML2025 got way more submissions than expected! 📈 We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb... 🙏
TokShop 2025
Registering interest in all things tokenization at TokShop @ ICML 2025 (July 18) Consider joining the Google group for future updates! https://groups.google.com/g/tokshop
forms.gle
June 2, 2025 at 9:23 PM
Reposted by Tomasz Limisiewicz
Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬

Why bother with rebuttal when the perfect venue is right around the corner!

Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
May 28, 2025 at 8:24 AM
Reposted by Tomasz Limisiewicz
#NAACL2025 ended more than a week ago & @ufal-cuni.bsky.social folks were there:
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...
May 16, 2025 at 12:07 PM
Reposted by Tomasz Limisiewicz
📣 Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
ICML 2025 Workshop TokShop
Welcome to the OpenReview homepage for ICML 2025 Workshop TokShop
openreview.net
May 14, 2025 at 1:31 PM
It’s finally official: the long-awaited Tokenization Workshop is here!
April 15, 2025 at 5:10 PM
So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussion🔥 @aclmeeting.bsky.social
April 3, 2025 at 5:01 PM
Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! I’ve joined @lukezettlemoyer.bsky.social’s fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!
March 31, 2025 at 2:23 PM
Reposted by Tomasz Limisiewicz
Paper 👉Beyond Literal Token Overlap: Token Alignability for Multilinguality👈 by @kathaem.bsky.social, @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser will appear at #NAACL2025! arxiv.org/abs/2502.06468 Congratulations to all authors! 🥳
March 10, 2025 at 3:52 PM
Reposted by Tomasz Limisiewicz
Happy to say that our paper "Beyond Literal Token Overlap: Token Alignability for Multilinguality" will be presented at #NAACL2025!

This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.

arxiv.org/abs/2502.06468

#newpaper #NLP #NLProc
Beyond Literal Token Overlap: Token Alignability for Multilinguality
Previous work has considered token overlap, or even similarity of token distributions, as predictors for multilinguality and cross-lingual knowledge transfer in language models. However, these very li...
arxiv.org
March 3, 2025 at 5:04 PM
Reposted by Tomasz Limisiewicz
Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!

go.bsky.app/NZDc31B
November 10, 2024 at 5:24 PM
Tokenization is so back! at #EMNLP
#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).
November 12, 2024 at 8:41 AM
#firstpost

Are you working on NLP for low-resource or non-Latin script languages?

If yes, I have great news for you! Our MYTE tokenizer and MyT5 models 🪲 are now easily available through🤗. It’s easy to try:
November 11, 2024 at 4:13 PM
Reposted by Tomasz Limisiewicz
If you are interested in AI, follow the folks in this starter pack! I have just updated it to include a few new arrivals here, but please let me know who else is missing

go.bsky.app/SipA7it
November 8, 2024 at 9:44 AM
Reposted by Tomasz Limisiewicz
A starter pack for #NLP #NLProc researchers! 🎉

go.bsky.app/SngwGeS
November 4, 2024 at 10:01 AM