If you’ve ever encountered oddities or frustrations with #tokenization I’d love to chat about it! #EMNLP
Have you ever wondered what "ट能" means?
Probably not, since it's not a meaningful phrase.
But if you ever did, any well-trained LLM should be able to tell you that. Right?
Not quite! We discover phrases like "ट能" trigger vulnerabilities in Byte-Level BPE Tokenizers. (1/11)
If you’ve ever encountered oddities or frustrations with #tokenization I’d love to chat about it! #EMNLP
Too many reviewers seem to not have internalised this. In my opinion, this is the hardest lesson a reviewer has to learn, and I want to share some thoughts.
Too many reviewers seem to not have internalised this. In my opinion, this is the hardest lesson a reviewer has to learn, and I want to share some thoughts.
Have you ever wondered what "ट能" means?
Probably not, since it's not a meaningful phrase.
But if you ever did, any well-trained LLM should be able to tell you that. Right?
Not quite! We discover phrases like "ट能" trigger vulnerabilities in Byte-Level BPE Tokenizers. (1/11)
Have you ever wondered what "ट能" means?
Probably not, since it's not a meaningful phrase.
But if you ever did, any well-trained LLM should be able to tell you that. Right?
Not quite! We discover phrases like "ट能" trigger vulnerabilities in Byte-Level BPE Tokenizers. (1/11)
The sky really is bluer on the other side.
The sky really is bluer on the other side.