danielmisrael.bsky.social
@danielmisrael.bsky.social
Pretty shocking vulnerability that the community should be aware of!
What happens if we tokenize cat as [ca, t] rather than [cat]?

LLMs are trained on just one tokenization per word, but they still understand alternative tokenizations. We show that this can be exploited to bypass safety filters without changing the text itself.

#AI #LLMs #tokenization #alignment
March 11, 2025 at 11:18 PM
“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong

Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily…
Introducing MARIA (Masked and Autoregressive Infilling Architecture)
arxiv.org/abs/2502.06901
February 14, 2025 at 12:28 AM