LLMs are trained on just one tokenization per word, but they still understand alternative tokenizations. We show that this can be exploited to bypass safety filters without changing the text itself.
#AI #LLMs #tokenization #alignment
Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily…
Introducing MARIA (Masked and Autoregressive Infilling Architecture)
arxiv.org/abs/2502.06901
Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily…
Introducing MARIA (Masked and Autoregressive Infilling Architecture)
arxiv.org/abs/2502.06901