1. We made an extension to our CUTE benchmark, EXECUTE! We find that LLMs actually *do* have the ability to do character-level manipulation, but language bias, and tokenization too, get in the way. openreview.net/forum?id=m95...
1/2
1. We made an extension to our CUTE benchmark, EXECUTE! We find that LLMs actually *do* have the ability to do character-level manipulation, but language bias, and tokenization too, get in the way. openreview.net/forum?id=m95...
1/2