In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵
In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵
Paper: aclanthology.org/2025.naacl-l...
Code: github.com/lucamouchel/...
#NAACL2025
Paper: aclanthology.org/2025.naacl-l...
Code: github.com/lucamouchel/...
#NAACL2025
Our new #NLProc work centers multilingual #LLM evaluations toward regional knowledge in 44 languages.
Contains *newly-collected* data, prioritizing *regional knowledge*.
Setting the stage for truly global AI evaluation.
Ready to see how your model measures up?
#AI #Multilingual #LLM #NLProc