Michael Hu
banner
michahu.bsky.social
Michael Hu
@michahu.bsky.social
PhD student at NYU. NLP & training data.
michahu.github.io
So you want a good pretraining data mix🧑‍🍳, but which data mixing algorithm do you pick? DoGE, DoReMi, Skill-it, grid searching proportions… 😵‍💫

It turns out that these algorithms are all special cases of Linear Mixing Optimization, our new data mixing framework! 🧵
November 12, 2024 at 5:04 PM