Clara Na
@clarana.bsky.social
PhD student @ CMU LTI. efficiency/data in NLP/ML
Come through! #492 in Hall 2!, 10am-12:30pm
April 26, 2025 at 1:59 AM
Come through! #492 in Hall 2!, 10am-12:30pm
Hi I am at 232 in the back of the riverfront room!
November 14, 2024 at 3:28 PM
Hi I am at 232 in the back of the riverfront room!
We can even predict larger model perplexity scores w/ smaller model proxy evals, AND the relationship holds even when the actual ppl scores are high (4/n)
November 5, 2024 at 10:39 PM
We can even predict larger model perplexity scores w/ smaller model proxy evals, AND the relationship holds even when the actual ppl scores are high (4/n)
What does this mean? We can simulate *comprehensive and fine-grained* data ablations on language corpora, at scale! Required training compute scales only linearly wrt *new* training data, i.e. work for previously seen train data is "cached" and reusable in subsequent evals (3/n)
November 5, 2024 at 10:39 PM
What does this mean? We can simulate *comprehensive and fine-grained* data ablations on language corpora, at scale! Required training compute scales only linearly wrt *new* training data, i.e. work for previously seen train data is "cached" and reusable in subsequent evals (3/n)
We show that there is a reliable *linear correlation* between perplexity evaluation scores for a model trained on a data mixture, and proxy scores from models trained on partitions of the mixture -- f(🟦🟩🟪) vs. f(🟦) f(🟩) f(🟪)
❗️This also works on arbitrary eval data (2/n)
❗️This also works on arbitrary eval data (2/n)
November 5, 2024 at 10:38 PM
We show that there is a reliable *linear correlation* between perplexity evaluation scores for a model trained on a data mixture, and proxy scores from models trained on partitions of the mixture -- f(🟦🟩🟪) vs. f(🟦) f(🟩) f(🟪)
❗️This also works on arbitrary eval data (2/n)
❗️This also works on arbitrary eval data (2/n)
Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
November 5, 2024 at 10:37 PM
Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵
You can try out recipes👩🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵