PhD student @ CMU with Zico Kolter and Zack Lipton | Founding Member @datologyai.com | Prev. Comp Sc @iitdelhi
http://pratyushmaini.github.io/
pratyushmaini.github.io/blog/2024/ri...
Very eager to hear more feedback on this new piece!
pratyushmaini.github.io/blog/2024/ri...
Very eager to hear more feedback on this new piece!
"common infra" includes question templates, topics, styles, annotators, etc.
> common annotators being the least privileged access.
"common infra" includes question templates, topics, styles, annotators, etc.
> common annotators being the least privileged access.
(Risk 1): There is a massive financial incentive for such companies to design evals that even marginally favor their own customers.
(Risk 1): There is a massive financial incentive for such companies to design evals that even marginally favor their own customers.
If you work on MIAs for LLMs, repeat after me: Temporally shifted benchmarks 👏 do 👏 not test membership.
If you work on MIAs for LLMs, repeat after me: Temporally shifted benchmarks 👏 do 👏 not test membership.
Even more unfortunate, this paper cites Duan et. al. (they are aware of the flaws in the setup), yet creates a new temporally shifted MIA benchmark
Even more unfortunate, this paper cites Duan et. al. (they are aware of the flaws in the setup), yet creates a new temporally shifted MIA benchmark
Duan et al: arxiv.org/abs/2402.07841
Dataset Inference: arxiv.org/abs/2406.06443
Blind MIAs: arxiv.org/abs/2406.16201 (@floriantramer.bsky.social)
Meeus et al: arxiv.org/pdf/2406.17975
and others...
Duan et al: arxiv.org/abs/2402.07841
Dataset Inference: arxiv.org/abs/2406.06443
Blind MIAs: arxiv.org/abs/2406.16201 (@floriantramer.bsky.social)
Meeus et al: arxiv.org/pdf/2406.17975
and others...
Unfortunately, the benchmarks studied are all "temporally shifted". At this point, we know very well that these benchmarks give a false sense of membership success by detecting distributional differences.
Unfortunately, the benchmarks studied are all "temporally shifted". At this point, we know very well that these benchmarks give a false sense of membership success by detecting distributional differences.
bsky.app/profile/leav...
And join us (@arimorcos.bsky.social
@agcrnz.bsky.social @alvin-d.bsky.social and many more who shaped this work)!
We are only getting started: jobs.ashbyhq.com/DatologyAI
Wired: Bringing up @datologyai.com’s new text curation results at Thanksgiving
That’s right, we applied our data curation pipeline to text pretraining data and the results are hot enough to roast a 🦃
🧵
bsky.app/profile/leav...
And join us (@arimorcos.bsky.social
@agcrnz.bsky.social @alvin-d.bsky.social and many more who shaped this work)!
We are only getting started: jobs.ashbyhq.com/DatologyAI
A small team, punching far above its weight, took on giants in an extremely competitive space and delivered kick-ass results. Huge shoutout to my amazing teammates, especially Jack Urbanek & @leavittron.bsky.social —absolute legends. 🙌
Let’s keep pushing 👊
A small team, punching far above its weight, took on giants in an extremely competitive space and delivered kick-ass results. Huge shoutout to my amazing teammates, especially Jack Urbanek & @leavittron.bsky.social —absolute legends. 🙌
Let’s keep pushing 👊
🎯 Carefully designed quality filters.
🔍 Deep understanding of synthetic data.
📐 Analyzing geometric properties of unsupervised data.
👀 Constantly looking at data!
It’s all in our deep dive: tinyurl.com/best-llm-data
🎯 Carefully designed quality filters.
🔍 Deep understanding of synthetic data.
📐 Analyzing geometric properties of unsupervised data.
👀 Constantly looking at data!
It’s all in our deep dive: tinyurl.com/best-llm-data
Our models trained on curated data saw:
• 4.4% better than DCLM.
• 2x faster training than FW-edu
• Our 1.3B model outperforms 2.7B models trained on DCLM & FW-edu
Our models trained on curated data saw:
• 4.4% better than DCLM.
• 2x faster training than FW-edu
• Our 1.3B model outperforms 2.7B models trained on DCLM & FW-edu