- Fixes critical flaws in math reasoning datasets
- Removes 5-25% of problematic examples unsuitable for RL
- Prevents models from learning invalid reasoning paths
Enables reliable reward verification for GRPO training
huggingface.co/datasets/bet...
- Fixes critical flaws in math reasoning datasets
- Removes 5-25% of problematic examples unsuitable for RL
- Prevents models from learning invalid reasoning paths
Enables reliable reward verification for GRPO training
huggingface.co/datasets/bet...
25% of Openthoughts-114k-math filtered — issues included proofs, missing figures, and multiple questions with one answer.
Check out work by
@ahochlehnert.bsky.social & @hrdkbhatnagar.bsky.social
below 👇
Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.
Here's why 👇🧵
25% of Openthoughts-114k-math filtered — issues included proofs, missing figures, and multiple questions with one answer.
Check out work by
@ahochlehnert.bsky.social & @hrdkbhatnagar.bsky.social
below 👇
Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.
Here's why 👇🧵
Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.
Here's why 👇🧵