#curatedthoughts
I have recently become infatuated with Obsidian. I love its simplicity, its openness and how easy it is for me to share thoughts rather than keeping them locked away in my Notes. So much so I created my own Digital Garden with it here https://buff.ly/3LEtoJC #obsidian #curatedthoughts #joy
July 24, 2024 at 5:31 PM
CuratedThoughts: Clean Math Data for RL

- Fixes critical flaws in math reasoning datasets
- Removes 5-25% of problematic examples unsuitable for RL
- Prevents models from learning invalid reasoning paths

Enables reliable reward verification for GRPO training
huggingface.co/datasets/bet...
bethgelab/CuratedThoughts · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
February 20, 2025 at 8:53 AM
CuratedThoughts: Data curation focus for RL post-training! (Update 1) 🚀

25% of Openthoughts-114k-math filtered — issues included proofs, missing figures, and multiple questions with one answer.

Check out work by
@ahochlehnert.bsky.social & @hrdkbhatnagar.bsky.social
below 👇
CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵
February 17, 2025 at 6:30 PM
CuratedThoughts: Data Curation for RL Datasets 🚀

Since DeepSeek-R1 introduced reasoning-based RL, datasets like Open-R1 & OpenThoughts emerged for fine-tuning & GRPO. Our deep dive found major flaws — 25% of OpenThoughts needed elimination by data curation.

Here's why 👇🧵
February 17, 2025 at 6:22 PM