✅Filtering out easy samples—i.e., those solved by a 7B model—leads to +2.15% accuracy gain when training a 32B model.
✅Harder questions push the model to learn deeper reasoning patterns.
✅Filtering out easy samples—i.e., those solved by a 7B model—leads to +2.15% accuracy gain when training a 32B model.
✅Harder questions push the model to learn deeper reasoning patterns.
➣ Open-ended questions boost accuracy (+1.21%) by forcing models to reason, not guess!
➣ Short-form answers—reduce ambiguity & avoid noisy rewards—boosts accuracy by +1.20%!
👉 Thoughtful templates = clearer supervision, better RL
➣ Open-ended questions boost accuracy (+1.21%) by forcing models to reason, not guess!
➣ Short-form answers—reduce ambiguity & avoid noisy rewards—boosts accuracy by +1.20%!
👉 Thoughtful templates = clearer supervision, better RL
➣ concise on general reasoning (229 tokens on MMLU) and
➣ detailed on math (+62% token increase)
Unlike math-only models, which barely adapt (12–14% token increase).
➣ concise on general reasoning (229 tokens on MMLU) and
➣ detailed on math (+62% token increase)
Unlike math-only models, which barely adapt (12–14% token increase).
Nemotron-CrossThink achieves:
📈 +30.1% on MATH-500, +15.1% on AGIEVAL, +12.8% on MMLU-Pro compared to base LLM
📉 28% fewer tokens per correct answer
🏆 Outperforms math-only blends by training on broader, more diverse reasoning data
Nemotron-CrossThink achieves:
📈 +30.1% on MATH-500, +15.1% on AGIEVAL, +12.8% on MMLU-Pro compared to base LLM
📉 28% fewer tokens per correct answer
🏆 Outperforms math-only blends by training on broader, more diverse reasoning data
➣Curate QA pairs from Common Crawl + open datasets
➣Apply structured templates: multiple-choice + open-ended
➣Filter out unverifiable / ambiguous samples
➣Train LLM with GRPO—a scalable RL algorithm
➣Curate QA pairs from Common Crawl + open datasets
➣Apply structured templates: multiple-choice + open-ended
➣Filter out unverifiable / ambiguous samples
➣Train LLM with GRPO—a scalable RL algorithm
Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more.
🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
🧵↓
Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more.
🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
🧵↓