shrimai.bsky.social
@shrimai.bsky.social
Senior Research Scientist @nvidia | Adjunct Prof @BU | PhD from @CMU
Reposted
🎯 Why it matters:
Nemotron-CrossThink achieves:
📈 +30.1% on MATH-500, +15.1% on AGIEVAL, +12.8% on MMLU-Pro compared to base LLM
📉 28% fewer tokens per correct answer
🏆 Outperforms math-only blends by training on broader, more diverse reasoning data
May 1, 2025 at 5:42 PM
Reposted
How does Nemotron-CrossThink work?
➣Curate QA pairs from Common Crawl + open datasets
➣Apply structured templates: multiple-choice + open-ended
➣Filter out unverifiable / ambiguous samples
➣Train LLM with GRPO—a scalable RL algorithm
May 1, 2025 at 5:42 PM
Reposted
Most RL methods stick to math because rewards are easy to define.
But general purpose reasoning?
❌ No clean answers
❌ No fixed rules
Nemotron-CrossThink addresses these by:
✅ Design verifiable rewards for diverse tasks
✅ Blend structured data from STEM, law, humanities, & more
May 1, 2025 at 5:42 PM
Reposted
RL boosts LLM reasoning—but why stop at math & code? 🤔
Meet Nemotron-CrossThink—a method to scale RL-based self-learning across law, physics, social science & more.

🔥Resulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
🧵↓
May 1, 2025 at 5:42 PM