Gaurav Pandey
gpandey1.bsky.social
Gaurav Pandey
@gpandey1.bsky.social
Research Scientist @ IBM Research
Reinforcement Learning for LLMs
Reposted by Gaurav Pandey
Hear me out: What if the Chinese translations of mathematical problems present in English test sets (e.g. MATH) were not filtered from the pre-training corpora of Qwen and DeepSeek? this means the knowledge is there, just translated. This would also explain the language switching when RL-ing CoT 👇
February 10, 2025 at 3:51 PM