From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮
And this isn’t a one-off. The pattern holds across RL algorithms and models.
🧵A Deep Dive
From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮
And this isn’t a one-off. The pattern holds across RL algorithms and models.
🧵A Deep Dive
Short Answer: Yes, thanks to “quote-and-think” + test-time scaling. You can even force them to reason in a target language!
But:
🌐 Low-resource langs & non-STEM topics still tough.
New paper: arxiv.org/abs/2505.05408
Short Answer: Yes, thanks to “quote-and-think” + test-time scaling. You can even force them to reason in a target language!
But:
🌐 Low-resource langs & non-STEM topics still tough.
New paper: arxiv.org/abs/2505.05408
We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern.
However, this does not mean they reason the same way across all languages or in new domains.
[1/N]
We observe that reasoning language models finetuned only on English data are capable of zero-shot cross-lingual reasoning through a "quote-and-think" pattern.
However, this does not mean they reason the same way across all languages or in new domains.
[1/N]
with Prof. Monojit and @sagnikmukherjee.bsky.social
In our survey of on cultural bias in LLMs, we reviewed ~90 papers. Interestingly, none of these papers define "culture" explicitly. They use “proxies”. [1/7]
[Appeared in EMNLP mains]
with Prof. Monojit and @sagnikmukherjee.bsky.social
In our survey of on cultural bias in LLMs, we reviewed ~90 papers. Interestingly, none of these papers define "culture" explicitly. They use “proxies”. [1/7]
[Appeared in EMNLP mains]
In our survey of on cultural bias in LLMs, we reviewed ~90 papers. Interestingly, none of these papers define "culture" explicitly. They use “proxies”. [1/7]
[Appeared in EMNLP mains]