Robert Isaacs
r-b-i.bsky.social
Robert Isaacs
@r-b-i.bsky.social
Founder and CEO of Nine Minds. Developer of open source MSP PSA Alga PSA.
Reposted by Robert Isaacs
The Llama 4 model that won in LM Arena is different than the released version. I have been comparing the answers from Arena to the released model. They aren't close.

The data is worth a look also as it shows how LM Arena results can be manipulated to be more pleasing to humans. t.co/rqAey9SMwh
April 8, 2025 at 2:10 AM
It looks like quantizing the DeepSeek V3/R1 models devastate the performance of them. I can say that after weeks of using them extensively (v3 then r1). Maybe something about the fp8 training and MoE architecture makes it particularly susceptible.

Always test on full weight, non-distilled DeepSeek.
February 2, 2025 at 4:59 PM
If you are using OpenRouter for access to DeepSeek, I *highly* suggest you curate the providers in your account settings. I've gotten some garbage responses from some, as if they're hosting distilled R1 as the real thing.

The top 4 in this post have worked well for me.

aider.chat/2025/01/28/d...
Alternative DeepSeek V3 providers
DeepSeek’s API has been experiencing reliability issues. Here are alternative providers you can use.
aider.chat
February 2, 2025 at 4:30 PM
Reposted by Robert Isaacs
Microsoft makes DeepSeek’s R1 model available on Azure AI and GitHub
Microsoft makes DeepSeek’s R1 model available on Azure AI and GitHub
Microsoft moves quick to make R1 available broadly.
buff.ly
January 29, 2025 at 8:40 PM
I find it so interesting that Azure is hosting the DeepSeek R1 model now. On the one hand, they host a lot of models, but on the other, this one has the geopolitical angle, DeepSeek's upending OpenAI's biz model, and OpenAI's contentious relationship with Microsoft.

... and it's currently free. 🤯
January 30, 2025 at 1:31 AM
Reposted by Robert Isaacs
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

- RL generalizes in rule-based envs, esp. when trained with an outcome-based reward
- SFT tends to memorize the training data and struggles to generalize OOD
January 29, 2025 at 1:43 PM