Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
Kimi K2 is based on the DeepSeek V3/R1 architecture, and here's a side-by-side comparison.
In short, Kimi K2 is a slightly scaled DeepSeek V3/R1. And the gains are in the data and training recipes. Hopefully, we will see some details on those soon, too.
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
From DeepSeek to GPT-OSS, it’s all here ↓
Covers every flagship model
1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS
#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics
lol notice how they clipped off the top 12
lol notice how they clipped off the top 12
Here's how to deploy the Deepseek-r1 LLM using Ollama on bare metal Kubernetes with Talos and Omni’s Image Factory. → www.youtube.com/watch?v=HiDW...
Want to talk more about it? Find our team at #KubeCon!
Here's how to deploy the Deepseek-r1 LLM using Ollama on bare metal Kubernetes with Talos and Omni’s Image Factory. → www.youtube.com/watch?v=HiDW...
Want to talk more about it? Find our team at #KubeCon!
www.interconnects.ai/p/the-new-rl...
www.interconnects.ai/p/the-new-rl...
new research from Princeton shows several factors that complect benchmarking
agents will:
- take shortcuts
- take overly expensive actions
- hardcode answers
also, token efficiency doesn’t translate to cost reduction
arxiv.org/abs/2510.11977
new research from Princeton shows several factors that complect benchmarking
agents will:
- take shortcuts
- take overly expensive actions
- hardcode answers
also, token efficiency doesn’t translate to cost reduction
arxiv.org/abs/2510.11977
huggingface.co/deepseek-ai/...
huggingface.co/deepseek-ai/...
See our tracker geni.us/IIB-LLMs
made with @vizsweet
See our tracker geni.us/IIB-LLMs
made with @vizsweet
35 tok/s
qwen-coder-30b
59 tok/s
gemma-3n-e4b
42 tok/s
gpt-oss-20b
57 tok/s
gpt-oss-120b
27 tok/s
35 tok/s
qwen-coder-30b
59 tok/s
gemma-3n-e4b
42 tok/s
gpt-oss-20b
57 tok/s
gpt-oss-120b
27 tok/s
The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6
www.youtube.com/watch?v=rNlU...
The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6
www.youtube.com/watch?v=rNlU...
i get that people in AI are bad at branding, but are they really *this* bad?
afaict the next one is V4, but they got so much publicity with R1..
i get that people in AI are bad at branding, but are they really *this* bad?
afaict the next one is V4, but they got so much publicity with R1..
"2.2.3. Training Template
To train DeepSeek-R1-Zero, we begin by designing a straightforward template that guides the base model to adhere to our specified instructions."
"2.2.3. Training Template
To train DeepSeek-R1-Zero, we begin by designing a straightforward template that guides the base model to adhere to our specified instructions."
#Java #CodeGen #genai #llm
#Java #CodeGen #genai #llm
Main Link | Techmeme Permalink
Main Link | Techmeme Permalink
Il partner di Huawei era l'élite della Zhejiang University
l'alma mater del fondatore di DeepSeek ⬇️
Il partner di Huawei era l'élite della Zhejiang University
l'alma mater del fondatore di DeepSeek ⬇️
Counterpoint: It cost a lot more than $294k.
🤷♂️
www.theregister.com/2025/09/19/d...
Counterpoint: It cost a lot more than $294k.
🤷♂️
www.theregister.com/2025/09/19/d...
go.nature.com/41WGjPu
go.nature.com/41WGjPu