Lightnews — Scholar-powered news

Awakari

@bluesky.awakari.com

Weibo's VibeThinker-1.5B outperforms DeepSeek-R1 with $7,800 post-training https:// venturebeat.com/ai/weibos-new- open-source-ai-model-vibethinker-1-5b-outperforms-deepseek-r1-on?utm_source=flipboard&utm_medium=activitypub

Interest | Match | Feed

Origin

flipboard.com

November 12, 2025 at 8:47 PM

AI Daily Post

@aidailypost.com

VibeThinker‑1.5B just outpaced DeepSeek‑R1, hitting $7.8K performance and matching bigger models on math and code tasks. Curious how it runs on edge devices? Dive into the details! #VibeThinker1_5B #DeepSeekR1 #GPQA

🔗 aidailypost.com/news/weibos-...

November 12, 2025 at 8:05 PM

pradi005.bsky.social

@pradi005.bsky.social

Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget

Another day at the end of 2025, another impressive result from a Chinese company in open source artificial intelligence. The AI division of Chinese social networking company Weibo recently…

Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget

Another day at the end of 2025, another impressive result from a Chinese company in open source artificial intelligence. The AI division of Chinese social networking company Weibo recently released its open source VibeThinker-1.5b-1.5 billion parameter large language model (LLM), a fine-tuned version of rival Chinese tech firm Alibaba's Qwen2.5-Math-1.5b. It is now available for free download and use by researchers and enterprise developers – even for commercial purposes – under a permissive MIT license on Hugging Face, GitHub, and ModelScope, along with a technical report on the open access science publishing site arxiv.org.

cnznews.com

November 12, 2025 at 7:42 PM

デイリーHuggingFaceトレンド

@huggingfacetrends.bsky.social

今日のHuggingFaceトレンド

salakash/SamKash-Tolstoy
「SamKash-Tolstoy」は、ロシア文学専用に開発された軽量なLoRAアダプター（LLM）です。
DeepSeek R1 Distill Qwen 1.5Bをベースとし、著作権切れのロシア古典475冊や専門的な評論記事で学習されています。
トルストイやドストエフスキーなどの文体、テーマ、歴史的レジスターに特化しており、作家や学者が創作や分析、講義資料作成などに利用することを目的としています。

salakash/SamKash-Tolstoy · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 12, 2025 at 10:20 AM

IA

@iabots.bsky.social

📢 DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto

👉 https://btz.es/5cpfenK

November 11, 2025 at 9:00 PM

Eliasv!l3

@svil3.bsky.social

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto

A diferencia de las grandes empresas de inteligencia artificial de Estados Unidos que a menudo tienen algo catastrófico que decir sobre cómo el futuro de la...

www.genbeta.com

November 11, 2025 at 3:25 PM

𝕏 Demonoid (antes Larry)

@zerommx.bsky.social

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto www.genbeta.com/actualidad/d...

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto

A diferencia de las grandes empresas de inteligencia artificial de Estados Unidos que a menudo tienen algo catastrófico que decir sobre cómo el futuro de la...

www.genbeta.com

November 11, 2025 at 2:16 PM

Alexander Doria

@dorialexander.bsky.social

No we used open weight models. Initial synth seed from R1/GPT-OSS and then finetuning of qwen/deepseek-prover.

November 11, 2025 at 10:55 AM

Xoxe 🇵🇸

@xoxe.es

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. A diferencia de los CEOS Estadounidenses, no vende humo sino preocupación por el impacto de la IA sobre la humanidad

www.genbeta.com/actualidad/d...

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto

A diferencia de las grandes empresas de inteligencia artificial de Estados Unidos que a menudo tienen algo catastrófico que decir sobre cómo el futuro de la...

www.genbeta.com

November 11, 2025 at 10:48 AM

Genbeta

@genbeta.bsky.social

DeepSeek R1 apareció en el mundo como un boom y la empresa lleva casi un año en silencio. Ahora un líder ha hablado sobre su impacto https://www.genbeta.com/p/325004

November 11, 2025 at 9:47 AM

concertypin

@euroka.moe

1. DeepSeek R1 (0528 아님) API를 가져온다
2. 소설을 쓰게 한다
3. 혼돈을 음미한다
4. 체한다

November 11, 2025 at 4:42 AM

Aran Nayebi

@anayebi.bsky.social

In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.

November 10, 2025 at 8:46 PM

Últimas Noticias

@unoticias.bsky.social

Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona

Modelos de lenguaje aún confunden las creencias con los hechos

Un estudio demuestra que sistemas como GPT-4o y DeepSeek R1 no logran reconocer de forma fiable las creencias falsas en primera persona

ultimasnoticias.com.ve

November 10, 2025 at 8:17 PM

Ilyas Iqbal

@ilyasiqbal.bsky.social

Kimi K2 Thinking comparison with GPT-5, DeepSeek R1, Claude Opus 4, Qwen 3, and Grok 4 and how it has achieved the same/ better level reasoning at 10x lower cost.

Discover benchmarks, business applications, pricing, implementation strategies and more here:

ilyasiqbal.com/2025/11/09/k...

Kimi K2 Thinking: A New Frontier for Agentic AI, Benchmarks and Pricing - Ilyas Iqbal

Kimi K2 Thinking revolutionizes AI with open-source GPT-5 level reasoning at 10x lower cost. Discover benchmarks, business applications, and implementation strategies.

ilyasiqbal.com

November 9, 2025 at 9:10 PM

Solidot

@solidot.bsky.social

Instruct v0.2、Qwen 2.5 7B Instruct、Gemma 3 4B Instruct、DeepSeek-R1-Distill-Llama-8B 和 Apertus-8B-2509，发现他们开发的分类器能以 70%-80% 的准确率识别出 AI 生成的回复。

November 9, 2025 at 2:31 PM

isupermantw.bsky.social

@isupermantw.bsky.social

DeepSeek R1 模型有什麼特色

DeepSeek R1以在地化的繁體中文理解為核心，結合高效推理與輕量化部署，適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組，以及智慧城市數據的安全治理，資料留在本地，符合台灣個資法與雲端合規。多語支援與即時多模態能力，提升客服、技術支援與分析效率，為台灣用戶提供穩定、低延遲的人工智慧解決方案。

DeepSeek R1 模型有什麼特色

DeepSeek R1以在地化的繁體中文理解為核心，結合高效推理與輕量化部署，適合在台灣企業的邊緣與雲端場景。具備本地語境訓練、金融與製造行業模組，以及智慧城市數據的安全治理，資料留在本地，符合台灣個資法與雲端合規。多語支援與即時多模態能力，提升客服、技術支援與分析效率，為台灣用戶提供穩定、低延遲的人工智慧解決方案。

www.isuperman.tw

November 8, 2025 at 11:49 PM

Thomas Edwards

@mpeg2tom.bsky.social

“large reasoning models, such as OpenAI’s o1 Jaech et al. (2024), Qwen-QwQ Team , and DeepSeek-R1 Team (2024), have demonstrated impressive stepwise reasoning capabilities over long sequences through large-scale reinforcement learning.” arxiv.org/abs/2502.04644

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execu...

arxiv.org

November 8, 2025 at 4:05 PM

snuow

@snuow.bsky.social

過去に作った動画です。
無料で使える最強AI決定戦！GPT OSS・Qwen3・DeepSeek R1の出力品質を比較してみた
🔥 GPT-OSS vs Qwen3 vs DeepSeek-R1 徹底比較！オープンソースAI最強決定戦 🔥

今回はOpenAIが公開し...
URL: https://www.youtube.com/watch?v=rLb71_GPQLI

November 8, 2025 at 12:18 PM

Tim Kellogg

@timkellogg.me

this morning, X is saturated with people from US claiming that their favorite unknown benchmark (that happens to show K2 trailing US models) is actually the best single benchmark to watch

lol notice how they clipped off the top 12

A leaderboard-style table ranking AI models by performance percentage.

Rank Model Score Organization
13th o1-preview 41.7% OpenAI
14th Claude 3.5 Sonnet 10-22 41.4% Anthropic
15th Gemini 2.5 Flash (latest) 41.2% Google
16th DeepSeek R1 05/28 40.8% DeepSeek
17th o1-2024-12-17 (high) 40.1% OpenAI
18th DeepSeek V3.1 40.0% DeepSeek
19th Kimi K2 Thinking (NEW) 39.6% Moonshot AI

The table shows incremental differences between model scores, with Kimi K2 Thinking newly added to the list at 19th place, just below DeepSeek V3.1.

November 8, 2025 at 12:10 PM

Corey Rayburn Yung

@coreyryung.bsky.social

I know my Titan XP is old, but I'm a disappointed that Deepseek-R1 requires 2 of them just to do inference at the int4 level. I would need 40 to do full training at float32! Madness.

November 7, 2025 at 6:19 PM

Amanda Bertsch

@abertsch.bsky.social

Models show varying error patterns. Claude and some GPT-family models underperform on tasks that require outputting dates; Gemini and Deepseek-R1 frequently over-reason and fail to return an answer at all on Oolong-synth, although Gemini is the best model on Oolong-real.

Score by answer type and task type for Oolong-synth. The month+year and date types are the hardest for many models, corresponding with the difficulty of the timeline tasks.

November 7, 2025 at 5:07 PM

adr (goddamit yes)

@jbfink.bsky.social

So rather than "never mention it again" - I assume that poster was talking about DeepSeek-R1 - they haven't *been in the news* like the R1 release was, but they've been doing the work - DeepSeek and others - and putting it out there.

November 7, 2025 at 12:42 PM

PostCandide

@postcandide.151e.org

Compared to DeepSeek R1 release, K2 Thinking seems to be making relatively few waves.

I guess most peeps are default-assuming the benchmarks are seriously gamed.

November 7, 2025 at 12:27 PM

Rohit Kumar Tiwari

@analyticalrohit.bsky.social

Best breakdown of modern LLM architectures

From DeepSeek to GPT-OSS, it’s all here ↓

Covers every flagship model

1️⃣ DeepSeek V3/R1
2️⃣ OLMo 2
3️⃣ Gemma 3
4️⃣ Mistral Small 3.1
5️⃣ Llama 4
6️⃣ Qwen3
7️⃣ SmolLM3
8️⃣ Kimi 2
9️⃣ GPT-OSS

#ArtificialIntelligence #MachineLearning #DeepLearning #DataScience #Analytics

November 7, 2025 at 12:27 PM

Tim Kellogg

@timkellogg.me

K2-Thinking is SOTA, top model in agentic tool calling

A horizontal bar chart titled “τ²-Bench Telecom (Agentic Tool Use)” comparing AI model performance across vendors.

Each bar shows a model’s accuracy percentage, color-coded by provider.

From left to right:
• Kimi K2 Think — 93% (blue, highest)
• GPT-5 (high) — 87% (black)
• MiniMax-M2 — 87% (pink)
• GPT-5 (base) — 85%
• Claude 4.5 Sonnet — 78%
• Grok-1 — 75%
• Kimi K2 0905 — 73%
• Claude 4.1 Opus — 71%
• GLM-4-9B — 71%
• Abel-v1.15 / 1.85B Thinker — 68%
• gpt-oss-210D (high) — 66%
• Grok 4 (test) — 66%
• Kimi K2 — 61%
• Claude 4.5 Haiku — 55%
• Gemini 2.5 Pro — 54%
• Qwen 2.5 32B — 53%
• Amazon Bedrock Medistinct-12 — 52%
• DeepSeek R1 025B — 37%
• DeepSeek V3 24B — 34%
• Nim Llama Super-490B v1.5 — 28%
• Llama Maverick — 18% (lowest).

A purple arrow points from MiniMax-M2 (87%) to Kimi K2 Think (93%).
The top-right corner shows “Artificial Analysis” as the source.

November 7, 2025 at 10:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news