Lightnews — Scholar-powered news

David Berenstein

@davidberenstein.bsky.social

🔥 Bespoke curator: Synthetic Data Curation for Post-Training & Structured Data Extraction

Create synthetic data pipelines with easy!
- Retries and caching included
- inference via LiteLLM, vLLM, and popular batch APIs
- asynchronous operations

🔗 URL: buff.ly/ajPRT1l

April 10, 2025 at 12:00 PM

Kosseila (CloudDude)

@clouddude.bsky.social

🚀#BlogSeries #vllm🔥
𝐯𝐋𝐋𝐌 𝐟𝐨𝐫 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐏𝐚𝐫𝐭 𝟑:📖 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐎𝐩𝐭𝐢𝐨𝐧𝐬
Learn to deploy #vLLM everywhere! Even on CPU🤫

✅Platform & model Support Matrix
✅Install on GPU & CPU
✅Build Wheel from scratch | Python vLLM package
✅Docker/Kubernetes Deployment
✅Running vLLM server (Offline + Online inference)

vLLM for beginners: Deployment Options (PartIII) - Cloudthrill

In this final part, we’ll shift from theory to practice, covering how to deploy vLLM across different environments, from source builds to docker containers. In this series, we aim to provide a solid foundation of vLLM core concepts to help you understand how it works and why it’s emerging as a de facto choice for LLM deployment.

cloudthrill.ca

August 5, 2025 at 1:49 PM

CloudThrill

@cloudthrill.bsky.social

#NewBlog part 2 of our #VLLM blog series 🔥
💎What makes #VLLM the Rolls Royce of inference? 👇🏻check our new blog, where we break it down in 5 performance-packed layers😎#TherYouGo

Kosseila (CloudDude) @clouddude.bsky.social · Jul 2

🚀#NewBlog #vllm🔥
𝐯𝐋𝐋𝐌 𝐟𝐨𝐫 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐏𝐚𝐫𝐭 𝟐:📖𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 & 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧s
💎 What makes #vLLM the Rolls Royce of inference?
👉check it out: cloudthrill.ca/what-is-vllm...

✅ #PagedAttention #PrefixCaching #ChunkedPrefill
✅ #SpeculativeDecoding #FlashAttention #lmcache
✅ Tensor & #PipelineParallelism⚡

vLLM for beginners: Key Features & Performance Optimization(PartII) - Cloudthrill

In this series, we aim to provide a solid foundation of vLLM core concepts to help you understand how it works and why it’s emerging as a defacto choice for LLM deployment.

cloudthrill.ca

July 2, 2025 at 3:29 PM

Eric Florenzano

@ericflo.bsky.social

I recently got one of these on discount www.gmktec.com/products/amd... I've been stubbornly trying to build my own pytorch-rocm with rocm 7 beta with no luck, so no vLLM yet, but have had good luck with the Vulkan backend in llama.cpp, and I've set up LM Studio with JIT model loading. It's 🔥

AMD Ryzen™ AI Max+ 395 --EVO-X2 AI Mini PC

EVO-X2 AMD Ryzen™ Al Max+ 395 Mini PC, the World’s First Windows 11 AI+ PC APU Supporting 70B LLM, Experience relentless power with the AMD Ryzen™ AI Max+ 395. With 16 cores and 32 threads built on th...

www.gmktec.com

July 31, 2025 at 4:18 PM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

GitHub - LMCache/LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache

github.com

August 17, 2025 at 11:08 PM

Techmeme X Chatter

@xchatter.techmeme.com

This tweet appeared under this Techmeme headline:

@reach_vb:

Letsss gooo! DeepSeek just released a 3B OCR model on Hugging Face 🔥 Optimised to be token efficient AND scale ~200K+ pages/day on A100-40G Same arch as DeepSeek VL2 Use it with Transformers, vLLM and more 🤗 https://huggingface.co/...

October 20, 2025 at 7:32 PM

Jack Hessel

@jmhessel.bsky.social

Blue skies 🦋 , hot (?) takes 🔥

Constrained output for LLMs, e.g., outlines library for vllm which forces models to output json/pydantic schemas, is cool!

But, because output tokens cost much more latency than input tokens, if speed matters: bespoke, low-token output formats are often better.

December 3, 2024 at 10:25 PM

Kosseila (CloudDude)

@clouddude.bsky.social

🚀#NewBlog #vllm🔥
𝐯𝐋𝐋𝐌 𝐟𝐨𝐫 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐏𝐚𝐫𝐭 𝟐:📖𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 & 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧s
💎 What makes #vLLM the Rolls Royce of inference?
👉check it out: cloudthrill.ca/what-is-vllm...

✅ #PagedAttention #PrefixCaching #ChunkedPrefill
✅ #SpeculativeDecoding #FlashAttention #lmcache
✅ Tensor & #PipelineParallelism⚡

vLLM for beginners: Key Features & Performance Optimization(PartII) - Cloudthrill

In this series, we aim to provide a solid foundation of vLLM core concepts to help you understand how it works and why it’s emerging as a defacto choice for LLM deployment.

cloudthrill.ca

July 2, 2025 at 3:19 PM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

GitHub - LMCache/LMCache: Redis for LLMs

Redis for LLMs. Contribute to LMCache/LMCache development by creating an account on GitHub.

github.com

July 11, 2025 at 9:01 PM

Kosseila (CloudDude)

@clouddude.bsky.social

🎙️ Join our 🔴𝐓𝐞𝐜𝐡 𝐁𝐞𝐚𝐭𝐬 𝐋𝐢𝐯𝐞 Show!
🗓️ Thursday 17th 11:30 AM EDT
🎯 A chill livestream unpacking LLM #Quantization: #vllm vs #ollama. Learn about the What & How.

🔥Dope guest stars:
#bartowski from arcee.ai & Eldar Kurtic from #RedHat

🔗Stream on YouTube & Linkedin:
www.youtube.com/watch?v=XTE0...

🔴TechBeats live : LLM Quantization "vLLM vs. Ollama"

YouTube video by CloudDude

www.youtube.com

July 10, 2025 at 2:33 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 vllm-project / vllm
⭐ 34,172 (+145)
🗒 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

github.com

January 21, 2025 at 12:01 PM

Embedded LLM

@embeddedllm.bsky.social

🔥vLLM v0.6.4 is live! This release delivers significant advancements in model compatibility, hardware acceleration, and core engine optimizations.🔥

🤯 Expanded model support? ✅
💪 Intel Gaudi integration? ✅
🚀 Major engine & torch.compile boosts? ✅
🔗 github.com/vllm-project...

Release v0.6.4 · vllm-project/vllm

Highlights Significant progress in V1 engine core refactor (#9826, #10135, #10288, #10211, #10225, #10228, #10268, #9954, #10272, #9971, #10224, #10166, #9289, #10058, #9888, #9972, #10059, #9945,...

github.com

November 17, 2024 at 8:59 AM

Awakari

@bluesky.awakari.com

Origin

github.com

May 28, 2025 at 6:33 AM

CloudThrill

@cloudthrill.bsky.social

#NewBlog: final part of our #VLLM blog series🔥
💎This, we shift from theory to practice, covering #vLLM installs across platforms? check our new blog, where we break it down in 5 sections😎#TherYouGo

Kosseila (CloudDude) @clouddude.bsky.social · Aug 5

🚀#BlogSeries #vllm🔥
𝐯𝐋𝐋𝐌 𝐟𝐨𝐫 𝐁𝐞𝐠𝐢𝐧𝐧𝐞𝐫𝐬 𝐏𝐚𝐫𝐭 𝟑:📖 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐎𝐩𝐭𝐢𝐨𝐧𝐬
Learn to deploy #vLLM everywhere! Even on CPU🤫

✅Platform & model Support Matrix
✅Install on GPU & CPU
✅Build Wheel from scratch | Python vLLM package
✅Docker/Kubernetes Deployment
✅Running vLLM server (Offline + Online inference)

vLLM for beginners: Deployment Options (PartIII) - Cloudthrill

In this final part, we’ll shift from theory to practice, covering how to deploy vLLM across different environments, from source builds to docker containers. In this series, we aim to provide a solid foundation of vLLM core concepts to help you understand how it works and why it’s emerging as a de facto choice for LLM deployment.

cloudthrill.ca

August 5, 2025 at 1:52 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 Alpha-VLLM / LLaMA2-Accessory
⭐ 489 (+174)
🗒 Python

An Open-source Toolkit for LLM Development

GitHub - Alpha-VLLM/LLaMA2-Accessory: An Open-source Toolkit for LLM Development

An Open-source Toolkit for LLM Development. Contribute to Alpha-VLLM/LLaMA2-Accessory development by creating an account on GitHub.

github.com

August 2, 2023 at 3:50 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 vllm-project / vllm
⭐ 54,301 (+143)
🗒 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

github.com

August 7, 2025 at 12:02 PM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

🔥 Red Hat’s vLLM 0.7.0: The AI Compression Breakthrough Changing LLMs Forever! Imagine if you could pare down the complexity of a huge, intricate machine, yet still boost its power and efficie...

#AI #Large #Language #Models #Red #Hat #AI #benchmarks #AI #optimization #enterprise

Origin | […]

Original post on franksworld.com

www.franksworld.com

August 31, 2025 at 2:17 AM

Stephan Janssen

@stephanjanssen.be

Alibaba released quantized models of Qwen3 🔥👇

"Now you can deploy Qwen3 via Ollama, LM Studio, SGLang, and vLLM — choose from multiple formats including GGUF, AWQ, and GPTQ for easy local deployment."

Find all models in the Qwen3 collection on Hugging Face @ huggingface.co/collections/...

May 12, 2025 at 1:54 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 vllm-project / vllm
⭐ 48,168 (+100)
🗒 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

github.com

May 27, 2025 at 4:02 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 vllm-project / vllm
⭐ 46,215 (+128)
🗒 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

github.com

April 30, 2025 at 1:02 PM

gregdiamos.bsky.social

@gregdiamos.bsky.social

🚀 Craylm just dropped - first open source unified LLM training & inference platform. Zero restrictions.
Built on vLLM + Megatron-LM. Create agents that actually learn & improve.
🔥 github.com/tensorwavecl...
✨ Let's advance LLMs together
#AI #OpenSource

February 3, 2025 at 8:11 PM

Embedded LLM

@embeddedllm.bsky.social

🔥 Pixtral Large is now supported on vLLM! 🔥

Run Pixtral Large with multiple input images from day 0 using vLLM.

Install vLLM:
pip install -U VLLM

Run Pixtral Large:
vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8

November 19, 2024 at 12:38 PM

GitHub Trending 🤖

@github-trending.bsky.social

🔥 Hot Repo！ 🔥 (100+ new stars)

📦 GeeeekExplorer / nano-vllm
⭐ 7,420 (+140)
🗒 Python

Nano vLLM

GitHub - GeeeekExplorer/nano-vllm: Nano vLLM

Nano vLLM. Contribute to GeeeekExplorer/nano-vllm development by creating an account on GitHub.

github.com

November 2, 2025 at 7:02 PM

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

GitHub - LMCache/LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Supercharge Your LLM with the Fastest KV Cache Layer - LMCache/LMCache

github.com

July 31, 2025 at 9:55 AM

Acroquest Technology | AI・データ分析・IoT・ChatGPT

@acroquest.bsky.social

🚀 #2024テックトレンド
生成AIと検索技術が大躍進！

AcroquestのYAMALEXチームが贈る
アドベントカレンダー9記事を総括

🔥 必見の3記事：
・Elasticsearch高速部分一致検索
・ColQwen2 PDFマルチモーダル検索
・vLLMでLLM推論を高速化

詳細は👇
acro-engineer.hatenablog.com/entry/2024/1...

#AI #検索 #テクノロジー

2024年アドベントカレンダー振り返り、今年は生成AIと検索が大反響 - Taste of Tech Topics

皆さんこんにちは Acroquestのデータサイエンスチーム「YAMALEX」のチームリーダ、@tereka114です。 YAMALEXチームでは、コンペティションへの参加や自社製品開発、技術研究などに日々取り組んでいます。今年も、アドベントカレンダーとして、9つの記事を本ブログで投稿しました。本記事では、今年の記事の傾向や、私個人が特に面白いと感じた記事を紹介します。本ブログの今年のアドベン...

acro-engineer.hatenablog.com

December 26, 2024 at 4:38 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news