Learn more at: https://llm-d.ai
Join contributors from vLLM and llm-d at NVIDIA Dynamo Day to see how the community is building the future of distributed inference.
📍 Virtual & Free 📅 Jan 22 | 8AM–1PM PT 🔗 nvevents.nvidia.com/dynamoday
Join contributors from vLLM and llm-d at NVIDIA Dynamo Day to see how the community is building the future of distributed inference.
📍 Virtual & Free 📅 Jan 22 | 8AM–1PM PT 🔗 nvevents.nvidia.com/dynamoday
- Model inference (KServe, vLLM, @llm-d.ai)
- @kubernetes.io AI Conformance Program
- @kubefloworg.bsky.social & @argoproj.bsky.social
- @cncf.io TAG Workloads Foundation
- Open source, cloud-native, AI infra and systems
- Model inference (KServe, vLLM, @llm-d.ai)
- @kubernetes.io AI Conformance Program
- @kubefloworg.bsky.social & @argoproj.bsky.social
- @cncf.io TAG Workloads Foundation
- Open source, cloud-native, AI infra and systems
1. Cloud Native AI + Kubeflow Day: Welcome + Opening Remarks: https://sched.co/2DZN3
2. Project Lightning Talk: Evolving KServe: https://sched.co/2EFyW
1. Cloud Native AI + Kubeflow Day: Welcome + Opening Remarks: https://sched.co/2DZN3
2. Project Lightning Talk: Evolving KServe: https://sched.co/2EFyW
We launched our newsletter publicly last year to share our contributions to upstream communities from our Red Hat AI teams. We’ve gained over 𝟭𝟮𝟬𝟬 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲𝗿𝘀!
We launched our newsletter publicly last year to share our contributions to upstream communities from our Red Hat AI teams. We’ve gained over 𝟭𝟮𝟬𝟬 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲𝗿𝘀!
Huge shoutout to @vllm_project and @IBMResearch on the new KV Offloading Connector. We’re seeing up to 9x throughput gains on H100s and massive TTFT reductions. 🧵
blog.vllm.ai/2026/01/08/k...
Huge shoutout to @vllm_project and @IBMResearch on the new KV Offloading Connector. We’re seeing up to 9x throughput gains on H100s and massive TTFT reductions. 🧵
blog.vllm.ai/2026/01/08/k...
Check out this breakdown by Cedric Clyburn from Red Hat on how llm-d intelligently routes distributed LLM requests.
🔹 Solves "round robin" congestion
🔹 Disaggregates P/D to save costs
www.youtube.com/watch?v=CNKG...
Check out this breakdown by Cedric Clyburn from Red Hat on how llm-d intelligently routes distributed LLM requests.
🔹 Solves "round robin" congestion
🔹 Disaggregates P/D to save costs
www.youtube.com/watch?v=CNKG...
This demo shows a near 90% KV cache hit rate, a smoother time to first token, and a ~500ms drop in P95 tail latency.
https://www.youtube.com/watch?v=H2N4c-E-iw8
This demo shows a near 90% KV cache hit rate, a smoother time to first token, and a ~500ms drop in P95 tail latency.
https://www.youtube.com/watch?v=H2N4c-E-iw8
@RedHat_AI is hiring a variety of roles around open source LLM inference. https://x.com/RedHat_AI/status/2001362060777586744
@RedHat_AI is hiring a variety of roles around open source LLM inference. https://x.com/RedHat_AI/status/2001362060777586744
We’ve curated our recent technical deep dives and talks from KubeCon, PyTorch Conf, and more into one central hub.
Learn Kubernetes-native distributed inference from the source. 🧵👇
https://llm-d.ai/videos
We’ve curated our recent technical deep dives and talks from KubeCon, PyTorch Conf, and more into one central hub.
Learn Kubernetes-native distributed inference from the source. 🧵👇
https://llm-d.ai/videos
👇 https://x.com/TerryTangYuan/status/1992995298105290794
👇 https://x.com/TerryTangYuan/status/1992995298105290794
Enter llm-d ⚡️
Join @RedHat_AI's Rob Shaw for a deep dive into this open-source framework for optimizing distributed LLM inference using a "well-lit paths" approach
👉 https://www.youtube.com/watch?v=_xAXb70d4-0
Enter llm-d ⚡️
Join @RedHat_AI's Rob Shaw for a deep dive into this open-source framework for optimizing distributed LLM inference using a "well-lit paths" approach
👉 https://www.youtube.com/watch?v=_xAXb70d4-0
⭐️ A Kubernetes-native distributed LLM inference framework built for performance and scalability.
Join the community today!
https://llm-d.ai/docs/community
⭐️ A Kubernetes-native distributed LLM inference framework built for performance and scalability.
Join the community today!
https://llm-d.ai/docs/community
Join the llm-d communites sessions exploring how to route and scale LLM inference on Kubernetes.
From prefix-aware routing to multi-accelerator deployments - come learn what we've been building.
Schedule: https://llm-d.ai/docs/community/events
Join the llm-d communites sessions exploring how to route and scale LLM inference on Kubernetes.
From prefix-aware routing to multi-accelerator deployments - come learn what we've been building.
Schedule: https://llm-d.ai/docs/community/events
Your input on how it compares to the previous Helmfiles approach is crucial for our v0.4 release cycle.
Please share your thoughts in our short form! 👇
📝 https://t.co/HGDIusHtBu
Your input on how it compares to the previous Helmfiles approach is crucial for our v0.4 release cycle.
Please share your thoughts in our short form! 👇
📝 https://t.co/HGDIusHtBu
Want to learn about efficient, scalable LLM inference directly from the experts?
Here's where you can find us this month: 👇
Want to learn about efficient, scalable LLM inference directly from the experts?
Here's where you can find us this month: 👇
Standard round-robin is blind to KV-cache state, leading to cache misses that force costly re-computation of tokens
Our latest post on the llm-d blog dives deep into this problem. 🧵
Standard round-robin is blind to KV-cache state, leading to cache misses that force costly re-computation of tokens
Our latest post on the llm-d blog dives deep into this problem. 🧵
Case in point: an upcoming talk from Jeff Fan at @DigitalOcean diving into next-gen AI infrastructure with llm-d! 👇
Case in point: an upcoming talk from Jeff Fan at @DigitalOcean diving into next-gen AI infrastructure with llm-d! 👇