Suraj Deshmukh | सुरज देशमुख
banner
suraj.io
Suraj Deshmukh | सुरज देशमुख
@suraj.io
@Microsoft.com | ex-@kinvolkio ex-@RedHat | bibliophile | He/Him | Opinions are my own.

🟥 🟩
🟦 🟨
Come see us (me & Yuhan Liu) tomorrow for our talk.

Specifically, Wednesday November 12, 2025 5:30pm - 6:00pm EST at Building B | Level 5 | Thomas Murphy Ballroom 1.

More info: sched.co/27FcQ #kubecon #vllm
KubeCon + CloudNativeCon North America 2025: LLMs on Kubernetes: Squeeze 5x GPU Effic...
View more about this event at KubeCon + CloudNativeCon North America 2025
sched.co
November 11, 2025 at 7:52 PM
Building a tool to copy-paste share terminal sessions using Claude Code for web
open.substack.com/pub/simonw/p...
Building a tool to copy-paste share terminal sessions using Claude Code for web
Plus Living dangerously with Claude, and prompt injection risks for ChatGPT Atlas
open.substack.com
October 24, 2025 at 8:07 PM
Join me and Yuhan Liu for our talk at the upcoming #Kubecon NA 2025 in Atlanta: sched.co/27FcQ we will talk about increasing efficency while serving #LLMs using #vLLM & #LMCache!
KubeCon + CloudNativeCon North America 2025: LLMs on Kubernetes: Squeeze 5x GPU Effic...
View more about this event at KubeCon + CloudNativeCon North America 2025
sched.co
October 15, 2025 at 10:29 PM
Using Claude Code but with Github Copilot hosted Claude models:
github.com/surajssd/dot...

TFS @nilekh.bsky.social
github.com
October 14, 2025 at 10:06 PM
Claude Code: Tips and Tricks

youtu.be/HSkLeECsBcw?...
Claude Code: Tips and Tricks
YouTube video by Anand Tyagi
youtu.be
October 13, 2025 at 10:54 PM
Gang Scheduling for Llama by Anca Agape and Andre Darabanov
www.youtube.com/watch?v=4Bef...
Gang Scheduling for Llama by Anca Agape and Andre Darabanov
YouTube video by @Scale
www.youtube.com
October 1, 2025 at 5:15 PM
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap | NVIDIA Technical Blog developer.nvidia.com/blog/cut-mod...
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap | NVIDIA Technical Blog
Deploying large language models (LLMs) at scale presents a dual challenge: ensuring fast responsiveness during high demand, while managing the costs of GPUs. Organizations often face a trade-off…
developer.nvidia.com
September 29, 2025 at 4:58 AM
The Only Trait for Success in the AI Era—How to Build It youtu.be/xWYb7tImErI?...
The Only Trait for Success in the AI Era—How to Build It | Carnegie Mellon University Po-Shen Loh
YouTube video by EO
youtu.be
September 3, 2025 at 3:18 AM
OSDI '24 - DistServe: Disaggregating Prefill and Decoding for Goodput-optimized LLM serving youtu.be/WwJvecXOeUA?...
OSDI '24 - DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language...
YouTube video by USENIX
youtu.be
August 28, 2025 at 8:09 AM
OSDI '24 - Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve youtu.be/S8rq3pYboZY?...
OSDI '24 - Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
YouTube video by USENIX
youtu.be
August 28, 2025 at 7:47 AM
More Nodes, More Problems: Solving Multi-Host GPU/TPU Scheduling with Dynamic Resource Allocation youtu.be/YqIHESG0suI?...
More Nodes, More Problems: Solving Multi-Host GPU/TPU Scheduli... John Belamaric & Morten Torkildsen
YouTube video by CNCF [Cloud Native Computing Foundation]
youtu.be
August 28, 2025 at 7:28 AM
Extending Kubernetes for AI | Lessons Learned From Platform Engineering
youtu.be/d9K5PSsHtDg?...
Extending Kubernetes for AI | Lessons Learned From Platform... - Susan, Lucy, Andrea, Etienne, Tim
YouTube video by CNCF [Cloud Native Computing Foundation]
youtu.be
August 28, 2025 at 7:26 AM
You Need to Be Bored. Here's Why.
www.youtube.com/watch?v=orQK...
You Need to Be Bored. Here's Why.
YouTube video by Harvard Business Review
www.youtube.com
August 27, 2025 at 1:53 PM
You can use ChatGPT and other models on a flight using onboard free WiFi via WhatsApp.

Use MetaAI out of the box or save these contacts:

- ChatGPT 1800 242 8478
- Microsoft Copilot +1 (877) 224-1042
August 27, 2025 at 12:51 PM
Andrej Karpathy: Software Is Changing (Again)
youtu.be/LCEmiRjPEtQ?...
Andrej Karpathy: Software Is Changing (Again)
YouTube video by Y Combinator
youtu.be
August 23, 2025 at 8:16 AM
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
www.youtube.com/live/Bh-jlh5...
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
YouTube video by PyTorch
www.youtube.com
July 25, 2025 at 3:49 AM
The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking
arxiv.org/html/2506.23...
The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking
arxiv.org
July 25, 2025 at 2:48 AM