Lightnews — Scholar-powered news

Nawaf Alageel

@nawafalageel.bsky.social

7 followers 22 following 8 posts

Trying to teach computers how to see through math.

Posts Replies Media Videos

Nawaf Alageel

@nawafalageel.bsky.social

Yes, in some cases CPU and GPU users can be different. However, it wasn't the issue. Oversubscribing CPU memory was a concern initially, but again we never encountered such an issue.

July 19, 2025 at 3:35 PM

Nawaf Alageel

@nawafalageel.bsky.social

Yes, that was the main setup. But soon the question came up: if Project X (container/person) isn’t using GPUs this week, shouldn’t we allocate them to other projects? From then on, allocation/monitoring moved from node level to container level, tied directly to a person or project.

July 16, 2025 at 5:08 PM

Nawaf Alageel

@nawafalageel.bsky.social

Yep yup, you are right! The team used to run "--gpus device=0,2", and we would flag each container with gpu ID in the name. But as the workload grew, they started to do "--gpus all". That’s when things started getting messy.

July 16, 2025 at 11:16 AM

Nawaf Alageel

@nawafalageel.bsky.social

You can find the tool that I built here: github.com/nawafalageel...

I hope it can helps you squeeze more out of your expensive GPUs.

#Nvidia #GPU #Docker #MachineLearning #MLOps #CloudDev #DataScience #OpenSource #LLM #AI #GenAI #AIagents

github.com

July 15, 2025 at 11:39 AM

Nawaf Alageel

@nawafalageel.bsky.social

Now, instead of guessing or jumping through hoops to find the answer, we can see the tool can tell us:
"Container X is occupied 12GB on GPU #1 with Y memory utilization"

We went from blindfolded resource to actual insight.

And our question is finally answered 🥳🎉

July 15, 2025 at 11:39 AM

Nawaf Alageel

@nawafalageel.bsky.social

- Nvidia tools (e.g., nvidia-smi) show processes, but not container names.
- Docker tools (e.g, docker status) show CPU and memory, but no GPU data.

We would still be blindfolded. And our question is not answered yet!

July 15, 2025 at 11:39 AM

Nawaf Alageel

@nawafalageel.bsky.social

When it comes to monitoring GPU usage in containerized environment, Nvidia and Docker both of them provide good out-of-the-box tools, but they aren't compatible.

None of them can answer my simple question:
"Which container uses which GPU?"

July 15, 2025 at 11:39 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news