Glenn K. Lockwood
banner
glennklockwood.mast.hpc.social.ap.brid.gy
Glenn K. Lockwood
@glennklockwood.mast.hpc.social.ap.brid.gy
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.

🌉 bridged from https://mast.hpc.social/@glennklockwood on the fediverse by https://fed.brid.gy/
Reposted by Glenn K. Lockwood
Finally, #jupiter crossed the 1 ExaFLOP/s threshold today. The list is lying to you, though, it's not like it's 1000 PFLOP/s exactly, it's 1000.184 PFLOP/s; the rest got lost to rounding.
The 184 TFLOP/s are pretty much exactly the same as the previous #jsc […]

[Original post on mastodon.social]
November 17, 2025 at 8:24 PM
Reposted by Glenn K. Lockwood
November 20, 2025 at 5:42 PM
Andreas Dilger is now working for The Lustre Collective (https://thelustrecollective.com) after leaving DDN. I am glad to see his leadership continue to drive Lustre into the future. Say what you will about it, Lustre is the standard to which every other #hpc file system is compared.

#sc25
November 18, 2025 at 11:51 PM
When I spent a Thanksgiving week after SC writing the non-MPI layer for Darshan years ago, I thought to myself “surely this work will make me famous!”

I guess my ship finally came in at the PDSW keynote by Rob Ross.

#sc25
November 17, 2025 at 3:27 PM
Apparently I was the first DAOS user to complain about having to refer to DAOS containers by UUIDs, so they added container labels. Don’t know if this is completely true, but I remember voicing this and will accept the credit if Mohamad is willing to give it to me 🙂

(Learned this at my own […]
Original post on mast.hpc.social
mast.hpc.social
November 16, 2025 at 4:43 PM
New SC record: ran into a colleague within 2 minutes of walking into the airport terminal from the curb. Been catching up nonstop straight through takeoff. Conference starts earlier and earlier every year.
November 15, 2025 at 6:27 PM
Chatting with a pal reminded me of a fun pre-SC activity: looking back at old conference takes that aged like milk. Remember this one?

#hpc #zettascale #hedoesntworkthereanymore
November 14, 2025 at 10:40 PM
🎶 It's the most wonderful time of the year 🎶

#sc25
November 14, 2025 at 8:22 PM
VAST and CoreWeave just announced a >$1.1 billion partnership to deliver #ai data services. Mind you, that's a billion in services, not GPUs. Though I can't claim any credit, I'm proud to work for a company that's earned this level of trust from a partner […]
Original post on mast.hpc.social
mast.hpc.social
November 7, 2025 at 12:13 AM
SC25 will be my 12th SC (10th in-person). I've attended and presented on behalf of SDSC, NERSC, Microsoft before, but I've got to say: this year has been the most work and most stress I've ever had around the conference.
November 5, 2025 at 7:23 PM
Google recently posted a promo for using their managed #lustre service to accelerate inferencing via KV caching. Raises questions:

1. What ever happened to Google Managed #daos (ParallelStore)? It performs better than Lustre.

2. Does Gemini use this? Unlikely. See […]
Original post on mast.hpc.social
mast.hpc.social
November 4, 2025 at 4:41 PM
This is like showing up with a new boyfriend the week after the divorce. At least Microsoft is still getting those alimony payments.

https://www.aboutamazon.com/news/aws/aws-open-ai-workloads-compute-infrastructure
November 3, 2025 at 4:11 PM
NVIDIA, Oracle, and US DOE are named in the headline. Argonne is not. I don’t think this is an Argonne system.

https://nvidianews.nvidia.com/news/nvidia-oracle-us-department-of-energy-ai-supercomputer-scientific-discovery
October 30, 2025 at 3:33 PM
"AMD Powers U.S. Sovereign AI Factory Supercomputers" - What exactly is "US sovereign AI?" I think the whole point of "sovereign AI" is "not dependent on the USA." All AI is, by default, sovereign to the US […]
Original post on mast.hpc.social
mast.hpc.social
October 30, 2025 at 2:32 AM
Few more details on ATS-5/Mission #hpc to be installed at LANL. Confirmed as next-gen Cray GX5000 with Vera Rubin + XDR InfiniBand. Confirms that GX5000 can do both InfiniBand and Slingshot++.

The messaging is funny; it "will build on the success" of LANL's Venado (non-ATS; GH200) system" but […]
Original post on mast.hpc.social
mast.hpc.social
October 29, 2025 at 6:04 PM
Kinda funny that OpenAI owns less of OpenAI than Microsoft.

OpenAI (the nonprofit) holds a $130B equity stake in OpenAI (the public benefit corporation) while Microsoft holds $135B.

https://openai.com/index/built-to-benefit-everyone/
October 29, 2025 at 1:47 AM
NVIDIA announced seven new DOE #hpc systems for #ai, including a 100K Blackwell system at Argonne. Oracle is a partner, just like with yesterday's OLCF-6/Discovery announcement.

Details on the ALCF systems are scant. This procurement was out of band of the Aurora follow-on.

The LANL systems […]
Original post on mast.hpc.social
mast.hpc.social
October 28, 2025 at 6:23 PM
Article implies AMD is partly paying DOE for the #hpc systems being deployed at ORNL: “The Department of Energy will host the computers, the companies will provide the machines and capital spending”

Is AMD buying market share, similar to its arrangement with OpenAI? […]
Original post on mast.hpc.social
mast.hpc.social
October 28, 2025 at 12:02 AM
Glad to see DAOS getting another round of DOE investment.

Highlights a diverging gap between government AI infra and commercial AI infra. I have never met an AI customer who's found relevance in DAOS. Maybe .gov can change that […]
Original post on mast.hpc.social
mast.hpc.social
October 27, 2025 at 11:24 PM
I published my first (of many) technical blogs for VAST. This one gives a quantitative, real-world perspective on how much checkpoint bandwidth is required to train trillion-parameter-scale models (hint: less than many have suggested) […]
Original post on mast.hpc.social
mast.hpc.social
October 23, 2025 at 1:23 AM
KV caching for LLM training is just a specialized memoization problem which is not uncommon across #hpc, and its implementation remains pretty primitive but is optimizing quickly. Here's an example of scavenging "perforated" results of memoized attention layers […]
Original post on mast.hpc.social
mast.hpc.social
October 21, 2025 at 5:06 PM
Love to see DAOS getting mainstream attention, but it's long road ahead before it can compete in the GPU cloud market. Everyone says their performance is the best; that doesn't differentiate storage for #ai. It's the hard problems - integration, reliability, etc […]
Original post on mast.hpc.social
mast.hpc.social
October 16, 2025 at 6:23 PM
I'll be talking about how hyperscale #ai resembles #hpc workflows at an #sc25 Exhibitor Forum session on Thurs 11/20.

Excited to be on stage at SC again. Aiming to maintain the same level of technical quality as before I left .gov.

Details […]
Original post on mast.hpc.social
mast.hpc.social
October 16, 2025 at 12:28 AM
Something eye-opening I've learned since leaving MSFT/joining VAST: how many companies out there just want to turn space+power into $$$ via GPUs. They have zero interest in #ai or technology; it's just the next BTC. This partnership is for them. They can get a turn-key GPU datacenter with next […]
Original post on mast.hpc.social
mast.hpc.social
September 30, 2025 at 9:45 PM
Although I am not officially attending OCP this year, I will be participating in an early morning breakfast panel that will discuss future directions of #ai infrastructure. If you'll be in San Jose that week, consider attending!

https://solidigm.techarena.ai/ocp-panel/

#shameless #selfpromotion
September 29, 2025 at 10:11 PM