Glenn K. Lockwood
banner
glennklockwood.com
Glenn K. Lockwood
@glennklockwood.com
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.
Tough break for my friends at WEKA and Qumulo. But from the sounds of it, neither side was getting what they used to out of the partnership.

Partnering through channel isn’t a bad way to be, especially when integrating two vendors’ products together.
November 11, 2025 at 3:35 AM
While I agree, I never abide by #1 or #2. Though I don’t plan on wearing western boots this year…St. Louis is a little far removed from the southwest to justify the getup.
1 week until #SC25

3 bits of advice:

1. Prioritise shoes for comfort not smart.

2. Find/force downtime during SC. Enough talks/meetings/receptions done well is better than lots rushed/tired/etc.

3. SC is about people as much as (or more than) technology.

#HPC #Supercomputing
November 11, 2025 at 3:05 AM
VAST and CoreWeave just announced a >$1.1 billion partnership to deliver #AI #data services. Mind you, that's a billion in services, not GPUs. Though I can't claim any credit, I'm proud to work for a company that's earned this level of trust from a partner.

www.reuters.com/technology/n...
www.reuters.com
November 7, 2025 at 12:13 AM
SC25 will be my 12th SC (10th in-person). I've attended and presented on behalf of SDSC, NERSC, Microsoft before, but I've got to say: this year has been the most work and most stress I've ever had around the conference.
November 5, 2025 at 7:22 PM
Reposted by Glenn K. Lockwood
Tasty, but not as nice as these beauties.
November 5, 2025 at 5:50 PM
I never heard of Socks Club before this year, they seem to be crushing it. I now have WEKA socks, VAST socks, and Anyscale socks all from them. Made in the USA, and pretty high quality for conference swag. I should invest.

CC @addisonsnell.bsky.social my fellow sock aficionado.
November 5, 2025 at 4:57 PM
Google recently posted a promo for using their managed Lustre service to accelerate inferencing via KV caching. Raises questions:

1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.

2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
attention
Attention is the mathematical operation within a transformer that allows different parts of the input to figure out how important they are to each other ...
glennklockwood.com
November 4, 2025 at 4:38 PM
I feel so famous now
This post appeared under this Techmeme headline:
This is like showing up with a new boyfriend the week after the divorce. At least Microsoft is still getting those alimony payments.

www.aboutamazon.com/news/aws/aws...
November 3, 2025 at 9:55 PM
This is like showing up with a new boyfriend the week after the divorce. At least Microsoft is still getting those alimony payments.

www.aboutamazon.com/news/aws/aws...
AWS announces new partnership to power OpenAI's AI workloads
Partnership will enable OpenAI to run its advanced AI workloads on AWS’s world-class infrastructure starting immediately.
www.aboutamazon.com
November 3, 2025 at 4:10 PM
The HDD manufacturers opted to not invest in manufacturing to match demand from AI, so there is a shortage of HDDs now. Makes demand (and profit) look good short-term, but HDD futures look grim.
October 30, 2025 at 3:35 PM
NVIDIA, Oracle, and US DOE are named in the headline. Argonne is not. I don’t think this is an Argonne system.

nvidianews.nvidia.com/news/nvidia-...
NVIDIA and Oracle to Build US Department of Energy’s Largest AI Supercomputer for Scientific Discovery | NVIDIA Newsroom
NVIDIA today announced a landmark collaboration with Oracle to build the U.S. Department of Energy (DOE)’s largest AI supercomputer to dramatically accelerate scientific discovery.
nvidianews.nvidia.com
October 30, 2025 at 3:32 PM
So DDN is now selling NVIDIA GPUs in Supermicro chassis with DDN bezels? Is this a vehicle to have more third parties take capital off of NVIDIA’s balance sheet? Could be because I am not versed in NVIDIA’s Data Platform reference architecture, but I don’t understand what this is.
October 30, 2025 at 3:26 PM
"AMD Powers U.S. Sovereign AI Factory Supercomputers" - What exactly is "US sovereign AI?" I think the whole point of "sovereign AI" is "not dependent on the USA." All AI is, by default, sovereign to the US.

www.amd.com/en/newsroom/...
www.amd.com
October 30, 2025 at 2:31 AM
Despite my greatest efforts to not do storage after quitting my job in Azure Storage, I find myself reading IOR source code once again. I guess once you've been type-cast as a storage person, there's no escape.
a man says just when i thought i was out in a kitchen
ALT: a man says just when i thought i was out in a kitchen
media.tenor.com
October 30, 2025 at 12:02 AM
Few more details on ATS-5/Mission #HPC to be installed at LANL. The messaging is funny; it "will build on the success" of LANL's Venado (non-ATS; GH200) system" but "will replace" Crossroads (ATS-4; Sapphire Rapids HBM).

As with Aurora, Intel is swept under the rug.

www.lanl.gov/media/news/1...
Los Alamos National Laboratory announces two new supercomputers | LANL
Los Alamos National Laboratory has selected HPE and NVIDIA as partners on two new supercomputers to be built, delivered and installed in the coming years, with HPE selected as the prime contractor.
www.lanl.gov
October 29, 2025 at 6:02 PM
I find myself a bit out of sorts after all of today’s DOE HPC/AI news from GTC. I don’t understand this new public-private mission; the complex in which I made my career seems strange and foreign now.

Hopefully the people remain the same, and I can take comfort in the technology remaining familiar.
October 29, 2025 at 1:52 AM
Kinda funny that OpenAI owns less of OpenAI than Microsoft.

OpenAI (the nonprofit) holds a $130B equity stake in OpenAI (the public benefit corporation) while Microsoft holds $135B.

openai.com/index/built-...
Built to benefit everyone | OpenAI
By Bret Taylor, Chair of the OpenAI Board of Directors
openai.com
October 29, 2025 at 1:47 AM
I wonder what “at Argonne” really means here.
DOE is partnering with Nvidia and Oracle to build 7 new AI supercomputers to accelerate scientific research and develop agentic AI for discovery.

Two of these systems at Argonne, will together form the DOE's largest AI supercomputing infrastructure.

www.theregister.com/2025/10/28/n...
Nvidia will help build 7 AI supercomputers for for DoE
: 100,000 Blackwell GPUs and 2,200 exaFLOPs make for a big system
www.theregister.com
October 29, 2025 at 1:41 AM
NVIDIA announced seven new DOE #HPC systems for #AI, including a 100K Blackwell system at Argonne. Oracle is a partner, just like OLCF-6/Discovery.

Details on the ALCF systems are scant, and this procurement was out of band of the Aurora follow-on.

Wild times

nvidianews.nvidia.com/news/nvidia-...
NVIDIA and Partners Build America’s AI Infrastructure and Create Blueprint to Power the Next Industrial Revolution
NVIDIA today announced that it is working with the U.S. Department of Energy’s national labs and the nation’s leading companies to build America’s AI infrastructure to support scientific discovery, ec...
nvidianews.nvidia.com
October 28, 2025 at 6:20 PM
Article implies AMD is partly paying DOE for the #HPC systems being deployed at ORNL: “The Department of Energy will host the computers, the companies will provide the machines and capital spending”

Is AMD buying market share, similar to its arrangement with OpenAI?

www.reuters.com/business/ene...
Exclusive: US Department of Energy forms $1 billion supercomputer and AI partnership with AMD | Reuters
The U.S. has formed a $1 billion partnership with Advanced Micro Devices to construct two supercomputers that will tackle large scientific problems ranging from nuclear power to cancer treatments to national security, Energy Secretary Chris Wright and AMD CEO Lisa Su told Reuters.
www.reuters.com
October 28, 2025 at 12:00 AM
Also glad to see DAOS getting a lifeline through this next round of DOE investment.

Highlights a diverging gap between government AI infrastructure and AI industry AI infrastructure. I have never met an AI customer who's expressed any interest in DAOS. Maybe .gov can change that?

#AI #storage
October 27, 2025 at 11:20 PM
I like how the implementors (ORNL and HPE) are mentioned almost as afterthoughts. Speaks to how the current administration is approaching the DOE Office of Science #HPC mission relative to #AI and American innovation.

Regardless, congrats to all parties. This is the result of tremendous effort.
October 27, 2025 at 11:16 PM
I published my first (of many) technical blogs for VAST. This one gives a quantitative, real-world perspective on how much checkpoint bandwidth is required to train trillion-parameter-scale models (hint: less than many have suggested)

www.vastdata.com/blog/optimiz...
Optimizing Checkpoint Bandwidth for LLM Training - VAST Data
VAST Data blog post with insights on data infrastructure and AI innovation.
www.vastdata.com
October 23, 2025 at 1:23 AM
KV caching for LLM training is just a specialized memoization problem which is not uncommon across #HPC, and its implementation remains pretty primitive but is optimizing quickly. Here's an example of scavenging "perforated" results of memoized attention layers.

github.com/vllm-project...
[RFC]: Generalized KV cache reuse · Issue #25950 · vllm-project/vllm
Motivation. 🚀 The feature, motivation and pitch Summary This RFC proposes enabling the reuse of the KV cache for any subset of tokens, rather than restricting reuse to prefix-complete tokens. With ...
github.com
October 21, 2025 at 5:04 PM
Love to see DAOS getting mainstream attention, but it's long road ahead before it can compete in the GPU cloud market. Everyone says their performance is the best; that doesn't differentiate storage for #AI. It's the hard problems - integration, reliability, etc.

blocksandfiles.com/2025/10/16/h...
High-performance orphan child DAOS and Enakta Labs – Blocks and Files
DAOS is the now-unwanted parallel file system offspring of Intel, back in its Optane era, and now, with its high performance is being resuscitated by Enakta Labs and other members of the DAOS Foundation. We wrote about DAOS, the Distributed Asynchronous Object Store software, back in April, noting that DAOS Foundation was set up in […]
blocksandfiles.com
October 16, 2025 at 6:23 PM