Glenn K. Lockwood
banner
glennklockwood.com
Glenn K. Lockwood
@glennklockwood.com
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.
I pitch in and so should you. Plus there are stickers.
December 21, 2025 at 6:04 PM
Sync training across geos isn’t new, tho doing it b/c of training data governance is. But training across AMD+NVIDIA is new; leave it to DOE to demonstrate such odd methods!

Unclear what separates “federated learning” from multicluster training tho.

www.sandia.gov/labnews/2025...

#AI
Three national security laboratories, one AI model – LabNews
Sandia, Los Alamos and Lawrence Livermore national laboratories have proven that it's possible to share a large language model without compromising sensitive data from each lab.
www.sandia.gov
December 19, 2025 at 3:22 PM
Very last minute, but I'm giving a talk online tomorrow (Thurs Dec 18) about my analysis of over 85K model training checkpoints and implications for system design. Punchline is "less bandwidth makes training go faster."

Registration required: www.vastdata.com/events/vast-...

#AI #storage
Smarter, Not Faster: The Storage Reality Hidden in 85,000 AI Checkpoints - VAST Data
Stop chasing multi-terabyte-per-second performance for your global storage. Focus on "checkpoint overlap," not raw bandwidth. Invest your budget in what matters most: GPUs.
www.vastdata.com
December 18, 2025 at 12:32 AM
NERSC recently did a wholesale replacement of its FDR InfiniBand storage fabric to RoCE. The IB was a greenfield installation back when I started in 2015, and replacing it with a competing technology in production is quite the feat. Glad to hear it succeeded.

www.nersc.gov/news-and-eve...
Network Upgrades Pave the Way to a Faster Future | NERSC
The National Energy Research Scientific Computing Center (NERSC), a U. S.
www.nersc.gov
December 17, 2025 at 12:19 AM
Idle CPUs in GPU clusters do not “exceed $250 million per year in electricity.” This is crazy math. What are they using as fuel here??
December 17, 2025 at 12:08 AM
Does this mean no more dirt-cheap NRE from Slurm? Or will Slurm development no longer be coin-operated? Would love to see serious engineering effort go into modernizing Slurm, but this could go in many directions.
As hybrid #HPC + #AI + #Quantum workflows become more prevalent, orchestration of complex systems is becoming a key component of successful deployment. Looking forward to accelerating these capabilities for the open-source community.
NVIDIA Acquires Open-Source Workload Management Provider SchedMD
NVIDIA will continue to distribute SchedMD’s open-source, vendor-neutral Slurm software, ensuring wide availability for high-performance computing and AI.
blogs.nvidia.com
December 15, 2025 at 5:40 PM
Philosophical Q: what is the role of industry in the CS peer review process? I am no longer a researcher and no longer publish (so far), but am still invited to review papers/proposals/projects/abstracts. It's not really my job to do this anymore, but I still feel partly obligated. Thoughts?
December 5, 2025 at 8:19 PM
Wow, that is legit! Complete with Jensen figurine. Wonder if he gets royalties.
Great question, @glennklockwood.com! Here’s the AI Factory Lego set I left #SC25 with. I hope this answers your questions. 🤪😂
December 5, 2025 at 5:20 PM
So yesterday I flew home from Oak Ridge, TN to San Francisco by way of Dulles Airport. My two flights emitted the same amount of CO2 as running a 100 MW data center for about 70-80 minutes. Or running HPL on the Frontier supercomputer for about 7 hours.

#HPC #AI
December 4, 2025 at 4:13 AM
Reposted by Glenn K. Lockwood
I'm excited to be in San Diego for #NeurIPS2025! Shyam Sankaran will be presenting our work at the Machine Learning for Physical Sciences (ML4PS) workshop (Upper Level Ballroom 6CF) on Saturday, 11-12: "𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐞𝐝 𝐄𝐥𝐞𝐦𝐞𝐧𝐭-𝐋𝐨𝐜𝐚𝐥 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫 𝐟𝐨𝐫 𝐒𝐜𝐚𝐥𝐚𝐛𝐥𝐞 𝐌𝐞𝐬𝐡-𝐁𝐚𝐬𝐞𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠"
December 4, 2025 at 12:34 AM
Honest Q: what, exactly, is an AI factory?
December 4, 2025 at 1:52 AM
Helios sounds like AMD's answer to NVIDIA's rack-scale NVLink, but it uses UALink over Ethernet with custom Broadcom scale-up switches. Interestingly, HPE will ship its Helios rack before its own Cray GX rack. Another example of #HPC playing second fiddle to #AI.

www.amd.com/en/newsroom/...
AMD and HPE Expand Collaboration to Advance Open Rack-Scale AI Infrastructure
www.amd.com
December 3, 2025 at 8:15 AM
Wouldn’t be a trip to Oak Ridge National Lab without a leg on one of these lil guys.
December 3, 2025 at 1:59 AM
I wrote up my notes from #SC25. Have a look: blog.glennklockwood.com/2025/12/sc25...

I’ll keep picking away at the editing, but would love to hear more from others about what stood out to them. I wasn’t at the conference itself as much this years as in the past, so I know I missed a lot.

#HPC
SC'25 recap
The annual SC conference was held last week, drawing over 16,000 registrants and 560 exhibitors to in St. Louis, Missouri to talk ab...
blog.glennklockwood.com
December 1, 2025 at 7:27 PM
Reposted by Glenn K. Lockwood
Appreciate the candor, and will be making sure the team is aware of the issues.

I'd recommend to all to check out the article here:

www.theregister.com/2025/11/27/t...
Blackhole QuietBox, Tenstorrent's AI workstation reviewed
hands on: $12K machine promises performance that can scale to 32 chip servers and beyond but immature stack makes harnessing compute challenging
www.theregister.com
November 27, 2025 at 4:55 PM
I defended my doctoral dissertation thirteen years ago this month, and the only question I got from my entire committee was “why doesn’t the water fall out of jello?”

I still think about that.
November 27, 2025 at 5:02 PM
It was tradition at NERSC for the director to give everyone a half-day off on the Wednesday before Thanksgiving. By comparison, VAST has no company holidays, so technically, nobody gets Thanksgiving off (much less the half day before it!)
November 26, 2025 at 9:29 PM
A brave stake in the ground that defines what is (and isn’t) a parallel file system. I generally agree with Chris’ explanation. But I’m sure he’ll get hate from parallel storage elitists who don’t like how inclusive his take is.
November 26, 2025 at 4:50 PM
It’s funny to see the HPE K3000 docs pitch DAOS’s key capability as bypassing POSIX, when the DAOS team openly states that the majority of DAOS users access it via POSIX. Something doesn’t line up in the value prop.
=>
"The Future of Supercomputing Storage", HPE, DAOS UG at SC25 daos.io/wp-content/u...
HPE Cray K3000 DAOS
GX5000 bsky.app/profile/ogaw...

SmartNIC Offload, RESDIS 2025 dl.acm.org/doi/10.1145/...
www.dropbox.com/scl/fi/ymjo4...

DAOS on 400 Gbps, HPE, WIP, PDSW 2025 www.pdsw.org/pdsw25/paper...
November 25, 2025 at 2:29 PM
As much as I like DAOS and Denis, he was asking for trouble when he implied that DAOS is better than others for single, huge systems. For example, the largest VAST namespace in production is larger (in exabytes) than the largest single Lustre or DAOS namespace in existence.
November 24, 2025 at 9:59 PM
Reposted by Glenn K. Lockwood
St. Louis's weather is working in our favor! 😁 Join us in room 260 for the Interactive and Urgent #HPC workshop at #SC25. Starting at 8:30. Keynote by Alan Chalker of OOD fame!
November 21, 2025 at 1:50 PM
Reposted by Glenn K. Lockwood
=>
"The Future of Supercomputing Storage", HPE, DAOS UG at SC25 daos.io/wp-content/u...
HPE Cray K3000 DAOS
GX5000 bsky.app/profile/ogaw...

SmartNIC Offload, RESDIS 2025 dl.acm.org/doi/10.1145/...
www.dropbox.com/scl/fi/ymjo4...

DAOS on 400 Gbps, HPE, WIP, PDSW 2025 www.pdsw.org/pdsw25/paper...
November 21, 2025 at 1:04 PM
Reposted by Glenn K. Lockwood
📆 Our activities at #SC25 on Friday:

📈 Workshop: Third International Workshop on HPC Testing and Evaluation of Systems, Tools, and Software (HPCTESTS 2025)
⌚ 8:30 am - 12:30 pm
📍 Room: 276
👤 Andreas Herten

go.fzj.de/sc25

#HPCignites #HPC4Germany
November 21, 2025 at 12:01 AM
This years venue for the #SC25 Tech Program Reception took me back to my childhood, going to my local science center in New Jersey. As has happened all week though, I kept running into friends and colleagues, so I didn’t actually get to see many of the exhibits. Alas!
November 21, 2025 at 5:00 AM