Suhaib Khan
@suhaibkhan.bsky.social
Interested in HPC & large storage systems
Reposted by Suhaib Khan
Are 2030 AI hyperscalars capital constrained, power constrained, DRAM constrained, flash constrained, compute constrained, software constrained, or :-) demand constrained?
November 11, 2025 at 2:30 PM
Are 2030 AI hyperscalars capital constrained, power constrained, DRAM constrained, flash constrained, compute constrained, software constrained, or :-) demand constrained?
Reposted by Suhaib Khan
Racks filled with GPUs and liquid cooling gear can now weigh 6,000 pounds or more, requiring new approaches to address human safety and investment protection. Google, Meta, and Microsoft are turning to robotics to safely move these huge racks.
open.substack.com/pub/datacent...
open.substack.com/pub/datacent...
Data Centers Turn to Robots to Haul Multi-Ton Racks
Hyperscalers, OCP Ramp Up Robotics Teams for Worker Safety, Productivity
open.substack.com
November 11, 2025 at 12:47 PM
Racks filled with GPUs and liquid cooling gear can now weigh 6,000 pounds or more, requiring new approaches to address human safety and investment protection. Google, Meta, and Microsoft are turning to robotics to safely move these huge racks.
open.substack.com/pub/datacent...
open.substack.com/pub/datacent...
Reposted by Suhaib Khan
Scammers have a new way of getting into your pockets: by targeting your #AI assistant. They use prompt engineering, embedding code in emails that trick AI tools into taking malicious actions. Learn how to protect your digital presence. spectrum.ieee.org/ai-agent-phi...
November 9, 2025 at 4:01 PM
Scammers have a new way of getting into your pockets: by targeting your #AI assistant. They use prompt engineering, embedding code in emails that trick AI tools into taking malicious actions. Learn how to protect your digital presence. spectrum.ieee.org/ai-agent-phi...
Reposted by Suhaib Khan
Can we build an #AI #Climate Scientist? Asked at the ADIA Lab Symposium in Abu Dhabi last week - now online at buff.ly/6igSeyg :-).
Much work to be done - this is outlining some directions of indicative results with a lot of potential to accelerate AI for Science.
Much work to be done - this is outlining some directions of indicative results with a lot of potential to accelerate AI for Science.
November 9, 2025 at 9:24 AM
Can we build an #AI #Climate Scientist? Asked at the ADIA Lab Symposium in Abu Dhabi last week - now online at buff.ly/6igSeyg :-).
Much work to be done - this is outlining some directions of indicative results with a lot of potential to accelerate AI for Science.
Much work to be done - this is outlining some directions of indicative results with a lot of potential to accelerate AI for Science.
Reposted by Suhaib Khan
AI excels in complex tasks but falters at reading analog clocks—what does this tell us about its limitations?
AI Struggles to Read Analog Clocks Correctly
AI struggles with analog clocks. What does this reveal about its limitations in image analysis?
spectrum.ieee.org
November 8, 2025 at 2:01 PM
AI excels in complex tasks but falters at reading analog clocks—what does this tell us about its limitations?
Reposted by Suhaib Khan
Nvidia's biggest scale up domain is 72 GPUs. Google's is 9,216 TPUs.
Historically TPUs have trailed on FLOPS, memory, & bandwidth. That's no longer the case with Ironwood.
Google has a Blackwell-class TPU with absurd scale. More on @theregister.com ⬇️
www.theregister.com/2025/11/06/g...
Historically TPUs have trailed on FLOPS, memory, & bandwidth. That's no longer the case with Ironwood.
Google has a Blackwell-class TPU with absurd scale. More on @theregister.com ⬇️
www.theregister.com/2025/11/06/g...
TPU v7, Google's answer to Nvidia's Blackwell is nearly here
: Chocolate Factory's homegrown silicon boasts Blackwell-level perf at massive scale
www.theregister.com
November 7, 2025 at 4:16 PM
Nvidia's biggest scale up domain is 72 GPUs. Google's is 9,216 TPUs.
Historically TPUs have trailed on FLOPS, memory, & bandwidth. That's no longer the case with Ironwood.
Google has a Blackwell-class TPU with absurd scale. More on @theregister.com ⬇️
www.theregister.com/2025/11/06/g...
Historically TPUs have trailed on FLOPS, memory, & bandwidth. That's no longer the case with Ironwood.
Google has a Blackwell-class TPU with absurd scale. More on @theregister.com ⬇️
www.theregister.com/2025/11/06/g...
Reposted by Suhaib Khan
Exclusive: Intel is losing a data center AI executive who previously helped lead the company’s Gaudi accelerator chip efforts and is now headed for a job at AMD, CRN has learned. www.crn.com/news/compone...
Exclusive: Intel Is Losing A Data Center AI Executive To AMD
Intel is losing a data center AI executive who previously helped led the company’s Gaudi accelerator chip efforts and is now headed for a job at AMD, CRN has learned.
www.crn.com
November 6, 2025 at 9:04 PM
Exclusive: Intel is losing a data center AI executive who previously helped lead the company’s Gaudi accelerator chip efforts and is now headed for a job at AMD, CRN has learned. www.crn.com/news/compone...
Reposted by Suhaib Khan
Collaborator and friend Dan Alistarh talks at ETH about using the new NvFP4 and MXFP4 block formats for inference.
Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks.
arxiv.org/abs/2509.23202
Great collaboration and cool stuff
Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks.
arxiv.org/abs/2509.23202
Great collaboration and cool stuff
November 5, 2025 at 8:32 AM
Collaborator and friend Dan Alistarh talks at ETH about using the new NvFP4 and MXFP4 block formats for inference.
Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks.
arxiv.org/abs/2509.23202
Great collaboration and cool stuff
Some going from "terrible" accuracy to acceptable using micro rotations to smoothen outliers in blocks.
arxiv.org/abs/2509.23202
Great collaboration and cool stuff
Reposted by Suhaib Khan
Google recently posted a promo for using their managed Lustre service to accelerate inferencing via KV caching. Raises questions:
1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.
2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.
2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
attention
Attention is the mathematical operation within a transformer that allows different parts of the input to figure out how important they are to each other ...
glennklockwood.com
November 4, 2025 at 4:38 PM
Google recently posted a promo for using their managed Lustre service to accelerate inferencing via KV caching. Raises questions:
1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.
2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
1. What ever happened to Google Managed DAOS (ParallelStore)? It performs better than Lustre.
2. Does Gemini use this? Unlikely. See glennklockwood.com/garden/atten...
OpenAI spreads the imaginary wealth beyond Microsoft with $38B AWS deal
Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com/2025/11/03/o...
Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com/2025/11/03/o...
OpenAI signs $38B cloud computing deal with AWS
: Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com
November 3, 2025 at 6:56 PM
OpenAI spreads the imaginary wealth beyond Microsoft with $38B AWS deal
Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com/2025/11/03/o...
Amazon deal still dwarfed by $250B Azure commitment made as part of OpenAI's for-profit transformation
www.theregister.com/2025/11/03/o...
Reposted by Suhaib Khan
Silicon Valley’s biggest companies are already planning to pour $400 billion into artificial intelligence efforts this year. They all say it’s nowhere near enough.
Big Tech Is Spending More Than Ever on AI and It’s Still Not Enough
Meta, Alphabet, Microsoft and Amazon have all said they will increase spending in 2026. But investors have given mixed signals.
on.wsj.com
October 31, 2025 at 11:18 AM
Silicon Valley’s biggest companies are already planning to pour $400 billion into artificial intelligence efforts this year. They all say it’s nowhere near enough.
Reposted by Suhaib Khan
The largest hyperscale operators say demand for AI services is filling data centers as fast as they can build them, with several saying they are compute-constrained.
As a result, they expect to build even more data center space in 2026.
datacenterrichness.substack.com/p/hyperscale...
As a result, they expect to build even more data center space in 2026.
datacenterrichness.substack.com/p/hyperscale...
Hyperscale Building Boom Poised to Continue
Microsoft, Google, Meta and AWS Describe Strong Demand for New Services
datacenterrichness.substack.com
October 31, 2025 at 12:35 PM
The largest hyperscale operators say demand for AI services is filling data centers as fast as they can build them, with several saying they are compute-constrained.
As a result, they expect to build even more data center space in 2026.
datacenterrichness.substack.com/p/hyperscale...
As a result, they expect to build even more data center space in 2026.
datacenterrichness.substack.com/p/hyperscale...
Reposted by Suhaib Khan
Each time a new AI training benchmark is introduced, the fastest training time gets longer. Then, hardware improvements gradually bring the execution time down, only to get thwarted again by the next benchmark. Then the cycle repeats itself.
AI Model Growth Outpaces Hardware Improvements
AI training races are heating up as benchmarks get tougher.
spectrum.ieee.org
October 30, 2025 at 5:35 PM
Each time a new AI training benchmark is introduced, the fastest training time gets longer. Then, hardware improvements gradually bring the execution time down, only to get thwarted again by the next benchmark. Then the cycle repeats itself.
Diamond Blankets Will Keep Future Chips Cool
Growing a micrometers-thick layer of diamond inside advanced chips spreads out the heat and drops the temperature more than 50°C.
spectrum.ieee.org/diamond-ther...
@spectrum.ieee.org
Growing a micrometers-thick layer of diamond inside advanced chips spreads out the heat and drops the temperature more than 50°C.
spectrum.ieee.org/diamond-ther...
@spectrum.ieee.org
Can Diamonds Solve the Chip Heat Dilemma?
Stanford's diamond innovation could redefine chip cooling, making electronics more efficient and powerful.
spectrum.ieee.org
October 30, 2025 at 5:17 PM
Diamond Blankets Will Keep Future Chips Cool
Growing a micrometers-thick layer of diamond inside advanced chips spreads out the heat and drops the temperature more than 50°C.
spectrum.ieee.org/diamond-ther...
@spectrum.ieee.org
Growing a micrometers-thick layer of diamond inside advanced chips spreads out the heat and drops the temperature more than 50°C.
spectrum.ieee.org/diamond-ther...
@spectrum.ieee.org
John Shalf @cs.lbl.gov to receive the 2025 IEEE Seymour Cray Computer Engineering Award
www.computer.org/profiles/joh...
#SC25 #HPC
www.computer.org/profiles/joh...
#SC25 #HPC
John Shalf
John Shalf is the Department Head for Computer Science at Lawrence Berkeley National Laboratory. He also formerly served as the Deputy Director for Hardware Technology on the US Department of Energy (...
www.computer.org
October 29, 2025 at 5:20 PM
John Shalf @cs.lbl.gov to receive the 2025 IEEE Seymour Cray Computer Engineering Award
www.computer.org/profiles/joh...
#SC25 #HPC
www.computer.org/profiles/joh...
#SC25 #HPC
NVIDIA and Oracle to Build US Department of Energy’s Largest AI Supercomputer for Scientific Discovery
nvidianews.nvidia.com/news/nvidia-...
nvidianews.nvidia.com/news/nvidia-...
NVIDIA and Oracle to Build US Department of Energy’s Largest AI Supercomputer for Scientific Discovery
NVIDIA today announced a landmark collaboration with Oracle to build the U.S. Department of Energy (DOE)’s largest AI supercomputer to dramatically accelerate scientific discovery.
nvidianews.nvidia.com
October 29, 2025 at 4:15 PM
NVIDIA and Oracle to Build US Department of Energy’s Largest AI Supercomputer for Scientific Discovery
nvidianews.nvidia.com/news/nvidia-...
nvidianews.nvidia.com/news/nvidia-...
Reposted by Suhaib Khan
Nvidia made history as the first company to reach $5 trillion in market value, powered by a stunning rally that has cemented its place at the center of the global AI boom reut.rs/48SKEHH
Nvidia storms past $5 trillion valuation as AI boom powers meteoric rise
Nvidia made history as the first company to reach $5 trillion in market value, powered by a stunning rally that has cemented its place at the center of the global artificial intelligence boom.
reut.rs
October 29, 2025 at 2:52 PM
Nvidia made history as the first company to reach $5 trillion in market value, powered by a stunning rally that has cemented its place at the center of the global AI boom reut.rs/48SKEHH
Reposted by Suhaib Khan
#OTD in #ComputingHistory in 1969, the first message was sent over what would become the Internet. That brief, two-letter message marked the beginning of networked communication as we know it today. Read more: www.acm.org/education/ot...
October 29, 2025 at 12:04 PM
#OTD in #ComputingHistory in 1969, the first message was sent over what would become the Internet. That brief, two-letter message marked the beginning of networked communication as we know it today. Read more: www.acm.org/education/ot...
Reposted by Suhaib Khan
Electrifying everything will require simulations that can model several physics events at once, like thermal issues, acoustics, and structural physics. This multiphysics simulation is powering better models of the power grid. spectrum.ieee.org/multiphysics...
October 29, 2025 at 12:47 PM
Electrifying everything will require simulations that can model several physics events at once, like thermal issues, acoustics, and structural physics. This multiphysics simulation is powering better models of the power grid. spectrum.ieee.org/multiphysics...
Reposted by Suhaib Khan
Nvidia is poised to become the first company to hit $5 trillion in market value, the latest milestone that reflects the growing influence of artificial intelligence.
Nvidia Poised to Become First $5 Trillion Company
The company’s shares have been boosted by the AI boom and a flurry of new deals.
on.wsj.com
October 29, 2025 at 12:51 PM
Nvidia is poised to become the first company to hit $5 trillion in market value, the latest milestone that reflects the growing influence of artificial intelligence.
Reposted by Suhaib Khan
Yes, but my point was more: is it physically at Argonne? or does the definition of Argonne expand to include a hypothetical new “Argonne East” which is on some contaminated land (near power) that DOE will lease to OCI?
October 29, 2025 at 2:13 AM
Yes, but my point was more: is it physically at Argonne? or does the definition of Argonne expand to include a hypothetical new “Argonne East” which is on some contaminated land (near power) that DOE will lease to OCI?
DOE is partnering with Nvidia and Oracle to build 7 new AI supercomputers to accelerate scientific research and develop agentic AI for discovery.
Two of these systems at Argonne, will together form the DOE's largest AI supercomputing infrastructure.
www.theregister.com/2025/10/28/n...
Two of these systems at Argonne, will together form the DOE's largest AI supercomputing infrastructure.
www.theregister.com/2025/10/28/n...
Nvidia will help build 7 AI supercomputers for for DoE
: 100,000 Blackwell GPUs and 2,200 exaFLOPs make for a big system
www.theregister.com
October 29, 2025 at 12:02 AM
DOE is partnering with Nvidia and Oracle to build 7 new AI supercomputers to accelerate scientific research and develop agentic AI for discovery.
Two of these systems at Argonne, will together form the DOE's largest AI supercomputing infrastructure.
www.theregister.com/2025/10/28/n...
Two of these systems at Argonne, will together form the DOE's largest AI supercomputing infrastructure.
www.theregister.com/2025/10/28/n...
Reposted by Suhaib Khan
⚛️ US Department of Energy Advances Nuclear Program for AI Data Centers
The agency is opening up its Oak Ridge Reservation in Tennessee for the private development of AI data centers with on-site power generation.
Read more ▶️ www.datacenterknowledge.com/energy-power...
The agency is opening up its Oak Ridge Reservation in Tennessee for the private development of AI data centers with on-site power generation.
Read more ▶️ www.datacenterknowledge.com/energy-power...
US DOE Advances Data Center Nuclear Program
Agency seeks private companies to build AI facilities with on-site power generation at Oak Ridge through a request for proposals.
www.datacenterknowledge.com
October 28, 2025 at 4:01 PM
⚛️ US Department of Energy Advances Nuclear Program for AI Data Centers
The agency is opening up its Oak Ridge Reservation in Tennessee for the private development of AI data centers with on-site power generation.
Read more ▶️ www.datacenterknowledge.com/energy-power...
The agency is opening up its Oak Ridge Reservation in Tennessee for the private development of AI data centers with on-site power generation.
Read more ▶️ www.datacenterknowledge.com/energy-power...
Reposted by Suhaib Khan
HPE's Discovery to succeed Frontier supercomputer with next-gen Cray tech
HPE's Discovery to succeed Frontier supercomputer with next-gen Cray tech
Oak Ridge's $500M system due in 2028, paired with a separate Lux AI cluster arriving two years earlier
HPE is set to build a successor to the Frontier exascale system for America's Oak Ridge National Laboratory, based on the next generation of its Cray supercomputer platform, plus a separate AI cluster to advance machine learning with a multi-tenant cloud-like platform.…
dlvr.it
October 27, 2025 at 7:05 PM
HPE's Discovery to succeed Frontier supercomputer with next-gen Cray tech
Reposted by Suhaib Khan
👀 ICYMI: Exciting news from #ORNL today, not one but two supercomputers headed our way...! ⚡️⚡️
Read more about #OLCF 's #Discovery and #Lux supercomputers here 👉 www.olcf.ornl.gov/2025/10/27/o...
And check out this video for a hashtag#Discovery sneak peek 🤩: vimeo.com/1130587062
#HPC #AI
Read more about #OLCF 's #Discovery and #Lux supercomputers here 👉 www.olcf.ornl.gov/2025/10/27/o...
And check out this video for a hashtag#Discovery sneak peek 🤩: vimeo.com/1130587062
#HPC #AI
ORNL, AMD and HPE to deliver DOE’s newest AI supercomputers: Discovery and Lux
The U.S. Department of Energy announced today its newest supercomputers, Discovery and Lux, at Oak Ridge National Laboratory that will expand America’s leadership in artificial intelligence for scient...
www.olcf.ornl.gov
October 27, 2025 at 7:30 PM
👀 ICYMI: Exciting news from #ORNL today, not one but two supercomputers headed our way...! ⚡️⚡️
Read more about #OLCF 's #Discovery and #Lux supercomputers here 👉 www.olcf.ornl.gov/2025/10/27/o...
And check out this video for a hashtag#Discovery sneak peek 🤩: vimeo.com/1130587062
#HPC #AI
Read more about #OLCF 's #Discovery and #Lux supercomputers here 👉 www.olcf.ornl.gov/2025/10/27/o...
And check out this video for a hashtag#Discovery sneak peek 🤩: vimeo.com/1130587062
#HPC #AI