Stephan Hoyer
banner
stephanhoyer.com
Stephan Hoyer
@stephanhoyer.com
Building AI climate models at Google. I also contribute to the scientific Python ecosystem, including Xarray, NumPy and JAX.

Opinions are my own, not my employer's.
Woah! What visualization tool are you using?
October 9, 2025 at 4:09 PM
Do you take it yourself?
May 13, 2025 at 3:43 PM
I think the problem is the algorithm. BlueSky's lack of a recommendation engine means that if you're not posting all the time, your stuff doesn't get seen.
May 6, 2025 at 3:07 PM
I think it's just about readability with small font, the same reason why printed newspapers use many columns.
February 2, 2025 at 8:17 PM
The losses here should be marked as millions not billions, right?
January 27, 2025 at 5:45 PM
Pretty much anything that you can write in high level array code like NumPy is very fast in JAX. Only intrinsically very loopy code is (relatively) slow, but JAX has excellent support for writing custom kernels in lower level languages.
January 23, 2025 at 6:11 AM
AD compatible Python is at the cutting edge of performance these days with it's central role in large-scale AI training.

In my experience (mostly geophysical fluid dynamics) JAX has comparable perf to modern Fortran on CPUs, with a much easier path to GPUs and multi-device code.
January 23, 2025 at 1:07 AM
Those are tiny chunks! Does that reduce max throughput for analytics use-cases compared to larger chunks?
January 10, 2025 at 9:44 PM
Such exciting news!

For anyone who has tried the new sharding feature -- do you have any guidance on optimal shard sizes, if I want more flexibility in access patterns but still optimal throughput?
January 10, 2025 at 3:17 AM
Reposted by Stephan Hoyer
Hi, thanks for the mention. Here's a 7-day paywall-free link to the main feature: www.bloomberg.com/graphics/202...
The Risky Business of Predicting Where Climate Disaster Will Hit
Climate tech companies can calculate the chances that a flood or wildfire will ravage your home. But what if their odds are all different?
www.bloomberg.com
December 30, 2024 at 5:27 PM
This paper by Watt-Meyer et al is a good example of "Error-based learning:" agupubs.onlinelibrary.wiley.com/doi/10.1029/...

ECMWF has also done similar work on top of IFS's data assimilation system.
Correcting Weather and Climate Models by Machine Learning Nudged Historical Simulations
Nudging an atmospheric model toward observations is a good way to estimate state-dependent biases Machine learning of state-dependent biases improves hindcast skill of a coarse-resolution general...
agupubs.onlinelibrary.wiley.com
December 28, 2024 at 8:35 PM
We have a few pre-computed climatologies in WeatherBench2: weatherbench2.readthedocs.io/en/latest/da...
WeatherBench 2 Data Guide — WeatherBench 2 documentation
weatherbench2.readthedocs.io
December 27, 2024 at 1:39 AM
We have a few other updates to share as well, which can be found in the inaugral edition of the NeuralGCM newsletter:
groups.google.com/g/neuralgcm-...

The biggest one is that NeuralGCM models are now freely available for everyone to use, including for commercial purposes!
NeuralGCM update: new models, new license, new datasets
groups.google.com
December 19, 2024 at 8:34 PM
Please reach out if you want to chat about anything related to AI modeling, NeuralGCM, JAX or Xarray. Also see Eni's poster on xarray.DataTree on Thurs: agu.confex.com/agu/agu24/me...
Simplifying analysis of hierarchical HDF5 and NetCDF4 files with xarray-datatree
NASA’s Earth Observing System Data and Information System (EOSDIS) contains tho...
agu.confex.com
December 9, 2024 at 5:47 PM
When I hear "ML" I tend to think of old school (i.e., scikit-learn) machine learning, which is great but much less powerful than deep learning. So I would opt for "AI weather models" though that misses quite a bit of nuance.
December 7, 2024 at 7:01 PM
This diagram is accurate historically, but recently AI seems to have become synonymous with deep learning.
December 7, 2024 at 6:57 PM
The bottleneck for traditional models is data movement within the CPU, not data transfer to disk -- physics based simulations do too little compute per byte (low arithmetic intensity) to fully utilize modern hardware.

AI is way better in this respect. It's easy to use lots of FLOPs on big matmuls!
December 7, 2024 at 7:22 AM
Unlimited potential, zero bugs!
December 1, 2024 at 1:09 PM