Stephan Rasp
raspstephan.bsky.social
Stephan Rasp
@raspstephan.bsky.social
Environmental science 39%
Geology 15%

Other minor updates:
- Where available, we added 2022 as an eval year in the interactive graphics.
- We added forecast activity as a metric for deterministic models, a simple measure of blurring.
- More regions.

Don't hesitate to file bugs or suggestions as GitHub issues.

end/

Next, we added 4 new models to the public benchmark (which now also uses WB-X as a backend):
- GenCast
- Stormer
- Excarta (HEAL-ViT)
- ArchesWeather

The probabilistic scorecard finally looks a little more populated :)

4/

To get started, check out the documentation: weatherbench-x.readthedocs.io/en/latest/

For an example of evaluating forecasts against sparse obs, see: weatherbench-x.readthedocs.io/en/latest/ho...

Please don't hesitate to ask questions or report bugs/feature requests via a GitHub issue :)

3/n
WeatherBench-X documentationContentsMenuExpandLight modeDark modeAuto light/dark mode
weatherbench-x.readthedocs.io

WB-X is a complete rewrite of our evaluation code. We designed it to be as modular and powerful as possible with cutting-edge use cases like observation-based models in mind. We've used WB-X internally over the last year for most of our model development.

2/n

🚨 WeatherBench Update

1. WeatherBench-X, our new evaluation code, is now on GitHub: github.com/google-resea...

2. New models (plus other small updates) on the WeatherBench website: sites.research.google/weatherbench/

1/n
GitHub - google-research/weatherbenchX: A modular framework for evaluating weather forecasts
A modular framework for evaluating weather forecasts - google-research/weatherbenchX
github.com

Reposted by Stephan Rasp

2025 is here tomorrow, so let's reflect on 2024. Even without the final counts and the new AMS and AGU ML journals, 2024 has eclipsed 10% of all papers and had over 600 papers mentioning neural networks in their abstracts 📈

Sure. The y-axis shows the 3d T850 RMSE relative to ECMWF IFS HRES (so >100% = better). It's a crude attempt at normalizing different evaluations, so don't overinterpret the small differences. This is more about the bigger picture.
Deterministic scores – WeatherBench2
sites.research.google

So, for AIFS and GenCast I am evaluating the ensemble mean. I still use deterministic HRES as a reference. For AIFS I grabbed the NH HRES scores from the scorecard on the ECMWF website and then eyeballed the AIFS score from Fig 9.

Good idea, done: Rasp, Stephan (2024). AI-Weather SotA vs Time. figshare. Dataset. doi.org/10.6084/m9.f...
AI-Weather SotA vs Time
The purpose of this spreadsheet is not to exactly compare different models but rather to get an overall sense of progress in AI-based weather prediction.
doi.org

But you do raise a good point. for purely obs-trained models, this probably isn't a fair comparison. In this case the conclusions are probably the same but still.

True but in the medium-range the obs uncertainty is probably smaller than the forecast uncertainty, right? Radiosonde vs ERA5 RMSE ~ 1k, right?

What is the conclusion from GraphDOP being so far away from SotA? Is the setup still suboptimal in some way or is pure obs-based forecasting harder than some might have thought.

ECMWF with two new papers right before christmas.

AIFS-CRPS: arxiv.org/abs/2412.158...
GraphDOP (the first truly end2end global weather model): arxiv.org/abs/2412.15687

Here they are added to the SotA tracker: docs.google.com/spreadsheets...

Reposted by Stephan Rasp

Can incorporating AI improve precipitation in global weather and climate models?

Yes! In the latest NeuralGCM paper, we show that training on satellite-based precipitation results in significant improvements over traditional atmospheric models:
arxiv.org/abs/2412.11973
Neural general circulation models optimized to predict satellite-based precipitation observations
Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. Here, we present a hybrid model that is trained directly on satellite-based precipitation obs...
arxiv.org

Reposted by Stephan Rasp

🌎

👋