Stephan Rasp
@raspstephan.bsky.social
Other minor updates:
- Where available, we added 2022 as an eval year in the interactive graphics.
- We added forecast activity as a metric for deterministic models, a simple measure of blurring.
- More regions.
Don't hesitate to file bugs or suggestions as GitHub issues.
end/
- Where available, we added 2022 as an eval year in the interactive graphics.
- We added forecast activity as a metric for deterministic models, a simple measure of blurring.
- More regions.
Don't hesitate to file bugs or suggestions as GitHub issues.
end/
February 13, 2025 at 7:38 AM
Other minor updates:
- Where available, we added 2022 as an eval year in the interactive graphics.
- We added forecast activity as a metric for deterministic models, a simple measure of blurring.
- More regions.
Don't hesitate to file bugs or suggestions as GitHub issues.
end/
- Where available, we added 2022 as an eval year in the interactive graphics.
- We added forecast activity as a metric for deterministic models, a simple measure of blurring.
- More regions.
Don't hesitate to file bugs or suggestions as GitHub issues.
end/
Next, we added 4 new models to the public benchmark (which now also uses WB-X as a backend):
- GenCast
- Stormer
- Excarta (HEAL-ViT)
- ArchesWeather
The probabilistic scorecard finally looks a little more populated :)
4/
- GenCast
- Stormer
- Excarta (HEAL-ViT)
- ArchesWeather
The probabilistic scorecard finally looks a little more populated :)
4/
February 13, 2025 at 7:38 AM
Next, we added 4 new models to the public benchmark (which now also uses WB-X as a backend):
- GenCast
- Stormer
- Excarta (HEAL-ViT)
- ArchesWeather
The probabilistic scorecard finally looks a little more populated :)
4/
- GenCast
- Stormer
- Excarta (HEAL-ViT)
- ArchesWeather
The probabilistic scorecard finally looks a little more populated :)
4/
To get started, check out the documentation: weatherbench-x.readthedocs.io/en/latest/
For an example of evaluating forecasts against sparse obs, see: weatherbench-x.readthedocs.io/en/latest/ho...
Please don't hesitate to ask questions or report bugs/feature requests via a GitHub issue :)
3/n
For an example of evaluating forecasts against sparse obs, see: weatherbench-x.readthedocs.io/en/latest/ho...
Please don't hesitate to ask questions or report bugs/feature requests via a GitHub issue :)
3/n
WeatherBench-X documentationContentsMenuExpandLight modeDark modeAuto light/dark mode
weatherbench-x.readthedocs.io
February 13, 2025 at 7:38 AM
To get started, check out the documentation: weatherbench-x.readthedocs.io/en/latest/
For an example of evaluating forecasts against sparse obs, see: weatherbench-x.readthedocs.io/en/latest/ho...
Please don't hesitate to ask questions or report bugs/feature requests via a GitHub issue :)
3/n
For an example of evaluating forecasts against sparse obs, see: weatherbench-x.readthedocs.io/en/latest/ho...
Please don't hesitate to ask questions or report bugs/feature requests via a GitHub issue :)
3/n
WB-X is a complete rewrite of our evaluation code. We designed it to be as modular and powerful as possible with cutting-edge use cases like observation-based models in mind. We've used WB-X internally over the last year for most of our model development.
2/n
2/n
February 13, 2025 at 7:38 AM
WB-X is a complete rewrite of our evaluation code. We designed it to be as modular and powerful as possible with cutting-edge use cases like observation-based models in mind. We've used WB-X internally over the last year for most of our model development.
2/n
2/n
Sure. The y-axis shows the 3d T850 RMSE relative to ECMWF IFS HRES (so >100% = better). It's a crude attempt at normalizing different evaluations, so don't overinterpret the small differences. This is more about the bigger picture.
Deterministic scores – WeatherBench2
sites.research.google
December 23, 2024 at 6:51 PM
Sure. The y-axis shows the 3d T850 RMSE relative to ECMWF IFS HRES (so >100% = better). It's a crude attempt at normalizing different evaluations, so don't overinterpret the small differences. This is more about the bigger picture.
So, for AIFS and GenCast I am evaluating the ensemble mean. I still use deterministic HRES as a reference. For AIFS I grabbed the NH HRES scores from the scorecard on the ECMWF website and then eyeballed the AIFS score from Fig 9.
December 23, 2024 at 6:37 PM
So, for AIFS and GenCast I am evaluating the ensemble mean. I still use deterministic HRES as a reference. For AIFS I grabbed the NH HRES scores from the scorecard on the ECMWF website and then eyeballed the AIFS score from Fig 9.
Good idea, done: Rasp, Stephan (2024). AI-Weather SotA vs Time. figshare. Dataset. doi.org/10.6084/m9.f...
AI-Weather SotA vs Time
The purpose of this spreadsheet is not to exactly compare different models but rather to get an overall sense of progress in AI-based weather prediction.
doi.org
December 23, 2024 at 6:14 PM
Good idea, done: Rasp, Stephan (2024). AI-Weather SotA vs Time. figshare. Dataset. doi.org/10.6084/m9.f...
But you do raise a good point. for purely obs-trained models, this probably isn't a fair comparison. In this case the conclusions are probably the same but still.
December 23, 2024 at 4:58 PM
But you do raise a good point. for purely obs-trained models, this probably isn't a fair comparison. In this case the conclusions are probably the same but still.
True but in the medium-range the obs uncertainty is probably smaller than the forecast uncertainty, right? Radiosonde vs ERA5 RMSE ~ 1k, right?
December 23, 2024 at 4:56 PM
True but in the medium-range the obs uncertainty is probably smaller than the forecast uncertainty, right? Radiosonde vs ERA5 RMSE ~ 1k, right?
What is the conclusion from GraphDOP being so far away from SotA? Is the setup still suboptimal in some way or is pure obs-based forecasting harder than some might have thought.
December 23, 2024 at 4:46 PM
What is the conclusion from GraphDOP being so far away from SotA? Is the setup still suboptimal in some way or is pure obs-based forecasting harder than some might have thought.