Lily-belle Sweet
banner
lilybellesweet.bsky.social
Lily-belle Sweet
@lilybellesweet.bsky.social
PhD student at UFZ - interested in explainable machine learning, agriculture and food security, compound climate events 🌾
I don't disagree with the broader point, but the methods reportedly used to evaluate the models doesn't appear sufficient for the intended use case, and could lead to inflated performance scores. This is a common pitfall: www.nature.com/articles/s41... I'm pro-AI, but also pro-proper evaluation :)
Spatial validation reveals poor predictive performance of large-scale ecological mapping models - Nature Communications
Mapping ecological variables using machine-learning algorithms based on remote-sensing data has become a widespread practice in ecology. Here, the authors use forest biomass mapping as a study case to...
www.nature.com
May 26, 2025 at 5:42 PM
If they aren't to be used for decisions, then what are they for? The NE blog says they'll be used to calculate emissions, plan conservation, make landscape management decisions, and that they mark 'a step change in our ability to make national, regional and local scale plans for England's peatlands'
May 26, 2025 at 5:31 PM
This quote in particular!
May 26, 2025 at 12:24 PM
Purple: 'Only once'
May 23, 2025 at 10:40 AM
Also, this wasn't your question, but for extra context (because the climate change impact of AI has been brought up in the other thread a few times) - the type of model they're using here is really not energy-intensive. I think with this size dataset you could probably train it on your laptop :)
May 21, 2025 at 12:53 PM
A big concern with maps is not just how they are produced, but how their limitations are communicated and what decisions they are used to make. I think the authors did a great job detailing the potential uncertainties of the model. Hopefully the map is used appropriately with those caveats in mind!
May 21, 2025 at 12:48 PM
I don't agree with their statement towards the end that the model could potentially be used outside of the study area, because their evaluation isn't testing performance under those conditions, but for use inside that specific area it looks ok! And it reads like they're not actually planning that
May 21, 2025 at 12:43 PM
... to similar products, and investigate what might be causing different accuracy between locations. Modelling temporal behaviour as well as spatial can be a bit tricky, so hopefully they evaluated that separately and carefully (didn't read that part in enough detail sorry!)
May 21, 2025 at 12:41 PM
I had a quick look! Difference here is that they labelled their own data and carefully sampled the locations randomly over the study area. In this case, and when the model isn't planned to be used outside the study area, it's probably fine. The scores look way more reasonable & they compare them ...
May 21, 2025 at 12:37 PM
Thank you for letting me rant about model evaluation! My colleagues are sick of it by now.

I've never been to Dartmoor but it looks beautiful! Despite having possibly less peat (or in different places) than advertised
May 20, 2025 at 7:52 PM
Yeah, and I am guessing also a lack of experience in applying ML on real, messy datasets? Because seeing an accuracy over 90% on a problem like this, where data quality is probably not super great, should have set off massive alarm bells imo
May 20, 2025 at 7:43 PM
The issue is that the data is super clustered, and they've used random sampling, so they have no idea how good the model is outside of the clusters. Judging from Cat's examples in the thread (& she's not the only one to have spotted errors) it could be pretty bad. And it's being used to guide policy
May 20, 2025 at 7:39 PM
Definitely. And in that case you just have to push for more data collection, or accept that you have no idea how accurate your model is in that location and ideally not publish any predictions there
May 20, 2025 at 7:32 PM
If you train your model on some areas and then test that it works at another area really far away, you can be more confident that it works in the places in-between. Ofc it's a bit more complicated than that, because different places have different landscapes and trees etc, but that's the idea
May 20, 2025 at 7:27 PM
Ideally the data would be distributed more evenly, but often you can't control that. But you can evaluate the model in a tougher way. The trick is to split the data so that the distance between training and test points is similar to the distance to the places you plan to use the model.
May 20, 2025 at 7:23 PM
This is a great bit of writing from a paper by Meyer & Pebesma which discusses this in more detail www.nature.com/articles/s41...
May 20, 2025 at 7:15 PM
It's also possible the model has actually learned really useful relationships and performs great everywhere! But unless you test specifically on data far away from data used to train, you have no idea if this is the case. And judging by the mistakes you've identified, it's not...
May 20, 2025 at 7:06 PM
If the model predicts the same value as the closest datapoint in the training set, it will probably achieve really high accuracy (perhaps even 94%...). But as soon as you use the model on data far away from any location it's seen, that won't work any more.
May 20, 2025 at 7:03 PM