This information aids in understanding the strengths and weaknesses of current automated extraction methods.
This information aids in understanding the strengths and weaknesses of current automated extraction methods.
💥
The datasets inherit large re-use potential due to the gold standard nature of the emission metrics and the accompanying wealth of information.
💥
💥
The datasets inherit large re-use potential due to the gold standard nature of the emission metrics and the accompanying wealth of information.
💥
- I expected it would be a very simple annotation task to copy GHG emission values from a sustainability report into a table. It was not, as the high level of disagreement between non-expert and expert coders shows.
- I expected it would be a very simple annotation task to copy GHG emission values from a sustainability report into a table. It was not, as the high level of disagreement between non-expert and expert coders shows.
Again, these expert teams disagreed for about half of the 40%. Only during an expert discussion an agreement was reached about which values would need to be extracted.
Again, these expert teams disagreed for about half of the 40%. Only during an expert discussion an agreement was reached about which values would need to be extracted.
4. represents a total value, not subcategories.
Two human non-expert annotators searched for all GHG values that meet these conditions.
Despite a training session, these non-experts agreed only for 60% of all reports.
4. represents a total value, not subcategories.
Two human non-expert annotators searched for all GHG values that meet these conditions.
Despite a training session, these non-experts agreed only for 60% of all reports.
1. cover emissions for the entire company,
2. are reported according to the operational boundaries of the scopes (according to the Greenhouse Gas Protocol)
1. cover emissions for the entire company,
2. are reported according to the operational boundaries of the scopes (according to the Greenhouse Gas Protocol)
To obtain the GHG emission metrics, we extract these metrics from PDF files with an LLM, GPT-4. This was just to simplify data extraction; human annotators double-checked the values.
To obtain the GHG emission metrics, we extract these metrics from PDF files with an LLM, GPT-4. This was just to simplify data extraction; human annotators double-checked the values.
We present a gold standard dataset containing emission metrics extracted from 139 sustainability reports collected from company websites.
We present a gold standard dataset containing emission metrics extracted from 139 sustainability reports collected from company websites.
Extracting GHG indicators from these reports by hand is a laborious task. Could one automate this process? How well do ML and AI models perform?
Extracting GHG indicators from these reports by hand is a laborious task. Could one automate this process? How well do ML and AI models perform?