Malte Schierholz
Malte Schierholz
@malteschierholz.bsky.social
The annotation dataset also documents a scalable data collection pipeline combining non-expert annotators with targeted expert input, offering a model for future data collection efforts.
August 27, 2025 at 5:47 PM
The dataset is a benchmark to compare various human & automatic annotation techniques.

This information aids in understanding the strengths and weaknesses of current automated extraction methods.
August 27, 2025 at 5:47 PM
🫱All our data are publicly available on Zenodo. zenodo.org/records/1512...

💥
The datasets inherit large re-use potential due to the gold standard nature of the emission metrics and the accompanying wealth of information.
💥
Gold Standard and Annotation Dataset for CO2 Emissions Annotation
This repository contains the results of a research project which provides a benchmark dataset for extracting greenhouse gas emissions from corporate annual and sustainability reports.  The zipped datasets file contains two datasets, gold_standard and annotation_dataset(password is provided in the zip file). Data collection A Large Language Model (LLM) based pipeline was used to extract the greenhouse gas emissions from the reports (see columns prefixed with llm_ in annotation_dataset). The extracted emissions follow the categories Scope 1, 2 (market-based) and 2 (location-based) and 3, as defined in the GHGP protocol (see variables scope). Annotation of the pipeline output was done in 3 phases: first by non-experts (see columns prefixed with non_expert_ in annotation_dataset), then by expert groups (columns prefixed with exp_group_ in annotation_dataset) in case of disagreement of non-experts and finally in a discussion of all experts (columns prefixed with exp__disc in annotation_dataset) in case of disagreement between expert groups. The annotation guidelines for the non-experts and experts are also included in this repository. The annotation results from all three phases are combined to form the final benchmark dataset: gold_standard. Codebooks detailing each variable of each of the two datasets are also provided. More details about the annotation template or the data wrangling scripts can be found in the GitHub repository.  Merging of datasets Users can match the two datasets (gold_standard and annotation_dataset) using the variable combination of company_name, report_year and merge_id (index column). The merge_id already includes the company name and report year implicitly, but to avoid column duplication in the join operation, it should be included as join variables. For example this is useful when comparing LLM extractions to gold standard data.
zenodo.org
August 27, 2025 at 5:47 PM
- All reported Scope 3 emissions need to be treated with caution, as their optional and therefore often incomplete reporting makes comparisons between companies challenging. Companies might just "forget" to report parts of their emissions if these are (too) hard to calculate.
August 27, 2025 at 5:47 PM
- Direct emissions by company facilities (Scope 1) and indirect emissions for power and heat consumption (Scope 2 location-based) are most often reported. Residual indirect emissions, e.g. from purchased goods or business travel, (Scope 3) are less often reported.
August 27, 2025 at 5:47 PM
- About half of the sustainability reports (69 of 139) do not contain any GHG emission values, partly as a consequence of our strict annotation rules.
August 27, 2025 at 5:47 PM
What do we learn from this, and what did I find surprising?

- I expected it would be a very simple annotation task to copy GHG emission values from a sustainability report into a table. It was not, as the high level of disagreement between non-expert and expert coders shows.
August 27, 2025 at 5:47 PM
To ensure that we create a gold standard dataset, two teams of expert annotators double-coded the remaining 40% of reports.

Again, these expert teams disagreed for about half of the 40%. Only during an expert discussion an agreement was reached about which values would need to be extracted.
August 27, 2025 at 5:47 PM
3. is reported in absolute terms as CO2 or CO2 equivalents emissions and
4. represents a total value, not subcategories.

Two human non-expert annotators searched for all GHG values that meet these conditions.

Despite a training session, these non-experts agreed only for 60% of all reports.
August 27, 2025 at 5:47 PM
Precise rules were created for human annotators. GHG emission values should be extracted only if they:

1. cover emissions for the entire company,
2. are reported according to the operational boundaries of the scopes (according to the Greenhouse Gas Protocol)
August 27, 2025 at 5:47 PM
We use a sample of 139 companies that are listed in the MSCI World Small Cap index and/or in the German Dax.

To obtain the GHG emission metrics, we extract these metrics from PDF files with an LLM, GPT-4. This was just to simplify data extraction; human annotators double-checked the values.
August 27, 2025 at 5:47 PM
Existing datasets are often inconsistent and lack transparent methodologies, making it difficult to obtain reliable emission data.

We present a gold standard dataset containing emission metrics extracted from 139 sustainability reports collected from company websites.
August 27, 2025 at 5:47 PM
Annual reports or sustainability reports are often more than 100 pages long and only available in PDF format.

Extracting GHG indicators from these reports by hand is a laborious task. Could one automate this process? How well do ML and AI models perform?
August 27, 2025 at 5:47 PM
Wie begründen Politikwissenschaftler denn dann sein Vorgehen? Meine Vermutung wäre dass solche Worte es ihm in der Koalition mit der Union einfacher machen. Langfristig ist es ggf. auch Anpassung um Gespräche mit den USA zu erleichtern? Politiker müssen ja nicht nur aufs Wahlvolk schauen.
May 11, 2025 at 12:09 PM
The twitter takeover was very much in the news, everyone knows by now what kind of person Elon Musk is, and one would hope X is dead by now... So is this Twitter-Musk showdown still relevant today? Okay, X is still very much alive... so maybe it is worth reading...?
April 6, 2025 at 12:41 PM
Besides allegations against Meta executives Mark Zuckerberg, Joel Kaplan and Sheryl Sandberg, there is a lot to learn about Facebook's company culture and international diplomacy/big tech lobbying. It was super-interesting to read about Facebook's values and how Facebook viewed its role in the world
April 6, 2025 at 12:22 PM
Reposted by Malte Schierholz
I like this paper: Gruber, C., Schenk, P. O., Schierholz, M., Kreuter, F., & Kauermann, G. (2023). Sources of Uncertainty in Machine Learning-A Statisticians ’ View. arxiv.org/abs/2305.16703. 1/
Sources of Uncertainty in Supervised Machine Learning -- A Statisticians' View
Supervised machine learning and predictive models have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becom...
arxiv.org
March 20, 2025 at 9:24 PM