A new gold-standard dataset reveals how hard it is to get accurate emissions data from company reports, even for AI, raising questions about the reliability of automated climate accountability tools.

Research: Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse gas emission extraction. Image Credit: Summit Art Creations / Shutterstock
Large companies in the EU are legally required to report their greenhouse gas (GHG) emissions. However, manually extracting this information from lengthy PDF sustainability reports is slow and prone to errors. Many teams attempt to expedite the process with automation, for example, by utilizing Large Language Models (LLMs), AI systems that read text and generate responses.
Project coordinator and postdoctoral researcher at the Social Data Science and AI Lab (SODA Lab), Dr. Malte Schierholz, urges caution, though: "With automatic extraction methods, it's easy to fully trust the LLM's output and overlook measurement errors that occur frequently." Because the trend of increased automation is both promising and risky, the research group Greenhouse Gas Insights and Sustainability Tracking (GIST) set out to establish a reliable point of reference for collecting emission data.
A Gold Standard for Recording Emissions Data
In a paper published in Scientific Data, the group introduces a gold-standard benchmark dataset for extracting GHG emissions. The dataset is based on sustainability reports sampled from companies in the MSCI World Small Cap index and the German DAX. "The basic task was to extract GHG emissions values from PDF files into a table," says Schierholz. "What first sounded straightforward turned out to be surprisingly complex."
In a multi-stage process, sustainable finance experts from LMU and the Deutsche Bundesbank collaborated with methodologists to define rigorous annotation rules, conducted multiple rounds of extraction and verification, and convened expert discussion groups. "If you want a dataset that's both accurate and allows for comparisons between companies, you need clear rules and plenty of feedback loops throughout the data annotation process," says Jacob Beck, who led the annotation effort. "In the end, some ambiguous cases still required expert group discussion."
Challenges with Company Disclosures
Sustainable Finance researcher Dr. Andreas Dimmelmeier (GreenDIA consortium) was not surprised: "The hard-to-resolve cases stem not only from complex and partly inconsistent reporting protocols, but also from missing context and incomplete disclosures in company reports. Many companies in our sample did not disclose emissions according to established reporting and calculation frameworks."
The team also observed that approximately half of the reports contained no usable greenhouse gas data. When emissions were reported, they most often referred to direct emissions and indirect emissions from energy consumption. Data on other indirect emissions, such as those arising in the supply chain or from travel and transport, were rarely complete.
A Transparent Resource for Researchers
The dataset, together with scripts and supplementary materials, offers a transparent and rigorously curated foundation for evaluating automated approaches to sustainability reporting. By making the assumptions and decisions explicit, it enables fair method comparisons and clearer communication of annotation uncertainty.
Moving Toward Net-Zero with Better Data
The GIST group hopes this resource will help researchers and practitioners measure progress more honestly and close critical data gaps on the path to net zero.
Source:
Journal reference:
- Beck, J., Steinberg, A., Dimmelmeier, A., Domenech Burin, L., Kormanyos, E., Fehr, M., & Schierholz, M. (2025). Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse gas emission extraction. Scientific Data, 12(1), 1-9. DOI: 10.1038/s41597-025-05664-8, https://www.nature.com/articles/s41597-025-05664-8