Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning

In an article published in the journal Plos One, researchers addressed the escalating demand for high-quality Scope 3 emissions data by investors concerned about climate risks. Investigating major providers such as Bloomberg, Refinitiv Eikon, and Institutional Shareholder Services (ISS), they uncovered substantial data divergence, incomplete reporting of emissions categories, and challenges in predicting emissions accurately, highlighting the critical need for investor awareness and transparent data disclosure.

Study: Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning. Image credit: BESTWEB/Shutterstock
Study: Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning. Image credit: BESTWEB/Shutterstock


Scope 3 emissions, integral to comprehensive climate risk assessments, are often mired in complexity due to diverse reporting standards and the discretionary nature of firms in selecting categories for disclosure. The Greenhouse Gas (GHG) Protocol's Corporate Value Chain Accounting and Reporting Standard provides a foundational framework, delineating 15 distinct categories for Scope 3 emissions. However, the discretion granted to firms in category selection, coupled with potential trade-offs among principles like relevance, completeness, consistency, transparency, and accuracy, raises concerns about data accuracy and comparability.

Prior research underscores two primary issues: first, a pervasive lack of uniformity in Scope 3 disclosure, with larger emitters exhibiting a reluctance to report downstream emissions, and second, a tendency among firms to selectively disclose categories, leading to reporting inconsistency and potential underrepresentation of material emissions. Furthermore, this paper identifies three sources of errors in Scope 3 reporting—boundary incompleteness, activity exclusion, and selection bias—complicating accurate emissions calculation and cross-firm comparisons.

Addressing the shortcomings in existing literature, the present paper employed a novel approach by scrutinizing third-party datasets from ISS, Refinitiv Eikon, and Bloomberg to uncover significant data divergence in reported Scope 3 emissions. By exploring the composition of emissions across distinct categories and leveraging machine-learning models for prediction accuracy, this study aimed to enhance understanding and transparency in Scope 3 emissions data quality. This comprehensive investigation sought to bridge gaps in the current discourse, shedding light on the complexities, discrepancies, and predictive challenges inherent in Scope 3 emissions reporting.


The study used a multifaceted approach to assess the quality of Scope 3 emissions data, emphasizing divergence, composition, and predictive accuracy through advanced statistical and machine learning techniques.

  • Divergence Analysis: To address the first research question on data divergence, the study went beyond correlation analyses, introducing percentage error metrics for quantifying the degree of inconsistency across datasets. Utilizing aggregated Scope 3 emissions data from ISS, Refinitiv Eikon, and Bloomberg, the researchers calculated the trimmed mean absolute percentage error and the trimmed mean percentage error, offering insights into both the direction and magnitude of discrepancies. Furthermore, the research explored the impact of data divergence on emissions rankings, crucial for constructing low-carbon portfolios, by evaluating the proportion of observations within the same or adjacent ranking deciles across datasets.
  • Composition Analysis: The second part of the analysis focused on the composition of Bloomberg's Scope 3 emissions. Assessing relevance and completeness, the study introduced carbon intensity as a measure, considering the contribution of each emissions category to the firm's overall Scope 3 emissions and the proportion of firms disclosing specific categories. The study highlighted the potential impact of an incomplete Scope 3 composition by imputing missing values based on peer group medians.
  • Prediction Accuracy Models: Addressing the third research question, the study developed machine learning models to predict Scope 3 emissions for non-disclosing firms. Baseline models included an Industry Fill model and an Ordinal Least Square regression using traditional financial metrics. Linear models extended the predictor set to include additional financial variables, while tree-based ensemble models such as Random Forest and Extreme Gradient Boosting leveraged decision trees for improved predictive power. Linear Tree models, combining decision trees and linear models, further enhanced accuracy by capturing non-linearities in the data. The study rigorously optimized hyperparameters using Bayesian methods and evaluated the predictive performance through five-fold cross-validation.


The study assessed the data quality, composition, and prediction accuracy of Scope 3 emissions from  Bloomberg, Refinitiv Eikon, and ISS. The divergence revealed significant disparities among the datasets. ISS, using proprietary models, showed the highest coverage but also a clear upward bias, while Bloomberg and Eikon exhibited more similarity. The divergence statistics indicated low consistency among the datasets, particularly with ISS having limited identical data points with the other providers.

In terms of composition, the completeness and relevance of reported Scope 3 emissions categories were evaluated. Firms, on average, reported only 3.8 out of 15 distinct categories, with improvements over time. However, there was a tendency to report easier-to-calculate categories rather than those with more material to carbon footprints. The most relevant categories varied across sectors, highlighting sector-specific patterns.

The researchers explored the machine learning model's performance in predicting Scope 3 emissions. Baseline models like industry-fill and naïve Ordinary Least Squares (OLS) performed similarly, and machine learning algorithms provided limited improvement. Linear Forest outperformed other algorithms in predicting total Scope 3 emissions. However, even the best model exhibited substantial prediction errors, with a median absolute percentage error of 72.2%. The inclusion of energy-related data marginally improved prediction accuracy.


In conclusion, the study revealed significant divergence in Scope 3 emissions data among third-party data providers like Bloomberg, Refinitiv Eikon, and ISS, impacting divestment strategies and low-carbon indices. Firms tend to disclose easier-to-calculate emission categories, affecting data relevance.

Prediction accuracy, even with advanced machine learning, remained limited, calling for caution in using Scope 3 estimates. The findings stressed the need for binding mandates, improved guidance, expanded reporting boundaries, and awareness of data uncertainties in addressing Scope 3 emissions disclosure and analysis.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2023, November 17). Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning. AZoAi. Retrieved on July 17, 2024 from

  • MLA

    Nandi, Soham. "Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning". AZoAi. 17 July 2024. <>.

  • Chicago

    Nandi, Soham. "Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning". AZoAi. (accessed July 17, 2024).

  • Harvard

    Nandi, Soham. 2023. Addressing Discrepancies in Scope 3 Emissions Reporting and Improving Predictive Accuracy Using Machine Learning. AZoAi, viewed 17 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Predicting Upper Secondary Education Dropout Using Machine Learning