In an article published in the journal Plos One, researchers addressed the escalating demand for high-quality Scope 3 emissions data by investors concerned about climate risks. Investigating major providers such as Bloomberg, Refinitiv Eikon, and Institutional Shareholder Services (ISS), they uncovered substantial data divergence, incomplete reporting of emissions categories, and challenges in predicting emissions accurately, highlighting the critical need for investor awareness and transparent data disclosure.
Scope 3 emissions, integral to comprehensive climate risk assessments, are often mired in complexity due to diverse reporting standards and the discretionary nature of firms in selecting categories for disclosure. The Greenhouse Gas (GHG) Protocol's Corporate Value Chain Accounting and Reporting Standard provides a foundational framework, delineating 15 distinct categories for Scope 3 emissions. However, the discretion granted to firms in category selection, coupled with potential trade-offs among principles like relevance, completeness, consistency, transparency, and accuracy, raises concerns about data accuracy and comparability.
Prior research underscores two primary issues: first, a pervasive lack of uniformity in Scope 3 disclosure, with larger emitters exhibiting a reluctance to report downstream emissions, and second, a tendency among firms to selectively disclose categories, leading to reporting inconsistency and potential underrepresentation of material emissions. Furthermore, this paper identifies three sources of errors in Scope 3 reporting—boundary incompleteness, activity exclusion, and selection bias—complicating accurate emissions calculation and cross-firm comparisons.
Addressing the shortcomings in existing literature, the present paper employed a novel approach by scrutinizing third-party datasets from ISS, Refinitiv Eikon, and Bloomberg to uncover significant data divergence in reported Scope 3 emissions. By exploring the composition of emissions across distinct categories and leveraging machine-learning models for prediction accuracy, this study aimed to enhance understanding and transparency in Scope 3 emissions data quality. This comprehensive investigation sought to bridge gaps in the current discourse, shedding light on the complexities, discrepancies, and predictive challenges inherent in Scope 3 emissions reporting.
The study used a multifaceted approach to assess the quality of Scope 3 emissions data, emphasizing divergence, composition, and predictive accuracy through advanced statistical and machine learning techniques.
- Divergence Analysis: To address the first research question on data divergence, the study went beyond correlation analyses, introducing percentage error metrics for quantifying the degree of inconsistency across datasets. Utilizing aggregated Scope 3 emissions data from ISS, Refinitiv Eikon, and Bloomberg, the researchers calculated the trimmed mean absolute percentage error and the trimmed mean percentage error, offering insights into both the direction and magnitude of discrepancies. Furthermore, the research explored the impact of data divergence on emissions rankings, crucial for constructing low-carbon portfolios, by evaluating the proportion of observations within the same or adjacent ranking deciles across datasets.
- Composition Analysis: The second part of the analysis focused on the composition of Bloomberg's Scope 3 emissions. Assessing relevance and completeness, the study introduced carbon intensity as a measure, considering the contribution of each emissions category to the firm's overall Scope 3 emissions and the proportion of firms disclosing specific categories. The study highlighted the potential impact of an incomplete Scope 3 composition by imputing missing values based on peer group medians.
- Prediction Accuracy Models: Addressing the third research question, the study developed machine learning models to predict Scope 3 emissions for non-disclosing firms. Baseline models included an Industry Fill model and an Ordinal Least Square regression using traditional financial metrics. Linear models extended the predictor set to include additional financial variables, while tree-based ensemble models such as Random Forest and Extreme Gradient Boosting leveraged decision trees for improved predictive power. Linear Tree models, combining decision trees and linear models, further enhanced accuracy by capturing non-linearities in the data. The study rigorously optimized hyperparameters using Bayesian methods and evaluated the predictive performance through five-fold cross-validation.
The study assessed the data quality, composition, and prediction accuracy of Scope 3 emissions from Bloomberg, Refinitiv Eikon, and ISS. The divergence revealed significant disparities among the datasets. ISS, using proprietary models, showed the highest coverage but also a clear upward bias, while Bloomberg and Eikon exhibited more similarity. The divergence statistics indicated low consistency among the datasets, particularly with ISS having limited identical data points with the other providers.
In terms of composition, the completeness and relevance of reported Scope 3 emissions categories were evaluated. Firms, on average, reported only 3.8 out of 15 distinct categories, with improvements over time. However, there was a tendency to report easier-to-calculate categories rather than those with more material to carbon footprints. The most relevant categories varied across sectors, highlighting sector-specific patterns.
The researchers explored the machine learning model's performance in predicting Scope 3 emissions. Baseline models like industry-fill and naïve Ordinary Least Squares (OLS) performed similarly, and machine learning algorithms provided limited improvement. Linear Forest outperformed other algorithms in predicting total Scope 3 emissions. However, even the best model exhibited substantial prediction errors, with a median absolute percentage error of 72.2%. The inclusion of energy-related data marginally improved prediction accuracy.
In conclusion, the study revealed significant divergence in Scope 3 emissions data among third-party data providers like Bloomberg, Refinitiv Eikon, and ISS, impacting divestment strategies and low-carbon indices. Firms tend to disclose easier-to-calculate emission categories, affecting data relevance.
Prediction accuracy, even with advanced machine learning, remained limited, calling for caution in using Scope 3 estimates. The findings stressed the need for binding mandates, improved guidance, expanded reporting boundaries, and awareness of data uncertainties in addressing Scope 3 emissions disclosure and analysis.