A CatBoost Approach for Superconducting Material Prediction with AI

In an article published in the journal Nature, researchers focused on using artificial intelligence (AI), specifically the CatBoost algorithm, to predict the transition temperatures (Tc) of superconducting materials. Utilizing the SuperCon dataset, the study included data pre-processing and introduced the Jabir and Soraya packages for generating atomic descriptors and selecting crucial features.

Study: A CatBoost Approach for Superconducting Material Prediction with AI. Image credit: SeniMelihat/Shutterstock
Study: A CatBoost Approach for Superconducting Material Prediction with AI. Image credit: SeniMelihat/Shutterstock

The resulting model achieved high accuracy with R-squared (R2) and root mean square error (RMSE) values of 0.952 and 6.45 K, respectively. Additionally, a web application for predicting Tc values of superconducting materials was introduced as a novel contribution.


Superconductivity, characterized by zero electrical resistance and the expulsion of magnetic fields, results from quantum mechanics on a macroscopic scale. While predicting the Tc of superconductors remains challenging, existing methods, including density functional theory (DFT), face limitations in handling strong correlations. The rise of machine learning (ML) offers an alternative, with data-driven approaches proving advantageous for predicting material properties. Recent studies have employed various algorithms, such as extreme gradient boosting (XGBoost), random forest, and convolutional neural networks (CNN), to predict Tc values for superconducting materials, but gaps persist in establishing a comprehensive feature space and identifying crucial features.

This research addressed these gaps by emphasizing the significance of the dataset in data science and introducing the Jabir package to generate 322 atomic features, establishing a more suitable feature space for superconducting Tc. Additionally, the Soraya package aided in selecting the most relevant features. Previous studies utilized methods like Magpie descriptors or crystal graph CNNs but did not thoroughly focus on feature selection or creating an optimal feature space. The proposed model, utilizing the CatBoost algorithm, surpassed prior works with superior R2 and RMSE values. The emphasis on dataset refinement and feature selection distinguished this research, contributing to the evolution of "Data-Based Materials Science" and advancing the accurate prediction of superconducting material properties.

Data and computational methods

The researchers focused on predicting the Tc of superconducting materials using the CatBoost algorithm and a meticulously processed dataset, named DataG, derived from the SuperCon dataset. The dataset, containing 33,407 compounds, underwent extensive cleaning procedures, addressing issues like missing and duplicated data, problematic compounds, and outliers. The cleaning process resulted in the creation of DataG, a refined dataset comprising 13,022 compounds.

The CatBoost algorithm, a gradient-boosted decision trees ensemble technique, was chosen for its efficiency in handling large datasets. To represent compounds, a novel Python package called Jabir generated 322 atomic features for each, emphasizing the importance of the dataset in ML. Feature selection became crucial in handling the vast feature space, and the authors introduced the Soraya package, a hybrid method combining correlation analysis, Shapley additive explanations (SHAP) method, and forward selection. This innovative approach helped identify the most significant features while eliminating redundant ones.

The research leveraged these refined features to predict Tc values, achieving notable accuracy with an emphasis on the dataset's quality and feature selection. The comprehensive methodology, from dataset preprocessing to feature selection and ML application, contributed to advancing the understanding and prediction of superconducting material properties.


The study employed an innovative hybrid technique, the Soraya package, to select 30 significant features from 322, emphasizing the importance of thermal conductivity in determining superconducting Tc. The CatBoost algorithm was then employed to sort these features, confirming the strong correlation (0.68) between thermal conductivity and Tc. For the refined dataset, DataG, comprising 13,022 superconducting materials, the CatBoost algorithm predicted Tc values with an impressive R2 of 0.952 and RMSE of 6.45, surpassing previous literature.

The methodology was extended to other datasets, DataS, DataK, and DataH, leading to improved evaluation criteria. The model demonstrated its predictive power by accurately estimating Tc values for new and previously unreported iron-based superconducting compounds. Notably, the model achieved remarkable agreement when predicting Tc values for compounds not present in the original dataset, validating its accuracy against experimental results. The study not only expanded the comprehension of superconducting material characteristics but also furnished a resilient and trustworthy ML model for Tc prediction.


In conclusion, the researchers leveraged AI, specifically the CatBoost algorithm, to predict the Tc of superconducting materials, presenting a novel approach in materials science. The development of the DataG dataset, consisting of 13,022 compounds, involved advanced data pre-processing techniques, while the newly designed Jabir package generated superior atomic features compared to existing methods.

The innovative Soraya package, as a feature selection method, significantly enhanced the prediction model by eliminating redundant features. This comprehensive approach resulted in optimized evaluation values for various datasets. The study's contributions, including the novel web application for Tc prediction, demonstrated the impactful synergy between AI and materials science.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, February 23). A CatBoost Approach for Superconducting Material Prediction with AI. AZoAi. Retrieved on April 16, 2024 from https://www.azoai.com/news/20240223/A-CatBoost-Approach-for-Superconducting-Material-Prediction-with-AI.aspx.

  • MLA

    Nandi, Soham. "A CatBoost Approach for Superconducting Material Prediction with AI". AZoAi. 16 April 2024. <https://www.azoai.com/news/20240223/A-CatBoost-Approach-for-Superconducting-Material-Prediction-with-AI.aspx>.

  • Chicago

    Nandi, Soham. "A CatBoost Approach for Superconducting Material Prediction with AI". AZoAi. https://www.azoai.com/news/20240223/A-CatBoost-Approach-for-Superconducting-Material-Prediction-with-AI.aspx. (accessed April 16, 2024).

  • Harvard

    Nandi, Soham. 2024. A CatBoost Approach for Superconducting Material Prediction with AI. AZoAi, viewed 16 April 2024, https://www.azoai.com/news/20240223/A-CatBoost-Approach-for-Superconducting-Material-Prediction-with-AI.aspx.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Analysis of Urban Residential Water Consumption in Developing Countries