Open Molecules 2025 Powers a New Era of AI-Driven Chemistry

By combining cutting-edge software, vast computational power, and over 100 million simulations, Open Molecules 2025 gives researchers unprecedented tools to accelerate breakthroughs in energy, medicine, and beyond.

Overview of OMol25, including chemical scope, sampling strategies used to construct structures, chemical phenomena we seek to capture, properties available for each datapoint, and envisioned application areas.Overview of OMol25, including chemical scope, sampling strategies used to construct structures, chemical phenomena we seek to capture, properties available for each datapoint, and envisioned application areas.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

A collaborative effort between Meta, Lawrence Berkeley National Laboratory, and Los Alamos National Laboratory leverages Los Alamos' expertise in building tools for molecular screening capabilities. The release of Open Molecules 2025, an unprecedented dataset of molecular simulations, can accelerate opportunities for machine learning to transform research in fields such as biology, materials science, and energy technologies.

"A prohibitive part of molecular design has been the extreme computational cost needed to achieve quantum chemistry-level accuracy," said Michael G. Taylor, researcher at Los Alamos and project member. "In order to train machine learning models capable of quantum chemistry-level accuracy, we need vast amounts of diverse, valid training data. Open Molecules 2025 bridges this gap with a dataset of over 100 million density-functional theory calculations that we can use to train machine learning models accurate enough for all kinds of chemical challenges."

The dataset is key to unlocking the potential of machine learning for chemical applications, such as designing a new drug to combat disease or a battery cell to store energy. The employment of density functional theory calculations in the dataset enables a precise, atomic-level understanding of molecular behavior and interactions. The unique software designed by Taylor played a crucial role in allowing Open Molecules 2025 to achieve its goals.

Novel software helps build the dataset

To help run the calculations and build the dataset, the collaboration leveraged the capabilities of the Architector software, designed by Taylor. Architector is a state-of-the-art software for predicting 3D structures of metal complexes. Metal complexes are chemical compounds in which a central metal atom is bound to an array of other molecules or atoms, and they represent important chemistry relevant to applications ranging from biology to materials science.

Architector, as employed by Taylor and collaborators in the Lab's Theoretical division, has mainly been applied to "F-block" elements: lanthanides like cerium and ytterbium, and actinides such as thorium and uranium. The F-block elements include many elements often referred to as rare earth elements, which are valuable for a wide range of industrial purposes, including high-tech applications in telecommunications, imaging, data storage, and more.

The metal complexes represent an important class of chemistry explored with the Open Molecules 2025 dataset. Other classes include ion molecules such as proteins and RNA, small molecules that might be the basis of drug discovery, and electrolyte metals surrounded by different solvents. Taylor estimates that the chemistry explored by Architector represents up to a third of the entire dataset.

An investment in foundational chemistry knowledge

Meta tasked its vast computing power to run the density functional theory calculations. Considering only the rare earth molecular simulations it was able to achieve, the Open Molecules 2025 project resulted in data on approximately 20,000 structures on each of the 17 rare earth elements. The next-largest dataset available in literature has approximately 1,000 structures total per rare earth element.

The immense data generated can now be used to train other machine learning models at a fraction of the time and cost. The dataset could lead to pre-trained foundation models that can be fine-tuned with minimal added data in areas of interest. The entire Open Molecules 2025 effort, including the initial machine learning models trained on the data, will be open to the public, providing researchers with access to data and models relevant to their research.

"Chemical design often boils down to predicting the properties of new chemistries with minimal information and computational expense," said Taylor. "Having this dataset, with the ability to train machine learning models to do that predictive work, is potentially transformative for scientific discovery."

In addition to Meta and Lawrence Berkeley National Laboratory, the project's collaborators include representatives from Carnegie Mellon University, Genentech, the University of California, Berkeley, New York University, Princeton University, Stanford University, and the University of Cambridge.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Source:
Journal reference:
  • Preliminary scientific report. Levine, D. S., Shuaibi, M., Clark, E. W., Taylor, M. G., Hasyim, M. R., Michel, K., Batatia, I., Csányi, G., Dzamba, M., Eastman, P., Frey, N. C., Fu, X., Gharakhanyan, V., Krishnapriyan, A. S., Rackers, J. A., Raja, S., Rizvi, A., Rosen, A. S., Ulissi, Z., . . . Wood, B. M. (2025). The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models. ArXiv. https://arxiv.org/abs/2505.08762

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Model Maps Hidden Animal Feeding Operations to Tackle Water Pollution