1
|
Morán-González L, Betten JE, Kneiding H, Balcells D. AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning. J Chem Inf Model 2024; 64:8756-8769. [PMID: 39580812 PMCID: PMC11632777 DOI: 10.1021/acs.jcim.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/03/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom-atom, bond-bond, and bond-atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a data set of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska's complex data set (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom-atom autocorrelations. Dimensionality reduction studies also showed that the bond-bond and bond-atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.
Collapse
Affiliation(s)
- Lucía Morán-González
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
- Centre
for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, P.O.
Box 1033 0315 Oslo, Norway
| | - Jørn Eirik Betten
- Simula
Research Laboratory, Kristian Augusts Gate 23, 0164 Oslo, Norway
| | - Hannes Kneiding
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| | - David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| |
Collapse
|
2
|
Vennelakanti V, Kilic IB, Terrones GG, Duan C, Kulik HJ. Machine Learning Prediction of the Experimental Transition Temperature of Fe(II) Spin-Crossover Complexes. J Phys Chem A 2024; 128:204-216. [PMID: 38148525 DOI: 10.1021/acs.jpca.3c07104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Spin-crossover (SCO) complexes are materials that exhibit changes in the spin state in response to external stimuli, with potential applications in molecular electronics. It is challenging to know a priori how to design ligands to achieve the delicate balance of entropic and enthalpic contributions needed to tailor a transition temperature close to room temperature. We leverage the SCO complexes from the previously curated SCO-95 data set [Vennelakanti et al. J. Chem. Phys. 159, 024120 (2023)] to train three machine learning (ML) models for transition temperature (T1/2) prediction using graph-based revised autocorrelations as features. We perform feature selection using random forest-ranked recursive feature addition (RF-RFA) to identify the features essential to model transferability. Of the ML models considered, the full feature set RF and recursive feature addition RF models perform best, achieving moderate correlation to experimental T1/2 values. We then compare ML T1/2 predictions to those from three previously identified best-performing density functional approximations (DFAs) which accurately predict SCO behavior across SCO-95, finding that the ML models predict T1/2 more accurately than the best-performing DFAs. In addition, we study ML model predictions for a set of 18 SCO complexes for which only estimated T1/2 values are available. Upon excluding outliers from this set, the RF-RFA RF model shows a strong correlation to estimated T1/2 values with a Pearson's r of 0.82. In contrast, DFA-predicted T1/2 values have large errors and show no correlation to estimated T1/2 values over the same set of complexes. Overall, our study demonstrates slightly superior performance of ML models in comparison with some of the best-performing DFAs, and we expect ML models to improve further as larger data sets of SCO complexes are curated and become available for model training.
Collapse
Affiliation(s)
- Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Irem B Kilic
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Kevlishvili I, Duan C, Kulik HJ. Classification of Hemilabile Ligands Using Machine Learning. J Phys Chem Lett 2023:11100-11109. [PMID: 38051982 DOI: 10.1021/acs.jpclett.3c02828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Hemilabile ligands have the capacity to partially disengage from a metal center, providing a strategy to balance stability and reactivity in catalysis, but they are not straightforward to identify. We identify ligands in the Cambridge Structural Database that have been crystallized with distinct denticities and are thus identifiable as hemilabile ligands. We implement a semi-supervised learning approach using a label-spreading algorithm to augment a small negative set that is supported by heuristic rules of ligand and metal co-occurrence. We show that a heuristic based on coordinating atom identity alone is not sufficient to identify whether a ligand is hemilabile, and our trained machine-learning classification models are instead needed to predict whether a bi-, tri-, or tetradentate ligand is hemilabile with high accuracy and precision. Feature importance analysis of our models shows that the second, third, and fourth coordination spheres all play important roles in ligand hemilability.
Collapse
Affiliation(s)
- Ilia Kevlishvili
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
4
|
Lu H, Kang X, Yu H, Zhang W, Luo Y. Using a single complex to predict the reaction energy profile: a case study of Pd/Ni-catalyzed ethylene polymerization. Dalton Trans 2023; 52:14790-14796. [PMID: 37807861 DOI: 10.1039/d3dt02745g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Mechanism-driven catalyst screening could be greatly accelerated by quantitative prediction models of the reaction energy profile. Here, we propose a novel method for molecular representation, taking palladium- and nickel-catalyzed ethylene polymerization as model reactions. The geometric parameters (GPfra) and electron occupancies (EOfra) from the non-ligand fragment of the η3-complex were extracted as the molecular descriptors, followed by constructing the reaction energy profile prediction models on the basis of various regression algorithms. The models showed great accuracy with respect to both theoretical and experimental data. More importantly, the models are convenient for training and utilization. On one hand, all the features were easily captured from the single η3-complex. On the other hand, further investigation also demonstrated that the models could be constructed with a small training sample size. We believe that our featurization method could possibly be generalized to more organometallic reactions and paves the way to efficient catalyst design.
Collapse
Affiliation(s)
- Han Lu
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Xiaohui Kang
- College of Pharmacy, Dalian Medical University, Dalian 116044, China
| | - Hang Yu
- Liaoning Key Laboratory of Clean Energy, Shenyang Aerospace University, Shenyang 110136, China
| | - Wenzhen Zhang
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Yi Luo
- State Key Laboratory of Fine Chemicals, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China.
- PetroChina Petrochemical Research Institute, Beijing 102206, China
| |
Collapse
|
5
|
Terrones GG, Duan C, Nandy A, Kulik HJ. Low-cost machine learning prediction of excited state properties of iridium-centered phosphors. Chem Sci 2023; 14:1419-1433. [PMID: 36794185 PMCID: PMC9906783 DOI: 10.1039/d2sc06150c] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/05/2023] [Indexed: 01/07/2023] Open
Abstract
Prediction of the excited state properties of photoactive iridium complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from the perspective of accuracy and of computational cost, complicating high-throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models and experimental data for 1380 iridium complexes to perform these prediction tasks. We find the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional tight binding calculations. Using artificial neural network (ANN) models, we predict the mean emission energy of phosphorescence, the excited state lifetime, and the emission spectral integral for iridium complexes with accuracy competitive with or superseding that of TDDFT. We conduct feature importance analysis to determine that high cyclometalating ligand ionization potential correlates to high mean emission energy, while high ancillary ligand ionization potential correlates to low lifetime and low spectral integral. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and use uncertainty-controlled predictions to identify promising ligands for the design of new phosphors while retaining confidence in the quality of the ANN predictions.
Collapse
Affiliation(s)
- Gianmarco G Terrones
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
- Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
6
|
Nandy A, Duan C, Goffinet C, Kulik HJ. New Strategies for Direct Methane-to-Methanol Conversion from Active Learning Exploration of 16 Million Catalysts. JACS AU 2022; 2:1200-1213. [PMID: 35647589 PMCID: PMC9135396 DOI: 10.1021/jacsau.2c00176] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/12/2022] [Accepted: 04/15/2022] [Indexed: 05/03/2023]
Abstract
Despite decades of effort, no earth-abundant homogeneous catalysts have been discovered that can selectively oxidize methane to methanol. We exploit active learning to simultaneously optimize methane activation and methanol release calculated with machine learning-accelerated density functional theory in a space of 16 M candidate catalysts including novel macrocycles. By constructing macrocycles from fragments inspired by synthesized compounds, we ensure synthetic realism in our computational search. Our large-scale search reveals that low-spin Fe(II) compounds paired with strong-field (e.g., P or S-coordinating) ligands have among the best energetic tradeoffs between hydrogen atom transfer (HAT) and methanol release. This observation contrasts with prior efforts that have focused on high-spin Fe(II) with weak-field ligands. By decoupling equatorial and axial ligand effects, we determine that negatively charged axial ligands are critical for more rapid release of methanol and that higher-valency metals [i.e., M(III) vs M(II)] are likely to be rate-limited by slow methanol release. With full characterization of barrier heights, we confirm that optimizing for HAT does not lead to large oxo formation barriers. Energetic span analysis reveals designs for an intermediate-spin Mn(II) catalyst and a low-spin Fe(II) catalyst that are predicted to have good turnover frequencies. Our active learning approach to optimize two distinct reaction energies with efficient global optimization is expected to be beneficial for the search of large catalyst spaces where no prior designs have been identified and where linear scaling relationships between reaction energies or barriers may be limited or unknown.
Collapse
Affiliation(s)
- Aditya Nandy
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Conrad Goffinet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
7
|
Duan C, Nandy A, Kulik HJ. Machine Learning for the Discovery, Design, and Engineering of Materials. Annu Rev Chem Biomol Eng 2022; 13:405-429. [PMID: 35320698 DOI: 10.1146/annurev-chembioeng-092320-120230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) has become a part of the fabric of high-throughput screening and computational discovery of materials. Despite its increasingly central role, challenges remain in fully realizing the promise of ML. This is especially true for the practical acceleration of the engineering of robust materials and the development of design strategies that surpass trial and error or high-throughput screening alone. Depending on the quantity being predicted and the experimental data available, ML can either outperform physics-based modes, be used to accelerate such models, or be integrated with them to improve their performance. We cover recent advances in algorithms and in their application that are starting to make inroads toward (a) the discovery of new materials through large-scale enumerative screening, (b) the design of materials through identification of rules and principles that govern materials properties, and (c) the engineering of practical materials by satisfying multiple objectives. We conclude with opportunities for further advancement to realize ML as a widespread tool for practical computational materials design. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 13 is October 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , , .,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA; , ,
| |
Collapse
|
8
|
Kalikadien AV, Pidko EA, Sinha V. ChemSpaX: exploration of chemical space by automated functionalization of molecular scaffold. DIGITAL DISCOVERY 2022; 1:8-25. [PMID: 35340336 PMCID: PMC8887922 DOI: 10.1039/d1dd00017a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/23/2021] [Indexed: 12/19/2022]
Abstract
Exploration of the local chemical space of molecular scaffolds by post-functionalization (PF) is a promising route to discover novel molecules with desired structure and function. PF with rationally chosen substituents based on known electronic and steric properties is a commonly used experimental and computational strategy in screening, design and optimization of catalytic scaffolds. Automated generation of reasonably accurate geometric representations of post-functionalized molecular scaffolds is highly desirable for data-driven applications. However, automated PF of transition metal (TM) complexes remains challenging. In this work a Python-based workflow, ChemSpaX, that is aimed at automating the PF of a given molecular scaffold with special emphasis on TM complexes, is introduced. In three representative applications of ChemSpaX by comparing with DFT and DFT-B calculations, we show that the generated structures have a reasonable quality for use in computational screening applications. Furthermore, we show that ChemSpaX generated geometries can be used in machine learning applications to accurately predict DFT computed HOMO-LUMO gaps for transition metal complexes. ChemSpaX is open-source and aims to bolster and democratize the efforts of the scientific community towards data-driven chemical discovery.
Collapse
Affiliation(s)
- Adarsh V Kalikadien
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Evgeny A Pidko
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| | - Vivek Sinha
- Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9 2629 HZ Delft The Netherlands
| |
Collapse
|
9
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
10
|
Taylor MG, Nandy A, Lu CC, Kulik HJ. Deciphering Cryptic Behavior in Bimetallic Transition-Metal Complexes with Machine Learning. J Phys Chem Lett 2021; 12:9812-9820. [PMID: 34597514 DOI: 10.1021/acs.jpclett.1c02852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We demonstrate an alternative, data-driven approach to uncovering structure-property relationships for the rational design of heterobimetallic transition-metal complexes that exhibit metal-metal bonding. We tailor graph-based representations of the metal-local environment for these complexes for use in multiple linear regression and kernel ridge regression (KRR) models. We curate a set of 28 experimentally characterized complexes to develop a multiple linear regression model for oxidation potentials. We achieve good accuracy (mean absolute error of 0.25 V) and preserve transferability to unseen experimental data with a new ligand structure. We also train a KRR model on a subset of 330 structurally characterized heterobimetallics to predict the degree of metal-metal bonding. This KRR model predicts relative metal-metal bond lengths in the test set to within 5%, and analysis of key features reveals the fundamental atomic contributions (e.g., the valence electron configuration) that most strongly influence the behavior of these complexes. Our work provides guidance for rational bimetallic design, suggesting that properties, including the formal shortness ratio, should be transferable from one period to another.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Connie C Lu
- Department of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Westermayr J, Marquetand P. Machine Learning for Electronically Excited States of Molecules. Chem Rev 2021; 121:9873-9926. [PMID: 33211478 PMCID: PMC8391943 DOI: 10.1021/acs.chemrev.0c00749] [Citation(s) in RCA: 197] [Impact Index Per Article: 49.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Indexed: 12/11/2022]
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute
of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna
Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data
Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
12
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
13
|
Abstract
Electronically excited states of molecules are at the heart of photochemistry, photophysics, as well as photobiology and also play a role in material science. Their theoretical description requires highly accurate quantum chemical calculations, which are computationally expensive. In this review, we focus on not only how machine learning is employed to speed up such excited-state simulations but also how this branch of artificial intelligence can be used to advance this exciting research field in all its aspects. Discussed applications of machine learning for excited states include excited-state dynamics simulations, static calculations of absorption spectra, as well as many others. In order to put these studies into context, we discuss the promises and pitfalls of the involved machine learning techniques. Since the latter are mostly based on quantum chemistry calculations, we also provide a short introduction into excited-state electronic structure methods and approaches for nonadiabatic dynamics simulations and describe tricks and problems when using them in machine learning for excited states of molecules.
Collapse
Affiliation(s)
- Julia Westermayr
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
| | - Philipp Marquetand
- Institute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Vienna Research Platform on Accelerating Photoreaction Discovery, University of Vienna, Währinger Strasse 17, 1090 Vienna, Austria
- Data Science @ Uni Vienna, University of Vienna, Währinger Strasse 29, 1090 Vienna, Austria
| |
Collapse
|
14
|
Affiliation(s)
- Heather J. Kulik
- Department of Chemical Engineering Massachusetts Institute of Technology 77 Massachusetts Ave Rm 66–464 Cambridge MA 02139 USA
| |
Collapse
|
15
|
Janet JP, Duan C, Nandy A, Liu F, Kulik HJ. Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design. Acc Chem Res 2021; 54:532-545. [PMID: 33480674 DOI: 10.1021/acs.accounts.0c00686] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The variability of chemical bonding in open-shell transition-metal complexes not only motivates their study as functional materials and catalysts but also challenges conventional computational modeling tools. Here, tailoring ligand chemistry can alter preferred spin or oxidation states as well as electronic structure properties and reactivity, creating vast regions of chemical space to explore when designing new materials atom by atom. Although first-principles density functional theory (DFT) remains the workhorse of computational chemistry in mechanism deduction and property prediction, it is of limited use here. DFT is both far too computationally costly for widespread exploration of transition-metal chemical space and also prone to inaccuracies that limit its predictive performance for localized d electrons in transition-metal complexes. These challenges starkly contrast with the well-trodden regions of small-organic-molecule chemical space, where the analytical forms of molecular mechanics force fields and semiempirical theories have for decades accelerated the discovery of new molecules, accurate DFT functional performance has been demonstrated, and gold-standard methods from correlated wavefunction theory can predict experimental results to chemical accuracy.The combined promise of transition-metal chemical space exploration and lack of established tools has mandated a distinct approach. In this Account, we outline the path we charted in exploration of transition-metal chemical space starting from the first machine learning (ML) models (i.e., artificial neural network and kernel ridge regression) and representations for the prediction of open-shell transition-metal complex properties. The distinct importance of the immediate coordination environment of the metal center as well as the lack of low-level methods to accurately predict structural properties in this coordination environment first motivated and then benefited from these ML models and representations. Once developed, the recipe for prediction of geometric, spin state, and redox potential properties was straightforwardly extended to a diverse range of other properties, including in catalysis, computational "feasibility", and the gas separation properties of periodic metal-organic frameworks. Interpretation of selected features most important for model prediction revealed new ways to encapsulate design rules and confirmed that models were robustly mapping essential structure-property relationships. Encountering the special challenge of ensuring that good model performance could generalize to new discovery targets motivated investigation of how to best carry out model uncertainty quantification. Distance-based approaches, whether in model latent space or in carefully engineered feature space, provided intuitive measures of the domain of applicability. With all of these pieces together, ML can be harnessed as an engine to tackle the large-scale exploration of transition-metal chemical space needed to satisfy multiple objectives using efficient global optimization methods. In practical terms, bringing these artificial intelligence tools to bear on the problems of transition-metal chemical space exploration has resulted in ML-model assessments of large, multimillion compound spaces in minutes and validated new design leads in weeks instead of decades.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
16
|
Townsend J, Vogiatzis KD. Transferable MP2-Based Machine Learning for Accurate Coupled-Cluster Energies. J Chem Theory Comput 2020; 16:7453-7461. [DOI: 10.1021/acs.jctc.0c00927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Jacob Townsend
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | | |
Collapse
|
17
|
Liu F, Duan C, Kulik HJ. Rapid Detection of Strong Correlation with Machine Learning for Transition-Metal Complex High-Throughput Screening. J Phys Chem Lett 2020; 11:8067-8076. [PMID: 32864977 DOI: 10.1021/acs.jpclett.0c02288] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite its widespread use in chemical discovery, approximate density functional theory (DFT) is poorly suited to many targets, such as those containing open-shell, 3d transition metals that can be expected to have strong multireference (MR) character. For discovery workflows to be predictive, we need automated, low-cost methods that can distinguish the regions of chemical space where DFT should be applied from those where it should not. We curate more than 4800 open-shell transition-metal complexes up to hundreds of atoms in size from prior high-throughput DFT studies and evaluate affordable, finite-temperature DFT fractional occupation number (FON)-based MR diagnostics. We show that intuitive measures of strong correlation (i.e., the HOMO-LUMO gap) are not predictive of MR character as judged by FON-based diagnostics. Analysis of independently trained machine learning (ML) models to predict HOMO-LUMO gaps and FON-based diagnostics reveals differences in the metal and ligand sensitivity of the two quantities. We use our trained ML models to rapidly evaluate MR character over a space of ∼187000 theoretical complexes, identifying large-scale trends in spin-state-dependent MR character and finding small HOMO-LUMO gap complexes while ensuring low MR character.
Collapse
Affiliation(s)
- Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
18
|
Bahlke MP, Mogos N, Proppe J, Herrmann C. Exchange Spin Coupling from Gaussian Process Regression. J Phys Chem A 2020; 124:8708-8723. [DOI: 10.1021/acs.jpca.0c05983] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Marc Philipp Bahlke
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Natnael Mogos
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Jonny Proppe
- Institute of Physical Chemistry, Georg-August University, Tammannstr. 6, 37077 Göttingen, Germany
| | - Carmen Herrmann
- Department of Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| |
Collapse
|
19
|
Duan C, Liu F, Nandy A, Kulik HJ. Semi-supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost. J Phys Chem Lett 2020; 11:6640-6648. [PMID: 32692570 DOI: 10.1021/acs.jpclett.0c02018] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single-reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wave function theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model's robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in an MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
20
|
Duan C, Liu F, Nandy A, Kulik HJ. Data-Driven Approaches Can Overcome the Cost-Accuracy Trade-Off in Multireference Diagnostics. J Chem Theory Comput 2020; 16:4373-4387. [PMID: 32536161 DOI: 10.1021/acs.jctc.0c00358] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost-accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
21
|
Janet JP, Ramesh S, Duan C, Kulik HJ. Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization. ACS CENTRAL SCIENCE 2020; 6:513-524. [PMID: 32342001 PMCID: PMC7181321 DOI: 10.1021/acscentsci.0c00026] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Indexed: 05/20/2023]
Abstract
The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.g., artificial neural network, ANN, or Gaussian process, GP) models for this task are limited by training data availability and predictive uncertainty quantification (UQ). We overcome such limitations by using efficient global optimization (EGO) with the multidimensional expected improvement (EI) criterion. EGO balances exploitation of a trained model with acquisition of new DFT data at the Pareto front, the region of chemical space that contains the optimal trade-off between multiple design criteria. We demonstrate this approach for the simultaneous optimization of redox potential and solubility in candidate M(II)/M(III) redox couples for redox flow batteries from a space of 2.8 M transition metal complexes designed for stability in practical redox flow battery (RFB) applications. We show that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space. With this approach, ANN prediction and EI scoring of the full space are achieved in minutes. Starting from ca. 100 representative points, EGO improves both properties by over 3 standard deviations in only five generations. Analysis of lookahead errors confirms rapid ANN model improvement during the EGO process, achieving suitable accuracy for predictive design in the space of transition metal complexes. The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Sahasrajit Ramesh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry, Massachusetts Institute of
Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- . Phone: 617-253-4584
| |
Collapse
|
22
|
Taylor MG, Yang T, Lin S, Nandy A, Janet JP, Duan C, Kulik HJ. Seeing Is Believing: Experimental Spin States from Machine Learning Model Structure Predictions. J Phys Chem A 2020; 124:3286-3299. [PMID: 32223165 PMCID: PMC7311053 DOI: 10.1021/acs.jpca.0c01458] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
![]()
Determination of ground-state spins
of open-shell transition-metal
complexes is critical to understanding catalytic and materials properties
but also challenging with approximate electronic structure methods.
As an alternative approach, we demonstrate how structure alone can
be used to guide assignment of ground-state spin from experimentally
determined crystal structures of transition-metal complexes. We first
identify the limits of distance-based heuristics from distributions
of metal–ligand bond lengths of over 2000 unique mononuclear
Fe(II)/Fe(III) transition-metal complexes. To overcome these limits,
we employ artificial neural networks (ANNs) to predict spin-state-dependent
metal–ligand bond lengths and classify experimental ground-state
spins based on agreement of experimental structures with the ANN predictions.
Although the ANN is trained on hybrid density functional theory data,
we exploit the method-insensitivity of geometric properties to enable
assignment of ground states for the majority (ca. 80–90%) of
structures. We demonstrate the utility of the ANN by data-mining the
literature for spin-crossover (SCO) complexes, which have experimentally
observed temperature-dependent geometric structure changes, by correctly
assigning almost all (>95%) spin states in the 46 Fe(II) SCO complex
set. This approach represents a promising complement to more conventional
energy-based spin-state assignment from electronic structure theory
at the low cost of a machine learning model.
Collapse
Affiliation(s)
- Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Tzuhsiung Yang
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Sean Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
23
|
Affiliation(s)
- Marco Foscato
- Department of Chemistry, University of Bergen, Allégaten 41, N-5007 Bergen, Norway
| | - Vidar R. Jensen
- Department of Chemistry, University of Bergen, Allégaten 41, N-5007 Bergen, Norway
| |
Collapse
|
24
|
Nandy A, Zhu J, Janet JP, Duan C, Getman RB, Kulik HJ. Machine Learning Accelerates the Discovery of Design Rules and Exceptions in Stable Metal–Oxo Intermediate Formation. ACS Catal 2019. [DOI: 10.1021/acscatal.9b02165] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | - Jiazhou Zhu
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634, United States
| | | | | | - Rachel B. Getman
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634, United States
| | | |
Collapse
|
25
|
Townsend J, Vogiatzis KD. Data-Driven Acceleration of the Coupled-Cluster Singles and Doubles Iterative Solver. J Phys Chem Lett 2019; 10:4129-4135. [PMID: 31290671 DOI: 10.1021/acs.jpclett.9b01442] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Solving the coupled-cluster (CC) equations is a cost-prohibitive process that exhibits poor scaling with system size. These equations are solved by determining the set of amplitudes (t) that minimize the system energy with respect to the coupled-cluster equations at the selected level of truncation. Here, a novel approach to predict the converged coupled-cluster singles and doubles (CCSD) amplitudes, thus the coupled-cluster wave function, is explored by using machine learning and electronic structure properties inherent to the MP2 level. Features are collected from quantum chemical data, such as orbital energies, one-electron Hamiltonian, Coulomb, and exchange terms. The data-driven CCSD (DDCCSD) is not an alchemical method because the actual iterative coupled-cluster equations are solved. However, accurate energetics can also be obtained by bypassing solving the CC equations entirely. Our preliminary data show that it is possible to achieve remarkable speedups in solving the CCSD equations, especially when the correct physics are encoded and used for training of machine learning models.
Collapse
Affiliation(s)
- Jacob Townsend
- Department of Chemistry , University of Tennessee , Knoxville , Tennessee 37996 , United States
| | | |
Collapse
|
26
|
Duan C, Janet JP, Liu F, Nandy A, Kulik HJ. Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models. J Chem Theory Comput 2019; 15:2331-2345. [DOI: 10.1021/acs.jctc.9b00057] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
27
|
Janet JP, Liu F, Nandy A, Duan C, Yang T, Lin S, Kulik HJ. Designing in the Face of Uncertainty: Exploiting Electronic Structure and Machine Learning Models for Discovery in Inorganic Chemistry. Inorg Chem 2019; 58:10592-10606. [PMID: 30834738 DOI: 10.1021/acs.inorgchem.9b00109] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent transformative advances in computing power and algorithms have made computational chemistry central to the discovery and design of new molecules and materials. First-principles simulations are increasingly accurate and applicable to large systems with the speed needed for high-throughput computational screening. Despite these strides, the combinatorial challenges associated with the vastness of chemical space mean that more than just fast and accurate computational tools are needed for accelerated chemical discovery. In transition-metal chemistry and catalysis, unique challenges arise. The variable spin, oxidation state, and coordination environments favored by elements with well-localized d or f electrons provide great opportunity for tailoring properties in catalytic or functional (e.g., magnetic) materials but also add layers of uncertainty to any design strategy. We outline five key mandates for realizing computationally driven accelerated discovery in inorganic chemistry: (i) fully automated simulation of new compounds, (ii) knowledge of prediction sensitivity or accuracy, (iii) faster-than-fast property prediction methods, (iv) maps for rapid chemical space traversal, and (v) a means to reveal design rules on the kilocompound scale. Through case studies in open-shell transition-metal chemistry, we describe how advances in methodology and software in each of these areas bring about new chemical insights. We conclude with our outlook on the next steps in this process toward realizing fully autonomous discovery in inorganic chemistry using computational chemistry.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Fang Liu
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Aditya Nandy
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States.,Department of Chemistry , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Chenru Duan
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States.,Department of Chemistry , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Tzuhsiung Yang
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Sean Lin
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| | - Heather J Kulik
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , Massachusetts 02139 , United States
| |
Collapse
|
28
|
Vogiatzis KD, Polynski MV, Kirkland JK, Townsend J, Hashemi A, Liu C, Pidko EA. Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities. Chem Rev 2019; 119:2453-2523. [PMID: 30376310 PMCID: PMC6396130 DOI: 10.1021/acs.chemrev.8b00361] [Citation(s) in RCA: 237] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Indexed: 12/28/2022]
Abstract
Computational chemistry provides a versatile toolbox for studying mechanistic details of catalytic reactions and holds promise to deliver practical strategies to enable the rational in silico catalyst design. The versatile reactivity and nontrivial electronic structure effects, common for systems based on 3d transition metals, introduce additional complexity that may represent a particular challenge to the standard computational strategies. In this review, we discuss the challenges and capabilities of modern electronic structure methods for studying the reaction mechanisms promoted by 3d transition metal molecular catalysts. Particular focus will be placed on the ways of addressing the multiconfigurational problem in electronic structure calculations and the role of expert bias in the practical utilization of the available methods. The development of density functionals designed to address transition metals is also discussed. Special emphasis is placed on the methods that account for solvation effects and the multicomponent nature of practical catalytic systems. This is followed by an overview of recent computational studies addressing the mechanistic complexity of catalytic processes by molecular catalysts based on 3d metals. Cases that involve noninnocent ligands, multicomponent reaction systems, metal-ligand and metal-metal cooperativity, as well as modeling complex catalytic systems such as metal-organic frameworks are presented. Conventionally, computational studies on catalytic mechanisms are heavily dependent on the chemical intuition and expert input of the researcher. Recent developments in advanced automated methods for reaction path analysis hold promise for eliminating such human-bias from computational catalysis studies. A brief overview of these approaches is presented in the final section of the review. The paper is closed with general concluding remarks.
Collapse
Affiliation(s)
| | | | - Justin K. Kirkland
- Department
of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Jacob Townsend
- Department
of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Ali Hashemi
- Inorganic
Systems Engineering group, Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Chong Liu
- Inorganic
Systems Engineering group, Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Evgeny A. Pidko
- TheoMAT
group, ITMO University, Lomonosova 9, St. Petersburg 191002, Russia
- Inorganic
Systems Engineering group, Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands
| |
Collapse
|
29
|
Abstract
Ligands, especially phosphines and carbenes, can play a key role in modifying and controlling homogeneous organometallic catalysts, and they often provide a convenient approach to fine-tuning the performance of known catalysts. The measurable outcomes of such catalyst modifications (yields, rates, selectivity) can be set into context by establishing their relationship to steric and electronic descriptors of ligand properties, and such models can guide the discovery, optimization, and design of catalysts. In this review we present a survey of calculated ligand descriptors, with a particular focus on homogeneous organometallic catalysis. A range of different approaches to calculating steric and electronic parameters are set out and compared, and we have collected descriptors for a range of representative ligand sets, including 30 monodentate phosphorus(III) donor ligands, 23 bidentate P,P-donor ligands, and 30 carbenes, with a view to providing a useful resource for analysis to practitioners. In addition, several case studies of applications of such descriptors, covering both maps and models, have been reviewed, illustrating how descriptor-led studies of catalysis can inform experiments and highlighting good practice for model comparison and evaluation.
Collapse
Affiliation(s)
- Derek J Durand
- School of Chemistry , University of Bristol , Cantock's Close , Bristol BS8 1TS , U.K
| | - Natalie Fey
- School of Chemistry , University of Bristol , Cantock's Close , Bristol BS8 1TS , U.K
| |
Collapse
|
30
|
Spies JA, Perets EA, Fisher KJ, Rudshteyn B, Batista VS, Brudvig GW, Schmuttenmaer CA. Collaboration between experiment and theory in solar fuels research. Chem Soc Rev 2019; 48:1865-1873. [DOI: 10.1039/c8cs00819a] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
As the challenges in science increase in scope and interdisciplinarity, collaboration becomes increasingly important.
Collapse
Affiliation(s)
- Jacob A. Spies
- Department of Chemistry
- Yale University
- New Haven
- USA
- Energy Sciences Institute, Yale University
| | | | - Katherine J. Fisher
- Department of Chemistry
- Yale University
- New Haven
- USA
- Energy Sciences Institute, Yale University
| | - Benjamin Rudshteyn
- Department of Chemistry
- Yale University
- New Haven
- USA
- Energy Sciences Institute, Yale University
| | - Victor S. Batista
- Department of Chemistry
- Yale University
- New Haven
- USA
- Energy Sciences Institute, Yale University
| | - Gary W. Brudvig
- Department of Chemistry
- Yale University
- New Haven
- USA
- Energy Sciences Institute, Yale University
| | | |
Collapse
|
31
|
Grajciar L, Heard CJ, Bondarenko AA, Polynski MV, Meeprasert J, Pidko EA, Nachtigall P. Towards operando computational modeling in heterogeneous catalysis. Chem Soc Rev 2018; 47:8307-8348. [PMID: 30204184 PMCID: PMC6240816 DOI: 10.1039/c8cs00398j] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Indexed: 12/19/2022]
Abstract
An increased synergy between experimental and theoretical investigations in heterogeneous catalysis has become apparent during the last decade. Experimental work has extended from ultra-high vacuum and low temperature towards operando conditions. These developments have motivated the computational community to move from standard descriptive computational models, based on inspection of the potential energy surface at 0 K and low reactant concentrations (0 K/UHV model), to more realistic conditions. The transition from 0 K/UHV to operando models has been backed by significant developments in computer hardware and software over the past few decades. New methodological developments, designed to overcome part of the gap between 0 K/UHV and operando conditions, include (i) global optimization techniques, (ii) ab initio constrained thermodynamics, (iii) biased molecular dynamics, (iv) microkinetic models of reaction networks and (v) machine learning approaches. The importance of the transition is highlighted by discussing how the molecular level picture of catalytic sites and the associated reaction mechanisms changes when the chemical environment, pressure and temperature effects are correctly accounted for in molecular simulations. It is the purpose of this review to discuss each method on an equal footing, and to draw connections between methods, particularly where they may be applied in combination.
Collapse
Affiliation(s)
- Lukáš Grajciar
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| | - Christopher J. Heard
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| | - Anton A. Bondarenko
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
| | - Mikhail V. Polynski
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
| | - Jittima Meeprasert
- Inorganic Systems Engineering group
, Department of Chemical Engineering
, Faculty of Applied Sciences
, Delft University of Technology
,
Van der Maasweg 9
, 2629 HZ Delft
, The Netherlands
.
| | - Evgeny A. Pidko
- TheoMAT group
, ITMO University
,
Lomonosova 9
, St. Petersburg
, 191002
, Russia
- Inorganic Systems Engineering group
, Department of Chemical Engineering
, Faculty of Applied Sciences
, Delft University of Technology
,
Van der Maasweg 9
, 2629 HZ Delft
, The Netherlands
.
| | - Petr Nachtigall
- Department of Physical and Macromolecular Chemistry
, Faculty of Science
, Charles University in Prague
,
128 43 Prague 2
, Czech Republic
.
;
;
| |
Collapse
|
32
|
Nandy A, Duan C, Janet JP, Gugler S, Kulik HJ. Strategies and Software for Machine Learning Accelerated Discovery in Transition Metal Chemistry. Ind Eng Chem Res 2018. [DOI: 10.1021/acs.iecr.8b04015] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Stefan Gugler
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
33
|
Sachse T, Martínez TJ, Dietzek B, Presselt M. A program for automatically predicting supramolecular aggregates and its application to urea and porphin. J Comput Chem 2018; 39:763-772. [PMID: 29297589 DOI: 10.1002/jcc.25151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 12/04/2017] [Accepted: 12/07/2017] [Indexed: 11/08/2022]
Abstract
Not only the molecular structure but also the presence or absence of aggregates determines many properties of organic materials. Theoretical investigation of such aggregates requires the prediction of a suitable set of diverse structures. Here, we present the open-source program EnergyScan for the unbiased prediction of geometrically diverse sets of small aggregates. Its bottom-up approach is complementary to existing ones by performing a detailed scan of an aggregate's potential energy surface, from which diverse local energy minima are selected. We crossvalidate this approach by predicting both literature-known and heretofore unreported geometries of the urea dimer. We also predict a diverse set of dimers of the less intensely studied case of porphin, which we investigate further using quantum chemistry. For several dimers, we find strong deviations from a reference absorption spectrum, which we explain using computed transition densities. This proof of principle clearly shows that EnergyScan successfully predicts aggregates exhibiting large structural and spectral diversity. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Torsten Sachse
- Friedrich Schiller University, Institute of Physical Chemistry, Helmholtzweg 4, 07743, Jena, Germany.,Leibniz Institute of Photonic Technology Jena (IPHT), Research Department Functional Interfaces, Albert-Einstein-Straße 9, Jena, 07745, Germany
| | - Todd J Martínez
- Stanford University, Department of Chemistry and the PULSE Institute, 333 Campus Drive, Stanford, California 94305.,SLAC National Accelerator Laboratory, 2575 Sand Hill Rd, Menlo Park, California, 94025
| | - Benjamin Dietzek
- Friedrich Schiller University, Institute of Physical Chemistry, Helmholtzweg 4, 07743, Jena, Germany.,Center for Energy and Environmental Chemistry Jena, Humboldtstraße 10, Jena, 07743, Germany
| | - Martin Presselt
- Leibniz Institute of Photonic Technology Jena (IPHT), Research Department Functional Interfaces, Albert-Einstein-Straße 9, Jena, 07745, Germany.,SciClus GmbH & Co. KG, Moritz-von-Rohr-Straße 1a, Jena, 07745, Germany
| |
Collapse
|
34
|
Kim JY, Kulik HJ. When Is Ligand pKa a Good Descriptor for Catalyst Energetics? In Search of Optimal CO2 Hydration Catalysts. J Phys Chem A 2018; 122:4579-4590. [DOI: 10.1021/acs.jpca.8b03301] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Jeong Yun Kim
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
35
|
Janet JP, Chan L, Kulik HJ. Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. J Phys Chem Lett 2018; 9:1064-1071. [PMID: 29425453 DOI: 10.1021/acs.jpclett.8b00170] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Machine learning (ML) has emerged as a powerful complement to simulation for materials discovery by reducing time for evaluation of energies and properties at accuracy competitive with first-principles methods. We use genetic algorithm (GA) optimization to discover unconventional spin-crossover complexes in combination with efficient scoring from an artificial neural network (ANN) that predicts spin-state splitting of inorganic complexes. We explore a compound space of over 5600 candidate materials derived from eight metal/oxidation state combinations and a 32-ligand pool. We introduce a strategy for error-aware ML-driven discovery by limiting how far the GA travels away from the nearest ANN training points while maximizing property (i.e., spin-splitting) fitness, leading to discovery of 80% of the leads from full chemical space enumeration. Over a 51-complex subset, average unsigned errors (4.5 kcal/mol) are close to the ANN's baseline 3 kcal/mol error. By obtaining leads from the trained ANN within seconds rather than days from a DFT-driven GA, this strategy demonstrates the power of ML for accelerating inorganic material discovery.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139, United States
| | - Lydia Chan
- Department of Chemical Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139, United States
| |
Collapse
|
36
|
Gani TZH, Kulik HJ. Understanding and Breaking Scaling Relations in Single-Site Catalysis: Methane to Methanol Conversion by FeIV═O. ACS Catal 2018. [DOI: 10.1021/acscatal.7b03597] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Terry Z. H. Gani
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
37
|
Janet JP, Kulik HJ. Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships. J Phys Chem A 2017; 121:8939-8954. [PMID: 29095620 DOI: 10.1021/acs.jpca.7b08750] [Citation(s) in RCA: 145] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology , Cambridge, Massachusetts 02139, United States
| |
Collapse
|
38
|
Gani TZH, Kulik HJ. Unifying Exchange Sensitivity in Transition-Metal Spin-State Ordering and Catalysis through Bond Valence Metrics. J Chem Theory Comput 2017; 13:5443-5457. [DOI: 10.1021/acs.jctc.7b00848] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Terry Z. H. Gani
- Department
of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department
of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|