1
|
Jose A, Devijver E, Jakse N, Poloni R. Informative Training Data for Efficient Property Prediction in Metal-Organic Frameworks by Active Learning. J Am Chem Soc 2024; 146:6134-6144. [PMID: 38404041 DOI: 10.1021/jacs.3c13687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
In recent data-driven approaches to material discovery, scenarios where target quantities are expensive to compute and measure are often overlooked. In such cases, it becomes imperative to construct a training set that includes the most diverse, representative, and informative samples. Here, a novel regression tree-based active learning algorithm is employed for such a purpose. It is applied to predict the band gap and adsorption properties of metal-organic frameworks (MOFs), a novel class of materials that results from the virtually infinite combinations of their building units. Simpler and low dimensional descriptors, such as those based on stoichiometric and geometric properties, are used to compute the feature space for this model owing to their ability to better represent MOFs in the low data regime. The partitions given by a regression tree constructed on the labeled part of the data set are used to select new samples to be added to the training set, thereby limiting its size while maximizing the prediction quality. Tests on the QMOF, hMOF, and dMOF data sets reveal that our method constructs small training data sets to learn regression models that predict the target properties more efficiently than existing active learning approaches, and with lower variance. Specifically, our active learning approach is highly beneficial when labels are unevenly distributed in the descriptor space and when the label distribution is imbalanced, which is often the case for real world data. The regions defined by the tree help in revealing patterns in the data, thereby offering a unique tool to efficiently analyze complex structure-property relationships in materials and accelerate materials discovery.
Collapse
Affiliation(s)
- Ashna Jose
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Emilie Devijver
- LiG, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Noel Jakse
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Roberta Poloni
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| |
Collapse
|
2
|
Li CH, Tabor DP. Generative organic electronic molecular design informed by quantum chemistry. Chem Sci 2023; 14:11045-11055. [PMID: 37860647 PMCID: PMC10583709 DOI: 10.1039/d3sc03781a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 09/11/2023] [Indexed: 10/21/2023] Open
Abstract
Generative molecular design strategies have emerged as promising alternatives to trial-and-error approaches for exploring and optimizing within large chemical spaces. To date, generative models with reinforcement learning approaches have frequently used low-cost methods to evaluate the quality of the generated molecules, enabling many loops through the generative model. However, for functional molecular materials tasks, such low-cost methods are either not available or would require the generation of large amounts of training data to train surrogate machine learning models. In this work, we develop a framework that connects the REINVENT reinforcement learning framework with excited state quantum chemistry calculations to discover molecules with specified molecular excited state energy levels, specifically molecules with excited state landscapes that would serve as promising singlet fission or triplet-triplet annihilation materials. We employ a two-step curriculum strategy to first find a set of diverse promising molecules, then demonstrate the framework's ability to exploit a more focused chemical space with anthracene derivatives. Under this protocol, we show that the framework can find desired molecules and improve Pareto fronts for targeted properties versus synthesizability. Moreover, we are able to find several different design principles used by chemists for the design of singlet fission and triplet-triplet annihilation molecules.
Collapse
Affiliation(s)
- Cheng-Han Li
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| | - Daniel P Tabor
- Department of Chemistry, Texas A&M University College Station TX 77842 USA
| |
Collapse
|
3
|
Leonel G, Lennox CB, Scharrer M, Jayanthi K, Friščic T, Navrotsky A. Experimental Investigation of Thermodynamic Stabilization in Boron Imidazolate Frameworks (BIFs) Synthesized by Mechanochemistry. J Phys Chem C Nanomater Interfaces 2023; 127:17754-17760. [PMID: 37736295 PMCID: PMC10510708 DOI: 10.1021/acs.jpcc.3c04164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 08/14/2023] [Indexed: 09/23/2023]
Abstract
This study experimentally explores the energetics for the formation of boron-imidazolate frameworks (BIFs), which are synthesized by mechanochemistry. The topologically similar frameworks employ the same tetratopic linker based on tetrakis(imidazolyl)boric acid but differ in the monovalent cation metal nodes. This permits assessment of the stabilizing effect of metal nodes in frameworks with sodalite (SOD) and diamondoid (dia) topologies. The enthalpy of formation from endmembers (metal oxide and linker), which define thermodynamic stability of the structures, has been determined by use of acid solution calorimetry. The results show that heavier metal atoms in the node promote greater energetic stabilization of denser structures. Overall, in BIFs the relation between cation descriptors (ionic radius and electronegativity) and thermodynamic stability depends on framework topology. Thermodynamic stability increases with the metallic character of the cation employed as the metal node, independent of the framework topology. The results suggest unifying aspects for thermodynamic stabilization across MOF systems.
Collapse
Affiliation(s)
- Gerson
J. Leonel
- Navrotsky
Eyring Center for Materials of the Universe, School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
- School
of Engineering of Matter, Transport, and Energy, Arizona State University, Tempe, Arizona 85287, United States
| | - Cameron B. Lennox
- School
of Chemistry, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.
- Department
of Chemistry, McGill University, 801 Sherbrooke St. W., Montreal, QC H2L
0B7, Canada
| | - Manuel Scharrer
- School
of Molecular Sciences and Center for Materials of the Universe, Arizona State University, Tempe, Arizona 85287, United States
- Navrotsky
Eyring Center for Materials of the Universe, School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - K Jayanthi
- School
of Molecular Sciences and Center for Materials of the Universe, Arizona State University, Tempe, Arizona 85287, United States
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Tomislav Friščic
- School
of Chemistry, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.
- Department
of Chemistry, McGill University, 801 Sherbrooke St. W., Montreal, QC H2L
0B7, Canada
| | - Alexandra Navrotsky
- School
of Molecular Sciences and Center for Materials of the Universe, Arizona State University, Tempe, Arizona 85287, United States
- Navrotsky
Eyring Center for Materials of the Universe, School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
- School
of Engineering of Matter, Transport, and Energy, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
4
|
Tseng YJ, Chuang PJ, Appell M. When Machine Learning and Deep Learning Come to the Big Data in Food Chemistry. ACS Omega 2023; 8:15854-15864. [PMID: 37179635 PMCID: PMC10173424 DOI: 10.1021/acsomega.2c07722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/07/2023] [Indexed: 05/15/2023]
Abstract
Since the first food database was released over one hundred years ago, food databases have become more diversified, including food composition databases, food flavor databases, and food chemical compound databases. These databases provide detailed information about the nutritional compositions, flavor molecules, and chemical properties of various food compounds. As artificial intelligence (AI) is becoming popular in every field, AI methods can also be applied to food industry research and molecular chemistry. Machine learning and deep learning are valuable tools for analyzing big data sources such as food databases. Studies investigating food compositions, flavors, and chemical compounds with AI concepts and learning methods have emerged in the past few years. This review illustrates several well-known food databases, focusing on their primary contents, interfaces, and other essential features. We also introduce some of the most common machine learning and deep learning methods. Furthermore, a few studies related to food databases are given as examples, demonstrating their applications in food pairing, food-drug interactions, and molecular modeling. Based on the results of these applications, it is expected that the combination of food databases and AI will play an essential role in food science and food chemistry.
Collapse
Affiliation(s)
- Yufeng Jane Tseng
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
- Y.J.T.:
tel, +886.2.3366.4888#529; fax, +886.2.23628167; email,
| | - Pei-Jiun Chuang
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
| | - Michael Appell
- USDA,
Agricultural Research Service, National Center for Agricultural Utilization
Research, Mycotoxin Prevention
and Applied Microbiology Research Unit, 1815 N. University, Peoria, Illinois. 61604, United States
| |
Collapse
|
5
|
Hartman RL, Grabow LC. Editorial overview: Data-centric catalysis and reaction engineering. Curr Opin Chem Eng 2022; 38:100875. [DOI: 10.1016/j.coche.2022.100875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
6
|
Gallarati S, van Gerwen P, Laplaza R, Vela S, Fabrizio A, Corminboeuf C. OSCAR: an extensive repository of chemically and functionally diverse organocatalysts. Chem Sci 2022; 13:13782-13794. [PMID: 36544722 PMCID: PMC9710326 DOI: 10.1039/d2sc04251g] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022] Open
Abstract
The automated construction of datasets has become increasingly relevant in computational chemistry. While transition-metal catalysis has greatly benefitted from bottom-up or top-down strategies for the curation of organometallic complexes libraries, the field of organocatalysis is mostly dominated by case-by-case studies, with a lack of transferable data-driven tools that facilitate both the exploration of a wider range of catalyst space and the optimization of reaction properties. For these reasons, we introduce OSCAR, a repository of 4000 experimentally derived organocatalysts along with their corresponding building blocks and combinatorially enriched structures. We outline the fragment-based approach used for database generation and showcase the chemical diversity, in terms of functions and molecular properties, covered in OSCAR. The structures and corresponding stereoelectronic properties are publicly available (https://archive.materialscloud.org/record/2022.106) and constitute the starting point to build generative and predictive models for organocatalyst performance.
Collapse
Affiliation(s)
- Simone Gallarati
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| | - Puck van Gerwen
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland,National Center for Competence in Research – Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| | - Ruben Laplaza
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland,National Center for Competence in Research – Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| | - Sergi Vela
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| | - Alberto Fabrizio
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland,National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| | - Clemence Corminboeuf
- Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland,National Center for Competence in Research – Catalysis (NCCR-Catalysis), Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland,National Center for Computational Design and Discovery of Novel Materials (MARVEL), Ecole Polytechnique Fédérale de Lausanne (EPFL)1015 LausanneSwitzerland
| |
Collapse
|
7
|
Deng J, Jia G. Effect of hydrated shell layers on surface tension of electrolyte solutions: Insights from interpretable machine learning. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Nandy A, Adamji H, Kastner DW, Vennelakanti V, Nazemi A, Liu M, Kulik HJ. Using Computational Chemistry To Reveal Nature’s Blueprints for Single-Site Catalysis of C–H Activation. ACS Catal 2022. [DOI: 10.1021/acscatal.2c02096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Husain Adamji
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - David W. Kastner
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Vyshnavi Vennelakanti
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Azadeh Nazemi
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mingjie Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
9
|
Steimers A, Schneider M. Sources of Risk of AI Systems. Int J Environ Res Public Health 2022; 19:3641. [PMID: 35329328 DOI: 10.3390/ijerph19063641] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/15/2022] [Accepted: 03/16/2022] [Indexed: 12/04/2022]
Abstract
Artificial intelligence can be used to realise new types of protective devices and assistance systems, so their importance for occupational safety and health is continuously increasing. However, established risk mitigation measures in software development are only partially suitable for applications in AI systems, which only create new sources of risk. Risk management for systems that for systems using AI must therefore be adapted to the new problems. This work objects to contribute hereto by identifying relevant sources of risk for AI systems. For this purpose, the differences between AI systems, especially those based on modern machine learning methods, and classical software were analysed, and the current research fields of trustworthy AI were evaluated. On this basis, a taxonomy could be created that provides an overview of various AI-specific sources of risk. These new sources of risk should be taken into account in the overall risk assessment of a system based on AI technologies, examined for their criticality and managed accordingly at an early stage to prevent a later system failure.
Collapse
|
10
|
Duan C, Chu DBK, Nandy A, Kulik HJ. Detection of multi-reference character imbalances enables a transfer learning approach for virtual high throughput screening with coupled cluster accuracy at DFT cost. Chem Sci 2022; 13:4962-4971. [PMID: 35655882 PMCID: PMC9067623 DOI: 10.1039/d2sc00393g] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/04/2022] [Indexed: 01/08/2023] Open
Abstract
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (i.e., MR effect) that involve multiple potential energy surfaces (i.e., adiabatic spin splitting, ΔEH–L, and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic ΔEH–L and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (i.e., to within 1 kcal mol−1 MAE) for robust VHTS. We demonstrate that cancellation in multi-reference effect outweighs accumulation in evaluating chemical properties. We combine transfer learning and uncertainty quantification for accelerated data acquisition with chemical accuracy.![]()
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Daniel B. K. Chu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J. Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|