1
|
Chua A, Hirn M, Little A. On Generalizations of the Nonwindowed Scattering Transform. APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS 2024; 68:101597. [PMID: 37810532 PMCID: PMC10552568 DOI: 10.1016/j.acha.2023.101597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
In this paper, we generalize finite depth wavelet scattering transforms, which we formulate as L q ( ℝ n ) norms of a cascade of continuous wavelet transforms (or dyadic wavelet transforms) and contractive nonlinearities. We then provide norms for these operators, prove that these operators are well-defined, and are Lipschitz continuous to the action of C 2 diffeomorphisms in specific cases. Lastly, we extend our results to formulate an operator invariant to the action of rotations R ∈ SO ( n ) and an operator that is equivariant to the action of rotations of R ∈ SO ( n ) .
Collapse
Affiliation(s)
- Albert Chua
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824 USA
| | - Matthew Hirn
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824 USA
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI, 48824 USA
- Center for Quantum Computing, Science & Engineering Michigan State University, East Lansing, MI, 48824 USA
| | - Anna Little
- Department of Mathematics and the Utah Center For Data Science, University of Utah, Salt Lake City, UT, 84112 USA
| |
Collapse
|
2
|
Saydjari AK, Finkbeiner DP. Equivariant Wavelets: Fast Rotation and Translation Invariant Wavelet Scattering Transforms. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:1716-1731. [PMID: 35389861 DOI: 10.1109/tpami.2022.3165730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Wavelet scattering networks, which are convolutional neural networks (CNNs) with fixed filters and weights, are promising tools for image analysis. Imposing symmetry on image statistics can improve human interpretability, aid in generalization, and provide dimension reduction. In this work, we introduce a fast-to-compute, translationally invariant and rotationally equivariant wavelet scattering network (EqWS) and filter bank of wavelets (triglets). We demonstrate the interpretability and quantify the invariance/equivariance of the coefficients, briefly commenting on difficulties with implementing scale equivariance. On MNIST, we show that training on a rotationally invariant reduction of the coefficients maintains rotational invariance when generalized to test data and visualize residual symmetry breaking terms. Rotation equivariance is leveraged to estimate the rotation angle of digits and reconstruct the full rotation dependence of each coefficient from a single angle. We benchmark EqWS with linear classifiers on EMNIST and CIFAR-10/100, introducing a new second-order, cross-color channel coupling for the color images. We conclude by comparing the performance of an isotropic reduction of the scattering coefficients and RWST, a previous coefficient reduction, on an isotropic classification of magnetohydrodynamic simulations with astrophysical relevance.
Collapse
|
3
|
He Y, Cheng P, Yang S, Zhang J. Three-Dimensional Face Recognition Using Solid Harmonic Wavelet Scattering and Homotopy Dictionary Learning. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1646. [PMID: 36421501 PMCID: PMC9689438 DOI: 10.3390/e24111646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/02/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Data representation has been one of the core topics in 3D graphics and pattern recognition in high-dimensional data. Although the high-resolution geometrical information of a physical object can be well preserved in the form of metrical data, e.g., point clouds/triangular meshes, from a regular data (e.g., image/audio) processing perspective, they also bring excessive noise in the course of feature abstraction and regression. For 3D face recognition, preceding attempts focus on treating the scan samples as signals laying on an underlying discrete surface (mesh) or morphable (statistic) models and by embedding auxiliary information, e.g., texture onto the regularized local planar structure to obtain a superior expressive performance to registration-based methods, but environmental variations such as posture/illumination will dissatisfy the integrity or uniform sampling condition, which holistic models generally rely on. In this paper, a geometric deep learning framework for face recognition is proposed, which merely requires the consumption of raw spatial coordinates. The non-uniformity and non-grid geometric transformations in the course of point cloud face scanning are mitigated by modeling each identity as a stochastic process. Individual face scans are considered realizations, yielding underlying inherent distributions under the appropriate assumption of ergodicity. To accomplish 3D facial recognition, we propose a windowed solid harmonic scattering transform on point cloud face scans to extract the invariant coefficients so that unrelated variations can be encoded into certain components of the scattering domain. With these constructions, a sparse learning network as the semi-supervised classification backbone network can work on reducing intraclass variability. Our framework obtained superior performance to current competing methods; without excluding any fragmentary or severely deformed samples, the rank-1 recognition rate (RR1) achieved was 99.84% on the Face Recognition Grand Challenge (FRGC) v2.0 dataset and 99.90% on the Bosphorus dataset.
Collapse
Affiliation(s)
- Yi He
- National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China
| | - Peng Cheng
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Shanmin Yang
- School of Computer Science, Chengdu University of Information Technology, Chengdu 610065, China
| | - Jianwei Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| |
Collapse
|
4
|
Parsaeifard B, Goedecker S. Manifolds of quasi-constant SOAP and ACSF fingerprints and the resulting failure to machine learn four-body interactions. J Chem Phys 2022; 156:034302. [DOI: 10.1063/5.0070488] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Affiliation(s)
- Behnam Parsaeifard
- Department of Physics, University of Basel, Klingelbergstrasse 82, CH-4056 Basel, Switzerland
| | - Stefan Goedecker
- Department of Physics, University of Basel, Klingelbergstrasse 82, CH-4056 Basel, Switzerland
| |
Collapse
|
5
|
Hirn M, Little A. Wavelet invariants for statistically robust multi-reference alignment. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2021; 10:1287-1351. [PMID: 35070296 PMCID: PMC8782248 DOI: 10.1093/imaiai/iaaa016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.
Collapse
Affiliation(s)
- Matthew Hirn
- Department of Computational Mathematics, Science and Engineering, Department of Mathematics and Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, MI 48824
| | - Anna Little
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824
| |
Collapse
|
6
|
Abstract
Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first-principles based virtual sampling of this space, for example, in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest subsets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an ab initio view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics.
Collapse
Affiliation(s)
- Bing Huang
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
| | - O. Anatole von Lilienfeld
- Faculty
of Physics, University of Vienna, 1090 Vienna, Austria
- Institute
of Physical Chemistry and National Center for Computational Design
and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, 4056 Basel, Switzerland
| |
Collapse
|
7
|
Dobbelaere MR, Plehiers PP, Van de Vijver R, Stevens CV, Van Geem KM. Learning Molecular Representations for Thermochemistry Prediction of Cyclic Hydrocarbons and Oxygenates. J Phys Chem A 2021; 125:5166-5179. [PMID: 34081474 DOI: 10.1021/acs.jpca.1c01956] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurate thermochemistry estimation of polycyclic molecules is crucial for kinetic modeling of chemical processes that use renewable and alternative feedstocks. In kinetic model generators, molecular properties are estimated rapidly with group additivity, but this method is known to have limitations for polycyclic structures. This issue has been resolved in our work by combining a geometry-based molecular representation with a deep neural network trained on ab initio data. Each molecule is transformed into a probabilistic vector from its interatomic distances, bond angles, and dihedral angles. The model is tested on a small experimental dataset (200 molecules) from the literature, a new medium-sized set (4000 molecules) with both open-shell and closed-shell species, calculated at the CBS-QB3 level with empirical corrections, and a large G4MP2-level QM9-based dataset (40 000 molecules). Heat capacities between 298.15 and 2500 K are calculated in the medium set with an average deviation of about 1.5 J mol-1 K-1 and the standard entropy at 298.15 K is predicted with an average error below 4 J mol-1 K-1. The standard enthalpy of formation at 298.15 K has an average out-of-sample error below 4 kJ mol-1 on a QM9 training set size of around 15 000 molecules. By fitting NASA polynomials, the enthalpy of formation at higher temperatures can be calculated with the same accuracy as the standard enthalpy of formation. Uncertainty quantification by means of the ensemble standard deviation is included to indicate when molecules that are on the edge or outside of the application range of the model are evaluated.
Collapse
Affiliation(s)
- Maarten R Dobbelaere
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Pieter P Plehiers
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Ruben Van de Vijver
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| | - Christian V Stevens
- SynBioC Research Group, Department of Green Chemistry and Technology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, 9000 Gent, Belgium
| | - Kevin M Van Geem
- Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Ghent University, Technologiepark 125, 9052 Gent, Belgium
| |
Collapse
|
8
|
Parsaeifard B, Sankar De D, Christensen AS, Faber FA, Kocer E, De S, Behler J, Anatole von Lilienfeld O, Goedecker S. An assessment of the structural resolution of various fingerprints commonly used in machine learning. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abb212] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
9
|
Lu J, Xia S, Lu J, Zhang Y. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. J Chem Inf Model 2021; 61:1095-1104. [PMID: 33683885 DOI: 10.1021/acs.jcim.1c00007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A dataset is the basis of deep learning model development, and the success of deep learning models heavily relies on the quality and size of the dataset. In this work, we present a new data preparation protocol and build a large fragment-based dataset Frag20, which consists of optimized 3D geometries and calculated molecular properties from Merck molecular force field (MMFF) and DFT at the B3LYP/6-31G* level of theory for more than half a million molecules composed of H, B, C, O, N, F, P, S, Cl, and Br with no larger than 20 heavy atoms. Based on the new dataset, we develop robust molecular energy prediction models using a simplified PhysNet architecture for both DFT-optimized and MMFF-optimized geometries, which achieve better than or close to chemical accuracy (1 kcal/mol) on multiple test sets, including CSD20 and Plati20 based on experimental crystal structures.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Jieyu Lu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
10
|
Vassilev-Galindo V, Fonseca G, Poltavsky I, Tkatchenko A. Challenges for machine learning force fields in reproducing potential energy surfaces of flexible molecules. J Chem Phys 2021; 154:094119. [DOI: 10.1063/5.0038516] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Valentin Vassilev-Galindo
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Gregory Fonseca
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Igor Poltavsky
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
11
|
Allen AEA, Dusson G, Ortner C, Csányi G. Atomic permutationally invariant polynomials for fitting molecular force fields. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abd51e] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
12
|
Grisafi A, Nigam J, Ceriotti M. Multi-scale approach for the prediction of atomic scale properties. Chem Sci 2020; 12:2078-2090. [PMID: 34163971 PMCID: PMC8179303 DOI: 10.1039/d0sc04934d] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Electronic nearsightedness is one of the fundamental principles that governs the behavior of condensed matter and supports its description in terms of local entities such as chemical bonds. Locality also underlies the tremendous success of machine-learning schemes that predict quantum mechanical observables - such as the cohesive energy, the electron density, or a variety of response properties - as a sum of atom-centred contributions, based on a short-range representation of atomic environments. One of the main shortcomings of these approaches is their inability to capture physical effects ranging from electrostatic interactions to quantum delocalization, which have a long-range nature. Here we show how to build a multi-scale scheme that combines in the same framework local and non-local information, overcoming such limitations. We show that the simplest version of such features can be put in formal correspondence with a multipole expansion of permanent electrostatics. The data-driven nature of the model construction, however, makes this simple form suitable to tackle also different types of delocalized and collective effects. We present several examples that range from molecular physics to surface science and biophysics, demonstrating the ability of this multi-scale approach to model interactions driven by electrostatics, polarization and dispersion, as well as the cooperative behavior of dielectric response functions.
Collapse
Affiliation(s)
- Andrea Grisafi
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| | - Jigyasa Nigam
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland .,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland.,Indian Institute of Space Science and Technology Thiruvananthapuram 695547 India
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland .,National Centre for Computational Design and Discovery of Novel Materials (MARVEL), École Polytechnique Fédérale de Lausanne 1015 Lausanne Switzerland
| |
Collapse
|
13
|
Bogojeski M, Vogt-Maranto L, Tuckerman ME, Müller KR, Burke K. Quantum chemical accuracy from density functional approximations via machine learning. Nat Commun 2020; 11:5223. [PMID: 33067479 PMCID: PMC7567867 DOI: 10.1038/s41467-020-19093-1] [Citation(s) in RCA: 129] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/24/2020] [Indexed: 12/21/2022] Open
Abstract
Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol-1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol-1) on test data. Moreover, density-based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT is highlighted by correcting "on the fly" DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.
Collapse
Affiliation(s)
- Mihail Bogojeski
- Machine Learning Group, Technische Universität Berlin, Marchstr. 23, 10587, Berlin, Germany
| | | | - Mark E Tuckerman
- Department of Chemistry, New York University, New York, NY, 10003, USA.
- Courant Institute of Mathematical Science, New York University, New York, NY, 10012, USA.
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai, 200062, China.
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, Marchstr. 23, 10587, Berlin, Germany.
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea.
- Max-Planck-Institut für Informatik, Stuhlsatzenhausweg, 66123, Saarbrücken, Germany.
| | - Kieron Burke
- Department of Physics and Astronomy, University of California, Irvine, CA, 92697, USA.
- Department of Chemistry, University of California, Irvine, CA, 92697, USA.
| |
Collapse
|
14
|
Sauceda HE, Gastegger M, Chmiela S, Müller KR, Tkatchenko A. Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields. J Chem Phys 2020; 153:124109. [DOI: 10.1063/5.0023005] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Huziel E. Sauceda
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
| | - Michael Gastegger
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- BASLEARN, BASF-TU Joint Lab, Technische Universität Berlin, 10587 Berlin, Germany
- DFG Cluster of Excellence “Unifying Systems in Catalysis” (UniSysCat), Technische Universität Berlin, 10623 Berlin, Germany
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 136-713, South Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
- Google Research, Brain Team, Berlin, Germany
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg, Luxembourg
| |
Collapse
|
15
|
Sinz P, Swift MW, Brumwell X, Liu J, Kim KJ, Qi Y, Hirn M. Wavelet scattering networks for atomistic systems with extrapolation of material properties. J Chem Phys 2020; 153:084109. [DOI: 10.1063/5.0016020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Affiliation(s)
- Paul Sinz
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Michael W. Swift
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Xavier Brumwell
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Jialin Liu
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Kwang Jin Kim
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Yue Qi
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, Michigan 48824-1226, USA
| | - Matthew Hirn
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan 48824-1226, USA
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824-1226, USA
- Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, Michigan 48824-1226, USA
| |
Collapse
|
16
|
Çaylak O, Anatole von Lilienfeld O, Baumeier B. Wasserstein metric for improved quantum machine learning with adjacency matrix representations. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/aba048] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
17
|
Gkeka P, Stoltz G, Barati Farimani A, Belkacemi Z, Ceriotti M, Chodera JD, Dinner AR, Ferguson AL, Maillet JB, Minoux H, Peter C, Pietrucci F, Silveira A, Tkatchenko A, Trstanova Z, Wiewiora R, Lelièvre T. Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems. J Chem Theory Comput 2020; 16:4757-4775. [PMID: 32559068 PMCID: PMC8312194 DOI: 10.1021/acs.jctc.0c00355] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning encompasses tools and algorithms that are now becoming popular in almost all scientific and technological fields. This is true for molecular dynamics as well, where machine learning offers promises of extracting valuable information from the enormous amounts of data generated by simulation of complex systems. We provide here a review of our current understanding of goals, benefits, and limitations of machine learning techniques for computational studies on atomistic systems, focusing on the construction of empirical force fields from ab initio databases and the determination of reaction coordinates for free energy computation and enhanced sampling.
Collapse
Affiliation(s)
- Paraskevi Gkeka
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| | | | - Zineb Belkacemi
- Integrated Drug Discovery, Sanofi R&D, 91385 Chilly-Mazarin, France
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
| | - Michele Ceriotti
- Laboratory of Computational Science and Modelling, Institute of Materials, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Aaron R Dinner
- Department of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, 5640 South Ellis Avenue, Chicago, Illinois 60637, United States
| | | | - Hervé Minoux
- Integrated Drug Discovery, Sanofi R&D, 94403 Vitry-sur-Seine, France
| | | | - Fabio Pietrucci
- UMR CNRS 7590, MNHN, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, Sorbonne Université, 75005 Paris, France
| | - Ana Silveira
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Zofia Trstanova
- School of Mathematics, The University of Edinburgh, Edinburgh EH9 3FD, U.K
| | - Rafal Wiewiora
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts, Marne-la-Vallée, France
- Matherials Project-Team, Inria Paris, 75012 Paris, France
| |
Collapse
|
18
|
Haghighatlari M, Li J, Heidar-Zadeh F, Liu Y, Guan X, Head-Gordon T. Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods. Chem 2020; 6:1527-1542. [PMID: 32695924 PMCID: PMC7373218 DOI: 10.1016/j.chempr.2020.05.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning method with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning methods drawn from deep learning, random forests, or kernel methods.
Collapse
Affiliation(s)
- Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
| | - Farnaz Heidar-Zadeh
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
- Center for Molecular Modeling (CMM), Ghent University, B-9052 Ghent, Belgium
- Department of Chemistry, Queen's University, Kingston, Ontario K7L 3N6, Canada
| | - Yuchen Liu
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
| | - Xingyi Guan
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, CA, USA
| |
Collapse
|
19
|
von Lilienfeld OA, Müller KR, Tkatchenko A. Exploring chemical compound space with quantum-based machine learning. Nat Rev Chem 2020; 4:347-358. [PMID: 37127950 DOI: 10.1038/s41570-020-0189-9] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2020] [Indexed: 12/16/2022]
Abstract
Rational design of compounds with specific properties requires understanding and fast evaluation of molecular properties throughout chemical compound space - the huge set of all potentially stable molecules. Recent advances in combining quantum-mechanical calculations with machine learning provide powerful tools for exploring wide swathes of chemical compound space. We present our perspective on this exciting and quickly developing field by discussing key advances in the development and applications of quantum-mechanics-based machine-learning methods to diverse compounds and properties, and outlining the challenges ahead. We argue that significant progress in the exploration and understanding of chemical compound space can be made through a systematic combination of rigorous physical theories, comprehensive synthetic data sets of microscopic and macroscopic properties, and modern machine-learning methods that account for physical and chemical knowledge.
Collapse
|
20
|
Gu GH, Noh J, Kim S, Back S, Ulissi Z, Jung Y. Practical Deep-Learning Representation for Fast Heterogeneous Catalyst Screening. J Phys Chem Lett 2020; 11:3185-3191. [PMID: 32191473 DOI: 10.1021/acs.jpclett.0c00634] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The binding site and energy is an invaluable descriptor in high-throughput screening of catalysts, as it is accessible and correlates with the activity and selectivity. Recently, comprehensive binding energy prediction machine-learning models have been demonstrated and promise to accelerate the catalyst screening. Here, we present a simple and versatile representation, applicable to any deep-learning models, to further accelerate such process. Our approach involves labeling the binding site atoms of the unrelaxed bare surface geometry; hence, for the model application, density functional theory calculations can be completely removed if the optimized bulk structure is available as is the case when using the Materials Project database. In addition, we present ensemble learning, where a set of predictions is used together to form a predictive distribution that reduces the model bias. We apply the labeled site approach and ensemble to crystal graph convolutional neural network and the ∼40 000 data set of alloy catalysts for CO2 reduction. The proposed model applied to the data set of unrelaxed structures shows 0.116 and 0.085 eV mean absolute error, respectively, for CO and H binding energy, better than the best method (0.13 and 0.13 eV) in the literature that requires costly geometry relaxations. The analysis of the model parameters demonstrates that the model can effectively learn the chemical information related to the binding site.
Collapse
Affiliation(s)
- Geun Ho Gu
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | - Juhwan Noh
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | - Sungwon Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | - Seoin Back
- Department of Chemical and Biomolecular Engineering, Sogang University, Seoul 04107, South Korea
| | - Zachary Ulissi
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| |
Collapse
|
21
|
Smith JS, Zubatyuk R, Nebgen B, Lubbers N, Barros K, Roitberg AE, Isayev O, Tretiak S. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci Data 2020; 7:134. [PMID: 32358545 PMCID: PMC7195467 DOI: 10.1038/s41597-020-0473-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
Collapse
Affiliation(s)
- Justin S Smith
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Roman Zubatyuk
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Kipton Barros
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Adrian E Roitberg
- University of Florida, Department of Chemistry, PO Box 117200, 32611-7200, Gainesville, USA.
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
| |
Collapse
|
22
|
Heinen S, Schwilk M, von Rudorff GF, von Lilienfeld OA. Machine learning the computational cost of quantum chemistry. MACHINE LEARNING-SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab6ac4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
23
|
Sauceda HE, Chmiela S, Poltavsky I, Müller KR, Tkatchenko A. Construction of Machine Learned Force Fields with Quantum Chemical Accuracy: Applications and Chemical Insights. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
24
|
Accurate Molecular Dynamics Enabled by Efficient Physically Constrained Machine Learning Approaches. MACHINE LEARNING MEETS QUANTUM PHYSICS 2020. [DOI: 10.1007/978-3-030-40245-7_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
25
|
Uniformly accurate machine learning-based hydrodynamic models for kinetic equations. Proc Natl Acad Sci U S A 2019; 116:21983-21991. [PMID: 31619568 DOI: 10.1073/pnas.1909854116] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A framework is introduced for constructing interpretable and truly reliable reduced models for multiscale problems in situations without scale separation. Hydrodynamic approximation to the kinetic equation is used as an example to illustrate the main steps and issues involved. To this end, a set of generalized moments are constructed first to optimally represent the underlying velocity distribution. The well-known closure problem is then solved with the aim of best capturing the associated dynamics of the kinetic equation. The issue of physical constraints such as Galilean invariance is addressed and an active-learning procedure is introduced to help ensure that the dataset used is representative enough. The reduced system takes the form of a conventional moment system and works regardless of the numerical discretization used. Numerical results are presented for the BGK (Bhatnagar-Gross-Krook) model and binary collision of Maxwell molecules. We demonstrate that the reduced model achieves a uniform accuracy in a wide range of Knudsen numbers spanning from the hydrodynamic limit to free molecular flow.
Collapse
|
26
|
Nudejima T, Ikabata Y, Seino J, Yoshikawa T, Nakai H. Machine-learned electron correlation model based on correlation energy density at complete basis set limit. J Chem Phys 2019; 151:024104. [DOI: 10.1063/1.5100165] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Takuro Nudejima
- Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Yasuhiro Ikabata
- Waseda Research Institute for Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Junji Seino
- Waseda Research Institute for Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| | - Takeshi Yoshikawa
- Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Hiromi Nakai
- Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Waseda Research Institute for Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Elements Strategy Initiative for Catalysts and Batteries (ESICB), Kyoto University, Katsura, Kyoto 615-8520, Japan
| |
Collapse
|
27
|
Sauceda HE, Chmiela S, Poltavsky I, Müller KR, Tkatchenko A. Molecular force fields with gradient-domain machine learning: Construction and application to dynamics of small molecules with coupled cluster forces. J Chem Phys 2019; 150:114102. [DOI: 10.1063/1.5078687] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Huziel E. Sauceda
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, 14195 Berlin, Germany
| | - Stefan Chmiela
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Igor Poltavsky
- Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg, Luxembourg
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, South Korea
- Max Planck Institute for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, L-1511 Luxembourg, Luxembourg
| |
Collapse
|
28
|
Towards exact molecular dynamics simulations with machine-learned force fields. Nat Commun 2018; 9:3887. [PMID: 30250077 PMCID: PMC6155327 DOI: 10.1038/s41467-018-06169-2] [Citation(s) in RCA: 335] [Impact Index Per Article: 55.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 08/22/2018] [Indexed: 12/25/2022] Open
Abstract
Molecular dynamics (MD) simulations employing classical force fields constitute the cornerstone of contemporary atomistic modeling in chemistry, biology, and materials science. However, the predictive power of these simulations is only as good as the underlying interatomic potential. Classical potentials often fail to faithfully capture key quantum effects in molecules and materials. Here we enable the direct construction of flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a gradient-domain machine learning (sGDML) model in an automatic data-driven way. The developed sGDML approach faithfully reproduces global force fields at quantum-chemical CCSD(T) level of accuracy and allows converged molecular dynamics simulations with fully quantized electrons and nuclei. We present MD simulations, for flexible molecules with up to a few dozen atoms and provide insights into the dynamical behavior of these molecules. Our approach provides the key missing ingredient for achieving spectroscopic accuracy in molecular simulations. Simultaneous accurate and efficient prediction of molecular properties relies on combined quantum mechanics and machine learning approaches. Here the authors develop a flexible machine-learning force-field with high-level accuracy for molecular dynamics simulations.
Collapse
|
29
|
Rupp M, von Lilienfeld OA, Burke K. Guest Editorial: Special Topic on Data-Enabled Theoretical Chemistry. J Chem Phys 2018; 148:241401. [DOI: 10.1063/1.5043213] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Matthias Rupp
- Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany
| | - O. Anatole von Lilienfeld
- Department of Chemistry, Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, University of Basel, 4056 Basel, Switzerland
| | - Kieron Burke
- Departments of Chemistry and Physics, University of California, Irvine, California 92697, USA
| |
Collapse
|