1
|
Akgüller Ö, Balcı MA, Cioca G. Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning. Molecules 2024; 29:3902. [PMID: 39202980 PMCID: PMC11357287 DOI: 10.3390/molecules29163902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace-Beltrami operator to derive significant geometric features. By examining the eigenvectors of these operators, we captured the intrinsic geometric properties of the molecules, aiding their classification and clustering. The research utilized four deep learning methods: Deep Belief Network, Convolutional Autoencoder, Variational Autoencoder, and Adversarial Autoencoder, each paired with k-means clustering at different cluster sizes. Clustering quality was evaluated using the Calinski-Harabasz and Davies-Bouldin indices, Silhouette Score, and standard deviation. Nonparametric tests were used to assess the impact of topological descriptors on clustering outcomes. Our results show that the DBN + k-means combination is the most effective, particularly at lower cluster counts, demonstrating significant sensitivity to structural variations. This study highlights the potential of integrating spectral geometry with deep learning for precise and efficient molecular clustering.
Collapse
Affiliation(s)
- Ömer Akgüller
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Mehmet Ali Balcı
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Gabriela Cioca
- Faculty of Medicine, Preclinical Department, Lucian Blaga University of Sibiu, 550024 Sibiu, Romania;
| |
Collapse
|
2
|
Gao P, Zhang Q, Keely D, Cleveland DW, Ye Y, Zheng W, Shen M, Yu H. Molecular Graph-Based Deep Learning Algorithm Facilitates an Imaging-Based Strategy for Rapid Discovery of Small Molecules Modulating Biomolecular Condensates. J Med Chem 2023; 66:15084-15093. [PMID: 37937963 PMCID: PMC10810226 DOI: 10.1021/acs.jmedchem.3c00490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Biomolecular condensates are proposed to cause diseases, such as cancer and neurodegeneration, by concentrating proteins at abnormal subcellular loci. Imaging-based compound screens have been used to identify small molecules that reverse or promote biomolecular condensates. However, limitations of conventional imaging-based methods restrict the screening scale. Here, we used a graph convolutional network (GCN)-based computational approach and identified small molecule candidates that reduce the nuclear liquid-liquid phase separation of TAR DNA-binding protein 43 (TDP-43), an essential protein that undergoes phase transition in neurodegenerative diseases. We demonstrated that the GCN-based deep learning algorithm is suitable for spatial information extraction from the molecular graph. Thus, this is a promising method to identify small molecule candidates with novel scaffolds. Furthermore, we validated that these candidates do not affect the normal splicing function of TDP-43. Taken together, a combination of an imaging-based screen and a GCN-based deep learning method dramatically improves the speed and accuracy of the compound screen for biomolecular condensates.
Collapse
Affiliation(s)
- Peng Gao
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Qi Zhang
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Devin Keely
- Center for Alzheimer’s and Neurodegenerative Diseases, Department of Molecular Biology, Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, TX, 75287, USA
| | - Don W. Cleveland
- Department of Cellular and Molecular Medicine, UC San Diego, CA, 92093, USA
| | - Yihong Ye
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), MD 20850, USA
| | - Wei Zheng
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Min Shen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), MD 20850, USA
| | - Haiyang Yu
- Center for Alzheimer’s and Neurodegenerative Diseases, Department of Molecular Biology, Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, TX, 75287, USA
| |
Collapse
|
3
|
Christiansen MPV, Rønne N, Hammer B. Atomistic Global Optimization X: A Python package for optimization of atomistic structures. J Chem Phys 2022; 157:054701. [DOI: 10.1063/5.0094165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Modelling and understanding properties of materials from first principles require knowledge of the underlyingatomistic structure. This entails knowing the individual chemical identity and position of all atoms involved.Obtaining such information for macro-molecules, nano-particles, clusters, and for the surface, interface, andbulk phases of amorphous and solid materials represents a difficult high-dimensional global optimizationproblem. The rise of machine learning techniques in materials science has, however, led to many compellingdevelopments that may speed up structure searches. The complexity of such new methods has prompted aneed for an efficient way of assembling them into global optimization algorithms that can be experimentedwith. In this paper, we introduce the Atomistic Global Optimization X (AGOX) framework and code, asa customizable approach that enables efficient building and testing of global optimization algorithms. Amodular way of expressing global optimization algorithms is described and modern programming practicesare used to enable that modularity in the freely available AGOX python package. A number of examplesof global optimization approaches are implemented and analyzed. This ranges from random search andbasin-hopping to machine learning aided approaches with on-the-fly learnt surrogate energy landscapes. Themethods are show-cased on problems ranging from supported clusters over surface reconstructions to largecarbon clusters and metal-nitride clusters incorporated into graphene sheets.
Collapse
Affiliation(s)
| | - Nikolaj Rønne
- Aarhus University Department of Physics and Astronomy, Denmark
| | - Bjørk Hammer
- Department of Physics and Astronomy and Interdisciplinary Nanoscience Center (iNANO) and Department of Physics and Astronomy, Aarhus University Department of Physics and Astronomy, Denmark
| |
Collapse
|
4
|
Bauer MN, Probert MIJ, Panosetti C. Systematic Comparison of Genetic Algorithm and Basin Hopping Approaches to the Global Optimization of Si(111) Surface Reconstructions. J Phys Chem A 2022; 126:3043-3056. [PMID: 35522778 PMCID: PMC9126620 DOI: 10.1021/acs.jpca.2c00647] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
![]()
We present a systematic
study of two widely used material structure
prediction methods, the Genetic Algorithm and Basin Hopping approaches
to global optimization, in a search for the 3 × 3, 5 × 5,
and 7 × 7 reconstructions of the Si(111) surface. The Si(111)
7 × 7 reconstruction is the largest and most complex surface
reconstruction known, and finding it is a very exacting test for global
optimization methods. In this paper, we introduce a modification to
previous Genetic Algorithm work on structure search for periodic systems,
to allow the efficient search for surface reconstructions, and present
a rigorous study of the effect of the different parameters of the
algorithm. We also perform a detailed comparison with the recently
improved Basin Hopping algorithm using Delocalized Internal Coordinates.
Both algorithms succeeded in either resolving the 3 × 3, 5 ×
5, and 7 × 7 DAS surface reconstructions or getting “sufficiently
close”, i.e., identifying structures that only differ for the
positions of a few atoms as well as thermally accessible structures
within kBT/unit area
of the global minimum, with T = 300 K. Overall, the
Genetic Algorithm is more robust with respect to parameter choice
and in success rate, while the Basin Hopping method occasionally exhibits
some advantages in speed of convergence. In line with previous studies,
the results confirm that robustness, success, and speed of convergence
of either approach are strongly influenced by how much the trial moves
tend to preserve favorable bonding patterns once these appear.
Collapse
Affiliation(s)
- Maximilian N Bauer
- Department of Physics, University of York, York YO10 5DD, United Kingdom.,Technical University of Munich, Lichtenbergstraße 4, 85748 Garching, Germany
| | - Matt I J Probert
- Department of Physics, University of York, York YO10 5DD, United Kingdom
| | - Chiara Panosetti
- Technical University of Munich, Lichtenbergstraße 4, 85748 Garching, Germany.,Fritz Haber Institute of the Max Planck Society, Faradayweg 4, 14195 Berlin, Germany
| |
Collapse
|
5
|
Gao P, Xu M, Zhang Q, Chen CZ, Guo H, Ye Y, Zheng W, Shen M. Graph Convolutional Network-Based Screening Strategy for Rapid Identification of SARS-CoV-2 Cell-Entry Inhibitors. J Chem Inf Model 2022; 62:1988-1997. [PMID: 35404596 PMCID: PMC9016773 DOI: 10.1021/acs.jcim.2c00222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Indexed: 11/29/2022]
Abstract
The cell entry of SARS-CoV-2 has emerged as an attractive drug development target. We previously reported that the entry of SARS-CoV-2 depends on the cell surface heparan sulfate proteoglycan (HSPG) and the cortex actin, which can be targeted by therapeutic agents identified by conventional drug repurposing screens. However, this drug identification strategy requires laborious library screening, which is time consuming, and often limited number of compounds can be screened. As an alternative approach, we developed and trained a graph convolutional network (GCN)-based classification model using information extracted from experimentally identified HSPG and actin inhibitors. This method allowed us to virtually screen 170,000 compounds, resulting in ∼2000 potential hits. A hit confirmation assay with the uptake of a fluorescently labeled HSPG cargo further shortlisted 256 active compounds. Among them, 16 compounds had modest to strong inhibitory activities against the entry of SARS-CoV-2 pseudotyped particles into Vero E6 cells. These results establish a GCN-based virtual screen workflow for rapid identification of new small molecule inhibitors against validated drug targets.
Collapse
Affiliation(s)
- Peng Gao
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Miao Xu
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Qi Zhang
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Bethesda, Maryland 20892, United States
| | - Catherine Z Chen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Hui Guo
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Yihong Ye
- National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Bethesda, Maryland 20892, United States
| | - Wei Zheng
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| | - Min Shen
- The National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland 20850, United States
| |
Collapse
|
6
|
Ager Meldgaard S, Köhler J, Lund Mortensen H, Christiansen MPV, Noé F, Hammer B. Generating stable molecules using imitation and reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3eb4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning (RL) approach for generating molecules in Cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning (IL) on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a RL setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how RL further refines the IL model in domains far from the training data.
Collapse
|
7
|
Modee R, Agarwal S, Verma A, Joshi K, Priyakumar UD. DART: deep learning enabled topological interaction model for energy prediction of metal clusters and its application in identifying unique low energy isomers. Phys Chem Chem Phys 2021; 23:21995-22003. [PMID: 34569568 DOI: 10.1039/d1cp02956h] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Recently, machine learning (ML) has proven to yield fast and accurate predictions of chemical properties to accelerate the discovery of novel molecules and materials. The majority of the work is on organic molecules, and much more work needs to be done for inorganic molecules, especially clusters. In the present work, we introduce a simple topological atomic descriptor called TAD, which encodes chemical environment information of each atom in the cluster. TAD is a simple and interpretable descriptor where each value represents the atom count in three shells. We also introduce the DART deep learning enabled topological interaction model, which uses TAD as a feature vector to predict energies of metal clusters, in our case gallium clusters with sizes ranging from 31 to 70 atoms. The DART model is designed based on the principle that the energy is a function of atomic interactions and allows us to model these complex atomic interactions to predict the energy. We further introduce a new dataset called GNC_31-70, which comprises structures and DFT optimized energies of gallium clusters with sizes ranging from 31 to 70 atoms. We show how DART can be used to accelerate the process of identification of low energy structures without geometry optimization. Albeit using a topological descriptor, DART achieves a mean absolute error (MAE) of 3.59 kcal mol-1 (0.15 eV) on the test set. We also show that our model can distinguish core and surface atoms in the Ga-70 cluster, which the model has never encountered earlier. Finally, we demonstrate the transferability of the DART model by predicting energies for about 6k unseen configurations picked up from molecular dynamics (MD) data for three cluster sizes (46, 57, and 60) within seconds. The DART model was able to reduce the load on DFT optimizations while identifying unique low energy structures from MD data.
Collapse
Affiliation(s)
- Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India.
| | - Sheena Agarwal
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh-201002, India
| | - Ashwini Verma
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India.
| | - Kavita Joshi
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh-201002, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India.
| |
Collapse
|
8
|
Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021; 121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 265] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]
Abstract
Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.
Collapse
Affiliation(s)
- John A. Keith
- Department
of Chemical and Petroleum Engineering Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Valentin Vassilev-Galindo
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Bingqing Cheng
- Accelerate
Programme for Scientific Discovery, Department
of Computer Science and Technology, 15 J. J. Thomson Avenue, Cambridge CB3 0FD, United Kingdom
| | - Stefan Chmiela
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Michael Gastegger
- Department
of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, 10587, Berlin, Germany
| | - Klaus-Robert Müller
- Machine
Learning Group, Technische Universität
Berlin, 10587, Berlin, Germany
- Department
of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul, 02841, Korea
- Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany
- Google Research, Brain Team, 10117 Berlin, Germany
| | - Alexandre Tkatchenko
- Department
of Physics and Materials Science, University
of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
9
|
Gao P, Zhang J, Qiu H, Zhao S. A general QSPR protocol for the prediction of atomic/inter-atomic properties: a fragment based graph convolutional neural network (F-GCN). Phys Chem Chem Phys 2021; 23:13242-13249. [PMID: 34086015 DOI: 10.1039/d1cp00677k] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this study, a general quantitative structure-property relationship (QSPR) protocol, fragment based graph convolutional neural network (F-GCN), was developed for the prediction of atomic/inter-atomic properties. We applied this novel artificial intelligence (AI) tool in predictions of NMR chemical shifts and bond dissociation energies (BDEs). The obtained results were comparable to experimental measurements, while the computational cost was substantially reduced, with respect to pure density functional theory (DFT) calculations. The two important features of F-GCN can be summarised as: first, it could utilise different levels of molecular fragments for atomic/inter-atomic information extraction; second, the designed architecture is also open to include additional descriptors for a more accurate solution of the local environment at atomic level, making itself more efficient for structural solutions. And during our test, the averaged prediction error of 1H NMR chemical shifts is as small as 0.32 ppm, and the error of C-H BDE estimation is 2.7 kcal mol-1. Moreover, we further demonstrated the applicability of this developed F-GCN model via several challenging structural assignments. The success of the F-GCN in atomic and inter-atomic predictions also indicates an essential improvement of computational chemistry with the assistance of AI tools.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 53000, China. and School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbo Qiu
- Department of Chemical Engineering, Monash University, Clayton, VIC 3800, Australia
| | - Shuaifei Zhao
- Institute for Frontier Materials (IFM), Deakin University, Perth, WA, Australia
| |
Collapse
|
10
|
Han R, Luber S. Fast Estimation of Møller-Plesset Correlation Energies Based on Atomic Contributions. J Phys Chem Lett 2021; 12:5324-5331. [PMID: 34061529 DOI: 10.1021/acs.jpclett.1c00900] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dynamic correlation plays an important role in the accurate calculation of chemical compounds such as the description of equilibrium structures in chemical systems. A model for the fast estimation of dynamic correlation energy is introduced in this work. This model is based on the idea of decomposition of the contribution of dynamic correlation energy calculated by nth order Møller-Plesset perturbation (MPn) theory with respect to atomic regions. Multiple levels of theory, including MP2, MP2.5, and MP4, are used as the reference, and the corresponding correlation energy densities are calculated. The proposed model is concise, fast, and promising for practical use, such as the prediction of reaction energies. It can also work as a baseline model or pretrained model for follow-up studies of machine learning.
Collapse
Affiliation(s)
- R Han
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - S Luber
- Department of Chemistry A, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| |
Collapse
|
11
|
Computational Surface Modelling of Ices and Minerals of Interstellar Interest—Insights and Perspectives. MINERALS 2020. [DOI: 10.3390/min11010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The universe is molecularly rich, comprising from the simplest molecule (H2) to complex organic molecules (e.g., CH3CHO and NH2CHO), some of which of biological relevance (e.g., amino acids). This chemical richness is intimately linked to the different physical phases forming Solar-like planetary systems, in which at each phase, molecules of increasing complexity form. Interestingly, synthesis of some of these compounds only takes place in the presence of interstellar (IS) grains, i.e., solid-state sub-micron sized particles consisting of naked dust of silicates or carbonaceous materials that can be covered by water-dominated ice mantles. Surfaces of IS grains exhibit particular characteristics that allow the occurrence of pivotal chemical reactions, such as the presence of binding/catalytic sites and the capability to dissipate energy excesses through the grain phonons. The present know-how on the physicochemical features of IS grains has been obtained by the fruitful synergy of astronomical observational with astrochemical modelling and laboratory experiments. However, current limitations of these disciplines prevent us from having a full understanding of the IS grain surface chemistry as they cannot provide fundamental atomic-scale of grain surface elementary steps (i.e., adsorption, diffusion, reaction and desorption). This essential information can be obtained by means of simulations based on computational chemistry methods. One capability of these simulations deals with the construction of atom-based structural models mimicking the surfaces of IS grains, the very first step to investigate on the grain surface chemistry. This perspective aims to present the current state-of-the-art methods, techniques and strategies available in computational chemistry to model (i.e., construct and simulate) surfaces present in IS grains. Although we focus on water ice mantles and olivinic silicates as IS test case materials to exemplify the modelling procedures, a final discussion on the applicability of these approaches to simulate surfaces of other cosmic grain materials (e.g., cometary and meteoritic) is given.
Collapse
|
12
|
Abstract
We introduce new and robust decompositions of mean-field Hartree-Fock and Kohn-Sham density functional theory relying on the use of localized molecular orbitals and physically sound charge population protocols. The new lossless property decompositions, which allow for partitioning one-electron reduced density matrices into either bond-wise or atomic contributions, are compared to alternatives from the literature with regard to both molecular energies and dipole moments. Besides commenting on possible applications as an interpretative tool in the rationalization of certain electronic phenomena, we demonstrate how decomposed mean-field theory makes it possible to expose and amplify compositional features in the context of machine-learned quantum chemistry. This is made possible by improving upon the granularity of the underlying data. On the basis of our preliminary proof-of-concept results, we conjecture that many of the structure-property inferences in existence today may be further refined by efficiently leveraging an increase in dataset complexity and richness.
Collapse
Affiliation(s)
- Janus J Eriksen
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
13
|
Gao P, Zhang J, Sun Y, Yu J. Toward Accurate Predictions of Atomic Properties via Quantum Mechanics Descriptors Augmented Graph Convolutional Neural Network: Application of This Novel Approach in NMR Chemical Shifts Predictions. J Phys Chem Lett 2020; 11:9812-9818. [PMID: 33151693 DOI: 10.1021/acs.jpclett.0c02654] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this study, an augmented Graph Convolutional Network (GCN) with quantum mechanics (QM) descriptors was reported for its accurate predictions of NMR chemical shifts with respect to experimental values. The prediction errors of 13C/1H NMR chemical shifts can be as small as 2.14/0.11 ppm. There are two crucial characteristics for this modified GCN: in one aspect, such a novel neural network could efficiently extract the overall molecule structure information; in another aspect, it could accurately solve the chemical environment of the target atom. As there exists an imperfect linear regression between the experimental NMR chemical shifts (δ) and the density functional theory (DFT) calculated isotropic shielding constants (σ), the inclusion of QM descriptors within GCN can largely improve its performance. Moreover, few-shot learning also becomes feasible with these descriptors. The success of this novel GCN in chemical shifts predictions also indicates its potential applicability for other computational studies.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW 2500, Australia
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 53000, China
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Yuzhu Sun
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jianguo Yu
- School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
14
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|
15
|
Meldgaard SA, Mortensen HL, Jørgensen MS, Hammer B. Structure prediction of surface reconstructions by deep reinforcement learning. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2020; 32:404005. [PMID: 32434171 DOI: 10.1088/1361-648x/ab94f2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 05/20/2020] [Indexed: 06/11/2023]
Abstract
We demonstrate how image recognition and reinforcement learning combined may be used to determine the atomistic structure of reconstructed crystalline surfaces. A deep neural network represents a reinforcement learning agent that obtains training rewards by interacting with an environment. The environment contains a quantum mechanical potential energy evaluator in the form of a density functional theory program. The agent handles the 3D atomistic structure as a series of stacked 2D images and outputs the next atom type to place and the atomic site to occupy. Agents are seen to require 1000-10 000 single point density functional theory evaluations, to learn by themselves how to build the optimal surface reconstructions of anatase TiO2(001)-(1 × 4) and rutile SnO2(110)-(4 × 1).
Collapse
Affiliation(s)
- Søren A Meldgaard
- Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Henrik L Mortensen
- Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Mathias S Jørgensen
- Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Bjørk Hammer
- Department of Physics and Astronomy, Aarhus University, DK-8000 Aarhus C, Denmark
| |
Collapse
|
16
|
Gao P, Zhang J, Peng Q, Zhang J, Glezakou VA. General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT. J Chem Inf Model 2020; 60:3746-3754. [DOI: 10.1021/acs.jcim.0c00388] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW 2500, Australia
| | - Jun Zhang
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| | - Qian Peng
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Guangzhou Regenerative Medicine and Health-Guangdong Laboratory, Science Park, Guangzhou 510530, China
| | - Vassiliki-Alexandra Glezakou
- Physical Sciences Division, Pacific Northwest National Laboratory (PNNL), Richland, Washington 99352, United States
| |
Collapse
|
17
|
Zhang J, Glezakou VA, Rousseau R, Nguyen MT. NWPEsSe: An Adaptive-Learning Global Optimization Algorithm for Nanosized Cluster Systems. J Chem Theory Comput 2020; 16:3947-3958. [PMID: 32364725 DOI: 10.1021/acs.jctc.9b01107] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Global optimization constitutes an important and fundamental problem in theoretical studies in many chemical fields, such as catalysis, materials, or separations problems. In this paper, a novel algorithm has been developed for the global optimization of large systems including neat and ligated clusters in the gas phase and supported clusters in periodic boundary conditions. The method is based on an updated artificial bee colony (ABC) algorithm method, that allows for adaptive-learning during the search process. The new algorithm is tested against four classes of systems of diverse chemical nature: gas phase Au55, ligated Au82+, Au8 supported on graphene oxide and defected rutile, and a large cluster assembly [Co6Te8(PEt3)6][C60]n, with sizes ranging between 1 and 3 nm and containing up to 1300 atoms. Reliable global minima (GMs) are obtained for all cases, either confirming published data or reporting new lower energy structures. The algorithm and interface to other codes in the form of an independent program, Northwest Potential Energy Search Engine (NWPEsSe), is freely available, and it provides a powerful and efficient approach for global optimization of nanosized cluster systems.
Collapse
Affiliation(s)
- Jun Zhang
- Physical Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | | | - Roger Rousseau
- Physical Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Manh-Thuong Nguyen
- Physical Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
18
|
Pei HW, Laaksonen A. Feature vector clustering molecular pairs in computer simulations. J Comput Chem 2019; 40:2539-2549. [PMID: 31313339 DOI: 10.1002/jcc.26028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/18/2019] [Accepted: 06/22/2019] [Indexed: 01/07/2023]
Abstract
A clustering framework is introduced to analyze the microscopic structural organization of molecular pairs in liquids and solutions. A molecular pair is represented by a representative vector (RV). To obtain RV, intermolecular atom distances in the pair are extracted from simulation trajectory as components of the key feature vector (KFV). A specific scheme is then suggested to transform KFV to RV by removing the influence of permutational molecular symmetry on the KFV as the predicted clusters should be independent of possible permutations of identical atoms in the pair. After RVs of pairs are obtained, a clustering analysis technique is finally used to classify all the RVs of molecular pairs into the clusters. The framework is applied to analyze trajectory from molecular dynamics simulations of an ionic liquid (trihexyltetradecylphosphonium bis(oxalato)borate ([P6,6,6,14 ][BOB])). The molecular pairs are successfully categorized into physically meaningful clusters, and their effectiveness is evaluated by computing the product moment correlation coefficient (PMCC). (Willett, Winterman, and Bawden, J. Chem. Inf. Comput. Sci. 1986, 26, 109-118; Downs, Willett, and Fisanick, J. Chem. Inf. Comput. Sci. 1994, 34, 1094-1102) It is observed that representative configurations of two clusters are related to two energy local minimum structures optimized by density functional theory (DFT) calculation, respectively. Several widely used clustering analysis techniques of both nonhierarchical (k-means) and hierarchical clustering algorithms are also evaluated and compared with each other. The proposed KFV technique efficiently reveals local molecular pair structures in the simulated complex liquid. It is a method, which is highly useful for liquids and solutions in particular with strong intermolecular interactions. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Han-Wen Pei
- Department of Materials and Environmental Chemistry, Arrhenius Laboratory, Stockholm University, SE-106 91, Stockholm, Sweden.,System and Component Design, Department of Machine Design, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden
| | - Aatto Laaksonen
- Department of Materials and Environmental Chemistry, Arrhenius Laboratory, Stockholm University, SE-106 91, Stockholm, Sweden.,State Key Laboratory of Materials-Oriented and Chemical Engineering, Nanjing Tech University, Nanjing, 210009, China.,Centre of Advanced Research in Bionanoconjugates and Biopolymers, Petru Poni Institute of Macromolecular Chemistry Aleea Grigore Ghica-Voda, 41A, 700487, Lasi, Romania
| |
Collapse
|
19
|
Schmitz G, Godtliebsen IH, Christiansen O. Machine learning for potential energy surfaces: An extensive database and assessment of methods. J Chem Phys 2019; 150:244113. [DOI: 10.1063/1.5100141] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Gunnar Schmitz
- Department of Chemistry, Aarhus Universitet, DK-8000 Aarhus, Denmark
| | | | - Ove Christiansen
- Department of Chemistry, Aarhus Universitet, DK-8000 Aarhus, Denmark
| |
Collapse
|
20
|
Van den Bossche M. DFTB-Assisted Global Structure Optimization of 13- and 55-Atom Late Transition Metal Clusters. J Phys Chem A 2019; 123:3038-3045. [DOI: 10.1021/acs.jpca.9b00927] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
21
|
Zhang P, Shen L, Yang W. Solvation Free Energy Calculations with Quantum Mechanics/Molecular Mechanics and Machine Learning Models. J Phys Chem B 2019; 123:901-908. [PMID: 30557020 PMCID: PMC6448400 DOI: 10.1021/acs.jpcb.8b11905] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
For exploration of chemical and biological systems, the combined quantum mechanics and molecular mechanics (QM/MM) and machine learning (ML) models have been developed recently to achieve high accuracy and efficiency for molecular dynamics (MD) simulations. Despite its success on reaction free energy calculations, how to identify new configurations on insufficiently sampled regions during MD and how to update the current ML models with the growing database on the fly are both very important but still challenging. In this article, we apply the QM/MM ML method to solvation free energy calculations and address these two challenges. We employ three approaches to detect new data points and introduce the gradient boosting algorithm to reoptimize efficiently the ML model during ML-based MD sampling. The solvation free energy calculations on several typical organic molecules demonstrate that our developed method provides a systematic, robust, and efficient way to explore new chemistry using ML-based QM/MM MD simulations.
Collapse
Affiliation(s)
- Pan Zhang
- Department of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Lin Shen
- Department of Chemistry, Duke University, Durham, North Carolina 27708, United States
| | - Weitao Yang
- Department of Chemistry and Department of Physics, Duke University, Durham, NC 27708, United States
- Key laboratory of Theoretical Chemistry of Environment, Ministry of Education, School of Chemistry and Environment, South China Normal University, Guangzhou 510006, P.R.China
| |
Collapse
|