1
|
Strandgaard M, Linjordet T, Kneiding H, Burnage AL, Nova A, Jensen JH, Balcells D. A Deep Generative Model for the Inverse Design of Transition Metal Ligands and Complexes. JACS AU 2025; 5:2294-2308. [PMID: 40443902 PMCID: PMC12117439 DOI: 10.1021/jacsau.5c00242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 04/15/2025] [Accepted: 04/15/2025] [Indexed: 06/02/2025]
Abstract
Deep generative models yielding transition metal complexes (TMCs) remain scarce despite the key role of these compounds in industrial catalytic processes, anticancer therapies, and the energy transition. Compared to drug discovery within the chemical space of organic molecules, TMCs pose further challenges, including the encoding of chemical bonds of higher complexity and the need to optimize multiple properties. In this work, we developed a generative model for the inverse design of transition metal ligands and complexes, based on the junction tree variational autoencoder (JT-VAE). After implementing a SMILES-based encoding of the metal-ligand bonds, the model was trained with the tmQMg-L ligand library, allowing for the generation of thousands of novel, highly diverse monodentate (κ1) and bidentate (κ2) ligands, including imines, phosphines, and carbenes. Further, the generated ligands were labeled with two target properties reflecting the stability and electron density of the associated homoleptic iridium TMCs: the HOMO-LUMO gap (ϵ) and the charge of the metal center (q Ir). This data was used to implement a conditional model that generated ligands from a prompt, with the single- or dual-objective of optimizing either or both the ϵ and q Ir properties and allowing for chemical interpretation based on the optimization trajectories. The optimizations also had an impact on other chemical properties, including ligand dissociation energies and oxidative addition barriers. A similar model was implemented to condition ligand generation by solubility and steric bulk.
Collapse
Affiliation(s)
- Magnus Strandgaard
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
- Department
of Chemistry, University of Copenhagen, Copenhagen2100, Denmark
| | - Trond Linjordet
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Hannes Kneiding
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Arron L. Burnage
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| | - Ainara Nova
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
- Centre
for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, OsloN-0315, Norway
| | - Jan Halborg Jensen
- Department
of Chemistry, University of Copenhagen, Copenhagen2100, Denmark
| | - David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033, Blindern, Oslo0315, Norway
| |
Collapse
|
2
|
Mroz AM, Basford AR, Hastedt F, Jayasekera IS, Mosquera-Lois I, Sedgwick R, Ballester PJ, Bocarsly JD, Antonio Del Río Chanona E, Evans ML, Frost JM, Ganose AM, Greenaway RL, Kuok Mimi Hii K, Li Y, Misener R, Walsh A, Zhang D, Jelfs KE. Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry. Chem Soc Rev 2025. [PMID: 40278836 PMCID: PMC12024683 DOI: 10.1039/d5cs00146c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Indexed: 04/26/2025]
Abstract
From accelerating simulations and exploring chemical space, to experimental planning and integrating automation within experimental labs, artificial intelligence (AI) is changing the landscape of chemistry. We are seeing a significant increase in the number of publications leveraging these powerful data-driven insights and models to accelerate all aspects of chemical research. For example, how we represent molecules and materials to computer algorithms for predictive and generative models, as well as the physical mechanisms by which we perform experiments in the lab for automation. Here, we present ten diverse perspectives on the impact of AI coming from those with a range of backgrounds from experimental chemistry, computational chemistry, computer science, engineering and across different areas of chemistry, including drug discovery, catalysis, chemical automation, chemical physics, materials chemistry. The ten perspectives presented here cover a range of themes, including AI for computation, facilitating discovery, supporting experiments, and enabling technologies for transformation. We highlight and discuss imminent challenges and ways in which we are redefining problems to accelerate the impact of chemical research via AI.
Collapse
Affiliation(s)
- Austin M Mroz
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
| | - Annabel R Basford
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Friedrich Hastedt
- Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
| | | | | | - Ruby Sedgwick
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Joshua D Bocarsly
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, USA
| | | | - Matthew L Evans
- UCLouvain, Institute of Condensed Matter and Nanosciences (IMCN), Chemin des Étoiles 8, Louvain-la-Neuve 1348, Belgium
- Matgenix SRL, A6K Advanced Engineering Center, Charleroi, Belgium
- Datalab Industries Ltd, King's Lynn, Norfolk, UK
| | - Jarvist M Frost
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | - Alex M Ganose
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| | | | | | - Yingzhen Li
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Ruth Misener
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Aron Walsh
- Department of Materials, Imperial College London, London SW7 2AZ, UK
| | - Dandan Zhang
- I-X Centre for AI in Science, Imperial College London, London W12 0BZ, UK
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| | - Kim E Jelfs
- Department of Chemistry, Imperial College London, London W12 0BZ, UK.
| |
Collapse
|
3
|
Generative molecular design and discovery on the rise. NATURE COMPUTATIONAL SCIENCE 2025; 5:269-270. [PMID: 40247016 DOI: 10.1038/s43588-025-00802-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2025]
|
4
|
Katbashev A, Stahn M, Rose T, Alizadeh V, Friede M, Plett C, Steinbach P, Ehlert S. Overview on Building Blocks and Applications of Efficient and Robust Extended Tight Binding. J Phys Chem A 2025; 129:2667-2682. [PMID: 40013428 DOI: 10.1021/acs.jpca.4c08263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
The extended tight binding (xTB) family of methods opened many new possibilities in the field of computational chemistry. Within just 5 years, the GFN2-xTB parametrization for all elements up to Z = 86 enabled more than a thousand applications, which were previously not feasible with other electronic structure methods. The xTB methods provide a robust and efficient way to apply quantum mechanics-based approaches for obtaining molecular geometries, computing free energy corrections or describing noncovalent interactions and found applicability for many more targets. A crucial contribution to the success of the xTB methods is the availability within many simulation packages and frameworks, supported by the open source development of its program library and packages. We present a comprehensive summary of the applications and capabilities of xTB methods in different fields of chemistry. Moreover, we consider the main software packages for xTB calculations, covering their current ecosystem, novel features, and usage by the scientific community.
Collapse
Affiliation(s)
- Abylay Katbashev
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Marcel Stahn
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
- OpenEye, Cadence Molecular Sciences, Ebertplatz 1, 50668 Cologne, Germany
| | - Thomas Rose
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Vahideh Alizadeh
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
- Center for Advanced Systems Understanding (CASUS), Untermarkt 20, 02826 Görlitz, Germany
| | - Marvin Friede
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Christoph Plett
- Mulliken Center for Theoretical Chemistry, Clausius Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Pit Steinbach
- Institute of Physical Chemistry, RWTH Aachen University, Melatener Str. 20, 52074 Aachen, Germany
| | - Sebastian Ehlert
- AI for Science, Microsoft Research, Evert van de Beekstraat 354, 1118 CZ Schiphol, The Netherlands
| |
Collapse
|
5
|
Kreimendahl L, Karnaukh M, Röhr MIS. Diffusion Generative Models for Designing Efficient Singlet Fission Dimers. J Phys Chem A 2025; 129:407-414. [PMID: 39780705 DOI: 10.1021/acs.jpca.4c08170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Diffusion generative models, a class of machine learning techniques, have shown remarkable promise in materials science and chemistry by enabling the precise generation of complex molecular structures. In this article, we propose a novel application of diffusion generative models for stabilizing reactive molecular structures identified through quantum mechanical screening. Specifically, we focus on the design challenge presented by singlet fission (SF), a phenomenon crucial for advancing solar cell efficiency beyond theoretical limits. While theoretical chemistry has been successful in predicting intermolecular arrangements with enhanced SF coupling, the practical implementation of these configurations faces challenges due to discrepancies between favorable and stabilized structures. To address this gap, we introduce a three-step strategy combining quantum mechanical screening for identifying optimal molecular arrangements and diffusion generative models for predicting stabilizing linkers. Through a case study of cibalackrot dimers, a promising SF material, we demonstrate the efficacy of our approach in enhancing SF efficiency by stabilizing the desired molecular arrangements.
Collapse
Affiliation(s)
- Lasse Kreimendahl
- Institute of Physical and Theoretical Chemistry, Julius-Maximilians-Universität Würzburg, Emil-Fischer-Str. 42, Würzburg 97074, Germany
| | - Mikhail Karnaukh
- Institute of Physical and Theoretical Chemistry, Julius-Maximilians-Universität Würzburg, Emil-Fischer-Str. 42, Würzburg 97074, Germany
| | - Merle I S Röhr
- Institute of Physical and Theoretical Chemistry, Julius-Maximilians-Universität Würzburg, Emil-Fischer-Str. 42, Würzburg 97074, Germany
- Center for Nanosystems Chemistry, Julius-Maximilians-Universität Würzburg, Theodor-Boveri Weg, Würzburg 97074, Germany
| |
Collapse
|
6
|
Schneuing A, Harris C, Du Y, Didi K, Jamasb A, Igashov I, Du W, Gomes C, Blundell TL, Lio P, Welling M, Bronstein M, Correia B. Structure-based drug design with equivariant diffusion models. NATURE COMPUTATIONAL SCIENCE 2024; 4:899-909. [PMID: 39653846 DOI: 10.1038/s43588-024-00737-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 11/04/2024] [Indexed: 12/21/2024]
Abstract
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs with their protein targets to propose new drug candidates. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pretrained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design and partial molecular design with inpainting. We formulate SBDD as a three-dimensional conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics.
Collapse
Affiliation(s)
- Arne Schneuing
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | | | | | | | - Arian Jamasb
- University of Cambridge, Cambridge, UK
- Prescient Design, Genentech, Basel, Switzerland
| | - Ilia Igashov
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Weitao Du
- Chinese Academy of Mathematics and System Science, Beijing, China
| | | | - Tom L Blundell
- University of Cambridge, Cambridge, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Pietro Lio
- University of Cambridge, Cambridge, UK
- University of Rome 'La Sapienza', Rome, Italy
| | - Max Welling
- Microsoft Research AI4Science, Amsterdam, Netherlands
- University of Amsterdam, Amsterdam, Netherlands
| | | | - Bruno Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
7
|
Chen M, Mei S, Fan J, Wang M. Opportunities and challenges of diffusion models for generative AI. Natl Sci Rev 2024; 11:nwae348. [PMID: 39554240 PMCID: PMC11562846 DOI: 10.1093/nsr/nwae348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/03/2024] [Accepted: 07/07/2024] [Indexed: 11/19/2024] Open
Abstract
Diffusion models, a powerful and universal generative artificial intelligence technology, have achieved tremendous success and opened up new possibilities in diverse applications. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active control towards task-desired properties. Despite the significant empirical success, theoretical underpinnings of diffusion models are very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models to highlight their sample generation capabilities under various control goals. At the same time, we dive into the unique working flow of diffusion models through the lens of stochastic processes. We identify theoretical challenges in analyzing diffusion models, owing to their complicated training procedure and interaction with the underlying data distribution. To address these challenges, we overview several promising advances, demonstrating diffusion models as an efficient distribution learner and a sampler. Furthermore, we introduce a new avenue in high-dimensional structured optimization through diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded exposure for stimulating forward-looking theories and methods of diffusion models.
Collapse
Affiliation(s)
- Minshuo Chen
- Department of Electrical and Computer Engineering, Princeton University, Princeton 08544, USA
| | - Song Mei
- Department of Statistics, University of California, Berkeley, Berkeley 94720, USA
| | - Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University, Princeton 08544, USA
| | - Mengdi Wang
- Department of Electrical and Computer Engineering, Princeton University, Princeton 08544, USA
| |
Collapse
|
8
|
Li Z, Tolba SA, Wang Y, Alesadi A, Xia W. Modeling-driven materials by design for conjugated polymers: insights into optoelectronic, conformational, and thermomechanical properties. Chem Commun (Camb) 2024; 60:11625-11641. [PMID: 39157936 DOI: 10.1039/d4cc03217a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
Conjugated polymers (CPs) have emerged as pivotal functional materials in the realm of flexible electronics and optoelectronic devices due to their unique blend of mechanical flexibility, solution processability, and tunable optoelectronic properties. This review synthesizes the latest molecular simulation-driven insights obtained from various multiscale modeling techniques, including quantum mechanics (QM), all-atomistic (AA) molecular dynamics (MD), coarse-grained (CG) modeling, and machine learning (ML), to elucidate the optoelectronic, structural, and thermomechanical properties of CPs. By integrating findings from our recent computational work with key experimental studies, we highlight the molecular mechanisms influencing the multifunctional performance of CPs. This comprehensive understanding aims to guide future research directions and applications in the modeling assisted design of high-performance CP-based materials and devices.
Collapse
Affiliation(s)
- Zhaofan Li
- Department of Aerospace Engineering, Iowa State University, Ames, Iowa 50011, USA.
| | - Sara A Tolba
- Materials and Nanotechnology Program, North Dakota State University, Fargo, ND 58108, USA
| | - Yang Wang
- Zernike Institute for Advanced Materials, University of Groningen, 9747 AG, Groningen, The Netherlands
| | - Amirhadi Alesadi
- Department of Civil, Construction and Environmental Engineering, North Dakota State University, Fargo, ND 58108, USA
| | - Wenjie Xia
- Department of Aerospace Engineering, Iowa State University, Ames, Iowa 50011, USA.
| |
Collapse
|
9
|
Liu H, Yin H, Luo Z, Wang X. Integrating chemistry knowledge in large language models via prompt engineering. Synth Syst Biotechnol 2024; 10:23-38. [PMID: 39206087 PMCID: PMC11350497 DOI: 10.1016/j.synbio.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 07/08/2024] [Accepted: 07/20/2024] [Indexed: 09/04/2024] Open
Abstract
This paper presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on various metrics, including capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials including the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results suggest that domain-knowledge prompts can guide LLMs to generate more accurate and relevant responses, highlighting the potential of LLMs as powerful tools for scientific discovery and innovation when equipped with domain-specific prompts. The study also discusses limitations and future directions for domain-specific prompt engineering development.
Collapse
Affiliation(s)
- Hongxuan Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoyu Yin
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zhiyao Luo
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Old Road Campus Research Building, Headington, Oxford, OX3 7DQ, United Kingdom
| | - Xiaonan Wang
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
- Key Laboratory for Industrial Biocatalysis, Ministry of Education, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
10
|
Chung JK, Brown ML, Popelier PLA. Transferability of Buckingham Parameters for Short-Range Repulsion between Topological Atoms. J Phys Chem A 2024; 128:4561-4572. [PMID: 38805440 PMCID: PMC11163427 DOI: 10.1021/acs.jpca.4c02048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/16/2024] [Accepted: 05/20/2024] [Indexed: 05/30/2024]
Abstract
The repulsive part of the Buckingham potential, with parameters A and B, can be used to model deformation energies and steric energies. Both are calculated using the interacting quantum atom energy decomposition scheme where the latter is obtained from the former by a charge-transfer-based energy correction. These energies relate to short-range interactions, specifically the deformation of electron density and steric hindrance, respectively, when topological atoms approach each other. In this work, we calculate and fit the energies of carbonyl carbon, carbonyl oxygen, and, where possible, amine nitrogen atoms to the repulsive part of the Buckingham potential for 26 molecules. We find that while the steric energies of all atom pairs studied display exponential behavior with respect to distance, some deformation energy data do not. The obtained parameters are shown to be transferable by calculating root-mean-square errors of fitted potentials with respect to energy data of the same atom in, as far as possible, all other molecules from our data set. We observed that 36% and 10% of these errors were smaller than 4 kJ mol-1 for steric and deformation energy, respectively. Thus, we find that steric energy parameters are more transferable than deformation energy parameters. Finally, we speculate about the physical meaning of the A and B parameters and the implications of the strong exponential and exponential-linear piecewise relationships that we observe between them.
Collapse
Affiliation(s)
- Jaiming
J. K. Chung
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Great
Britain
| | - Matthew L. Brown
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Great
Britain
| | - Paul L. A. Popelier
- Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, Great
Britain
| |
Collapse
|
11
|
Wahab A, Gershoni-Poranne R. COMPAS-3: a dataset of peri-condensed polybenzenoid hydrocarbons. Phys Chem Chem Phys 2024; 26:15344-15357. [PMID: 38758092 DOI: 10.1039/d4cp01027b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
We introduce the third installment of the COMPAS Project - a COMputational database of Polycyclic Aromatic Systems, focused on peri-condensed polybenzenoid hydrocarbons. In this installment, we develop two datasets containing the optimized ground-state structures and a selection of molecular properties of ∼39k and ∼9k peri-condensed polybenzenoid hydrocarbons (at the GFN2-xTB and CAM-B3LYP-D3BJ/cc-pvdz//CAM-B3LYP-D3BJ/def2-SVP levels, respectively). The manuscript details the enumeration and data generation processes and describes the information available within the datasets. An in-depth comparison between the two types of computation is performed, and it is found that the geometrical disagreement is maximal for slightly-distorted molecules. In addition, a data-driven analysis of the structure-property trends of peri-condensed PBHs is performed, highlighting the effect of the size of peri-condensed islands and linearly annulated rings on the HOMO-LUMO gap. The insights described herein are important for rational design of novel functional aromatic molecules for use in, e.g., organic electronics. The generated datasets provide a basis for additional data-driven machine- and deep-learning studies in chemistry.
Collapse
Affiliation(s)
- Alexandra Wahab
- The Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich, Switzerland
| | - Renana Gershoni-Poranne
- The Schulich Faculty of Chemistry and the Resnick Sustainability Center for Catalysis, Technion - Israel Institute of Technology, Haifa 32000, Israel.
| |
Collapse
|
12
|
Zhang R, Yuan R, Tian B. PointGAT: A Quantum Chemical Property Prediction Model Integrating Graph Attention and 3D Geometry. J Chem Theory Comput 2024; 20:4115-4128. [PMID: 38727259 DOI: 10.1021/acs.jctc.3c01420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Predicting quantum chemical properties is a fundamental challenge for computational chemistry. While the development of graph neural networks has advanced molecular representation learning and property prediction, their performance could be further enhanced by incorporating three-dimensional (3D) structural geometry into two-dimensional (2D) molecular graph representation. In this study, we introduce the PointGAT model for quantum molecular property prediction, which integrates 3D molecular coordinates with graph-attention modeling. Comparison with other current models in molecular prediction tasks showed that PointGAT could provide higher predictive accuracy in various benchmark data sets from MoleculeNet, including ESOL, FreeSolv, Lipop, HIV, and 6 out of 12 tasks of the QM9 data set. To further examine PointGAT prediction of quantum mechanical (QM) energies, we constructed a C10 data set comprising 11,841 charged and chiral carbocation intermediates with QM energies calculated at the DM21/6-31G*//B3LYP/6-31G* levels. Notably, PointGAT achieved an R2 value of 0.950 and an MAE of 1.616 kcal/mol, outperforming even the best-performing graph neural network model with a reduction of 0.216 kcal/mol in MAE and an improvement of 0.050 in R2. Additional ablation studies indicated that incorporating molecular geometry into the model resulted in markedly higher predictive accuracy, reducing the MAE value from 1.802 to 1.616 kcal/mol. Moreover, visualization of PointGAT atomic attention weights suggested its predictions were interpretable. Findings in this study support the application of PointGAT as a powerful and versatile tool for quantum chemical property prediction that can facilitate high-accuracy modeling for fundamental exploration of chemical space as well as drug design and molecular engineering.
Collapse
Affiliation(s)
- Rong Zhang
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| | - Rongqing Yuan
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
13
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
14
|
Mayo Yanes E, Chakraborty S, Gershoni-Poranne R. COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems. Sci Data 2024; 11:97. [PMID: 38242917 PMCID: PMC10799083 DOI: 10.1038/s41597-024-02927-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/05/2024] [Indexed: 01/21/2024] Open
Abstract
Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.
Collapse
Affiliation(s)
- Eduardo Mayo Yanes
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel
| | - Sabyasachi Chakraborty
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel
| | - Renana Gershoni-Poranne
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel.
| |
Collapse
|
15
|
Gryn'ova G. Crafting molecular architectures with guided diffusion. NATURE COMPUTATIONAL SCIENCE 2023; 3:821-822. [PMID: 38177764 DOI: 10.1038/s43588-023-00533-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Heidelberg, Germany.
- Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany.
| |
Collapse
|