1
|
Li J, Reid JP. Connecting the complexity of stereoselective synthesis to the evolution of predictive tools. Chem Sci 2025; 16:3832-3851. [PMID: 39911341 PMCID: PMC11791519 DOI: 10.1039/d4sc07461k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 01/22/2025] [Indexed: 02/07/2025] Open
Abstract
Synthetic methods have seemingly progressed to an extent where there is an apparent and increasing need for predictive models to navigate the vast chemical space. Methods for anticipating and optimizing reaction outcomes have evolved from simple qualitative pictures generated from chemical intuition to complex models constructed from quantitative methods like quantum chemistry and machine learning. These toolsets are rooted in physical organic chemistry where fundamental principles of chemical reactivity and molecular interactions guide their development and application. Here, we detail how the evolution of these methods is a successful outcome and a powerful response to the diverse synthetic challenges confronted and the innovative selectivity concepts introduced. In this review, we perform a periodization of organic chemistry focusing on strategies that have been applied to guide the synthesis of chiral organic molecules.
Collapse
Affiliation(s)
- Jiajing Li
- Department of Chemistry, University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| | - Jolene P Reid
- Department of Chemistry, University of British Columbia 2036 Main Mall Vancouver British Columbia V6T 1Z1 Canada
| |
Collapse
|
2
|
Chen LY, Li YP. Machine learning-guided strategies for reaction conditions design and optimization. Beilstein J Org Chem 2024; 20:2476-2492. [PMID: 39376489 PMCID: PMC11457048 DOI: 10.3762/bjoc.20.212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 09/19/2024] [Indexed: 10/09/2024] Open
Abstract
This review surveys the recent advances and challenges in predicting and optimizing reaction conditions using machine learning techniques. The paper emphasizes the importance of acquiring and processing large and diverse datasets of chemical reactions, and the use of both global and local models to guide the design of synthetic processes. Global models exploit the information from comprehensive databases to suggest general reaction conditions for new reactions, while local models fine-tune the specific parameters for a given reaction family to improve yield and selectivity. The paper also identifies the current limitations and opportunities in this field, such as the data quality and availability, and the integration of high-throughput experimentation. The paper demonstrates how the combination of chemical engineering, data science, and ML algorithms can enhance the efficiency and effectiveness of reaction conditions design, and enable novel discoveries in synthetic chemistry.
Collapse
Affiliation(s)
- Lung-Yi Chen
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
3
|
Spiekermann KA, Dong X, Menon A, Green WH, Pfeifle M, Sandfort F, Welz O, Bergeler M. Accurately Predicting Barrier Heights for Radical Reactions in Solution Using Deep Graph Networks. J Phys Chem A 2024; 128:8384-8403. [PMID: 39298746 DOI: 10.1021/acs.jpca.4c04121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Quantitative estimates of reaction barriers and solvent effects are essential for developing kinetic mechanisms and predicting reaction outcomes. Here, we create a new data set of 5,600 unique elementary radical reactions calculated using the M06-2X/def2-QZVP//B3LYP-D3(BJ)/def2-TZVP level of theory. A conformer search is done for each species using TPSS/def2-TZVP. Gibbs free energies of activation and of reaction for these radical reactions in 40 common solvents are obtained using COSMO-RS for solvation effects. These balanced reactions involve the elements H, C, N, O, and S, contain up to 19 heavy atoms, and have atom-mapped SMILES. All transition states are verified by an intrinsic reaction coordinate calculation. We next train a deep graph network to directly estimate the Gibbs free energy of activation and of reaction in both gas and solution phases using only the atom-mapped SMILES of the reactant and product and the SMILES of the solvent. This simple input representation avoids computationally expensive optimizations for the reactant, transition state, and product structures during inference, making our model well-suited for high-throughput predictive chemistry and quickly providing information for (retro-)synthesis planning tools. To properly measure model performance, we report results on both interpolative and extrapolative data splits and also compare to several baseline models. During training and testing, the data set is augmented by including the reverse direction of each reaction and variants with different resonance structures. After data augmentation, we have around 2 million entries to train the model, which achieves a testing set mean absolute error of 1.16 kcal mol-1 for the Gibbs free energy of activation in solution. We anticipate this model will accelerate predictions for high-throughput screening to quickly identify relevant reactions in solution, and our data set will serve as a benchmark for future studies.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mark Pfeifle
- BASF Digital Solutions GmbH, Ludwigshafen am Rhein 67061, Germany
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Oliver Welz
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Maike Bergeler
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| |
Collapse
|
4
|
Plyer L, Marcou G, Perves C, Bonachera F, Varnek A. Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling. J Cheminform 2024; 16:90. [PMID: 39090756 PMCID: PMC11295431 DOI: 10.1186/s13321-024-00889-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 07/21/2024] [Indexed: 08/04/2024] Open
Abstract
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed Graph of Reaction (CGR) approach. This workflow is part of the ChemMoodle project and is implemented as a Moodle Plugin. It uses the Chemdoodle engine for reaction drawing and visualization and communicates with a REST server calculating the similarity score using ISIDA fragment descriptors. The plugin is open-source, accessible in GitHub ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_reacsimilarity ) and on the Moodle plugin store ( https://moodle.org/plugins/qtype_reacsimilarity?lang=en ). Both similarity measures and fragmentation can be configured.Scientific contribution This work introduces an open-source method for evaluating chemical reaction questions within Moodle using the CGR approach. Our contribution provides a nuanced grading mechanism that accommodates acceptable tolerances in reaction assessments, enhancing the accuracy and flexibility of the grading process.
Collapse
Affiliation(s)
- Louis Plyer
- Faculté de Chimie, University of Strasbourg, Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France.
| | - Céline Perves
- Direction du Numérique (DNUM), University of Strasbourg, Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France
| | - Alexander Varnek
- Laboratory of Chemoinformatics-UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
5
|
Abedin MM, Tabata K, Matsumura Y, Komatsuzaki T. Multi-armed bandit algorithm for sequential experiments of molecular properties with dynamic feature selection. J Chem Phys 2024; 161:014115. [PMID: 38958158 DOI: 10.1063/5.0206042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/16/2024] [Indexed: 07/04/2024] Open
Abstract
Sequential optimization is one of the promising approaches in identifying the optimal candidate(s) (molecules, reactants, drugs, etc.) with desired properties (reaction yield, selectivity, efficacy, etc.) from a large set of potential candidates, while minimizing the number of experiments required. However, the high dimensionality of the feature space (e.g., molecular descriptors) makes it often difficult to utilize the relevant features during the process of updating the set of candidates to be examined. In this article, we developed a new sequential optimization algorithm for molecular problems based on reinforcement learning, multi-armed linear bandit framework, and online, dynamic feature selections in which relevant molecular descriptors are updated along with the experiments. We also designed a stopping condition aimed to guarantee the reliability of the chosen candidate from the dataset pool. The developed algorithm was examined by comparing with Bayesian optimization (BO), using two synthetic datasets and two real datasets in which one dataset includes hydration free energy of molecules and another one includes a free energy difference between enantiomer products in chemical reaction. We found that the dynamic feature selection in representing the desired properties along the experiments provides a better performance (e.g., time required to find the best candidate and stop the experiment) as the overall trend and that our multi-armed linear bandit approach with a dynamic feature selection scheme outperforms the standard BO with fixed feature variables. The comparison of our algorithm to BO with dynamic feature selection is also addressed.
Collapse
Affiliation(s)
- Md Menhazul Abedin
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Khulna University, Khulna 9208, Bangladesh
| | - Koji Tabata
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Yoshihiro Matsumura
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Tamiki Komatsuzaki
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
- Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Yamadaoka, Suita 565-0871, Osaka, Japan
- The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki 567-0047, Osaka, Japan
| |
Collapse
|
6
|
Ryzhkov FV, Ryzhkova YE, Elinson MN. Python tools for structural tasks in chemistry. Mol Divers 2024:10.1007/s11030-024-10889-7. [PMID: 38744790 DOI: 10.1007/s11030-024-10889-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 04/27/2024] [Indexed: 05/16/2024]
Abstract
In recent decades, the use of computational approaches and artificial intelligence in the scientific environment has become more widespread. In this regard, the popular and versatile programming language Python has attracted considerable attention from scientists in the field of chemistry. It is used to solve a variety of chemical and structural problems, including calculating descriptors, molecular fingerprints, graph construction, and computing chemical reaction networks. Python offers high-quality visualization tools for analyzing chemical spaces and compound libraries. This review is a list of tools for the above tasks, including scripts, libraries, ready-made programs, and web interfaces. Inevitably this manuscript does not claim to be an all-encompassing handbook including all the existing Python-based structural chemistry codes. The review serves as a starting point for scientists wishing to apply automatization or optimization to routine chemistry problems.
Collapse
Affiliation(s)
- Fedor V Ryzhkov
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia.
| | - Yuliya E Ryzhkova
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia
| | - Michail N Elinson
- N. D. Zelinsky Institute of Organic Chemistry Russian Academy of Sciences, 47 Leninsky Prospekt, Moscow, 119991, Russia
| |
Collapse
|
7
|
Chen S, An S, Babazade R, Jung Y. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning. Nat Commun 2024; 15:2250. [PMID: 38480709 PMCID: PMC10937625 DOI: 10.1038/s41467-024-46364-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/20/2024] [Indexed: 03/17/2024] Open
Abstract
Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | - Sunggi An
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea.
- Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea.
- Institute of Chemical Processes, Seoul National University, Seoul, South Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea.
| |
Collapse
|
8
|
Sidorov P, Tsuji N. A Primer on 2D Descriptors in Selectivity Modeling for Asymmetric Catalysis. Chemistry 2024; 30:e202302837. [PMID: 38010242 DOI: 10.1002/chem.202302837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
Machine learning has permeated all fields of research, including chemistry, and is now an integral part of the design of novel compounds with desired properties. In the field of asymmetric catalysis, the preference still lies with models based on a physical understanding of the catalysis phenomenon and the electronic and steric properties of catalysts. However, such models require quantum chemical calculations and are thus limited by their computational cost. Here, we highlight the recent advances in modeling catalyst selectivity by using the 2D structures of catalysts and substrates. While these have a less explicit mechanistic connection to the modeled property, 2D descriptors, such as topological indices, molecular fingerprints, and fragments, offer the tremendous advantages of low cost and high speed of calculations. This makes them optimal for the in-silico screening of large amounts of data. We provide an overview of common quantitative structure-property relationship workflow, model building and validation techniques, applications of these methodologies in asymmetric catalysis design, and an outlook on improving the understanding of 2D-based models.
Collapse
Affiliation(s)
- Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| |
Collapse
|
9
|
Chung Y, Green WH. Machine learning from quantum chemistry to predict experimental solvent effects on reaction rates. Chem Sci 2024; 15:2410-2424. [PMID: 38362410 PMCID: PMC10866337 DOI: 10.1039/d3sc05353a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 01/04/2024] [Indexed: 02/17/2024] Open
Abstract
Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol-1 for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.
Collapse
Affiliation(s)
- Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
10
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
11
|
Zankov D, Madzhidov T, Polishchuk P, Sidorov P, Varnek A. Multi-Instance Learning Approach to the Modeling of Enantioselectivity of Conformationally Flexible Organic Catalysts. J Chem Inf Model 2023; 63:6629-6641. [PMID: 37902548 DOI: 10.1021/acs.jcim.3c00393] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Dmitry Zankov
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg 67081, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| | - Timur Madzhidov
- Chemistry Solutions, Elsevier Ltd., Oxford OX5 1GB, United Kingdom
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Palacký University, Olomouc 77900, Czech Republic
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, Strasbourg 67081, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
| |
Collapse
|
12
|
Zankov D, Madzhidov T, Baskin I, Varnek A. Conjugated quantitative structure-property relationship models: Prediction of kinetic characteristics linked by the Arrhenius equation. Mol Inform 2023; 42:e2200275. [PMID: 37488968 DOI: 10.1002/minf.202200275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 07/08/2023] [Accepted: 07/24/2023] [Indexed: 07/26/2023]
Abstract
Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constantl o g k ${{\rm l}{\rm o}{\rm g}k}$ , pre-exponential factorl o g A ${{\rm l}{\rm o}{\rm g}A}$ , and activation energyE a ${{E}_{{\rm a}}}$ . They were benchmarked against single-task (individual and equation-based models) and multi-task models. In individual models, all characteristics were modeled separately, while in multi-task modelsl o g k ${{\rm l}{\rm o}{\rm g}k}$ ,l o g A ${{\rm l}{\rm o}{\rm g}A}$ andE a ${{E}_{{\rm a}}}$ were treated cooperatively. An equation-based model assessedl o g k ${{\rm l}{\rm o}{\rm g}k}$ using the Arrhenius equation andl o g A ${{\rm l}{\rm o}{\rm g}A}$ andE a ${{E}_{{\rm a}}}$ values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in the case of small training sets conjugated models are more robust than related single-task approaches.
Collapse
Affiliation(s)
- Dmitry Zankov
- Laboratory of Chemoinformatics, University of Strasbourg, France
| | - Timur Madzhidov
- Chemistry Solutions, Elsevier Ltd, Oxford, OX5 1GB, United Kingdom
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, Israel
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, France
| |
Collapse
|
13
|
Sar S, Mitra S, Panda P, Mandal SC, Ghosh N, Halder AK, Cordeiro MNDS. In Silico Modeling and Structural Analysis of Soluble Epoxide Hydrolase Inhibitors for Enhanced Therapeutic Design. Molecules 2023; 28:6379. [PMID: 37687207 PMCID: PMC10490281 DOI: 10.3390/molecules28176379] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/17/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
Human soluble epoxide hydrolase (sEH), a dual-functioning homodimeric enzyme with hydrolase and phosphatase activities, is known for its pivotal role in the hydrolysis of epoxyeicosatrienoic acids. Inhibitors targeting sEH have shown promising potential in the treatment of various life-threatening diseases. In this study, we employed a range of in silico modeling approaches to investigate a diverse dataset of structurally distinct sEH inhibitors. Our primary aim was to develop predictive and validated models while gaining insights into the structural requirements necessary for achieving higher inhibitory potential. To accomplish this, we initially calculated molecular descriptors using nine different descriptor-calculating tools, coupled with stochastic and non-stochastic feature selection strategies, to identify the most statistically significant linear 2D-QSAR model. The resulting model highlighted the critical roles played by topological characteristics, 2D pharmacophore features, and specific physicochemical properties in enhancing inhibitory potential. In addition to conventional 2D-QSAR modeling, we implemented the Transformer-CNN methodology to develop QSAR models, enabling us to obtain structural interpretations based on the Layer-wise Relevance Propagation (LRP) algorithm. Moreover, a comprehensive 3D-QSAR analysis provided additional insights into the structural requirements of these compounds as potent sEH inhibitors. To validate the findings from the QSAR modeling studies, we performed molecular dynamics (MD) simulations using selected compounds from the dataset. The simulation results offered crucial insights into receptor-ligand interactions, supporting the predictions obtained from the QSAR models. Collectively, our work serves as an essential guideline for the rational design of novel sEH inhibitors with enhanced therapeutic potential. Importantly, all the in silico studies were performed using open-access tools to ensure reproducibility and accessibility.
Collapse
Affiliation(s)
- Shuvam Sar
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Soumya Mitra
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
| | - Parthasarathi Panda
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
| | - Subhash C. Mandal
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Nilanjan Ghosh
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Amit Kumar Halder
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
- LAQV@REQUIMTE—Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Maria Natalia D. S. Cordeiro
- LAQV@REQUIMTE—Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
14
|
Jiang J, Zhang R, Yuan Y, Li T, Li G, Zhao Z, Yu Z. NoiseMol: A noise-robusted data augmentation via perturbing noise for molecular property prediction. J Mol Graph Model 2023; 121:108454. [PMID: 36963306 DOI: 10.1016/j.jmgm.2023.108454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/05/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
Simplified Molecular-Input Line-Entry System (SMILES) is one of a widely used molecular representation methods for molecular property prediction. We conjecture that all the characters in the SMILES string of a molecule are essential for making up the molecules, but most of them make little contribution to determining a particular property of the molecule. Therefore, we verified the conjecture in the pre-experiment. Motivated by the result, we propose to inject proper noisy information into the SMILES to augment the training data by increasing the diversity of the labeled molecules. To this end, we explore injecting perturbing noise into the original labeled SMILES strings to construct augmented data for alleviating the limitation of the labeled compound data and enhancing the model to extract more useful molecular representation for molecular property prediction. Specifically, we directly adopt mask, swap, deletion, and fusion operations on SMILES strings to randomly mask, swap, and delete atoms in SMILES strings. Then, the augmented data is used by two strategies: each epoch alternately feeds the original and perturbing noisy molecules, or each batch alternately feeds the original and perturbing noisy molecules. We conduct experiments on both Transformer and BiGRU models to validate the effectiveness by adopting widely used datasets from MoleculeNet and ZINC. Experimental results demonstrate that the proposed method outperforms strong baselines on all the datasets. NoiseMol obtains the best performance on BBBP and FDA when compared with state-of-the-art methods. Besides, NoiseMol achieves the best accuracy on LogP. Therefore, injecting perturbing noise into the labeled SMILES strings is an effective and efficient method, which improves the prediction performance, generalization, and robustness of the deep learning models.
Collapse
Affiliation(s)
- Jing Jiang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, Gansu, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Yongna Yuan
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China; Computer College, Qinghai Normal University, Xining, Qinghai, China.
| | - Gaili Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| | - Zhixuan Yu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China.
| |
Collapse
|
15
|
Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of 11B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023; 25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
In this article, we present the results of developing a model based on an RFR machine learning method using the ISIDA fragment descriptors for predicting the 11B NMR chemical shift of BODIPYs. The model is freely available at https://ochem.eu/article/146458. The model demonstrates the high quality of predicting the 11B NMR chemical shift (RMSE, 5CV (FINALE training set) = 0.40 ppm, RMSE (TEST set) = 0.14 ppm). In addition, we compared the "cost" and the user-friendliness for calculations using the quantum-chemical model with the DFT/GIAO approach. The 11B NMR chemical shift prediction accuracy (RMSE) of the model considered is more than three times higher and tremendously faster than the DFT/GIAO calculations. As a result, we provide a convenient tool and database that we collected for all researchers, that allows them to predict the 11B NMR chemical shift of boron-containing dyes. We believe that the new model will make it easier for researchers to correctly interpret the 11B NMR chemical shifts experimentally determined and to select more optimal conditions to perform an NMR experiment.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Yaroslav I Isaev
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Dmitry M Makarov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Varvara A Eventova
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Ilya A Khodov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Mechail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| |
Collapse
|
16
|
Tsuji N, Sidorov P, Zhu C, Nagata Y, Gimadiev T, Varnek A, List B. Predicting Highly Enantioselective Catalysts Using Tunable Fragment Descriptors. Angew Chem Int Ed Engl 2023; 62:e202218659. [PMID: 36688354 DOI: 10.1002/anie.202218659] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/17/2023] [Accepted: 01/19/2023] [Indexed: 01/24/2023]
Abstract
Catalyst optimization processes typically rely on inductive and qualitative assumptions of chemists based on screening data. While machine learning models using molecular properties or calculated 3D structures enable quantitative data evaluation, costly quantum chemical calculations are often required. In contrast, readily available binary fingerprint descriptors are time- and cost-efficient, but their predictive performance remains insufficient. Here, we describe a machine learning model based on fragment descriptors, which are fine-tuned for asymmetric catalysis and represent cyclic or polyaromatic hydrocarbons, enabling robust and efficient virtual screening. Using training data with only moderate selectivities, we designed theoretically and validated experimentally new catalysts showing higher selectivities in a challenging asymmetric tetrahydropyran synthesis.
Collapse
Affiliation(s)
- Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Chendan Zhu
- Max-Planck-Institut für Kohlenforschung, 45470, Mülheim an der Ruhr, Germany
| | - Yuuya Nagata
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan.,Laboratory of Chemoinformatics, UMR 7140, CNRS, University of Strasbourg, 67081, Strasbourg, France
| | - Benjamin List
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan.,Max-Planck-Institut für Kohlenforschung, 45470, Mülheim an der Ruhr, Germany
| |
Collapse
|
17
|
Kwon Y, Kim S, Choi YS, Kang S. Generative Modeling to Predict Multiple Suitable Conditions for Chemical Reactions. J Chem Inf Model 2022; 62:5952-5960. [PMID: 36413480 DOI: 10.1021/acs.jcim.2c01085] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In synthesis planning, it is important to determine suitable reaction conditions such that a chemical reaction proceeds as intended. Recent research attempts based on machine learning have proven to be effective in recommending reaction elements for specific categories regarding critical chemical context and operating conditions. However, existing methods can only make a single prediction per reaction and do not directly provide a complete specification of the reaction elements as the prediction. Therefore, their achievable performance is limited. In this study, we propose a generative modeling approach to predict multiple different reaction conditions for a chemical reaction, each of which fully specifies critical reaction elements such that these elements can be directly used as a feasible reaction condition. We formulate the problem of predicting reaction conditions as sampling from a generative distribution. We model the distribution by introducing a variational autoencoder augmented with a graph neural network and learn it from a reaction dataset. For a query reaction, multiple predictions can be obtained by repeated sampling from the distribution. Through experimental investigation on the reaction datasets of four major types of cross-coupling reactions, we demonstrate that the proposed method significantly outperforms existing methods in retrieving ground-truth reaction conditions.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon16678, Republic of Korea.,Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul08826, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul08826, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon16419, Republic of Korea
| |
Collapse
|
18
|
Mai H, Le TC, Chen D, Winkler DA, Caruso RA. Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery. Chem Rev 2022; 122:13478-13515. [PMID: 35862246 DOI: 10.1021/acs.chemrev.2c00061] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Electrocatalysts and photocatalysts are key to a sustainable future, generating clean fuels, reducing the impact of global warming, and providing solutions to environmental pollution. Improved processes for catalyst design and a better understanding of electro/photocatalytic processes are essential for improving catalyst effectiveness. Recent advances in data science and artificial intelligence have great potential to accelerate electrocatalysis and photocatalysis research, particularly the rapid exploration of large materials chemistry spaces through machine learning. Here a comprehensive introduction to, and critical review of, machine learning techniques used in electrocatalysis and photocatalysis research are provided. Sources of electro/photocatalyst data and current approaches to representing these materials by mathematical features are described, the most commonly used machine learning methods summarized, and the quality and utility of electro/photocatalyst models evaluated. Illustrations of how machine learning models are applied to novel electro/photocatalyst discovery and used to elucidate electrocatalytic or photocatalytic reaction mechanisms are provided. The review offers a guide for materials scientists on the selection of machine learning methods for electrocatalysis and photocatalysis research. The application of machine learning to catalysis science represents a paradigm shift in the way advanced, next-generation catalysts will be designed and synthesized.
Collapse
Affiliation(s)
- Haoxin Mai
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, Victoria 3001, Australia
| | - Tu C Le
- School of Engineering, STEM College, RMIT University, GPO Box 2476, Melbourne, Victoria 3001, Australia
| | - Dehong Chen
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, Victoria 3001, Australia
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia.,Biochemistry and Chemistry, La Trobe University, Kingsbury Drive, Bundoora, Victoria 3042, Australia.,School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Rachel A Caruso
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, Victoria 3001, Australia
| |
Collapse
|
19
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|
20
|
Spiekermann KA, Pattanaik L, Green WH. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J Phys Chem A 2022; 126:3976-3986. [PMID: 35727075 DOI: 10.1021/acs.jpca.2c02614] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Quantitative estimates of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of experimental data and the steep scaling of accurate quantum calculations often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately estimate performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good density functional theory calculation. Furthermore, our results show that future modeling efforts to estimate reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small molecule chemistry.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
21
|
Heid E, Green WH. Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction. J Chem Inf Model 2022; 62:2101-2110. [PMID: 34734699 PMCID: PMC9092344 DOI: 10.1021/acs.jcim.1c00975] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Indexed: 11/28/2022]
Abstract
The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.
Collapse
Affiliation(s)
- Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
22
|
Schadow G, Borodina YV, Delannée V, Ihlenfeldt WD, Godfrey AG, Nicklaus MC. Reaction SPL – extension of a public document markup standard to chemical reactions. PURE APPL CHEM 2022. [PMCID: PMC9189732 DOI: 10.1515/pac-2021-2011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
There are numerous formats and data models for describing reaction-related data. However, each offers only a limited coverage of the multitude of information that can be of interest to a broad user base in the context of chemical reactions. Structured Product Labeling (SPL) is a robust yet fairly light public XML document standard. It uses a highly generic but usefully refinable data schema, which is, like a language, highly expressive. We are therefore presenting an extension of SPL to chemical reactions (“Reaction SPL”). This extension is designed to support chemical manufacturing processes, which include as a minimum the chemical reaction and the procedures and conditions to run it. We provide an overview of the SPL reaction specification structures followed by some examples of documents with reaction data: predicted single-step reactions, a two-step synthesis, an enzymatic reaction, an example how to represent a reaction center, a patent, and a fully annotated reaction with by-products. Special attention is given to a mechanism for atom-atom mapping of reactions as well as to the possibility to integrate Reaction SPL with laboratory automation equipment, in particular automated synthesis devices.
Collapse
Affiliation(s)
| | | | | | | | - Alexander G. Godfrey
- National Center for Advancing Translational Sciences, NIH , Rockville , MD , USA
| | | |
Collapse
|
23
|
Baskin I, Epshtein A, Ein-Eli Y. Benchmarking machine learning methods for modeling physical properties of ionic liquids. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118616] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
24
|
Prediction of Carbonate Selectivity of PVC-Plasticized Sensor Membranes with Newly Synthesized Ionophores through QSPR Modeling. CHEMOSENSORS 2022. [DOI: 10.3390/chemosensors10020043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Developing a potentiometric sensor with required target properties is a challenging task. This work explores the potential of quantitative structure-property relationship (QSPR) modeling in the prediction of potentiometric selectivity for plasticized polymeric membrane sensors based on newly synthesized ligands. As a case study, we have addressed sensors with selectivity towards carbonate—an important topic for environmental and biomedical studies. Using the logKsel(HCO3−/Cl−) selectivity data on 40 ionophores available in literature and their substructural molecular fragments as descriptors, we have constructed a QSPR model, which has demonstrated reasonable precision in predicting selectivities for newly synthesized ligands sharing similar molecular fragments with those employed for modeling.
Collapse
|
25
|
Afonina VA, Mazitov DA, Nurmukhametova A, Shevelev MD, Khasanova DA, Nugmanov RI, Burilov VA, Madzhidov TI, Varnek A. Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach. Int J Mol Sci 2021; 23:ijms23010248. [PMID: 35008674 PMCID: PMC8745269 DOI: 10.3390/ijms23010248] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 12/18/2021] [Accepted: 12/23/2021] [Indexed: 11/20/2022] Open
Abstract
The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.
Collapse
Affiliation(s)
- Valentina A. Afonina
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Daniyar A. Mazitov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Albina Nurmukhametova
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Maxim D. Shevelev
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, Rue Blaise Pascal, 67000 Strasbourg, France
| | - Dina A. Khasanova
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Ramil I. Nugmanov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Vladimir A. Burilov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
| | - Timur I. Madzhidov
- Chemoinformatics and Molecular Modelling Lab, A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya Str. 18, 420008 Kazan, Russia; (V.A.A.); (D.A.M.); (A.N.); (M.D.S.); (D.A.K.); (R.I.N.); (V.A.B.)
- Correspondence: (T.I.M.); (A.V.)
| | - Alexandre Varnek
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, Rue Blaise Pascal, 67000 Strasbourg, France
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo 001-0021, Japan
- Correspondence: (T.I.M.); (A.V.)
| |
Collapse
|
26
|
Gimadiev T, Nugmanov R, Khakimova A, Fatykhova A, Madzhidov T, Sidorov P, Varnek A. CGRdb2.0: A Python Database Management System for Molecules, Reactions, and Chemical Data. J Chem Inf Model 2021; 62:2015-2020. [PMID: 34843251 DOI: 10.1021/acs.jcim.1c01105] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.
Collapse
Affiliation(s)
- Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Aigul Khakimova
- JSC ≪BIOCAD≫, Petrodvortsoviy District, Strelna, Svyazi st., Bld. 34, Liter A, 198515 St. Petersburg, Russia
| | - Adeliya Fatykhova
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal Str., 67081 Strasbourg, France
| |
Collapse
|
27
|
Orlov AA, Demenko DY, Bignaud C, Valtz A, Marcou G, Horvath D, Coquelet C, Varnek A, de Meyer F. Chemoinformatics-Driven Design of New Physical Solvents for Selective CO 2 Absorption. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:15542-15553. [PMID: 34736317 DOI: 10.1021/acs.est.1c04092] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The removal of CO2 from gases is an important industrial process in the transition to a low-carbon economy. The use of selective physical (co-)solvents is especially perspective in cases when the amount of CO2 is large as it enables one to lower the energy requirements for solvent regeneration. However, only a few physical solvents have found industrial application and the design of new ones can pave the way to more efficient gas treatment techniques. Experimental screening of gas solubility is a labor-intensive process, and solubility modeling is a viable strategy to reduce the number of solvents subject to experimental measurements. In this paper, a chemoinformatics-based modeling workflow was applied to build a predictive model for the solubility of CO2 and four other industrially important gases (CO, CH4, H2, and N2). A dataset containing solubilities of gases in 280 solvents was collected from literature sources and supplemented with the new data for six solvents measured in the present study. A modeling workflow based on the usage of several state-of-the-art machine learning algorithms was applied to establish quantitative structure-solubility relationships. The best models were used to perform virtual screening of the industrially produced chemicals. It enabled the identification of compounds with high predicted CO2 solubility and selectivity toward other gases. The prediction for one of the compounds, 4-methylmorpholine, was confirmed experimentally.
Collapse
Affiliation(s)
- Alexey A Orlov
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, Strasbourg 67081, France
| | - Daryna Yu Demenko
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, Strasbourg 67081, France
| | - Charles Bignaud
- TotalEnergies S.E., Exploration Production, Development and Support to Operations, Liquefied Natural Gas - Acid Gas Entity, CCUS R&D Program, Paris 92078, France
| | - Alain Valtz
- MINES ParisTech, PSL University, Centre de thermodynamique des procédés (CTP), 35 rue St Honoré, 77300 Fontainebleau, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, Strasbourg 67081, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, Strasbourg 67081, France
| | - Christophe Coquelet
- MINES ParisTech, PSL University, Centre de thermodynamique des procédés (CTP), 35 rue St Honoré, 77300 Fontainebleau, France
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, Strasbourg 67081, France
| | - Frédérick de Meyer
- TotalEnergies S.E., Exploration Production, Development and Support to Operations, Liquefied Natural Gas - Acid Gas Entity, CCUS R&D Program, Paris 92078, France
- MINES ParisTech, PSL University, Centre de thermodynamique des procédés (CTP), 35 rue St Honoré, 77300 Fontainebleau, France
| |
Collapse
|
28
|
Machine learning modelling of chemical reaction characteristics: yesterday, today, tomorrow. MENDELEEV COMMUNICATIONS 2021. [DOI: 10.1016/j.mencom.2021.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
29
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
30
|
Varnek A, Zankov D, Polishchuk P, Madzhidov T. Multi-Instance Learning Approach to Predictive Modeling of Catalysts Enantioselectivity. Synlett 2021. [DOI: 10.1055/a-1553-0427] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
AbstractHere, we report an application of the multi-instance learning approach to predictive modeling of enantioselectivity of chiral catalysts. Catalysts were represented by ensembles of conformations encoded by the pmapper physicochemical descriptors capturing stereoconfiguration of the molecule. Each catalyzed chemical reaction was transformed to a condensed graph of reaction for which ISIDA fragment descriptors were generated. This approach does not require any conformations’ alignment and can potentially be used for a diverse set of catalysts bearing different scaffolds. Its efficiency has been demonstrated in predicting the selectivity of BINOL-derived phosphoric acid catalysts in asymmetric thiol addition to N-acylimines and benchmarked with previously reported models.
Collapse
Affiliation(s)
- A. Varnek
- Laboratory of Chemoinformatics, University of Strasbourg
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University
| | - D. Zankov
- Laboratory of Chemoinformatics, University of Strasbourg
- Laboratory of Chemoinformatics and Molecular Modeling, Kazan Federal University
| | - P. Polishchuk
- Institute of Molecular and Translational Medicine, Palacký University
| | - T. Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Kazan Federal University
| |
Collapse
|
31
|
Baybekov S, Marcou G, Ramos P, Saurel O, Galzi JL, Varnek A. DMSO Solubility Assessment for Fragment-Based Screening. Molecules 2021; 26:3950. [PMID: 34203441 PMCID: PMC8271413 DOI: 10.3390/molecules26133950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 06/23/2021] [Accepted: 06/23/2021] [Indexed: 11/16/2022] Open
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
Collapse
Affiliation(s)
- Shamkhal Baybekov
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Gilles Marcou
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| | - Pascal Ramos
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Olivier Saurel
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse CNRS, UPS, 205 Route de Narbonne, 31077 Toulouse, France; (P.R.); (O.S.)
| | - Jean-Luc Galzi
- Biotechnologie et Signalisation Cellulaire UMR 7242 CNRS, École Supérieure de Biotechnologie de Strasbourg, University of Strasbourg, 300 Boulevard Sébastien Brant, 67412 Illkirch, France;
- ChemBioFrance—Chimiothèque Nationale UAR3035, 8 Rue de L’école Normale, CEDEX 05, 34296 Montpellier, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique UMR 7140 CNRS, Institut Le Bel, University of Strasbourg, 4 Rue Blaise Pascal, 67081 Strasbourg, France; (S.B.); (G.M.)
| |
Collapse
|
32
|
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown J, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, et alMansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TE, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown J, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. CATMoS: Collaborative Acute Toxicity Modeling Suite. ENVIRONMENTAL HEALTH PERSPECTIVES 2021; 129:47013. [PMID: 33929906 PMCID: PMC8086800 DOI: 10.1289/ehp8495] [Show More Authors] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 03/10/2021] [Accepted: 03/19/2021] [Indexed: 05/02/2023]
Abstract
BACKGROUND Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD 50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD 50 (LD 50 ≤ 50 mg / kg )], and nontoxic chemicals (L D 50 > 2,000 mg / kg ). METHODS An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
Collapse
Affiliation(s)
- Kamel Mansouri
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | | | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Prachi Pradeep
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | | | - Timothy E.H. Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dave Allen
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Vinicius M. Alves
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Carolina H. Andrade
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Davide Ballabio
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Shannon Bell
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Sudin Bhattacharya
- Institute for Quantitative Health Science and Engineering, Department of Biomedical Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Joyce V. Bastos
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Stephen Boyd
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - J.B. Brown
- Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Stephen J. Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yaroslav Chushak
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Heather Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Alex M. Clark
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Viviana Consonni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | | | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., Raleigh, North Carolina, USA
| | - Sherif Farag
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Maxim Fedorov
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Feng Gao
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Jeffery M. Gearhart
- Aeromedical Research Department, Force Health Protection, USAFSAM, Dayton, Ohio, USA
- Henry M Jackson Foundation for the Advancement of Military Medicine, Dayton, Ohio, USA
| | - Garett Goh
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jonathan M. Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Francesca Grisoni
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Christopher M. Grulke
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | | | - Matthew Hirn
- Department of Computational Mathematics, Science & Engineering, Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Pavel Karpov
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
| | | | - Giovanna J. Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Xinhao Li
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, USA
| | | | - Filippo Lunghini
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Giuseppe F. Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Dan Marsh
- Underwriters Laboratories, Northbrook, Illinois, USA
| | - Todd Martin
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Cincinnati, Ohio, USA
| | | | - Eugene N. Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | | | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Orazio Nicolotti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Reine Note
- L’Oréal Research & Innovation, Aulnay-sous-Bois, France
| | - Paritosh Pande
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Tyler Peryea
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Robert Rallo
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Patricia Ruiz
- Office of Innovation and Analytics, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Ahmed Sayed
- Rosettastein Consulting UG, Freising, Germany
| | - Risa Sayre
- Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
- Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Timothy Sheils
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Charles Siegel
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Arthur C. Silva
- Laboratory for Molecular Modeling and Design, Faculty of Pharmacy, Federal University of Goiás, Goiania, Brazil
| | - Anton Simeonov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Sergey Sosnin
- Skoltech, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Noel Southall
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Judy Strickland
- Integrated Laboratory Systems, LLC, Morrisville, North Carolina, USA
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Brian Teppen
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan, USA
| | - Igor V. Tetko
- Institute of Structural Biology, Helmholtz Zentrum München (GmbH), Neuherberg, Germany
- BIGCHEM GmbH, Unterschleissheim, Germany
| | - Dennis Thomas
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Roberto Todeschini
- Milano Chemometrics & QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Ignacio Tripodi
- Computer Science/Interdisciplinary Quantitative Biology, University of Colorado, Boulder, Colorado, USA
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, URM7140, Université de Strasbourg, Strasbourg, France
| | - Kristijan Vukovic
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Zhongyu Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Liguo Wang
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | | | - Andrew J. Wedlake
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Dan Wilson
- The Dow Chemical Company, Midland, Michigan, USA
| | - Zijun Xiao
- School of Environmental Sciences and Technology, Dalian University of Technology; Dalian, Liaoning, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Gergely Zahoranszky-Kohalmi
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | | | - Zhen Zhang
- Dow Agrosciences, Indianapolis, Indiana, USA
| | - Tongan Zhao
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | | | - Warren Casey
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Research Triangle Park, North Carolina, USA
| |
Collapse
|
33
|
Kumar S, Kim MH. SMPLIP-Score: predicting ligand binding affinity from simple and interpretable on-the-fly interaction fingerprint pattern descriptors. J Cheminform 2021; 13:28. [PMID: 33766140 PMCID: PMC7993508 DOI: 10.1186/s13321-021-00507-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/16/2021] [Indexed: 12/13/2022] Open
Abstract
In drug discovery, rapid and accurate prediction of protein–ligand binding affinities is a pivotal task for lead optimization with acceptable on-target potency as well as pharmacological efficacy. Furthermore, researchers hope for a high correlation between docking score and pose with key interactive residues, although scoring functions as free energy surrogates of protein–ligand complexes have failed to provide collinearity. Recently, various machine learning or deep learning methods have been proposed to overcome the drawbacks of scoring functions. Despite being highly accurate, their featurization process is complex and the meaning of the embedded features cannot directly be interpreted by human recognition without an additional feature analysis. Here, we propose SMPLIP-Score (Substructural Molecular and Protein–Ligand Interaction Pattern Score), a direct interpretable predictor of absolute binding affinity. Our simple featurization embeds the interaction fingerprint pattern on the ligand-binding site environment and molecular fragments of ligands into an input vectorized matrix for learning layers (random forest or deep neural network). Despite their less complex features than other state-of-the-art models, SMPLIP-Score achieved comparable performance, a Pearson’s correlation coefficient up to 0.80, and a root mean square error up to 1.18 in pK units with several benchmark datasets (PDBbind v.2015, Astex Diverse Set, CSAR NRC HiQ, FEP, PDBbind NMR, and CASF-2016). For this model, generality, predictive power, ranking power, and robustness were examined using direct interpretation of feature matrices for specific targets. ![]()
Collapse
Affiliation(s)
- Surendra Kumar
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea
| | - Mi-Hyun Kim
- Gachon Institute of Pharmaceutical Science & Department of Pharmacy, College of Pharmacy, Gachon University, 191 Hambakmoeiro, Yeonsu-gu, Incheon, Republic of Korea.
| |
Collapse
|
34
|
Abstract
As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until 'big data' applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.
Collapse
|
35
|
Rakhimbekova A, Akhmetshin TN, Minibaeva GI, Nugmanov RI, Gimadiev TR, Madzhidov TI, Baskin II, Varnek A. Cross-validation strategies in QSPR modelling of chemical reactions. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:207-219. [PMID: 33601989 DOI: 10.1080/1062936x.2021.1883107] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/26/2021] [Indexed: 06/12/2023]
Abstract
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
Collapse
Affiliation(s)
- A Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T N Akhmetshin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
| | - G I Minibaeva
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - R I Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T R Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| | - T I Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - I I Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - A Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| |
Collapse
|
36
|
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 2021; 11:3178. [PMID: 33542271 PMCID: PMC7862614 DOI: 10.1038/s41598-021-81889-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/06/2021] [Indexed: 12/18/2022] Open
Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Igor I Baskin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Artem Mukanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan.
| |
Collapse
|
37
|
Gimadiev T, Nugmanov R, Batyrshin D, Madzhidov T, Maeda S, Sidorov P, Varnek A. Combined Graph/Relational Database Management System for Calculated Chemical Reaction Pathway Data. J Chem Inf Model 2021; 61:554-559. [PMID: 33502186 DOI: 10.1021/acs.jcim.0c01280] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.
Collapse
Affiliation(s)
- Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Satoshi Maeda
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021 Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081 Strasbourg, France
| |
Collapse
|
38
|
Jorner K, Brinck T, Norrby PO, Buttar D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 2021; 12:1163-1175. [PMID: 36299676 PMCID: PMC9528810 DOI: 10.1039/d0sc04896h] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/02/2020] [Indexed: 12/19/2022] Open
Abstract
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol-1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints.
Collapse
Affiliation(s)
- Kjell Jorner
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| | - Tore Brinck
- Applied Physical Chemistry, Department of Chemistry, CBH, KTH Royal Institute of Technology Stockholm Sweden
| | - Per-Ola Norrby
- Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Gothenburg Sweden
| | - David Buttar
- Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca Macclesfield UK
| |
Collapse
|
39
|
Varnek A, Baskin II. Modern Trends in Chemical Reactions Modeling. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11543-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
40
|
Chemical Graph Theory for Property Modeling in QSAR and QSPR—Charming QSAR & QSPR. MATHEMATICS 2020. [DOI: 10.3390/math9010060] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Quantitative structure-activity relationship (QSAR) and Quantitative structure-property relationship (QSPR) are mathematical models for the prediction of the chemical, physical or biological properties of chemical compounds. Usually, they are based on structural (grounded on fragment contribution) or calculated (centered on QSAR three-dimensional (QSAR-3D) or chemical descriptors) parameters. Hereby, we describe a Graph Theory approach for generating and mining molecular fragments to be used in QSAR or QSPR modeling based exclusively on fragment contributions. Merging of Molecular Graph Theory, Simplified Molecular Input Line Entry Specification (SMILES) notation, and the connection table data allows a precise way to differentiate and count the molecular fragments. Machine learning strategies generated models with outstanding root mean square error (RMSE) and R2 values. We also present the software Charming QSAR & QSPR, written in Python, for the property prediction of chemical compounds while using this approach.
Collapse
|
41
|
Guan Y, Coley CW, Wu H, Ranasinghe D, Heid E, Struble TJ, Pattanaik L, Green WH, Jensen KF. Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors. Chem Sci 2020; 12:2198-2208. [PMID: 34163985 PMCID: PMC8179287 DOI: 10.1039/d0sc04823b] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 12/19/2020] [Indexed: 12/20/2022] Open
Abstract
Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C-H functionalization, aromatic C-X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings.
Collapse
Affiliation(s)
- Yanfei Guan
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Duminda Ranasinghe
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Esther Heid
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thomas J Struble
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
42
|
Cabrera-Andrade A, López-Cortés A, Jaramillo-Koupermann G, González-Díaz H, Pazos A, Munteanu CR, Pérez-Castillo Y, Tejera E. A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery through Drug Repurposing. Pharmaceuticals (Basel) 2020; 13:ph13110409. [PMID: 33266378 PMCID: PMC7700154 DOI: 10.3390/ph13110409] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 02/08/2023] Open
Abstract
Osteosarcoma is the most common type of primary malignant bone tumor. Although nowadays 5-year survival rates can reach up to 60–70%, acute complications and late effects of osteosarcoma therapy are two of the limiting factors in treatments. We developed a multi-objective algorithm for the repurposing of new anti-osteosarcoma drugs, based on the modeling of molecules with described activity for HOS, MG63, SAOS2, and U2OS cell lines in the ChEMBL database. Several predictive models were obtained for each cell line and those with accuracy greater than 0.8 were integrated into a desirability function for the final multi-objective model. An exhaustive exploration of model combinations was carried out to obtain the best multi-objective model in virtual screening. For the top 1% of the screened list, the final model showed a BEDROC = 0.562, EF = 27.6, and AUC = 0.653. The repositioning was performed on 2218 molecules described in DrugBank. Within the top-ranked drugs, we found: temsirolimus, paclitaxel, sirolimus, everolimus, and cabazitaxel, which are antineoplastic drugs described in clinical trials for cancer in general. Interestingly, we found several broad-spectrum antibiotics and antiretroviral agents. This powerful model predicts several drugs that should be studied in depth to find new chemotherapy regimens and to propose new strategies for osteosarcoma treatment.
Collapse
Affiliation(s)
- Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito 170125, Ecuador
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Correspondence: (A.C.-A.); (E.T.)
| | - Andrés López-Cortés
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador
- Latin American Network for Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), 28029 Madrid, Spain
| | - Gabriela Jaramillo-Koupermann
- Laboratorio de Biología Molecular, Subproceso de Anatomía Patológica, Hospital de Especialidades Eugenio Espejo, Quito 170403, Ecuador;
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, and Basque Center for Biophysics CSIC-UPV/EHU, University of the Basque Country UPV/EHU, 48940 Leioa, Spain;
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
| | - Cristian R. Munteanu
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, CITIC, Campus Elviña s/n, 15071 A Coruña, Spain; (A.L.-C.); (A.P.); (C.R.M.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170125, Ecuador
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Quito 170125, Ecuador
- Correspondence: (A.C.-A.); (E.T.)
| |
Collapse
|
43
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 215] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
44
|
Chaube S, Goverapet Srinivasan S, Rai B. Applied machine learning for predicting the lanthanide-ligand binding affinities. Sci Rep 2020; 10:14322. [PMID: 32868845 PMCID: PMC7459320 DOI: 10.1038/s41598-020-71255-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 08/12/2020] [Indexed: 11/25/2022] Open
Abstract
Binding affinities of metal-ligand complexes are central to a multitude of applications like drug design, chelation therapy, designing reagents for solvent extraction etc. While state-of-the-art molecular modelling approaches are usually employed to gather structural and chemical insights about the metal complexation with ligands, their computational cost and the limited ability to predict metal-ligand stability constants with reasonable accuracy, renders them impractical to screen large chemical spaces. In this context, leveraging vast amounts of experimental data to learn the metal-binding affinities of ligands becomes a promising alternative. Here, we develop a machine learning framework for predicting binding affinities (logK1) of lanthanide cations with several structurally diverse molecular ligands. Six supervised machine learning algorithms-Random Forest (RF), k-Nearest Neighbours (KNN), Support Vector Machines (SVM), Kernel Ridge Regression (KRR), Multi Layered Perceptrons (MLP) and Adaptive Boosting (AdaBoost)-were trained on a dataset comprising thousands of experimental values of logK1 and validated in an external 10-folds cross-validation procedure. This was followed by a thorough feature engineering and feature importance analysis to identify the molecular, metallic and solvent features most relevant to binding affinity prediction, along with an evaluation of performance metrics against the dimensionality of feature space. Having demonstrated the excellent predictive ability of our framework, we utilized the best performing AdaBoost model to predict the logK1 values of lanthanide cations with nearly 71 million compounds present in the PubChem database. Our methodology opens up an opportunity for significantly accelerating screening and design of ligands for various targeted applications, from vast chemical spaces.
Collapse
Affiliation(s)
- Suryanaman Chaube
- TCS Research, Tata Research Development and Design Center, 54-B Hadapsar Industrial Estate, Hadapsar, Pune, Maharashtra, 411013, India
| | - Sriram Goverapet Srinivasan
- TCS Research, Tata Research Development and Design Center, 54-B Hadapsar Industrial Estate, Hadapsar, Pune, Maharashtra, 411013, India.
| | - Beena Rai
- TCS Research, Tata Research Development and Design Center, 54-B Hadapsar Industrial Estate, Hadapsar, Pune, Maharashtra, 411013, India
| |
Collapse
|
45
|
Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A. Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions. Int J Mol Sci 2020; 21:E5542. [PMID: 32756326 PMCID: PMC7432167 DOI: 10.3390/ijms21155542] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/27/2020] [Accepted: 07/30/2020] [Indexed: 01/28/2023] Open
Abstract
Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
Collapse
Affiliation(s)
- Assima Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur I. Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Ramil I. Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
| | - Timur R. Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
| | - Igor I. Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 420008 Kazan, Russia; (A.R.); (R.I.N.); (I.I.B.)
- Faculty of Physics, Moscow State University, 119234 Moscow, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo 001-0021, Japan;
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
46
|
Baskin II, Lozano S, Durot M, Marcou G, Horvath D, Varnek A. Autoignition temperature: comprehensive data analysis and predictive models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:597-613. [PMID: 32646236 DOI: 10.1080/1062936x.2020.1785933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 06/11/2023]
Abstract
Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built. This led to a selection of the dataset containing 875 compounds annotated with AIT values. The thereupon-based SVR model performs reasonably well in cross-validation with the determination coefficient r 2 = 0.77 and mean absolute error MAE = 37.8°C. External validation on 20 industrial compounds missing in the training set confirmed its good predictive power (MAE = 28.7°C).
Collapse
Affiliation(s)
- I I Baskin
- Laboratory of Chemoinformatics, University of Strasbourg, UMR 7140 CNRS/UniStra , Strasbourg, France
| | - S Lozano
- BioLab, Centre de Recherche de Solaize, Total , Solaize, France
| | - M Durot
- BioLab, Centre de Recherche de Solaize, Total , Solaize, France
| | - G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, UMR 7140 CNRS/UniStra , Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, UMR 7140 CNRS/UniStra , Strasbourg, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, UMR 7140 CNRS/UniStra , Strasbourg, France
| |
Collapse
|
47
|
Thermodynamic radii of lanthanide ions derived from metal–ligand complexes stability constants. J INCL PHENOM MACRO 2020. [DOI: 10.1007/s10847-020-01010-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
48
|
Bosc N, Muller C, Hoffer L, Lagorce D, Bourg S, Derviaux C, Gourdel ME, Rain JC, Miller TW, Villoutreix BO, Miteva MA, Bonnet P, Morelli X, Sperandio O, Roche P. Fr-PPIChem: An Academic Compound Library Dedicated to Protein-Protein Interactions. ACS Chem Biol 2020; 15:1566-1574. [PMID: 32320205 PMCID: PMC7399473 DOI: 10.1021/acschembio.0c00179] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Protein-protein interactions (PPIs) mediate nearly every cellular process and represent attractive targets for modulating disease states but are challenging to target with small molecules. Despite this, several PPI inhibitors (iPPIs) have entered clinical trials, and a growing number of PPIs have become validated drug targets. However, high-throughput screening efforts still endure low hit rates mainly because of the use of unsuitable screening libraries. Here, we describe the collective effort of a French consortium to build, select, and store in plates a unique chemical library dedicated to the inhibition of PPIs. Using two independent predictive models and two updated databases of experimentally confirmed PPI inhibitors developed by members of the consortium, we built models based on different training sets, molecular descriptors, and machine learning methods. Independent statistical models were used to select putative PPI inhibitors from large commercial compound collections showing great complementarity. Medicinal chemistry filters were applied to remove undesirable structures from this set (such as PAINS, frequent hitters, and toxic compounds) and to improve drug likeness. The remaining compounds were subjected to a clustering procedure to reduce the final size of the library while maintaining its chemical diversity. In practice, the library showed a 46-fold activity rate enhancement when compared to a non-iPPI-enriched diversity library in high-throughput screening against the CD47-SIRPα PPI. The Fr-PPIChem library is plated in 384-well plates and will be distributed on demand to the scientific community as a powerful tool for discovering new chemical probes and early hits for the development of potential therapeutic drugs.
Collapse
Affiliation(s)
- Nicolas Bosc
- Inserm U973 MTi, 25 rue Hélène Brion 75013 Paris
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR3528, 28 rue du Dr Roux 75015 Paris
| | - Christophe Muller
- IPC Drug Discovery Platform, Institut Paoli-Calmettes, 232 Boulevard de Sainte-Marguerite, 13009, Marseille, France
| | - Laurent Hoffer
- CRCM, CNRS, INSERM, Institut Paoli-Calmettes, Aix-Marseille Univ, 13009 Marseille, France
| | - David Lagorce
- Université de Paris, INSERM US14, Plateforme Maladies Rares - Orphanet, 75014 Paris, France
| | - Stéphane Bourg
- Institut de Chimie Organique et Analytique (ICOA), Université d’Orléans, UMR CNRS 7311, BP 6759, 45067 Orléans. France
| | - Carine Derviaux
- IPC Drug Discovery Platform, Institut Paoli-Calmettes, 232 Boulevard de Sainte-Marguerite, 13009, Marseille, France
| | - Marie-Edith Gourdel
- Hybrigenics Services SAS, 1 rue Pierre Fontaine, 91000 Evry Courcouronnes, France
| | - Jean-Christophe Rain
- Hybrigenics Services SAS, 1 rue Pierre Fontaine, 91000 Evry Courcouronnes, France
| | - Thomas W. Miller
- IPC Drug Discovery Platform, Institut Paoli-Calmettes, 232 Boulevard de Sainte-Marguerite, 13009, Marseille, France
| | - Bruno O. Villoutreix
- Université de Lille, INSERM, Institut Pasteur de Lille, U1177 - Drugs and Molecules for living Systems, 59000 Lille, France
| | - Maria A. Miteva
- Inserm U1268 MCTR, CNRS UMR 8038 CiTCoM – Univ. De Paris, Faculté de Pharmacie de Paris, 75006 Paris, France
| | - Pascal Bonnet
- Institut de Chimie Organique et Analytique (ICOA), Université d’Orléans, UMR CNRS 7311, BP 6759, 45067 Orléans. France
| | - Xavier Morelli
- IPC Drug Discovery Platform, Institut Paoli-Calmettes, 232 Boulevard de Sainte-Marguerite, 13009, Marseille, France
- CRCM, CNRS, INSERM, Institut Paoli-Calmettes, Aix-Marseille Univ, 13009 Marseille, France
| | - Olivier Sperandio
- Inserm U973 MTi, 25 rue Hélène Brion 75013 Paris
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR3528, 28 rue du Dr Roux 75015 Paris
| | - Philippe Roche
- CRCM, CNRS, INSERM, Institut Paoli-Calmettes, Aix-Marseille Univ, 13009 Marseille, France
| |
Collapse
|
49
|
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, Tropsha A. QSAR without borders. Chem Soc Rev 2020; 49:3525-3564. [PMID: 32356548 PMCID: PMC8008490 DOI: 10.1039/d0cs00098a] [Citation(s) in RCA: 384] [Impact Index Per Article: 76.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
Collapse
Affiliation(s)
- Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Li X, Fourches D. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. J Cheminform 2020; 12:27. [PMID: 33430978 PMCID: PMC7178569 DOI: 10.1186/s13321-020-00430-x] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/15/2020] [Indexed: 12/25/2022] Open
Abstract
Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood–brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far.![]()
Collapse
Affiliation(s)
- Xinhao Li
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|