1
|
Panwar P, Yang Q, Martini A. Temperature-Dependent Density and Viscosity Prediction for Hydrocarbons: Machine Learning and Molecular Dynamics Simulations. J Chem Inf Model 2024; 64:2760-2774. [PMID: 37582234 DOI: 10.1021/acs.jcim.3c00231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Machine learning-based predictive models allow rapid and reliable prediction of material properties and facilitate innovative materials design. Base oils used in the formulation of lubricant products are complex hydrocarbons of varying sizes and structure. This study developed Gaussian process regression-based models to accurately predict the temperature-dependent density and dynamic viscosity of 305 complex hydrocarbons. In our approach, strongly correlated/collinear predictors were trimmed, important predictors were selected by least absolute shrinkage and selection operator (LASSO) regularization and prior domain knowledge, hyperparameters were systematically optimized by Bayesian optimization, and the models were interpreted. The approach provided versatile and quantitative structure-property relationship (QSPR) models with relatively simple predictors for determining the dynamic viscosity and density of complex hydrocarbons at any temperature. In addition, we developed molecular dynamics simulation-based descriptors and evaluated the feasibility and versatility of dynamic descriptors from simulations for predicting the material properties. It was found that the models developed using a comparably smaller pool of dynamic descriptors performed similarly in predicting density and viscosity to models based on many more static descriptors. The best models were shown to predict density and dynamic viscosity with coefficient of determination (R2) values of 99.6% and 97.7%, respectively, for all data sets, including a test data set of 45 molecules. Finally, partial dependency plots (PDPs), individual conditional expectation (ICE) plots, local interpretable model-agnostic explanation (LIME) values, and trimmed model R2 values were used to identify the most important static and dynamic predictors of the density and viscosity.
Collapse
Affiliation(s)
- Pawan Panwar
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| | - Quanpeng Yang
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| | - Ashlie Martini
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States
| |
Collapse
|
2
|
Duprat F, Ploix JL, Aubry JM, Gaudin T. Fast and Accurate Prediction of Refractive Index of Organic Liquids with Graph Machines. Molecules 2023; 28:6805. [PMID: 37836648 PMCID: PMC10574377 DOI: 10.3390/molecules28196805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/22/2023] [Accepted: 09/23/2023] [Indexed: 10/15/2023] Open
Abstract
The refractive index (RI) of liquids is a key physical property of molecular compounds and materials. In addition to its ubiquitous role in physics, it is also exploited to impart specific optical properties (transparency, opacity, and gloss) to materials and various end-use products. Since few methods exist to accurately estimate this property, we have designed a graph machine model (GMM) capable of predicting the RI of liquid organic compounds containing up to 16 different types of atoms and effective in discriminating between stereoisomers. Using 8267 carefully checked RI values from the literature and the corresponding 2D organic structures, the GMM provides a training root mean square relative error of less than 0.5%, i.e., an RMSE of 0.004 for the estimation of the refractive index of the 8267 compounds. The GMM predictive ability is also compared to that obtained by several fragment-based approaches. Finally, a Docker-based tool is proposed to predict the RI of organic compounds solely from their SMILES code. The GMM developed is easy to apply, as shown by the video tutorials provided on YouTube.
Collapse
Affiliation(s)
- François Duprat
- Molecular, Macromolecular Chemistry and Materials, ESPCI Paris, PSL Research University, 75005 Paris, France;
| | - Jean-Luc Ploix
- Molecular, Macromolecular Chemistry and Materials, ESPCI Paris, PSL Research University, 75005 Paris, France;
| | - Jean-Marie Aubry
- Unité de Catalyse et Chimie du Solide, Centrale Lille, University Lille, UMR CNRS 8181, 59000 Lille, France;
| | | |
Collapse
|
3
|
Zhu W, Zhang Y, Zhao D, Xu J, Wang L. HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention. J Chem Inf Model 2023; 63:43-55. [PMID: 36519623 DOI: 10.1021/acs.jcim.2c01099] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNNs) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural network (termed HiGNN) framework for predicting molecular property by utilizing a corepresentation learning of molecular graphs and chemically synthesizable breaking of retrosynthetically interesting chemical substructure (BRICS) fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark data sets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of the HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn.
Collapse
Affiliation(s)
- Weimin Zhu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Yi Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Jianrong Xu
- Department of Pharmacology and Chemical Biology, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China.,Academy of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai201203, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| |
Collapse
|
4
|
Delforce L, Duprat F, Ploix JL, Ontiveros JF, Goussard V, Nardello-Rataj V, Aubry JM. Fast Prediction of the Equivalent Alkane Carbon Number Using Graph Machines and Neural Networks. ACS OMEGA 2022; 7:38869-38881. [PMID: 36340160 PMCID: PMC9631404 DOI: 10.1021/acsomega.2c04592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 08/09/2022] [Indexed: 06/16/2023]
Abstract
The hydrophobicity of oils is a key parameter to design surfactant/oil/water (SOW) macro-, micro-, or nano-dispersed systems with the desired features. This essential physicochemical characteristic is quantitatively expressed by the equivalent alkane carbon number (EACN) whose experimental determination is tedious since it requires knowledge of the phase behavior of the SOW systems at different temperatures and for different surfactant concentrations. In this work, two mathematical models are proposed for the rapid prediction of the EACN of oils. They have been designed using artificial intelligence (machine-learning) methods, namely, neural networks (NN) and graph machines (GM). While the GM model is implemented from the SMILES codes of a 111-molecule training set of known EACN values, the NN model is fed with some σ-moment descriptors computed with the COSMOtherm software for the 111-molecule set. In a preliminary step, the leave-one-out algorithm is used to select, given the available data, the appropriate complexity of the two models. A comparison of the EACNs of liquids of a fresh set of 10 complex cosmetic and perfumery molecules shows that the two approaches provide comparable results in terms of accuracy and reliability. Finally, the NN and GM models are applied to nine series of homologous compounds, for which the GM model results are in better agreement with the experimental EACN trends than the NN model predictions. The results obtained by the GMs and by the NN based on σ-moments can be duplicated with the demonstration tool available for download as detailed in the Supporting Information.
Collapse
Affiliation(s)
- Lucie Delforce
- University
of Lille, CNRS, Centrale Lille, Université d′Artois,
UMR 8181—UCCS—Unité de Catalyse et Chimie du
Solide, F-59000Lille, France
| | - François Duprat
- Laboratoire
de Chimie Organique, CNRS, ESPCI Paris,
PSL Research University, 10 rue Vauquelin, 75005Paris, France
| | - Jean-Luc Ploix
- Laboratoire
de Chimie Organique, CNRS, ESPCI Paris,
PSL Research University, 10 rue Vauquelin, 75005Paris, France
| | - Jesus Fermín Ontiveros
- University
of Lille, CNRS, Centrale Lille, Université d′Artois,
UMR 8181—UCCS—Unité de Catalyse et Chimie du
Solide, F-59000Lille, France
| | - Valentin Goussard
- University
of Lille, CNRS, Centrale Lille, Université d′Artois,
UMR 8181—UCCS—Unité de Catalyse et Chimie du
Solide, F-59000Lille, France
| | - Véronique Nardello-Rataj
- University
of Lille, CNRS, Centrale Lille, Université d′Artois,
UMR 8181—UCCS—Unité de Catalyse et Chimie du
Solide, F-59000Lille, France
| | - Jean-Marie Aubry
- University
of Lille, CNRS, Centrale Lille, Université d′Artois,
UMR 8181—UCCS—Unité de Catalyse et Chimie du
Solide, F-59000Lille, France
| |
Collapse
|
5
|
Bio-based alternatives to volatile silicones: Relationships between chemical structure, physicochemical properties and functional performances. Adv Colloid Interface Sci 2022; 304:102679. [PMID: 35512559 DOI: 10.1016/j.cis.2022.102679] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 04/08/2022] [Accepted: 04/13/2022] [Indexed: 11/23/2022]
Abstract
Emollient oils are ubiquitous ingredients of personal care products, especially skin care and hair care formulations. They offer excellent spreading properties and give end-use products a soft, pleasant and non-sticky after-feel. Emollients belong to various petro- or bio-based chemical families among which silicone oils, hydrocarbons and esters are the most prominent. Silicones have exceptional physicochemical and sensory properties but their high chemical stability results in very low biodegradability and a high bioaccumulation potential. Nowadays, consumers are increasingly responsive to environmental issues and demand more environmentally friendly products. This awareness strongly encourages cosmetics industries to develop bio-based alternatives to silicone oils. Finding effective silicon-free emollients requires understanding the molecular origin of emollience. This review details the relationships between the molecular structures of emollients and their physicochemical properties as well as the resulting functional performances in order to facilitate the design of alternative oils with suitable physicochemical and sensory properties. The molecular profile of an ideal emollient in terms of chemical function (alkane, ether, ester, carbonate, alcohol), optimal number of carbons and branching is established to obtain an odourless oil with good spreading on the skin. Since none of the carbon-based emollients alone can imitate the non-sticky and dry feel of silicone oils, it is judicious to blend alkanes and esters to significantly improve both the sensory properties and the solubilizing properties of the synergistic mixture towards polar ingredients (sun filters, antioxidants, fragrances). Finally, it is shown how modelling tools (QSPR, COSMO-RS and neural networks) can predict in silico the key properties of hundreds of virtual candidate molecules in order to synthesize only the most promising whose predicted properties are close to the specifications.
Collapse
|
6
|
Goussard V, Duprat F, Ploix JL, Dreyfus G, Nardello-Rataj V, Aubry JM. A New Machine-Learning Tool for Fast Estimation of Liquid Viscosity. Application to Cosmetic Oils. J Chem Inf Model 2020; 60:2012-2023. [PMID: 32250628 DOI: 10.1021/acs.jcim.0c00083] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The viscosities of pure liquids are estimated at 25 °C, from their molecular structures, using three modeling approaches: group contributions, COSMO-RS σ-moment-based neural networks, and graph machines. The last two are machine-learning methods, whereby models are designed and trained from a database of viscosities of 300 molecules at 25 °C. Group contributions and graph machines make use of the 2D-structures only (the SMILES codes of the molecules), while neural networks estimations are based on a set of five descriptors: COSMO-RS σ-moments. For the first time, leave-one-out is used for graph machine selection, and it is shown that it can be replaced with the much faster virtual leave-one-out algorithm. The database covers a wide diversity of chemical structures, namely, alkanes, ethers, esters, ketones, carbonates, acids, alcohols, silanes, and siloxanes, as well as different chemical backbone, i.e., straight, branched, or cyclic chains. A comparison of the viscosities of liquids of an independent set of 22 cosmetic oils shows that the graph machine approach provides the most accurate results given the available data. The results obtained by the neural network based on sigma-moments and by the graph machines can be duplicated easily by using a demonstration tool based on the Docker technology, available for download as explained in the Supporting Information. This demonstration also allows the reader to predict, at 25 °C, the viscosity of any liquid of moderate molecular size (M < 600 Da) that contains C, H, O, or Si atoms, starting either from its SMILES code or from its σ-moments computed with the COSMOtherm software.
Collapse
Affiliation(s)
- Valentin Goussard
- Université de Lille, CNRS, ENSCL, UMR 8181, UCCS-Unité de Catalyse et de Chimie du Solide, 59655 Villeneuve d'Ascq, France
| | - François Duprat
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, CNRS, PSL University, 10 rue Vauquelin, 75005 Paris, France
| | - Jean-Luc Ploix
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, CNRS, PSL University, 10 rue Vauquelin, 75005 Paris, France
| | - Gérard Dreyfus
- Chimie Moléculaire, Macromoléculaire, Matériaux, ESPCI Paris, CNRS, PSL University, 10 rue Vauquelin, 75005 Paris, France
| | - Véronique Nardello-Rataj
- Université de Lille, CNRS, ENSCL, UMR 8181, UCCS-Unité de Catalyse et de Chimie du Solide, 59655 Villeneuve d'Ascq, France
| | - Jean-Marie Aubry
- Université de Lille, CNRS, ENSCL, UMR 8181, UCCS-Unité de Catalyse et de Chimie du Solide, 59655 Villeneuve d'Ascq, France
| |
Collapse
|
7
|
Tang B, Kramer ST, Fang M, Qiu Y, Wu Z, Xu D. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform 2020; 12:15. [PMID: 33431047 PMCID: PMC7035778 DOI: 10.1186/s13321-020-0414-z] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/27/2020] [Indexed: 01/19/2023] Open
Abstract
Efficient and accurate prediction of molecular properties, such as lipophilicity and solubility, is highly desirable for rational compound design in chemical and pharmaceutical industries. To this end, we build and apply a graph-neural-network framework called self-attention-based message-passing neural network (SAMPN) to study the relationship between chemical properties and structures in an interpretable way. The main advantages of SAMPN are that it directly uses chemical graphs and breaks the black-box mold of many machine/deep learning methods. Specifically, its attention mechanism indicates the degree to which each atom of the molecule contributes to the property of interest, and these results are easily visualized. Further, SAMPN outperforms random forests and the deep learning framework MPN from Deepchem. In addition, another formulation of SAMPN (Multi-SAMPN) can simultaneously predict multiple chemical properties with higher accuracy and efficiency than other models that predict one specific chemical property. Moreover, SAMPN can generate chemically visible and interpretable results, which can help researchers discover new pharmaceuticals and materials. The source code of the SAMPN prediction pipeline is freely available at Github (https://github.com/tbwxmu/SAMPN).
Collapse
Affiliation(s)
- Bowen Tang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen, 361000, China.,Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Skyler T Kramer
- Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Meijuan Fang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen, 361000, China
| | - Yingkun Qiu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen, 361000, China
| | - Zhen Wu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen, 361000, China.
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
8
|
Goussard V, Duprat F, Gerbaud V, Ploix JL, Dreyfus G, Nardello-Rataj V, Aubry JM. Predicting the Surface Tension of Liquids: Comparison of Four Modeling Approaches and Application to Cosmetic Oils. J Chem Inf Model 2017; 57:2986-2995. [DOI: 10.1021/acs.jcim.7b00512] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Valentin Goussard
- Univ.
Lille, CNRS, Centrale Lille, ENSCL, Univ. Artois, UMR 8181 - UCCS
- Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - François Duprat
- Laboratoire
de Chimie Organique, CNRS, ESPCI Paris, PSL Research University, 10 rue Vauquelin, 75005 Paris, France
| | - Vincent Gerbaud
- Laboratoire
de Génie Chimique, Université de Toulouse, CNRS, INP, UPS, 31432 Toulouse, France
| | - Jean-Luc Ploix
- Laboratoire
de Chimie Organique, CNRS, ESPCI Paris, PSL Research University, 10 rue Vauquelin, 75005 Paris, France
| | - Gérard Dreyfus
- Laboratoire
de Chimie Organique, CNRS, ESPCI Paris, PSL Research University, 10 rue Vauquelin, 75005 Paris, France
| | - Véronique Nardello-Rataj
- Univ.
Lille, CNRS, Centrale Lille, ENSCL, Univ. Artois, UMR 8181 - UCCS
- Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| | - Jean-Marie Aubry
- Univ.
Lille, CNRS, Centrale Lille, ENSCL, Univ. Artois, UMR 8181 - UCCS
- Unité de Catalyse et Chimie du Solide, F-59000 Lille, France
| |
Collapse
|
9
|
Creton B. Chemoinformatics at IFP Energies Nouvelles: Applications in the Fields of Energy, Transport, and Environment. Mol Inform 2017; 36. [PMID: 28418201 DOI: 10.1002/minf.201700028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Accepted: 03/20/2017] [Indexed: 11/10/2022]
Abstract
The objective of the present paper is to summarize chemoinformatics based research, and more precisely, the development of quantitative structure property relationships performed at IFP Energies nouvelles (IFPEN) during the last decade. A special focus is proposed on research activities performed in the "Thermodynamics and Molecular Simulation" department, i. e. the use of multiscale molecular simulation methods in responses to projects. Molecular simulation techniques can be envisaged to supplement dataset when experimental information lacks, thus the review includes a section dedicated to molecular simulation codes, development of intermolecular potentials, and some of their possible applications. Know-how and feedback from our experiences in terms of machine learning application for thermophysical property predictions are included in a section dealing with methodological aspects. The generic character of chemoinformatics is emphasized through applications in the fields of energy, transport, and environment, with illustrations for three IFPEN business units: "Transports", "Energy Resources", and "Processes". More precisely, the review focus on different challenges such as the prediction of properties for alternative fuels, the prediction of fuel compatibility with polymeric materials, the prediction of properties for surfactants usable in chemical enhanced oil recovery, and the prediction of guest-host interactions between gases and nanoporous materials in the frame of carbon dioxide capture or gas separation activities.
Collapse
Affiliation(s)
- Benoit Creton
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| |
Collapse
|
10
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
11
|
Veselinović JB, Nikolić GM, Trutić NV, Živković JV, Veselinović AM. Monte Carlo QSAR models for predicting organophosphate inhibition of acetycholinesterase. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:449-460. [PMID: 26043064 DOI: 10.1080/1062936x.2015.1049665] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
A series of 278 organophosphate compounds acting as acetylcholinesterase inhibitors has been studied. The Monte Carlo method was used as a tool for building up one-variable quantitative structure-activity relationship (QSAR) models for acetylcholinesterase inhibition activity based on the principle that the target endpoint is treated as a random event. As an activity, bimolecular rate constants were used. The QSAR models were based on optimal descriptors obtained from Simplified Molecular Input-Line Entry System (SMILES) used for the representation of molecular structure. Two modelling approaches were examined: (1) 'classic' training-test system where the QSAR model was built with one random split into a training, test and validation set; and (2) the correlation balance based QSAR models were built with two random splits into a sub-training, calibration, test and validation set. The DModX method was used for defining the applicability domain. The obtained results suggest that studied activity can be determined with the application of QSAR models calculated with the Monte Carlo method since the statistical quality of all build models was very good. Finally, structural indicators for the increase and the decrease of the bimolecular rate constant are defined. The possibility of using these results for the computer-aided design of new organophosphate compounds is presented.
Collapse
|
12
|
Veselinović JB, Toropov AA, Toropova AP, Nikolić GM, Veselinović AM. Monte Carlo Method-Based QSAR Modeling of Penicillins Binding to Human Serum Proteins. Arch Pharm (Weinheim) 2014; 348:62-7. [DOI: 10.1002/ardp.201400259] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 09/12/2014] [Accepted: 10/01/2014] [Indexed: 11/12/2022]
Affiliation(s)
| | - Andrey A. Toropov
- IRCCS - Istituto di Ricerche Farmacologiche Mario Negri; Milano Italy
| | - Alla P. Toropova
- IRCCS - Istituto di Ricerche Farmacologiche Mario Negri; Milano Italy
| | - Goran M. Nikolić
- Faculty of Medicine; Department of Chemistry; University of Niš; Niš Serbia
| | | |
Collapse
|
13
|
Dioury F, Duprat A, Dreyfus G, Ferroud C, Cossy J. QSPR Prediction of the Stability Constants of Gadolinium(III) Complexes for Magnetic Resonance Imaging. J Chem Inf Model 2014; 54:2718-31. [DOI: 10.1021/ci500346w] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fabienne Dioury
- Laboratoire
de Chimie moléculaire, génie des procédés
chimiques et énergétiques (CMGPCE), Conservatoire national des arts et métiers (Cnam), 2 rue Conté, 75003 Paris, France
| | - Arthur Duprat
- Signal
Processing and Machine Learning (SIGMA) Lab, ESPCI ParisTech, 10
rue Vauquelin, 75005 Paris, France
- Laboratoire
de Chimie Organique, ESPCI ParisTech, 10 rue Vauquelin, 75005 Paris, France
| | - Gérard Dreyfus
- Signal
Processing and Machine Learning (SIGMA) Lab, ESPCI ParisTech, 10
rue Vauquelin, 75005 Paris, France
| | - Clotilde Ferroud
- Laboratoire
de Chimie moléculaire, génie des procédés
chimiques et énergétiques (CMGPCE), Conservatoire national des arts et métiers (Cnam), 2 rue Conté, 75003 Paris, France
| | - Janine Cossy
- Laboratoire
de Chimie Organique, ESPCI ParisTech, 10 rue Vauquelin, 75005 Paris, France
| |
Collapse
|
14
|
Toropov AA, Veselinović JB, Veselinović AM, Miljković FN, Toropova AP. QSAR models for 1,2,4-benzotriazines as Src inhibitors based on Monte Carlo method. Med Chem Res 2014. [DOI: 10.1007/s00044-014-1132-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Saldana DA, Starck L, Mougin P, Rousseau B, Creton B. On the rational formulation of alternative fuels: melting point and net heat of combustion predictions for fuel compounds using machine learning methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013; 24:259-277. [PMID: 23574496 DOI: 10.1080/1062936x.2013.766634] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We report the development of predictive models for two fuel specifications: melting points (T(m)) and net heat of combustion (Δ(c)H). Compounds inside the scope of these models are those likely to be found in alternative fuels, i.e. hydrocarbons, alcohols and esters. Experimental T(m) and Δ(c)H values for these types of molecules have been gathered to generate a unique database. Various quantitative structure-property relationship (QSPR) approaches have been used to build models, ranging from methods leading to multi-linear models such as genetic function approximation (GFA), or partial least squares (PLS) to those leading to non-linear models such as feed-forward artificial neural networks (FFANN), general regression neural networks (GRNN), support vector machines (SVM), or graph machines. Except for the case of the graph machines method for which the only inputs are SMILES formulae, previously listed approaches working on molecular descriptors and functional group count descriptors were used to develop specific models for T(m) and Δ(c)H. For each property, the predictive models return slightly different responses for each molecular structure. Therefore, models labelled as 'consensus models' were built by averaging values computed with selected individual models. Predicted results were then compared with experimental data and with predictions of models in the literature.
Collapse
Affiliation(s)
- D A Saldana
- IFP Energies Nouvelles, Rueil-Malmaison, France
| | | | | | | | | |
Collapse
|
16
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 145] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
17
|
Porcheron F, Gibert A, Mougin P, Wender A. High throughput screening of CO2 solubility in aqueous monoamine solutions. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2011; 45:2486-2492. [PMID: 21341690 DOI: 10.1021/es103453f] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Post-combustion Carbon Capture and Storage technology (CCS) is viewed as an efficient solution to reduce CO(2) emissions of coal-fired power stations. In CCS, an aqueous amine solution is commonly used as a solvent to selectively capture CO(2) from the flue gas. However, this process generates additional costs, mostly from the reboiler heat duty required to release the carbon dioxide from the loaded solvent solution. In this work, we present thermodynamic results of CO(2) solubility in aqueous amine solutions from a 6-reactor High Throughput Screening (HTS) experimental device. This device is fully automated and designed to perform sequential injections of CO(2) within stirred-cell reactors containing the solvent solutions. The gas pressure within each reactor is monitored as a function of time, and the resulting transient pressure curves are transformed into CO(2) absorption isotherms. Solubility measurements are first performed on monoethanolamine, diethanolamine, and methyldiethanolamine aqueous solutions at T = 313.15 K. Experimental results are compared with existing data in the literature to validate the HTS device. In addition, a comprehensive thermodynamic model is used to represent CO(2) solubility variations in different classes of amine structures upon a wide range of thermodynamic conditions. This model is used to fit the experimental data and to calculate the cyclic capacity, which is a key parameter for CO(2) process design. Solubility measurements are then performed on a set of 50 monoamines and cyclic capacities are extracted using the thermodynamic model, to asses the potential of these molecules for CO(2) capture.
Collapse
Affiliation(s)
- Fabien Porcheron
- IFP Energies Nouvelles, Rond-Point de l'Échangeur de Solaize, BP 3, 69360 Solaize, France.
| | | | | | | |
Collapse
|
18
|
Novel graph machine based QSAR approach for the prediction of the adsorption enthalpies of alkanes on zeolites. Catal Today 2011. [DOI: 10.1016/j.cattod.2010.07.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
19
|
Considerations and recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction. J Comput Aided Mol Des 2008; 22:843-55. [DOI: 10.1007/s10822-008-9225-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 06/08/2008] [Indexed: 02/07/2023]
|