1
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
2
|
Gruber S, Rienesl L, Köck A, Egger-Danner C, Sölkner J. Importance of Mid-Infrared Spectra Regions for the Prediction of Mastitis and Ketosis in Dairy Cows. Animals (Basel) 2023; 13:ani13071193. [PMID: 37048449 PMCID: PMC10093284 DOI: 10.3390/ani13071193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 03/22/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
Mid-infrared (MIR) spectroscopy is routinely applied to determine major milk components, such as fat and protein. Moreover, it is used to predict fine milk composition and various traits pertinent to animal health. MIR spectra indicate an absorbance value of infrared light at 1060 specific wavenumbers from 926 to 5010 cm−1. According to research, certain parts of the spectrum do not contain sufficient information on traits of dairy cows. Hence, the objective of the present study was to identify specific regions of the MIR spectra of particular importance for the prediction of mastitis and ketosis, performing variable selection analysis. Partial least squares discriminant analysis (PLS-DA) along with three other statistical methods, support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and random forest (RF), were compared. Data originated from the Austrian milk recording and associated health monitoring system (GMON). Test-day data and corresponding MIR spectra were linked to respective clinical mastitis and ketosis diagnoses. Certain wavenumbers were identified as particularly relevant for the prediction models of clinical mastitis (23) and ketosis (61). Wavenumbers varied across four distinct statistical methods as well as concerning different traits. The results indicate that variable selection analysis could potentially be beneficial in the process of modeling.
Collapse
Affiliation(s)
- Stefan Gruber
- Institute of Livestock Sciences, University of Natural Resources and Life Sciences, Vienna (BOKU), Gregor-Mendel-Straße 33, 1180 Vienna, Austria
| | - Lisa Rienesl
- Institute of Livestock Sciences, University of Natural Resources and Life Sciences, Vienna (BOKU), Gregor-Mendel-Straße 33, 1180 Vienna, Austria
- Correspondence: ; Tel.: +43-1-476-549-3201
| | - Astrid Köck
- ZuchtData EDV-Dienstleistungen GmbH, Dresdner Straße 89/19, 1200 Vienna, Austria
| | - Christa Egger-Danner
- ZuchtData EDV-Dienstleistungen GmbH, Dresdner Straße 89/19, 1200 Vienna, Austria
| | - Johann Sölkner
- Institute of Livestock Sciences, University of Natural Resources and Life Sciences, Vienna (BOKU), Gregor-Mendel-Straße 33, 1180 Vienna, Austria
| |
Collapse
|
3
|
De León G, Fröhlich E, Fink E, Di Pizio A, Salar-Behzadi S. Premexotac: Machine learning bitterants predictor for advancing pharmaceutical development. Int J Pharm 2022; 628:122263. [DOI: 10.1016/j.ijpharm.2022.122263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/27/2022] [Accepted: 09/29/2022] [Indexed: 10/31/2022]
|
4
|
Conev A, Devaurs D, Rigo MM, Antunes DA, Kavraki LE. 3pHLA-score improves structure-based peptide-HLA binding affinity prediction. Sci Rep 2022; 12:10749. [PMID: 35750701 PMCID: PMC9232595 DOI: 10.1038/s41598-022-14526-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/08/2022] [Indexed: 12/30/2022] Open
Abstract
Binding of peptides to Human Leukocyte Antigen (HLA) receptors is a prerequisite for triggering immune response. Estimating peptide-HLA (pHLA) binding is crucial for peptide vaccine target identification and epitope discovery pipelines. Computational methods for binding affinity prediction can accelerate these pipelines. Currently, most of those computational methods rely exclusively on sequence-based data, which leads to inherent limitations. Recent studies have shown that structure-based data can address some of these limitations. In this work we propose a novel machine learning (ML) structure-based protocol to predict binding affinity of peptides to HLA receptors. For that, we engineer the input features for ML models by decoupling energy contributions at different residue positions in peptides, which leads to our novel per-peptide-position protocol. Using Rosetta's ref2015 scoring function as a baseline we use this protocol to develop 3pHLA-score. Our per-peptide-position protocol outperforms the standard training protocol and leads to an increase from 0.82 to 0.99 of the area under the precision-recall curve. 3pHLA-score outperforms widely used scoring functions (AutoDock4, Vina, Dope, Vinardo, FoldX, GradDock) in a structural virtual screening task. Overall, this work brings structure-based methods one step closer to epitope discovery pipelines and could help advance the development of cancer and viral vaccines.
Collapse
Affiliation(s)
- Anja Conev
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| | - Didier Devaurs
- grid.4305.20000 0004 1936 7988MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU UK
| | - Mauricio Menegatti Rigo
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| | | | - Lydia E. Kavraki
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| |
Collapse
|
5
|
Alijagic A, Engwall M, Särndahl E, Karlsson H, Hedbrant A, Andersson L, Karlsson P, Dalemo M, Scherbak N, Färnlund K, Larsson M, Persson A. Particle Safety Assessment in Additive Manufacturing: From Exposure Risks to Advanced Toxicology Testing. FRONTIERS IN TOXICOLOGY 2022; 4:836447. [PMID: 35548681 PMCID: PMC9081788 DOI: 10.3389/ftox.2022.836447] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 04/06/2022] [Indexed: 11/13/2022] Open
Abstract
Additive manufacturing (AM) or industrial three-dimensional (3D) printing drives a new spectrum of design and production possibilities; pushing the boundaries both in the application by production of sophisticated products as well as the development of next-generation materials. AM technologies apply a diversity of feedstocks, including plastic, metallic, and ceramic particle powders with distinct size, shape, and surface chemistry. In addition, powders are often reused, which may change the particles' physicochemical properties and by that alter their toxic potential. The AM production technology commonly relies on a laser or electron beam to selectively melt or sinter particle powders. Large energy input on feedstock powders generates several byproducts, including varying amounts of virgin microparticles, nanoparticles, spatter, and volatile chemicals that are emitted in the working environment; throughout the production and processing phases. The micro and nanoscale size may enable particles to interact with and to cross biological barriers, which could, in turn, give rise to unexpected adverse outcomes, including inflammation, oxidative stress, activation of signaling pathways, genotoxicity, and carcinogenicity. Another important aspect of AM-associated risks is emission/leakage of mono- and oligomers due to polymer breakdown and high temperature transformation of chemicals from polymeric particles, both during production, use, and in vivo, including in target cells. These chemicals are potential inducers of direct toxicity, genotoxicity, and endocrine disruption. Nevertheless, understanding whether AM particle powders and their byproducts may exert adverse effects in humans is largely lacking and urges comprehensive safety assessment across the entire AM lifecycle-spanning from virgin and reused to airborne particles. Therefore, this review will detail: 1) brief overview of the AM feedstock powders, impact of reuse on particle physicochemical properties, main exposure pathways and protective measures in AM industry, 2) role of particle biological identity and key toxicological endpoints in the particle safety assessment, and 3) next-generation toxicology approaches in nanosafety for safety assessment in AM. Altogether, the proposed testing approach will enable a deeper understanding of existing and emerging particle and chemical safety challenges and provide a strategy for the development of cutting-edge methodologies for hazard identification and risk assessment in the AM industry.
Collapse
Affiliation(s)
- Andi Alijagic
- Man-Technology-Environment Research Center (MTM), Örebro University, Örebro, Sweden
- Inflammatory Response and Infection Susceptibility Centre (iRiSC), Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
| | - Magnus Engwall
- Man-Technology-Environment Research Center (MTM), Örebro University, Örebro, Sweden
| | - Eva Särndahl
- Inflammatory Response and Infection Susceptibility Centre (iRiSC), Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
| | - Helen Karlsson
- Department of Health, Medicine and Caring Sciences, Occupational and Environmental Medicine Center in Linköping, Linköping University, Linköping, Sweden
| | - Alexander Hedbrant
- Inflammatory Response and Infection Susceptibility Centre (iRiSC), Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
| | - Lena Andersson
- Inflammatory Response and Infection Susceptibility Centre (iRiSC), Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- Department of Occupational and Environmental Medicine, Örebro University, Örebro, Sweden
| | - Patrik Karlsson
- Department of Mechanical Engineering, Örebro University, Örebro, Sweden
| | | | - Nikolai Scherbak
- Man-Technology-Environment Research Center (MTM), Örebro University, Örebro, Sweden
| | | | - Maria Larsson
- Man-Technology-Environment Research Center (MTM), Örebro University, Örebro, Sweden
| | - Alexander Persson
- Inflammatory Response and Infection Susceptibility Centre (iRiSC), Faculty of Medicine and Health, Örebro University, Örebro, Sweden
- School of Medical Sciences, Faculty of Medicine and Health, Örebro University, Örebro, Sweden
| |
Collapse
|
6
|
Casey AD, Son SF, Bilionis I, Barnes BC. Prediction of Energetic Material Properties from Electronic Structure Using 3D Convolutional Neural Networks. J Chem Inf Model 2020; 60:4457-4473. [PMID: 33054184 DOI: 10.1021/acs.jcim.0c00259] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Alex D. Casey
- School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Steven F. Son
- School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Ilias Bilionis
- School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Brian C. Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
7
|
Yuan R, Xue D, Xue D, Li J, Ding X, Sun J, Lookman T. Knowledge-Based Descriptor for the Compositional Dependence of the Phase Transition in BaTiO 3-Based Ferroelectrics. ACS APPLIED MATERIALS & INTERFACES 2020; 12:44970-44980. [PMID: 32924419 DOI: 10.1021/acsami.0c12763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Descriptors play a central role in constructing composition-structure-property relationships to guide materials design. We propose a material descriptor, δτ, for the composition dependence of the Curie temperature (Tc) on single doping elements in BaTiO3 ferroelectrics, which is then generalized to a linear combination of multiple dopants in the solid solutions. The descriptor δτ depends linearly on the Curie temperature and also serves to separate the ferroelectric phase from the relaxor phase. We compare δτ to other commonly used descriptors such as the tolerance factor, electronegativity, and ionic displacement. By using regression analysis on our assembled experimental data, we show how it outperforms other descriptors. We use the trained machine-learned models to predict compositions in our search space with the largest ferroelectric, dielectric, and piezoelectric properties, namely, d33, electrostrain, and recoverable energy storage density. We experimentally verify our predictions for Tc and classification into ferroelectrics and relaxors by synthesizing and characterizing six solid solutions in BaTiO3 ferroelectrics. Our definition of δτ can shed light on the design of knowledge-based descriptors in other systems such as Pb-based and Bi-based solid solutions.
Collapse
Affiliation(s)
- Ruihao Yuan
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Deqing Xue
- School of Materials Science and Engineering, Xi'an University of Technology, Xi'an 710048, China
| | - Dezhen Xue
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jinshan Li
- State Key Laboratory of Solidification Processing, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xiangdong Ding
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jun Sun
- State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an 710049, China
| | - Turab Lookman
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
8
|
Winkler DA. Role of Artificial Intelligence and Machine Learning in Nanosafety. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2020; 16:e2001883. [PMID: 32537842 DOI: 10.1002/smll.202001883] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 05/07/2020] [Indexed: 06/11/2023]
Abstract
Robotics and automation provide potentially paradigm shifting improvements in the way materials are synthesized and characterized, generating large, complex data sets that are ideal for modeling and analysis by modern machine learning (ML) methods. Nanomaterials have not yet fully captured the benefits of automation, so lag behind in the application of ML methods of data analysis. Here, some key developments in, and roadblocks to the application of ML methods are reviewed to model and predict potentially adverse biological and environmental effects of nanomaterials. This work focuses on the diverse ways a range of ML algorithms are applied to understand and predict nanomaterials properties, provides examples of the application of traditional ML and deep learning methods to nanosafety, and provides context and future perspectives on developments that are likely to occur, or need to occur in the near future that allow artificial intelligence to make a deeper contribution to nanosafety.
Collapse
Affiliation(s)
- David A Winkler
- La Trobe Institute for Molecular Science, La Trobe University, Kingsbury Drive, Bundoora, 3042, Australia
- CSIRO Data61, 1 Technology Court, Pullenvale, 4069, Australia
- School of Pharmacy, University of Nottingham, Nottingham, NG7 2QL, UK
- Monash Institute of Pharmaceutical Sciences, Monash University, 392 Royal Parade, Parkville, 3052, Australia
| |
Collapse
|
9
|
Liu Y, Wu J, Avdeev M, Shi S. Multi‐Layer Feature Selection Incorporating Weighted Score‐Based Expert Knowledge toward Modeling Materials with Targeted Properties. ADVANCED THEORY AND SIMULATIONS 2020. [DOI: 10.1002/adts.201900215] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Yue Liu
- School of Computer Engineering and Science, Shanghai Institute for Advanced Communication and Data ScienceShanghai University Shanghai 200444 China
| | - Jun‐Ming Wu
- School of Computer Engineering and Science, Shanghai Institute for Advanced Communication and Data ScienceShanghai University Shanghai 200444 China
| | - Maxim Avdeev
- Australian Nuclear Science and Technology OrganisationLocked Bag 2001 Kirrawee DC NSW 2232 Australia
- School of ChemistryThe University of Sydney Sydney 2006 Australia
| | - Si‐Qi Shi
- School of Materials Science and Engineering, Materials Genome InstituteShanghai University Shanghai 200444 China
| |
Collapse
|
10
|
Ma XY, Lewis JP, Yan QB, Su G. Accelerated Discovery of Two-Dimensional Optoelectronic Octahedral Oxyhalides via High-Throughput Ab Initio Calculations and Machine Learning. J Phys Chem Lett 2019; 10:6734-6740. [PMID: 31621332 DOI: 10.1021/acs.jpclett.9b02420] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Traditional trial-and-error methods are obstacles for large-scale searching of new optoelectronic materials. Here, we introduce a method combining high-throughput ab initio calculations and machine-learning approaches to predict two-dimensional octahedral oxyhalides with improved optoelectronic properties. We develop an effective machine-learning model based on an expansive data set generated from density functional calculations including the geometric and electronic properties of 300 two-dimensional octahedral oxyhalides. Our model accelerates the screening of potential optoelectronic materials of 5000 two-dimensional octahedral oxyhalides. The distorted stacked octahedral factors proposed in our model play essential roles in the machine-learning prediction. Several potential two-dimensional optoelectronic octahedral oxyhalides with moderate band gaps, high electron mobilities, and ultrahigh absorbance coefficients are successfully hypothesized.
Collapse
Affiliation(s)
- Xing-Yu Ma
- School of Physical Sciences , University of Chinese Academy of Sciences , Beijing 100049 , China
| | - James P Lewis
- Department of Physics and Astronomy , West Virginia University , Morgantown , West Virginia 26506-6315 , United States
- State Key Laboratory of Coal Conversion, Institute of Coal Chemistry , Chinese Academy of Sciences , Taiyuan , Shanxi 030001 , China
- Beijing Advanced Innovation Center for Materials Genome Engineering , Beijing Information S & T University , Beijing 101400 , China
| | - Qing-Bo Yan
- Center of Materials Science and Optoelectronics Engineering, College of Materials Science and Optoelectronic Technology , University of Chinese Academy of Sciences , Beijing 100049 , China
| | - Gang Su
- School of Physical Sciences , University of Chinese Academy of Sciences , Beijing 100049 , China
- Kavli Institute for Theoretical Sciences, and CAS Center of Excellence in Topological Quantum Computation , University of Chinese Academy of Sciences , Beijing 100190 , China
| |
Collapse
|
11
|
García-Jacas CR, Marrero-Ponce Y, Cortés-Guzmán F, Suárez-Lezcano J, Martinez-Rios FO, García-González LA, Pupo-Meriño M, Martinez-Mayorga K. Enhancing Acute Oral Toxicity Predictions by using Consensus Modeling and Algebraic Form-Based 0D-to-2D Molecular Encodes. Chem Res Toxicol 2019; 32:1178-1192. [PMID: 31066547 DOI: 10.1021/acs.chemrestox.9b00011] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Quantitative structure-activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS (acronym for quadratic, bilinear and N-Linear maps based on graph-theoretic electronic-density matrices and atomic weightings) framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds), and Zhu training set (10 152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest, and support vector machine procedures were employed to build several base (individual) models. The base models with REPA-training ≥ 0.75 ( R = correlation coefficient) and MAEEPA-training ≤ 0.5 (MAE = mean absolute error) were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has MAEtest-AD = 0.4044, MAEProTox-AD = 0.4067 and MAET3DB-AD = 0.2586 on the EPA test set, ProTox external set, and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present MAEProTox-AD = 0.3992 and MAET3DB-AD = 0.2286, and MAEProTox-AD = 0.3773 and MAET3DB-AD = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was performed, to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labeling a compound as toxic according to the globally harmonized system of classification and labeling of chemicals. Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed to be used in virtual screening tasks ( http://tomocomd.com/apps/ptoxra ).
Collapse
Affiliation(s)
- César R García-Jacas
- Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada , Ensenada , Baja California , México
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional, Colegio de Ciencias de la Salud , Escuela de Medicina, Edificio de Especialidades Médicas , Quito , Pichincha , Ecuador.,Grupo de Investigación Ambiental, Programas Ambientales, Facultad de Ingenierías , Fundacion Universitaria Tecnologico Comfenalco-Cartagena , Cr44 DN 30 A, 91 , Cartagena , Bolívar , Colombia
| | - Fernando Cortés-Guzmán
- Instituto de Química , Universidad Nacional Autónoma de México , Ciudad de México , México
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas , Esmeraldas , Ecuador
| | | | - Luis A García-González
- Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas , La Habana , Cuba
| | - Mario Pupo-Meriño
- Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas , La Habana , Cuba
| | | |
Collapse
|
12
|
Isayev O, Oses C, Toher C, Gossett E, Curtarolo S, Tropsha A. Universal fragment descriptors for predicting properties of inorganic crystals. Nat Commun 2017; 8:15679. [PMID: 28580961 PMCID: PMC5465371 DOI: 10.1038/ncomms15679] [Citation(s) in RCA: 173] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 04/11/2017] [Indexed: 12/23/2022] Open
Abstract
Although historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction's accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.
Collapse
Affiliation(s)
- Olexandr Isayev
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Corey Oses
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Cormac Toher
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Eric Gossett
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Stefano Curtarolo
- Center for Materials Genomics, Duke University, Durham, North Carolina 27708, USA
- Materials Science, Electrical Engineering, Physics and Chemistry, Duke University, Durham, North Carolina 27708, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
13
|
Zhu XW, Xin YJ, Ge HL. Recursive Random Forests Enable Better Predictive Performance and Model Interpretation than Variable Selection by LASSO. J Chem Inf Model 2015; 55:736-46. [PMID: 25746224 DOI: 10.1021/ci500715e] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Variable selection is of crucial significance in QSAR modeling since it increases the model predictive ability and reduces noise. The selection of the right variables is far more complicated than the development of predictive models. In this study, eight continuous and categorical data sets were employed to explore the applicability of two distinct variable selection methods random forests (RF) and least absolute shrinkage and selection operator (LASSO). Variable selection was performed: (1) by using recursive random forests to rule out a quarter of the least important descriptors at each iteration and (2) by using LASSO modeling with 10-fold inner cross-validation to tune its penalty λ for each data set. Along with regular statistical parameters of model performance, we proposed the highest pairwise correlation rate, average pairwise Pearson's correlation coefficient, and Tanimoto coefficient to evaluate the optimal by RF and LASSO in an extensive way. Results showed that variable selection could allow a tremendous reduction of noisy descriptors (at most 96% with RF method in this study) and apparently enhance model's predictive performance as well. Furthermore, random forests showed property of gathering important predictors without restricting their pairwise correlation, which is contrary to LASSO. The mutual exclusion of highly correlated variables in LASSO modeling tends to skip important variables that are highly related to response endpoints and thus undermine the model's predictive performance. The optimal variables selected by RF share low similarity with those by LASSO (e.g., the Tanimoto coefficients were smaller than 0.20 in seven out of eight data sets). We found that the differences between RF and LASSO predictive performances mainly resulted from the variables selected by different strategies rather than the learning algorithms. Our study showed that the right selection of variables is more important than the learning algorithm for modeling. We hope that a standard procedure could be developed based on these proposed statistical metrics to select the truly important variables for model interpretation, as well as for further use to facilitate drug discovery and environmental toxicity assessment.
Collapse
Affiliation(s)
| | | | - Hui-Lin Ge
- §Hainan Provincial Key Laboratory of Quality and Safety for Tropical Fruits and Vegetables, Analysis and Testing Center, Chinese Academy of Tropical Agricultural Sciences, Haikou, 571101 Hainan, China
| |
Collapse
|
14
|
Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 2014; 12:948-62. [PMID: 24287782 DOI: 10.1038/nrd4128] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 'quality' of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure-activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact.
Collapse
Affiliation(s)
- John G Cumming
- Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield SK10 4TG, UK
| | | | | | | | | |
Collapse
|
15
|
Gabler S, Soelter J, Hussain T, Sachse S, Schmuker M. Physicochemical vs. Vibrational Descriptors for Prediction of Odor Receptor Responses. Mol Inform 2013; 32:855-65. [PMID: 27480237 DOI: 10.1002/minf.201300037] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 07/25/2013] [Indexed: 01/20/2023]
Abstract
Responses of olfactory receptors (ORs) can be predicted by applying machine learning methods on a multivariate encoding of an odorant's chemical structure. Physicochemical descriptors that encode features of the molecular graph are a popular choice for such an encoding. Here, we explore the EVA descriptor set, which encodes features derived from the vibrational spectrum of a molecule. We assessed the performance of Support Vector Regression (SVR) and Random Forest Regression (RFR) to predict the gradual response of Drosophila ORs. We compared a 27-dimensional variant of the EVA descriptor against a set of 1467 descriptors provided by the eDragon software package, and against a 32-dimensional subset thereof that has been proposed as the basis for an odor metric consisting of 32 descriptors (HADDAD). The best prediction performance was reproducibly achieved using SVR on the highest-dimensional feature set. The low-dimensional EVA and HADDAD feature sets predicted odor-OR interactions with similar accuracy. Adding charge and polarizability information to the EVA descriptor did not improve the results but rather decreased predictive power. Post-hoc in vivo measurements confirmed these results. Our findings indicate that EVA provides a meaningful low-dimensional representation of odor space, although EVA hardly outperformed "classical" descriptor sets.
Collapse
Affiliation(s)
- Stephan Gabler
- Theoretical Neuroscience, Institute of Biology, Dept. of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Königin-Luise-Str. 1-3, D-14195 Berlin, Germany.,Bernstein Center for Computational Neuroscience Berlin, Philippstr. 13, Haus 6, D-10115 Berlin, Germany
| | - Jan Soelter
- Theoretical Neuroscience, Institute of Biology, Dept. of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Königin-Luise-Str. 1-3, D-14195 Berlin, Germany
| | - Taufia Hussain
- Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745 Jena, Germany
| | - Silke Sachse
- Max Planck Institute for Chemical Ecology, Hans-Knöll-Straße 8, D-07745 Jena, Germany
| | - Michael Schmuker
- Theoretical Neuroscience, Institute of Biology, Dept. of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Königin-Luise-Str. 1-3, D-14195 Berlin, Germany. .,Bernstein Center for Computational Neuroscience Berlin, Philippstr. 13, Haus 6, D-10115 Berlin, Germany.
| |
Collapse
|
16
|
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Mol Inform 2013; 32:843-53. [DOI: 10.1002/minf.201300029] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Accepted: 07/29/2013] [Indexed: 11/07/2022]
|
17
|
Koch CP, Pillong M, Hiss JA, Schneider G. Computational Resources for MHC Ligand Identification. Mol Inform 2013; 32:326-36. [PMID: 27481589 DOI: 10.1002/minf.201300042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 04/04/2013] [Indexed: 01/16/2023]
Abstract
Advances in the high-throughput determination of functional modulators of major histocompatibility complex (MHC) and improved computational predictions of MHC ligands have rendered the rational design of immunomodulatory peptides feasible. Proteome-derived peptides and 'reverse vaccinology' by computational means will play a driving role in future vaccine design. Here we review the molecular mechanisms of the MHC mediated immune response, present the computational approaches that have emerged in this area of biotechnology, and provide an overview of publicly available computational resources for predicting and designing new peptidic MHC ligands.
Collapse
Affiliation(s)
- Christian P Koch
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Max Pillong
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Jan A Hiss
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Gisbert Schneider
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland.
| |
Collapse
|