1
|
Damavandi S, Shiri F, Emamjomeh A, Pirhadi S, Beyzaei H. A study of the interaction space of two lactate dehydrogenase isoforms (LDHA and LDHB) and some of their inhibitors using proteochemometrics modeling. BMC Chem 2023; 17:70. [PMID: 37415191 DOI: 10.1186/s13065-023-00991-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 06/30/2023] [Indexed: 07/08/2023] Open
Abstract
Lactate dehydrogenase (LDH) is a tetramer enzyme that converts pyruvate to lactate reversibly. This enzyme becomes important because it is associated with diseases such as cancers, heart disease, liver problems, and most importantly, corona disease. As a system-based method, proteochemometrics does not require knowledge of the protein's three-dimensional structure, but rather depends on the amino acid sequence and protein descriptors. Here, we applied this methodology to model a set of LDHA and LDHB isoenzyme inhibitors. To implement the proteochemetrics method, the camb package in the R Studio Server programming environment was used. The activity of 312 compounds of LDHA and LDHB isoenzyme inhibitors from the valid Binding DB database was retrieved. The proteochemometrics method was applied to three machine learning algorithms gradient amplification model, random forest, and support vector machine as regression methods to find the best model. Through the combination of different models into an ensemble (greedy and stacking optimization), we explored the possibility of improving the performance of models. For the RF best ensemble model of inhibitors of LDHA and LDHB isoenzymes, and were 0.66 and 0.62, respectively. LDH inhibitory activation is influenced by Morgan fingerprints and topological structure descriptors.
Collapse
Affiliation(s)
- Sedigheh Damavandi
- Department of Bioinformatics, Laboratory of Computational Biotechnology and Bioinformatics (CBB Lab), University of Zabol, Zabol, Iran
| | - Fereshteh Shiri
- Department of Chemistry, Faculty of Science, University of Zabol, Zabol, Iran.
| | - Abbasali Emamjomeh
- Department of Bioinformatics, Laboratory of Computational Biotechnology and Bioinformatics (CBB Lab), University of Zabol, Zabol, Iran
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Somayeh Pirhadi
- Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Hamid Beyzaei
- Department of Chemistry, Faculty of Science, University of Zabol, Zabol, Iran
| |
Collapse
|
2
|
Gosline SJC, Tognon C, Nestor M, Joshi S, Modak R, Damnernsawad A, Posso C, Moon J, Hansen JR, Hutchinson-Bunch C, Pino JC, Gritsenko MA, Weitz KK, Traer E, Tyner J, Druker B, Agarwal A, Piehowski P, McDermott JE, Rodland K. Proteomic and phosphoproteomic measurements enhance ability to predict ex vivo drug response in AML. Clin Proteomics 2022; 19:30. [PMID: 35896960 PMCID: PMC9327422 DOI: 10.1186/s12014-022-09367-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 06/22/2022] [Indexed: 11/23/2022] Open
Abstract
Acute Myeloid Leukemia (AML) affects 20,000 patients in the US annually with a five-year survival rate of approximately 25%. One reason for the low survival rate is the high prevalence of clonal evolution that gives rise to heterogeneous sub-populations of leukemic cells with diverse mutation spectra, which eventually leads to disease relapse. This genetic heterogeneity drives the activation of complex signaling pathways that is reflected at the protein level. This diversity makes it difficult to treat AML with targeted therapy, requiring custom patient treatment protocols tailored to each individual's leukemia. Toward this end, the Beat AML research program prospectively collected genomic and transcriptomic data from over 1000 AML patients and carried out ex vivo drug sensitivity assays to identify genomic signatures that could predict patient-specific drug responses. However, there are inherent weaknesses in using only genetic and transcriptomic measurements as surrogates of drug response, particularly the absence of direct information about phosphorylation-mediated signal transduction. As a member of the Clinical Proteomic Tumor Analysis Consortium, we have extended the molecular characterization of this cohort by collecting proteomic and phosphoproteomic measurements from a subset of these patient samples (38 in total) to evaluate the hypothesis that proteomic signatures can improve the ability to predict response to 26 drugs in AML ex vivo samples. In this work we describe our systematic, multi-omic approach to evaluate proteomic signatures of drug response and compare protein levels to other markers of drug response such as mutational patterns. We explore the nuances of this approach using two drugs that target key pathways activated in AML: quizartinib (FLT3) and trametinib (Ras/MEK), and show how patient-derived signatures can be interpreted biologically and validated in cell lines. In conclusion, this pilot study demonstrates strong promise for proteomics-based patient stratification to assess drug sensitivity in AML.
Collapse
Affiliation(s)
| | - Cristina Tognon
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
| | | | - Sunil Joshi
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
| | - Rucha Modak
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Alisa Damnernsawad
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
- Department of Biology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Camilo Posso
- Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Jamie Moon
- Pacific Northwest National Laboratory, Seattle, WA, USA
| | | | | | - James C Pino
- Pacific Northwest National Laboratory, Seattle, WA, USA
| | | | - Karl K Weitz
- Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Elie Traer
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
| | - Jeffrey Tyner
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
| | - Brian Druker
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
| | - Anupriya Agarwal
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
- Division of Hematology & Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, OR, USA
- Division of Oncological Sciences, Oregon Health & Science University, Portland, OR, USA
- Department of Cell, Developmental, and Cancer Biology, Oregon Health & Science University, Portland, OR, USA
| | | | - Jason E McDermott
- Pacific Northwest National Laboratory, Seattle, WA, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, OR, USA
| | - Karin Rodland
- Pacific Northwest National Laboratory, Seattle, WA, USA.
- Department of Cell, Developmental, and Cancer Biology, Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
3
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
4
|
Kolmar SS, Grulke CM. The effect of noise on the predictive limit of QSAR models. J Cheminform 2021; 13:92. [PMID: 34823605 PMCID: PMC8613965 DOI: 10.1186/s13321-021-00571-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/14/2021] [Indexed: 01/09/2023] Open
Abstract
A key challenge in the field of Quantitative Structure Activity Relationships (QSAR) is how to effectively treat experimental error in the training and evaluation of computational models. It is often assumed in the field of QSAR that models cannot produce predictions which are more accurate than their training data. Additionally, it is implicitly assumed, by necessity, that data points in test sets or validation sets do not contain error, and that each data point is a population mean. This work proposes the hypothesis that QSAR models can make predictions which are more accurate than their training data and that the error-free test set assumption leads to a significant misevaluation of model performance. This work used 8 datasets with six different common QSAR endpoints, because different endpoints should have different amounts of experimental error associated with varying complexity of the measurements. Up to 15 levels of simulated Gaussian distributed random error was added to the datasets, and models were built on the error laden datasets using five different algorithms. The models were trained on the error laden data, evaluated on error-laden test sets, and evaluated on error-free test sets. The results show that for each level of added error, the RMSE for evaluation on the error free test sets was always better. The results support the hypothesis that, at least under the conditions of Gaussian distributed random error, QSAR models can make predictions which are more accurate than their training data, and that the evaluation of models on error laden test and validation sets may give a flawed measure of model performance. These results have implications for how QSAR models are evaluated, especially for disciplines where experimental error is very large, such as in computational toxicology. ![]()
Collapse
Affiliation(s)
- Scott S Kolmar
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA.
| | - Christopher M Grulke
- Center for Computational Toxicology and Exposure, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, NC, USA
| |
Collapse
|
5
|
Tarasova O, Poroikov V. Machine Learning in Discovery of New Antivirals and Optimization of Viral Infections Therapy. Curr Med Chem 2021; 28:7840-7861. [PMID: 33949929 DOI: 10.2174/0929867328666210504114351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/13/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
Nowadays, computational approaches play an important role in the design of new drug-like compounds and optimization of pharmacotherapeutic treatment of diseases. The emerging growth of viral infections, including those caused by the Human Immunodeficiency Virus (HIV), Ebola virus, recently detected coronavirus, and some others, leads to many newly infected people with a high risk of death or severe complications. A huge amount of chemical, biological, clinical data is at the disposal of the researchers. Therefore, there are many opportunities to find the relationships between the particular features of chemical data and the antiviral activity of biologically active compounds based on machine learning approaches. Biological and clinical data can also be used for building models to predict relationships between viral genotype and drug resistance, which might help determine the clinical outcome of treatment. In the current study, we consider machine-learning approaches in the antiviral research carried out during the past decade. We overview in detail the application of machine-learning methods for the design of new potential antiviral agents and vaccines, drug resistance prediction, and analysis of virus-host interactions. Our review also covers the perspectives of using the machine-learning approaches for antiviral research, including Dengue, Ebola viruses, Influenza A, Human Immunodeficiency Virus, coronaviruses, and some others.
Collapse
Affiliation(s)
- Olga Tarasova
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow. Russian Federation
| |
Collapse
|
6
|
Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Su‐Qing Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
| | - Qing Ye
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Jun‐Jie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing China
| | - Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ai‐Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ting‐Jun Hou
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Dong‐Sheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| |
Collapse
|
7
|
D’Souza S, Prema K, Balaji S. Machine learning models for drug–target interactions: current knowledge and future directions. Drug Discov Today 2020; 25:748-756. [DOI: 10.1016/j.drudis.2020.03.003] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 02/28/2020] [Accepted: 03/05/2020] [Indexed: 12/22/2022]
|
8
|
Svensson F, Norinder U. Conformal Prediction for Ecotoxicology and Implications for Regulatory Decision-Making. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
9
|
Cortés-Ciriano I, Bender A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019; 59:3330-3339. [PMID: 31241929 DOI: 10.1021/acs.jcim.9b00297] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| |
Collapse
|
10
|
Feng D, Svetnik V, Liaw A, Pratola M, Sheridan RP. Building Quantitative Structure–Activity Relationship Models Using Bayesian Additive Regression Trees. J Chem Inf Model 2019; 59:2642-2655. [DOI: 10.1021/acs.jcim.9b00094] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Dai Feng
- Biometics Research, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Vladimir Svetnik
- Biometics Research, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Andy Liaw
- Biometics Research, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Matthew Pratola
- Department of Statistics, The Ohio State University, Cockins Hall, 1958 Neil Avenue, Columbus, Ohio 43210, United States
| | - Robert P. Sheridan
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| |
Collapse
|
11
|
Cortés-Ciriano I, Bender A. Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J Chem Inf Model 2018; 59:1269-1281. [DOI: 10.1021/acs.jcim.8b00542] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
12
|
Russo DP, Zorn KM, Clark AM, Zhu H, Ekins S. Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol Pharm 2018; 15:4361-4370. [PMID: 30114914 PMCID: PMC6181119 DOI: 10.1021/acs.molpharmaceut.8b00546] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many chemicals that disrupt endocrine function have been linked to a variety of adverse biological outcomes. However, screening for endocrine disruption using in vitro or in vivo approaches is costly and time-consuming. Computational methods, e.g., quantitative structure-activity relationship models, have become more reliable due to bigger training sets, increased computing power, and advanced machine learning algorithms, such as multilayered artificial neural networks. Machine learning models can be used to predict compounds for endocrine disrupting capabilities, such as binding to the estrogen receptor (ER), and allow for prioritization and further testing. In this work, an exhaustive comparison of multiple machine learning algorithms, chemical spaces, and evaluation metrics for ER binding was performed on public data sets curated using in-house cheminformatics software (Assay Central). Chemical features utilized in modeling consisted of binary fingerprints (ECFP6, FCFP6, ToxPrint, or MACCS keys) and continuous molecular descriptors from RDKit. Each feature set was subjected to classic machine learning algorithms (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, Support Vector Machine) and Deep Neural Networks (DNN). Models were evaluated using a variety of metrics: recall, precision, F1-score, accuracy, area under the receiver operating characteristic curve, Cohen's Kappa, and Matthews correlation coefficient. For predicting compounds within the training set, DNN has an accuracy higher than that of other methods; however, in 5-fold cross validation and external test set predictions, DNN and most classic machine learning models perform similarly regardless of the data set or molecular descriptors used. We have also used the rank normalized scores as a performance-criteria for each machine learning method, and Random Forest performed best on the validation set when ranked by metric or by data sets. These results suggest classic machine learning algorithms may be sufficient to develop high quality predictive models of ER activity.
Collapse
Affiliation(s)
- Daniel P. Russo
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
- first author
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
- first author
| | - Alex M. Clark
- Molecular Materials Informatics, Inc., Montreal, Quebec, Canada
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
13
|
Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A. Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty. J Chem Inf Model 2018; 58:1132-1140. [DOI: 10.1021/acs.jcim.8b00054] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Fredrik Svensson
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
- IOTA Pharmaceuticals, St Johns Innovation Centre, Cowley Road, Cambridge CB4 0WS, U.K
| | - Natalia Aniceto
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ulf Norinder
- Swetox, Unit of Toxicology Sciences, Karolinska Institutet, Forskargatan 20, SE-151 36 Södertälje, Sweden
- Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07 Kista, Sweden
| | - Isidro Cortes-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, SE-75124, Uppsala Sweden
| | - Lars Carlsson
- Quantitative Biology, Discovery Sciences, IMED Biotech Unit, AstraZeneca, SE-43183, Mölndal, Sweden
- Department of Computer Science, Royal Holloway, University of London, Egham Hill, Surrey, U.K
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
14
|
Qiu T, Wu D, Qiu J, Cao Z. Finding the molecular scaffold of nuclear receptor inhibitors through high-throughput screening based on proteochemometric modelling. J Cheminform 2018; 10:21. [PMID: 29651663 PMCID: PMC5897275 DOI: 10.1186/s13321-018-0275-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 04/02/2018] [Indexed: 02/10/2023] Open
Abstract
Nuclear receptors (NR) are a class of proteins that are responsible for sensing steroid and thyroid hormones and certain other molecules. In that case, NR have the ability to regulate the expression of specific genes and associated with various diseases, which make it essential drug targets. Approaches which can predict the inhibition ability of compounds for different NR target should be particularly helpful for drug development. In this study, proteochemometric modelling was introduced to analysis the bioactivity between chemical compounds and NR targets. Results illustrated the ability of our PCM model for high-throughput NR-inhibitor screening after evaluated on both internal (AUC > 0.870) and external (AUC > 0.746) validation set. Moreover, in-silico predicted bioactive compounds were clustered according to structure similarity and a series of representative molecular scaffolds can be derived for five major NR targets. Through scaffolds analysis, those essential bioactive scaffolds of different NR target can be detected and compared. Generally, the methods and molecular scaffolds proposed in this article can not only help the screening of potential therapeutic NR-inhibitors but also able to guide the future NR-related drug discovery.
Collapse
Affiliation(s)
- Tianyi Qiu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.,The Institute of Biomedical Sciences, Fudan University, No. 138 Medical College Road, Shanghai, China
| | - Dingfeng Wu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China
| | - Jingxuan Qiu
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.,School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, No. 516 JunGong Road, Shanghai, China
| | - Zhiwei Cao
- School of Life Sciences and Technology, Shanghai 10th People's Hospital, Tongji University, No. 1239 SiPing Road, Shanghai, China.
| |
Collapse
|
15
|
Dong J, Yao ZJ, Zhang L, Luo F, Lin Q, Lu AP, Chen AF, Cao DS. PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J Cheminform 2018; 10:16. [PMID: 29556758 PMCID: PMC5861255 DOI: 10.1186/s13321-018-0270-2] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 03/12/2018] [Indexed: 11/15/2022] Open
Abstract
Background
With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. Results Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. Conclusion PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html.![]()
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Lin Zhang
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Feijun Luo
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Qinlu Lin
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Alex F Chen
- Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China. .,Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China.
| |
Collapse
|
16
|
Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017; 57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and
Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
| |
Collapse
|
17
|
Sorgenfrei FA, Fulle S, Merget B. Kinome-Wide Profiling Prediction of Small Molecules. ChemMedChem 2017; 13:495-499. [PMID: 28544552 DOI: 10.1002/cmdc.201700180] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Revised: 05/20/2017] [Indexed: 12/21/2022]
Abstract
Extensive kinase profiling data, covering more than half of the human kinome, are available nowadays and allow the construction of activity prediction models of high practical utility. Proteochemometric (PCM) approaches use compound and protein descriptors, which enables the extrapolation of bioactivity values to thus far unexplored kinases. In this study, the potential of PCM to make large-scale predictions on the entire kinome is explored, considering the applicability on novel compounds and kinases, including clinically relevant mutants. A rigorous validation indicates high predictive power on left-out kinases and superiority over individual kinase QSAR models for new compounds. Furthermore, external validation on clinically relevant mutant kinases reveals an excellent predictive power for mutations spread across the ATP binding site.
Collapse
Affiliation(s)
- Frieda A Sorgenfrei
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| | - Simone Fulle
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| | - Benjamin Merget
- BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120, Heidelberg, Germany
| |
Collapse
|
18
|
Dong J, Yao ZJ, Zhu MF, Wang NN, Lu B, Chen AF, Lu AP, Miao H, Zeng WB, Cao DS. ChemSAR: an online pipelining platform for molecular SAR modeling. J Cheminform 2017; 9:27. [PMID: 29086046 PMCID: PMC5418185 DOI: 10.1186/s13321-017-0215-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 04/24/2017] [Indexed: 12/31/2022] Open
Abstract
Background In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, scikit-learn package for model building, and ggplot2 package for statistical analysis and visualization, etc.). In addition, it may require strong programming skills to accomplish these jobs, which poses severe challenges for users without advanced training in computer programming. Therefore, an online pipelining platform that integrates a number of selected tools is a valuable and efficient solution that can meet the needs of related researchers. Results This work presents a web-based pipelining platform, called ChemSAR, for generating SAR classification models of small molecules. The capabilities of ChemSAR include the validation and standardization of chemical structure representation, the computation of 783 1D/2D molecular descriptors and ten types of widely-used fingerprints for small molecules, the filtering methods for feature selection, the generation of predictive models via a step-by-step job submission process, model interpretation in terms of feature importance and tree visualization, as well as a helpful report generation system. The results can be visualized as high-quality plots and downloaded as local files. Conclusion ChemSAR provides an integrated web-based platform for generating SAR classification models that will benefit cheminformatics and other biomedical users. It is freely available at: http://chemsar.scbdd.com.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0215-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Min-Feng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ning-Ning Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Ben Lu
- The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Alex F Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China
| | - Hongyu Miao
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Wen-Bin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
19
|
Abstract
Aim: Computational chemogenomics models the compound–protein interaction space, typically for drug discovery, where existing methods predominantly either incorporate increasing numbers of bioactivity samples or focus on specific subfamilies of proteins and ligands. As an alternative to modeling entire large datasets at once, active learning adaptively incorporates a minimum of informative examples for modeling, yielding compact but high quality models. Results/methodology: We assessed active learning for protein/target family-wide chemogenomic modeling by replicate experiment. Results demonstrate that small yet highly predictive models can be extracted from only 10–25% of large bioactivity datasets, irrespective of molecule descriptors used. Conclusion: Chemogenomic active learning identifies small subsets of ligand–target interactions in a large screening database that lead to knowledge discovery and highly predictive models.
Collapse
|
20
|
Bosc N, Wroblowski B, Meyer C, Bonnet P. Prediction of Protein Kinase-Ligand Interactions through 2.5D Kinochemometrics. J Chem Inf Model 2017; 57:93-101. [PMID: 27983837 DOI: 10.1021/acs.jcim.6b00520] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
So far, 518 protein kinases have been identified in the human genome. They share a common mechanism of protein phosphorylation and are involved in many critical biological processes of eukaryotic cells. Deregulation of the kinase phosphorylation function induces severe illnesses such as cancer, diabetes, or inflammatory diseases. Many actors in the pharmaceutical domain have made significant efforts to design potent and selective protein kinase inhibitors as new potential drugs. Because the ATP binding site is highly conserved in the protein kinase family, the design of selective inhibitors remains a challenge and has negatively impacted the progression of drug candidates to late-stage clinical development. The work presented here adopts a 2.5D kinochemometrics (KCM) approach, derived from proteochemometrics (PCM), in which protein kinases are depicted by a novel 3D descriptor and the ligands by 2D fingerprints. We demonstrate in two examples that the protein descriptor successfully classified protein kinases based on their group membership and their Asp-Phe-Gly (DFG) conformation. We also compared the performance of our models with those obtained from a full 2D KCM model and QSAR models. In both cases, the internal validation of the models demonstrated good capabilities to distinguish "active" from "inactive" protein kinase-ligand pairs. However, the external validation performed on two independent data sets showed that the two statistical models tended to overestimate the number of "inactive" pairs.
Collapse
Affiliation(s)
- Nicolas Bosc
- Institut de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d'Orléans 7311 , Université d'Orléans BP 6759, 45067 Orléans Cedex 2, France
| | - Berthold Wroblowski
- Janssen Research & Development, Janssen Pharmaceutica N.V. , Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Christophe Meyer
- Centre de Recherche Janssen-Cilag , Campus de Maigremont - CS 10615, 27106 Val de Reuil CEDEX, France
| | - Pascal Bonnet
- Institut de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d'Orléans 7311 , Université d'Orléans BP 6759, 45067 Orléans Cedex 2, France
| |
Collapse
|
21
|
Small Random Forest Models for Effective Chemogenomic Active Learning. JOURNAL OF COMPUTER AIDED CHEMISTRY 2017. [DOI: 10.2751/jcac.18.124] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
22
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
23
|
Clark AM, Dole K, Ekins S. Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 2016; 56:275-85. [PMID: 26750305 PMCID: PMC4764945 DOI: 10.1021/acs.jcim.5b00555] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
Bayesian models constructed from
structure-derived fingerprints
have been a popular and useful method for drug discovery research
when applied to bioactivity measurements that can be effectively classified
as active or inactive. The results can be used to rank candidate structures
according to their probability of activity, and this ranking benefits
from the high degree of interpretability when structure-based fingerprints
are used, making the results chemically intuitive. Besides selecting
an activity threshold, building a Bayesian model is fast and requires
few or no parameters or user intervention. The method also does not
suffer from such acute overtraining problems as quantitative structure–activity
relationships or quantitative structure–property relationships
(QSAR/QSPR). This makes it an approach highly suitable for automated
workflows that are independent of user expertise or prior knowledge
of the training data. We now describe a new method for creating a
composite group of Bayesian models to extend the method to work with
multiple states, rather than just binary. Incoming activities are
divided into bins, each covering a mutually exclusive range of activities.
For each of these bins, a Bayesian model is created to model whether
or not the compound belongs in the bin. Analyzing putative molecules
using the composite model involves making a prediction for each bin
and examining the relative likelihood for each assignment, for example,
highest value wins. The method has been evaluated on a collection
of hundreds of data sets extracted from ChEMBL v20 and validated data
sets for ADME/Tox and bioactivity.
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Sean Ekins
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
24
|
Dong J, Cao DS, Miao HY, Liu S, Deng BC, Yun YH, Wang NN, Lu AP, Zeng WB, Chen AF. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 2015; 7:60. [PMID: 26664458 PMCID: PMC4674923 DOI: 10.1186/s13321-015-0109-z] [Citation(s) in RCA: 194] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 11/26/2015] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Molecular descriptors and fingerprints have been routinely used in QSAR/SAR analysis, virtual drug screening, compound search/ranking, drug ADME/T prediction and other drug discovery processes. Since the calculation of such quantitative representations of molecules may require substantial computational skills and efforts, several tools have been previously developed to make an attempt to ease the process. However, there are still several hurdles for users to overcome to fully harness the power of these tools. First, most of the tools are distributed as standalone software or packages that require necessary configuration or programming efforts of users. Second, many of the tools can only calculate a subset of molecular descriptors, and the results from multiple tools need to be manually merged to generate a comprehensive set of descriptors. Third, some packages only provide application programming interfaces and are implemented in different computer languages, which pose additional challenges to the integration of these tools. RESULTS A freely available web-based platform, named ChemDes, is developed in this study. It integrates multiple state-of-the-art packages (i.e., Pybel, CDK, RDKit, BlueDesc, Chemopy, PaDEL and jCompoundMapper) for computing molecular descriptors and fingerprints. ChemDes not only provides friendly web interfaces to relieve users from burdensome programming work, but also offers three useful and convenient auxiliary tools for format converting, MOPAC optimization and fingerprint similarity calculation. Currently, ChemDes has the capability of computing 3679 molecular descriptors and 59 types of molecular fingerprints. CONCLUSION ChemDes provides users an integrated and friendly tool to calculate various molecular descriptors and fingerprints. It is freely available at http://www.scbdd.com/chemdes. The source code of the project is also available as a supplementary file. Graphical abstract:An overview of ChemDes. A platform for computing various molecular descriptors and fingerprints.
Collapse
Affiliation(s)
- Jie Dong
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013 Hunan People's Republic of China
| | - Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013 Hunan People's Republic of China
| | - Hong-Yu Miao
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, USA
| | - Shao Liu
- Xiangya Hospital, Central South University, Changsha, 410008 Hunan People's Republic of China
| | - Bai-Chuan Deng
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083 People's Republic of China
| | - Yong-Huan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083 People's Republic of China
| | - Ning-Ning Wang
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013 Hunan People's Republic of China
| | - Ai-Ping Lu
- Institute of Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR People's Republic of China
| | - Wen-Bin Zeng
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013 Hunan People's Republic of China
| | - Alex F Chen
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013 Hunan People's Republic of China
| |
Collapse
|
25
|
Murrell DS, Cortes-Ciriano I, van Westen GJP, Stott IP, Bender A, Malliavin TE, Glen RC. Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules. J Cheminform 2015; 7:45. [PMID: 26322135 PMCID: PMC4551546 DOI: 10.1186/s13321-015-0086-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 07/03/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. RESULTS camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). CONCLUSIONS Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package.
Collapse
Affiliation(s)
- Daniel S Murrell
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Isidro Cortes-Ciriano
- Unite de Bioinformatique Structurale, Structural Biology and Chemistry Department, Institut Pasteur and CNRS UMR 3825, 25-28, rue Dr. Roux, 75 724 Paris, France
| | - Gerard J P van Westen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB101SD UK
| | - Ian P Stott
- Unilever Research, Port Sunlight Laboratory, Bebington, L63 3JW Wirral UK
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Thérèse E Malliavin
- Unite de Bioinformatique Structurale, Structural Biology and Chemistry Department, Institut Pasteur and CNRS UMR 3825, 25-28, rue Dr. Roux, 75 724 Paris, France
| | - Robert C Glen
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
26
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
27
|
Cortes-Ciriano I, Bender A, Malliavin TE. Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets. J Chem Inf Model 2015; 55:1413-25. [DOI: 10.1021/acs.jcim.5b00101] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Isidro Cortes-Ciriano
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, Ile de France, France
| | - Andreas Bender
- Centre
for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Thérèse E. Malliavin
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, Ile de France, France
| |
Collapse
|
28
|
Harigua-Souiai E, Cortes-Ciriano I, Desdouits N, Malliavin TE, Guizani I, Nilges M, Blondel A, Bouvier G. Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis. BMC Bioinformatics 2015; 16:93. [PMID: 25888251 PMCID: PMC4381396 DOI: 10.1186/s12859-015-0518-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 02/24/2015] [Indexed: 11/24/2022] Open
Abstract
Background Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands. Results We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS. Conclusion The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France. .,Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia. .,University of Carthage, Faculty of sciences of Bizerte - Tunisia, Jarzouna, 7021, Tunisia.
| | - Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Nathan Desdouits
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Thérèse E Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia.
| | - Michael Nilges
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Arnaud Blondel
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Guillaume Bouvier
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| |
Collapse
|
29
|
Cortés-Ciriano I, Bender A, Malliavin T. Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction. Mol Inform 2015; 34:357-66. [DOI: 10.1002/minf.201400165] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2014] [Accepted: 01/20/2015] [Indexed: 11/07/2022]
|
30
|
Cortes-Ciriano I, van Westen GJP, Murrell DS, Lenselink EB, Bender A, Malliavin TE. Applications of proteochemometrics - from species extrapolation to cell line sensitivity modelling. BMC Bioinformatics 2015. [PMCID: PMC4340128 DOI: 10.1186/1471-2105-16-s3-a4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
31
|
Cortes-Ciriano I, Murrell DS, van Westen GJ, Bender A, Malliavin TE. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling. J Cheminform 2015; 7:1. [PMID: 25705261 PMCID: PMC4335128 DOI: 10.1186/s13321-014-0049-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 11/21/2014] [Indexed: 12/16/2022] Open
Abstract
Cyclooxygenases (COX) are present in the body in two isoforms, namely: COX-1, constitutively expressed, and COX-2, induced in physiopathological conditions such as cancer or chronic inflammation. The inhibition of COX with non-steroideal anti-inflammatory drugs (NSAIDs) is the most widely used treatment for chronic inflammation despite the adverse effects associated to prolonged NSAIDs intake. Although selective COX-2 inhibition has been shown not to palliate all adverse effects (e.g. cardiotoxicity), there are still niche populations which can benefit from selective COX-2 inhibition. Thus, capitalizing on bioactivity data from both isoforms simultaneously would contribute to develop COX inhibitors with better safety profiles. We applied ensemble proteochemometric modeling (PCM) for the prediction of the potency of 3,228 distinct COX inhibitors on 11 mammalian cyclooxygenases. Ensemble PCM models ([Formula: see text], and RMSEtest = 0.71) outperformed models exclusively trained on compound ([Formula: see text], and RMSEtest = 1.09) or protein descriptors ([Formula: see text] and RMSEtest = 1.10) on the test set. Moreover, PCM predicted COX potency for 1,086 selective and non-selective COX inhibitors with [Formula: see text] and RMSEtest = 0.76. These values are in agreement with the maximum and minimum achievable [Formula: see text] and RMSEtest values of approximately 0.68 for both metrics. Confidence intervals for individual predictions were calculated from the standard deviation of the predictions from the individual models composing the ensembles. Finally, two substructure analysis pipelines singled out chemical substructures implicated in both potency and selectivity in agreement with the literature. Graphical AbstractPrediction of uncorrelated bioactivity profiles for mammalian COX inhibitors with Ensemble Proteochemometric Modeling.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| | - Daniel S Murrell
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Gerard Jp van Westen
- European Molecular Biology Laboratory European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse E Malliavin
- Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825, 25, rue du Dr Roux, Paris, 75015 France
| |
Collapse
|
32
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|