1
|
Zhong Y, Seoighe C, Yang H. Non-Negative matrix factorization combined with kernel regression for the prediction of adverse drug reaction profiles. BIOINFORMATICS ADVANCES 2024; 4:vbae009. [PMID: 38736682 PMCID: PMC11087822 DOI: 10.1093/bioadv/vbae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/11/2024] [Accepted: 01/18/2024] [Indexed: 05/14/2024]
Abstract
Motivation Post-market unexpected Adverse Drug Reactions (ADRs) are associated with significant costs, in both financial burden and human health. Due to the high cost and time required to run clinical trials, there is significant interest in accurate computational methods that can aid in the prediction of ADRs for new drugs. As a machine learning task, ADR prediction is made more challenging due to a high degree of class imbalance and existing methods do not successfully balance the requirement to detect the minority cases (true positives for ADR), as measured by the Area Under the Precision-Recall (AUPR) curve with the ability to separate true positives from true negatives [as measured by the Area Under the Receiver Operating Characteristic (AUROC) curve]. Surprisingly, the performance of most existing methods is worse than a naïve method that attributes ADRs to drugs according to the frequency with which the ADR has been observed over all other drugs. The existing advanced methods applied do not lead to substantial gains in predictive performance. Results We designed a rigorous evaluation to provide an unbiased estimate of the performance of ADR prediction methods: Nested Cross-Validation and a hold-out set were adopted. Among the existing methods, Kernel Regression (KR) performed best in AUPR but had a disadvantage in AUROC, relative to other methods, including the naïve method. We proposed a novel method that combines non-negative matrix factorization with kernel regression, called VKR. This novel approach matched or exceeded the performance of existing methods, overcoming the weakness of the existing methods. Availability Code and data are available on https://github.com/YezhaoZhong/VKR.
Collapse
Affiliation(s)
- Yezhao Zhong
- School of Mathematical & Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Cathal Seoighe
- School of Mathematical & Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| | - Haixuan Yang
- School of Mathematical & Statistical Sciences, University of Galway, Galway H91 TK33, Ireland
| |
Collapse
|
2
|
Zhang F, Sun B, Diao X, Zhao W, Shu T. Prediction of adverse drug reactions based on knowledge graph embedding. BMC Med Inform Decis Mak 2021; 21:38. [PMID: 33541342 PMCID: PMC7863488 DOI: 10.1186/s12911-021-01402-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 01/19/2021] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) are an important concern in the medication process and can pose a substantial economic burden for patients and hospitals. Because of the limitations of clinical trials, it is difficult to identify all possible ADRs of a drug before it is marketed. We developed a new model based on data mining technology to predict potential ADRs based on available drug data. METHOD Based on the Word2Vec model in Nature Language Processing, we propose a new knowledge graph embedding method that embeds drugs and ADRs into their respective vectors and builds a logistic regression classification model to predict whether a given drug will have ADRs. RESULT First, a new knowledge graph embedding method was proposed, and comparison with similar studies showed that our model not only had high prediction accuracy but also was simpler in model structure. In our experiments, the AUC of the classification model reached a maximum of 0.87, and the mean AUC was 0.863. CONCLUSION In this paper, we introduce a new method to embed knowledge graph to vectorize drugs and ADRs, then use a logistic regression classification model to predict whether there is a causal relationship between them. The experiment showed that the use of knowledge graph embedding can effectively encode drugs and ADRs. And the proposed ADRs prediction system is also very effective.
Collapse
Affiliation(s)
- Fei Zhang
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Bo Sun
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Xiaolin Diao
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Wei Zhao
- Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 North Lishi Road, Xicheng District, Beijing, 100037 China
| | - Ting Shu
- National Institute of Hospital Administration, National Health Commission, Building 3, Yard 6, Shouti South Road, Haidian, Beijing, 100044 China
| |
Collapse
|
3
|
Wang M, Ma X, Si J, Tang H, Wang H, Li T, Ouyang W, Gong L, Tang Y, He X, Huang W, Liu X. Adverse Drug Reaction Discovery Using a Tumor-Biomarker Knowledge Graph. Front Genet 2021; 11:625659. [PMID: 33584816 PMCID: PMC7873847 DOI: 10.3389/fgene.2020.625659] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 12/09/2020] [Indexed: 12/14/2022] Open
Abstract
Adverse drug reactions (ADRs) are a major public health concern, and early detection is crucial for drug development and patient safety. Together with the increasing availability of large-scale literature data, machine learning has the potential to predict unknown ADRs from current knowledge. By the machine learning methods, we constructed a Tumor-Biomarker Knowledge Graph (TBKG) which contains four types of node: Tumor, Biomarker, Drug, and ADR using biomedical literatures. Based on this knowledge graph, we not only discovered potential ADRs of antitumor drugs but also provided explanations. Experiments on real-world data show that our model can achieve 0.81 accuracy of three cross-validation and the ADRs discovery of Osimertinib was chosen for the clinical validation. Calculated ADRs of Osimertinib by our model consisted of the known ADRs which were in line with the official manual and some unreported rare ADRs in clinical cases. Results also showed that our model outperformed traditional co-occurrence methods. Moreover, each calculated ADRs were attached with the corresponding paths of “tumor-biomarker-drug” in the knowledge graph which could help to obtain in-depth insights into the underlying mechanisms. In conclusion, the tumor-biomarker knowledge-graph based approach is an explainable method for potential ADRs discovery based on biomarkers and might be valuable to the community working on the emerging field of biomedical literature mining and provide impetus for the mechanism research of ADRs.
Collapse
Affiliation(s)
- Meng Wang
- School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Xinyu Ma
- School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Jingwen Si
- Department of Pharmaceutical Sciences, Tsinghua University, Beijing, China
| | - Hongjia Tang
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Haofen Wang
- College of Design and Innovation, Tongji University, Shanghai, China
| | - Tunliang Li
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Wen Ouyang
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Liying Gong
- Department of Intensive Care Unit, Third Xiangya Hospital, Central South University, Changsha, China
| | - Yongzhong Tang
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Xi He
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Wei Huang
- Department of Cardiology, Third Xiangya Hospital, Central South University, Changsha, China
| | - Xing Liu
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
4
|
Xue R, Liao J, Shao X, Han K, Long J, Shao L, Ai N, Fan X. Prediction of Adverse Drug Reactions by Combining Biomedical Tripartite Network and Graph Representation Model. Chem Res Toxicol 2019; 33:202-210. [PMID: 31777246 DOI: 10.1021/acs.chemrestox.9b00238] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
As one of the primary contributors to high clinical attrition rates of drugs, toxicity evaluation is of critical significance to new drug discovery. Unsurprisingly, a vast number of computational methods have been developed at various stages of development pipeline to evaluate potential adverse drug reactions (ADRs). Despite previous success of these methods on individual ADR or certain drug family, there are great challenges to toxicity evaluation. In this study, a novel strategy was developed to predict the drug-ADR associations by combining deep learning and the biomedical tripartite network. This heterogeneous network contains biomedical linked data of three entities, for example, drugs, targets, and ADRs. For the first time, GraRep, a deep learning method for distributed representations, is introduced to learn graph representations and identify hidden features from the tripartite network which are further used for ADR prediction. Through this approach, drug-ADR associations could possibly be discovered from a systemic perspective. The accuracy of our method is 0.95 based on internal resource validation and 0.88 based on external resource validation. Moreover, our results show the prediction accuracy using the tripartite network is better than the one with bipartite network, suggesting the model performance can be improved with further enrichment on information. According to the result of 10-fold cross validation, the deep learning model outperforms two traditional methods (topology-based measures and chemical structure-based measures). Additionally, predictive models are also constructed using other deep learning methods, and comparable results are achieved. In summary, the biomedical tripartite network-based deep learning model proposed here proves to offer a promising solution for prediction of ADRs.
Collapse
Affiliation(s)
- Rui Xue
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Jie Liao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Ke Han
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Jingbo Long
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Li Shao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine , Zhejiang University , 79 Qingchun Road , Hangzhou , 310003 , China
| | - Ni Ai
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences , Zhejiang University , Hangzhou 310058 , China
| |
Collapse
|
5
|
Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.028] [Citation(s) in RCA: 148] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
6
|
Dong J, Yao ZJ, Zhang L, Luo F, Lin Q, Lu AP, Chen AF, Cao DS. PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions. J Cheminform 2018; 10:16. [PMID: 29556758 PMCID: PMC5861255 DOI: 10.1186/s13321-018-0270-2] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 03/12/2018] [Indexed: 11/15/2022] Open
Abstract
Background
With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. Results Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. Conclusion PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html.![]()
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Lin Zhang
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Feijun Luo
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Qinlu Lin
- College of Food Science and Engineering, National Engineering Laboratory for Deep Processing of Rice and Byproducts, Central South University of Forestry and Technology, Changsha, China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China
| | - Alex F Chen
- Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, China. .,Center for Vascular Disease and Translational Medicine, Third Xiangya Hospital, Central South University, Changsha, People's Republic of China.
| |
Collapse
|
7
|
Kanwa N, De SK, Adhikari C, Chakraborty A. Spectroscopic Study of the Interaction of Carboxyl-Modified Gold Nanoparticles with Liposomes of Different Chain Lengths and Controlled Drug Release by Layer-by-Layer Technology. J Phys Chem B 2017; 121:11333-11343. [DOI: 10.1021/acs.jpcb.7b08455] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Nishu Kanwa
- Discipline of Chemistry, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India 453552
| | - Soumya Kanti De
- Discipline of Chemistry, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India 453552
| | - Chandan Adhikari
- Discipline of Chemistry, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India 453552
| | - Anjan Chakraborty
- Discipline of Chemistry, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India 453552
| |
Collapse
|
8
|
Bean DM, Wu H, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, Stewart R, Dobson RJB. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep 2017; 7:16416. [PMID: 29180758 PMCID: PMC5703951 DOI: 10.1038/s41598-017-16674-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 11/16/2017] [Indexed: 01/31/2023] Open
Abstract
Unknown adverse reactions to drugs available on the market present a significant health risk and limit accurate judgement of the cost/benefit trade-off for medications. Machine learning has the potential to predict unknown adverse reactions from current knowledge. We constructed a knowledge graph containing four types of node: drugs, protein targets, indications and adverse reactions. Using this graph, we developed a machine learning algorithm based on a simple enrichment test and first demonstrated this method performs extremely well at classifying known causes of adverse reactions (AUC 0.92). A cross validation scheme in which 10% of drug-adverse reaction edges were systematically deleted per fold showed that the method correctly predicts 68% of the deleted edges on average. Next, a subset of adverse reactions that could be reliably detected in anonymised electronic health records from South London and Maudsley NHS Foundation Trust were used to validate predictions from the model that are not currently known in public databases. High-confidence predictions were validated in electronic records significantly more frequently than random models, and outperformed standard methods (logistic regression, decision trees and support vector machines). This approach has the potential to improve patient safety by predicting adverse reactions that were not observed during randomised trials.
Collapse
Affiliation(s)
- Daniel M Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Honghan Wu
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Ehtesham Iqbal
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Olubanke Dzahini
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom.,Institute of Pharmaceutical Science, King's College, London, 5th Floor, Franklin-Wilkins Building, 150 Stamford Street, London, SE1 9NH, United Kingdom
| | - Zina M Ibrahim
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom.,Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom
| | - Matthew Broadbent
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom
| | - Robert Stewart
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, United Kingdom.,Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom
| | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom. .,Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
9
|
Affiliation(s)
- Saeed Alqahtani
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
10
|
Maggiora G, Gokhale V. A simple mathematical approach to the analysis of polypharmacology and polyspecificity data. F1000Res 2017; 6:Chem Inf Sci-788. [PMID: 28690829 PMCID: PMC5482344 DOI: 10.12688/f1000research.11517.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/30/2017] [Indexed: 12/23/2022] Open
Abstract
There many possible types of drug-target interactions, because there are a surprising number of ways in which drugs and their targets can associate with one another. These relationships are expressed as polypharmacology and polyspecificity. Polypharmacology is the capability of a given drug to exhibit activity with respect to multiple drug targets, which are not necessarily in the same activity class. Adverse drug reactions ('side effects') are its principal manifestation, but polypharmacology is also playing a role in the repositioning of existing drugs for new therapeutic indications. Polyspecificity, on the other hand, is the capability of a given target to exhibit activity with respect to multiple, structurally dissimilar drugs. That these concepts are closely related to one another is, surprisingly, not well known. It will be shown in this work that they are, in fact, mathematically related to one another and are in essence 'two sides of the same coin'. Hence, information on polypharmacology provides equivalent information on polyspecificity, and vice versa. Networks are playing an increasingly important role in biological research. Drug-target networks, in particular, are made up of drug nodes that are linked to specific target nodes if a given drug is active with respect to that target. Such networks provide a graphic depiction of polypharmacology and polyspecificity. However, by their very nature they can obscure information that may be useful in their interpretation and analysis. This work will show how such latent information can be used to determine bounds for the degrees of polypharmacology and polyspecificity, and how to estimate other useful features associated with the lack of completeness of most drug-target datasets.
Collapse
Affiliation(s)
- Gerry Maggiora
- BIO5 Institute, University of Arizona, 1657 East Helen Street, Tucson, AZ, 85719, USA
| | - Vijay Gokhale
- BIO5 Institute, University of Arizona, 1657 East Helen Street, Tucson, AZ, 85719, USA
| |
Collapse
|
11
|
Dong J, Yao ZJ, Zhu MF, Wang NN, Lu B, Chen AF, Lu AP, Miao H, Zeng WB, Cao DS. ChemSAR: an online pipelining platform for molecular SAR modeling. J Cheminform 2017; 9:27. [PMID: 29086046 PMCID: PMC5418185 DOI: 10.1186/s13321-017-0215-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 04/24/2017] [Indexed: 12/31/2022] Open
Abstract
Background In recent years, predictive models based on machine learning techniques have proven to be feasible and effective in drug discovery. However, to develop such a model, researchers usually have to combine multiple tools and undergo several different steps (e.g., RDKit or ChemoPy package for molecular descriptor calculation, ChemAxon Standardizer for structure preprocessing, scikit-learn package for model building, and ggplot2 package for statistical analysis and visualization, etc.). In addition, it may require strong programming skills to accomplish these jobs, which poses severe challenges for users without advanced training in computer programming. Therefore, an online pipelining platform that integrates a number of selected tools is a valuable and efficient solution that can meet the needs of related researchers. Results This work presents a web-based pipelining platform, called ChemSAR, for generating SAR classification models of small molecules. The capabilities of ChemSAR include the validation and standardization of chemical structure representation, the computation of 783 1D/2D molecular descriptors and ten types of widely-used fingerprints for small molecules, the filtering methods for feature selection, the generation of predictive models via a step-by-step job submission process, model interpretation in terms of feature importance and tree visualization, as well as a helpful report generation system. The results can be visualized as high-quality plots and downloaded as local files. Conclusion ChemSAR provides an integrated web-based platform for generating SAR classification models that will benefit cheminformatics and other biomedical users. It is freely available at: http://chemsar.scbdd.com.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0215-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Zhi-Jiang Yao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Min-Feng Zhu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ning-Ning Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Ben Lu
- The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Alex F Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China.,The Third Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China
| | - Hongyu Miao
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Wen-Bin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Yuelu District, Changsha, People's Republic of China. .,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
12
|
Abstract
Drug discovery is a multidisciplinary and multivariate optimization endeavor. As such, in silico screening tools have gained considerable importance to archive, analyze and exploit the vast and ever-increasing amount of experimental data generated throughout the process. The current review will focus on the computer-aided prediction of the numerous properties that need to be controlled during the discovery of a preliminary hit and its promotion to a viable clinical candidate. It does not pretend to the almost impossible task of an exhaustive report but will highlight a few key points that need to be collectively addressed both by chemists and biologists to fuel the drug discovery pipeline with innovative and safe drug candidates.
Collapse
Affiliation(s)
- Didier Rognan
- Laboratoire d'Innovation Thérapeutique, UMR 7200 CNRS-Université de Strasbourg, 74 route du Rhin, 67400 Illkirch, France.
| |
Collapse
|
13
|
Siragusa L, Luciani R, Borsari C, Ferrari S, Costi MP, Cruciani G, Spyrakis F. Comparing Drug Images and Repurposing Drugs with BioGPS and FLAPdock: The Thymidylate Synthase Case. ChemMedChem 2016; 11:1653-66. [PMID: 27404817 DOI: 10.1002/cmdc.201600121] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 06/08/2016] [Indexed: 12/14/2022]
Abstract
Repurposing and repositioning drugs has become a frequently pursued and successful strategy in the current era, as new chemical entities are increasingly difficult to find and get approved. Herein we report an integrated BioGPS/FLAPdock pipeline for rapid and effective off-target identification and drug repurposing. Our method is based on the structural and chemical properties of protein binding sites, that is, the ligand image, encoded in the GRID molecular interaction fields (MIFs). Protein similarity is disclosed through the BioGPS algorithm by measuring the pockets' overlap according to which pockets are clustered. Co-crystallized and known ligands can be cross-docked among similar targets, selected for subsequent in vitro binding experiments, and possibly improved for inhibitory potency. We used human thymidylate synthase (TS) as a test case and searched the entire RCSB Protein Data Bank (PDB) for similar target pockets. We chose casein kinase IIα as a control and tested a series of its inhibitors against the TS template. Ellagic acid and apigenin were identified as TS inhibitors, and various flavonoids were selected and synthesized in a second-round selection. The compounds were demonstrated to be active in the low-micromolar range.
Collapse
Affiliation(s)
- Lydia Siragusa
- Molecular Discovery Limited, 215 Marsh Road, Pinner Middlesex, London, HA5 5NE, UK
| | - Rosaria Luciani
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Chiara Borsari
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Stefania Ferrari
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Maria Paola Costi
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Gabriele Cruciani
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Via Elce di Sotto 8, 06123, Perugia, Italy
| | - Francesca Spyrakis
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy. .,Department of Food Science, University of Parma, Viale delle Scienze 17A, 43124, Parma, Italy.
| |
Collapse
|