1
|
Zhou M, Zheng C, Xu R. Combining phenome-driven drug-target interaction prediction with patients' electronic health records-based clinical corroboration toward drug discovery. Bioinformatics 2021; 36:i436-i444. [PMID: 32657406 PMCID: PMC7355254 DOI: 10.1093/bioinformatics/btaa451] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Predicting drug–target interactions (DTIs) using human phenotypic data have the potential in eliminating the translational gap between animal experiments and clinical outcomes in humans. One challenge in human phenome-driven DTI predictions is integrating and modeling diverse drug and disease phenotypic relationships. Leveraging large amounts of clinical observed phenotypes of drugs and diseases and electronic health records (EHRs) of 72 million patients, we developed a novel integrated computational drug discovery approach by seamlessly combining DTI prediction and clinical corroboration. Results We developed a network-based DTI prediction system (TargetPredict) by modeling 855 904 phenotypic and genetic relationships among 1430 drugs, 4251 side effects, 1059 diseases and 17 860 genes. We systematically evaluated TargetPredict in de novo cross-validation and compared it to a state-of-the-art phenome-driven DTI prediction approach. We applied TargetPredict in identifying novel repositioned candidate drugs for Alzheimer’s disease (AD), a disease affecting over 5.8 million people in the United States. We evaluated the clinical efficiency of top repositioned drug candidates using EHRs of over 72 million patients. The area under the receiver operating characteristic (ROC) curve was 0.97 in the de novo cross-validation when evaluated using 910 drugs. TargetPredict outperformed a state-of-the-art phenome-driven DTI prediction system as measured by precision–recall curves [measured by average precision (MAP): 0.28 versus 0.23, P-value < 0.0001]. The EHR-based case–control studies identified that the prescriptions top-ranked repositioned drugs are significantly associated with lower odds of AD diagnosis. For example, we showed that the prescription of liraglutide, a type 2 diabetes drug, is significantly associated with decreased risk of AD diagnosis [adjusted odds ratios (AORs): 0.76; 95% confidence intervals (CI) (0.70, 0.82), P-value < 0.0001]. In summary, our integrated approach that seamlessly combines computational DTI prediction and large-scale patients’ EHRs-based clinical corroboration has high potential in rapidly identifying novel drug targets and drug candidates for complex diseases. Availability and implementation nlp.case.edu/public/data/TargetPredict.
Collapse
Affiliation(s)
- Mengshi Zhou
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA.,Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Chunlei Zheng
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
2
|
Wang Q, Xu R. CoMNRank: An integrated approach to extract and prioritize human microbial metabolites from MEDLINE records. J Biomed Inform 2020; 109:103524. [PMID: 32791237 DOI: 10.1016/j.jbi.2020.103524] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 07/17/2020] [Accepted: 07/29/2020] [Indexed: 02/06/2023]
Abstract
MOTIVATION Trillions of bacteria in human body (human microbiota) affect human health and diseases by controlling host functions through small molecule metabolites.An accurate and comprehensive catalog of the metabolic output from human microbiota is critical for our deep understanding of how microbial metabolism contributes to human health.The large number of published biomedical research articles is a rich resource of microbiome studies.However, automatically extracting microbial metabolites from free-text documents and differentiating them from other human metabolites is a challenging task.Here we developed an integrated approach called Co-occurrence Metabolite Network Ranking (CoMNRank) by combining named entity extraction, network construction and topic sensitive network-based prioritization to extract and prioritize microbial metabolites from biomedical articles. METHODS The text data included 28,851,232 MEDLINE records.CoMNRank consists of three steps: (1) extraction of human metabolites from MEDLINE records; (2) construction of a weighted co-occurrence metabolite network (CoMN); (3) prioritization and differentiation of microbial metabolites from other human metabolites. RESULTS For the first step of CoMNRank, we extracted 11,846 human metabolites from MEDLINE articles, with a baseline performance of precision of 0.014, recall of 0.959 and F1 of 0.028.We then constructed a weighted CoMN of 6,996 nodes and 986,186 edges.CoMNRank effectively prioritized microbial metabolites: the precision of top ranked metabolites is 0.45, a 31-fold enrichment as compared to the overall precision of 0.014.Manual curation of top 100 metabolites showed a true precision of 0.67, among which 48% true positives are not captured by existing databases. CONCLUSION Our study sets the foundation for future tasks of microbial entity and relationship extractions as well as data-driven studies of how microbial metabolism contributes to human health and diseases.
Collapse
Affiliation(s)
- QuanQiu Wang
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States.
| |
Collapse
|
3
|
Yan CK, Wang WX, Zhang G, Wang JL, Patel A. BiRWDDA: A Novel Drug Repositioning Method Based on Multisimilarity Fusion. J Comput Biol 2019; 26:1230-1242. [DOI: 10.1089/cmb.2019.0063] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Affiliation(s)
- Chao-Kun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Wen-Xiu Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jian-Lin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | | |
Collapse
|
4
|
Luo L, Zheng C, Wang J, Tan M, Li Y, Xu R. Analysis of disease organ as a novel phenotype towards disease genetics understanding. J Biomed Inform 2019; 95:103235. [PMID: 31207382 PMCID: PMC6644057 DOI: 10.1016/j.jbi.2019.103235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 11/24/2022]
Abstract
Discerning the modular nature of human diseases through computational approaches calls for diverse data. The finding sites of diseases, like other disease phenotypes, possess rich information in understanding disease genetics. Yet, analysis of the rich knowledge of disease finding sites has not been comprehensively investigated. In this study, we built a large-scale disease organ network (DON) based on 76,561 disease-organ associations (for 37,615 diseases and 3492 organs) extracted from the United Medical Language System (UMLS) Metathesaurus. We investigated how phenotypic organ similarity among diseases in DON reflects disease gene sharing. We constructed a disease genetic network (DGN) using curated disease-gene associations and demonstrated that disease pairs with higher organ similarities not only are more likely to share genes, but also tend to share more genes. Based on community detection algorithm, we showed that phenotypic disease clusters on DON significantly correlated with genetic disease clusters on DGN. We compared DON with a state-of-art disease phenotype network, disease manifestation network (DMN), that we have recently constructed, and demonstrated that DON contains complementary knowledge for disease genetics understanding.
Collapse
Affiliation(s)
- Lingyun Luo
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China; Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA.
| | - Chunlei Zheng
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Jiaolong Wang
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
| | - Minsheng Tan
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
| | - Yanshu Li
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
5
|
Zhou M, Chen Y, Xu R. A Drug-Side Effect Context-Sensitive Network approach for drug target prediction. Bioinformatics 2019; 35:2100-2107. [PMID: 30428013 PMCID: PMC6581434 DOI: 10.1093/bioinformatics/bty906] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 10/05/2018] [Accepted: 11/13/2018] [Indexed: 01/21/2023] Open
Abstract
SUMMARY Computational drug target prediction has become an important process in drug discovery. Network-based approaches are commonly used in computational drug-target interaction (DTI) prediction. Existing network-based approaches are limited in capturing the contextual information on how diseases, drugs and genes are connected. Here, we proposed a context-sensitive network (CSN) model for DTI prediction by modeling contextual drug phenotypic relationships. We constructed a Drug-Side Effect Context-Sensitive Network (DSE-CSN) of 139 760 drug-side effect pairs, representing 1480 drugs and 5868 side effects. We also built a protein-protein interaction network (PPIN) of 15 267 gene nodes and 178 972 weighted edges. A heterogeneous network was built by connecting the DSE-CSN and the PPIN through 3684 known DTIs. For each drug on the DSE-CSN, its genetic targets were predicted and prioritized using a network-based ranking algorithm. Our approach was evaluated in both de novo and leave-one-out cross-validation analysis using known DTIs as the gold standard. We compared our DSE-CSN-based model to the traditional similarity-based network (SBN)-based prediction model. The results suggested that the DSE-CSN-based model was able to rank known DTIs highly. In a de novo cross-validation, the area under the receiver operating characteristic (ROC) curve was 0.95. In a leave-one-out cross-validation, the average rank was top 3.2% for known DTIs. When it was compared to the SBN-based model using the Precision-Recall curve, our CSN-based model achieved a higher mean average precision (MAP) (0.23 versus 0.19, P-value<1e-4) in a de novo cross-validation analysis. We further improved the CSN-based DTI prediction by differentially weighting the drug-side effect pairs on the network and showed a significant improvement of the MAP (0.29 versus 0.23, P-value<1e-4). We also showed that the CSN-based model consistently achieved better performances than the traditional SBN-based model across different drug classes. Moreover, we demonstrated that our novel DTI predictions can be supported by published literature. In summary, the CSN-based model, by modeling the context-specific inter-relationships among drugs and side effects, has a high potential in drug target prediction. AVAILABILITY AND IMPLEMENTATION nlp/case/edu/public/data/DSE/CSN_DTI.
Collapse
Affiliation(s)
| | - Yang Chen
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
6
|
Li W, Zhang Y, He Y, Wang Y, Guo S, Zhao X, Feng Y, Song Z, Zou Y, He W, Chen L. Candidate gene prioritization for non-communicable diseases based on functional information: Case studies. J Biomed Inform 2019; 93:103155. [PMID: 30902596 DOI: 10.1016/j.jbi.2019.103155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 03/14/2019] [Accepted: 03/19/2019] [Indexed: 10/27/2022]
Abstract
Candidate gene prioritization for complex non-communicable diseases is essential to understanding the mechanism and developing better means for diagnosing and treating these diseases. Many methods have been developed to prioritize candidate genes in protein-protein interaction (PPI) networks. Integrating functional information/similarity into disease-related PPI networks could improve the performance of prioritization. In this study, a candidate gene prioritization method was proposed for non-communicable diseases considering disease risks transferred between genes in weighted disease PPI networks with weights for nodes and edges based on functional information. Here, three types of non-communicable diseases with pathobiological similarity, Type 2 diabetes (T2D), coronary artery disease (CAD) and dilated cardiomyopathy (DCM), were used as case studies. Literature review and pathway enrichment analysis of top-ranked genes demonstrated the effectiveness of our method. Better performance was achieved after comparing our method with other existing methods. Pathobiological similarity among these three diseases was further investigated for common top-ranked genes to reveal their pathogenesis.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Shanshan Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Xilei Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuyan Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Zhaona Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuqing Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin 150000, Heilongjiang Province, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China.
| |
Collapse
|
7
|
Chen Y, Xu R. Context-sensitive network analysis identifies food metabolites associated with Alzheimer's disease: an exploratory study. BMC Med Genomics 2019; 12:17. [PMID: 30704467 PMCID: PMC6357669 DOI: 10.1186/s12920-018-0459-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Diet plays an important role in Alzheimer's disease (AD) initiation, progression and outcomes. Previous studies have shown individual food-derived substances may have neuroprotective or neurotoxic effects. However, few works systematically investigate the role of food and food-derived metabolites on the development and progression of AD. METHODS In this study, we systematically investigated 7569 metabolites and identified AD-associated food metabolites using a novel network-based approach. We constructed a context-sensitive network to integrate heterogeneous chemical and genetic data, and to model context-specific inter-relationships among foods, metabolites, human genes and AD. RESULTS Our metabolite prioritization algorithm ranked 59 known AD-associated food metabolites within top 4.9%, which is significantly higher than random expectation. Interestingly, a few top-ranked food metabolites were specifically enriched in herbs and spices. Pathway enrichment analysis shows that these top-ranked herb-and-spice metabolites share many common pathways with AD, including the amyloid processing pathway, which is considered as a hallmark in AD-affected brains and has pathological roles in AD development. CONCLUSIONS Our study represents the first unbiased systems approach to characterizing the effects of food and food-derived metabolites in AD pathogenesis. Our ranking approach prioritizes the known AD-associated food metabolites, and identifies interesting relationships between AD and the food group "herbs and spices". Overall, our study provides intriguing evidence for the role of diet, as an important environmental factor, in AD etiology.
Collapse
Affiliation(s)
- Yang Chen
- Department of Population and Quantitative Health Science, School of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Rong Xu
- Department of Population and Quantitative Health Science, School of Medicine, Case Western Reserve University, Cleveland, OH 44106 USA
| |
Collapse
|
8
|
Wang Q, Xu R. Disease comorbidity-guided drug repositioning: a case study in schizophrenia. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:1300-1309. [PMID: 30815174 PMCID: PMC6371343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The key to any computational drug repositioning is the availability of relevant data in machine-understandable format. While large amount of genetic, genomic and chemical data are publicly available, large-scale higher-level disease and drug phenotypic data are limited. We recently constructed a large-scale disease-comorbidity relationship knowledge base (dCombKB) and a comprehensive drug-treatment relationship knowledge base (TreatKB) from 21 million biomedical research articles and other resources. In this study, we demonstrated the potential of dCombKB and TreatKB in drug repositioning for schizophrenia, one of the top ten illnesses contributing to the global burden of disease. dCombKB contains 121,359 unique disease-disease comorbidity pairs for 23,041 diseases. TreatKB contains 208,330 unique drug-disease treatment pairs for 2,484 drugs and 24,511 diseases. We constructed a phenotypic comorbidity disease network (PDN) of 14,645 disease nodes and 101,275 edges based on dCombKB. We applied standard network-based ranking algorithm to find diseases that are phenotypically related to SCZ. We developed a drug prioritization system, PhenoPredict-CDN, to systematically reposition drugs for SCZ from diseases phenotypically related to SCZ. PhenoPredict-CDN found all 18 FDA-approved SCZ drugs and ranked them highly as tested in a de-novo validation setting (recall: 1.0, mean ranking: top 6.05%, median ranking: top 1.65%). When compared to PREDICT, a comprehensive drug repositioning system, for novel predictions, Pheno-Predict-CDN outperformed PREDICT in Precision-Recall (PR) curves across three different evaluation datasets. Compared to PREDICT, PhenoPredict-CDN showed a significant 110.0-230.0% improvements in mean average precision. In summary, large-scale higher-level disease-comorbidity relationships data extracted from biomedical literature has potential in drug discovery for SCZ, a complex disease with unknown pathophysiological mechanisms. All the data are publicly available: dCombKB at http://nlp. CASE edu/public/data/dCombKB, TreatKB at http://nlp. CASE edu/public/data/treatKB/, and predictions for SCZ at http://nlp. CASE edu/public/data/SCZ_CDN/.
Collapse
Affiliation(s)
- QuanQiu Wang
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland OH 44106
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland OH 44106
| |
Collapse
|
9
|
Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 2018; 19:1236-1246. [PMID: 28481991 PMCID: PMC6455466 DOI: 10.1093/bib/bbx044] [Citation(s) in RCA: 870] [Impact Index Per Article: 124.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 02/19/2017] [Indexed: 02/07/2023] Open
Abstract
Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.
Collapse
Affiliation(s)
- Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY
| | - Fei Wang
- Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY
| | - Shuang Wang
- Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics at the University of California San Diego, La Jolla, CA
| | - Joel T Dudley
- the Institute for Next Generation Healthcare and associate professor in the Department of Genetics and Genomic Sciences at the Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
10
|
GC[Formula: see text]NMF: A Novel Matrix Factorization Framework for Gene-Phenotype Association Prediction. Interdiscip Sci 2018; 10:572-582. [PMID: 29691712 DOI: 10.1007/s12539-018-0296-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 03/05/2018] [Accepted: 04/03/2018] [Indexed: 10/17/2022]
Abstract
Gene-phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene-phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene-phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene-phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC[Formula: see text]NMF). Specifically, first we introduce the depth of parent-child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene-phenotype association datasets of mouse and human demonstrate that GC[Formula: see text]NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.
Collapse
|
11
|
Wang Q, Li L, Xu R. A systems biology approach to predict and characterize human gut microbial metabolites in colorectal cancer. Sci Rep 2018; 8:6225. [PMID: 29670137 PMCID: PMC5906656 DOI: 10.1038/s41598-018-24315-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 03/26/2018] [Indexed: 12/16/2022] Open
Abstract
Colorectal cancer (CRC) is the second leading cause of cancer-related deaths. It is estimated that about half the cases of CRC occurring today are preventable. Recent studies showed that human gut microbiota and their collective metabolic outputs play important roles in CRC. However, the mechanisms by which human gut microbial metabolites interact with host genetics in contributing CRC remain largely unknown. We hypothesize that computational approaches that integrate and analyze vast amounts of publicly available biomedical data have great potential in better understanding how human gut microbial metabolites are mechanistically involved in CRC. Leveraging vast amount of publicly available data, we developed a computational algorithm to predict human gut microbial metabolites for CRC. We validated the prediction algorithm by showing that previously known CRC-associated gut microbial metabolites ranked highly (mean ranking: top 10.52%; median ranking: 6.29%; p-value: 3.85E-16). Moreover, we identified new gut microbial metabolites likely associated with CRC. Through computational analysis, we propose potential roles for tartaric acid, the top one ranked metabolite, in CRC etiology. In summary, our data-driven computation-based study generated a large amount of associations that could serve as a starting point for further experiments to refute or validate these microbial metabolite associations in CRC cancer.
Collapse
Affiliation(s)
| | - Li Li
- Department of Family Medicine and Community Health, Case Comprehensive Cancer Center, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 2103 Cornell Road, Cleveland, Ohio, 44106, USA.
| |
Collapse
|
12
|
Wang Q, Xu R. Drug repositioning for prostate cancer: using a data-driven approach to gain new insights. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:1724-1733. [PMID: 29854243 PMCID: PMC5977574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
UNLABELLED Prostate cancer (PC) is the most common cancer and the third leading cause of cancer death in men worldwide. Despite its high incidence and mortality, the likelihood of a cure is low for late-stages of PC. There is an unmet need for more effective agents for treating PC. Here, we present a drug repositioning system, GenoPredict, for finding innovative drug candidates for treating PC. GenoPredict leverages upon a large amount of disease genomics data and a large-scale drug treatment knowledge base (TreatKB) that we recently constructed. We first constructed a genetic disease network (GDN) that comprised of 882 nodes and 200,758 edges and applied a network-based ranking algorithm to find diseases from GDN that are genetically related to PC. We developed a drug prioritization algorithm to reposition drugs from PC-related diseases to treat PC. When evaluated in a de-novo prediction setting using 27 FDA- approved PC drugs, GenoPredict found 25 of 27 FDA-approved PC drugs and ranked them highly (recall: 0.925, mean ranking: 27.3%, median ranking: 15.6%). When compared to PREDICT, a comprehensive drug repositioning system, in novel predictions, GenoPredict performed better than PREDICT across two evaluation datasets. GenoPredict achieved a mean average precision (MAP) of 0.447 when evaluated with 172 PC drugs extracted from 172,888 clinical trial reports, representing a 164.5% improvement as compared to a MAP of 0.169 for PREDICT. When evaluated with 72 PC drugs extracted from 43,811 ongoing clinical trial reports, GenoPredict achieved a MAP of 0.278, representing a 231.1% improvement as compared to a MAP of 0.084 for PREDICT. The data is publicly available at: http://nlp. CASE edu/public/data/PC_GenoPredict and http: //nlp. CASE edu/public/data/treatKB.
Collapse
Affiliation(s)
| | - Rong Xu
- Department of Epidemiology and Biostatistics, School of Medicine, Case Western Reserve University, Cleveland OH 44106
| |
Collapse
|
13
|
Ni J, Cheng W, Fan W, Zhang X. ComClus: A Self-Grouping Framework for Multi-Network Clustering. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2018; 30:435-448. [PMID: 30416320 PMCID: PMC6221474 DOI: 10.1109/tkde.2017.2771762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Joint clustering of multiple networks has been shown to be more accurate than performing clustering on individual networks separately. This is because multi-network clustering algorithms typically assume there is a common clustering structure shared by all networks, and different networks can provide compatible and complementary information for uncovering this underlying clustering structure. However, this assumption is too strict to hold in many emerging applications, where multiple networks usually have diverse data distributions. More popularly, the networks in consideration belong to different underlying groups. Only networks in the same underlying group share similar clustering structures. Better clustering performance can be achieved by considering such groups differently. As a result, an ideal method should be able to automatically detect network groups so that networks in the same group share a common clustering structure. To address this problem, we propose a new method, ComClus, to simultaneously group and cluster multiple networks. ComClus is novel in combining the clustering approach of non-negative matrix factorization (NMF) and the feature subspace learning approach of metric learning. Specifically, it treats node clusters as features of networks and learns proper subspaces from such features to differentiate different network groups. During the learning process, the two procedures of network grouping and clustering are coupled and mutually enhanced. Moreover, ComClus can effectively leverage prior knowledge on how to group networks such that network grouping can be conducted in a semi-supervised manner. This will enable users to guide the grouping process using domain knowledge so that network clustering accuracy can be further boosted. Extensive experimental evaluations on a variety of synthetic and real datasets demonstrate the effectiveness and scalability of the proposed method.
Collapse
Affiliation(s)
- Jingchao Ni
- College of Information Sciences and Technology, Pennsylvania State University, PA 16802 USA
| | - Wei Cheng
- NEC Laboratories America, NJ 08540 USA
| | - Wei Fan
- Baidu Research Big Data Lab, CA 94089 USA
| | - Xiang Zhang
- College of Information Sciences and Technology, Pennsylvania State University, PA 16802 USA
| |
Collapse
|
14
|
Zhang Y, Liu J, Liu X, Fan X, Hong Y, Wang Y, Huang Y, Xie M. Prioritizing disease genes with an improved dual label propagation framework. BMC Bioinformatics 2018; 19:47. [PMID: 29422030 PMCID: PMC5806269 DOI: 10.1186/s12859-018-2040-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 01/24/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. RESULTS A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes. CONCLUSIONS IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes. AVAILABILITY https://github.com/nkiip/IDLP.
Collapse
Affiliation(s)
- Yaogong Zhang
- College of Software, Nankai University, TianJin, 300350, China
| | - Jiahui Liu
- College of Software, Nankai University, TianJin, 300350, China
| | - Xiaohu Liu
- College of Software, Nankai University, TianJin, 300350, China
| | - Xin Fan
- College of Software, Nankai University, TianJin, 300350, China
| | - Yuxiang Hong
- College of Software, Nankai University, TianJin, 300350, China
| | - Yuan Wang
- School of Computer Science and Information Engineering, Tianjin University of Science and Technology, TianJin, 300222, China
| | - YaLou Huang
- College of Software, Nankai University, TianJin, 300350, China
| | - MaoQiang Xie
- College of Software, Nankai University, TianJin, 300350, China.
| |
Collapse
|
15
|
Using a novel computational drug-repositioning approach (DrugPredict) to rapidly identify potent drug candidates for cancer treatment. Oncogene 2017; 37:403-414. [PMID: 28967908 PMCID: PMC5799769 DOI: 10.1038/onc.2017.328] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 06/28/2017] [Accepted: 07/03/2017] [Indexed: 12/25/2022]
Abstract
Computation-based drug-repurposing/repositioning approaches can greatly speed up the traditional drug discovery process. To date, systematic and comprehensive computation-based approaches to identify and validate drug-repositioning candidates for epithelial ovarian cancer (EOC) have not been undertaken. Here, we present a novel drug discovery strategy that combines a computational drug-repositioning system (DrugPredict) with biological testing in cell lines in order to rapidly identify novel drug candidates for EOC. DrugPredict exploited unique repositioning opportunities rendered by a vast amount of disease genomics, phenomics, drug treatment, and genetic pathway and uniquely revealed that non-steroidal anti-inflammatories (NSAIDs) rank just as high as currently used ovarian cancer drugs. As epidemiological studies have reported decreased incidence of ovarian cancer associated with regular intake of NSAIDs, we assessed whether NSAIDs could have chemoadjuvant applications in EOC and found that (i) NSAID Indomethacin induces robust cell death in primary patient-derived platinum-sensitive and platinum- resistant ovarian cancer cells and ovarian cancer stem cells and (ii) downregulation of β-catenin is partially driving effects of Indomethacin in cisplatin-resistant cells. In summary, we demonstrate that DrugPredict represents an innovative computational drug- discovery strategy to uncover drugs that are routinely used for other indications that could be effective in treating various cancers, thus introducing a potentially rapid and cost-effective translational opportunity. As NSAIDs are already in routine use in gynecological treatment regimens and have acceptable safety profile, our results will provide with a rationale for testing NSAIDs as potential chemoadjuvants in EOC patient trials.
Collapse
|
16
|
Cai X, Chen Y, Zheng C, Xu R. Interrogating Patient-level Genomics and Mouse Phenomics towards Understanding Cytokines in Colorectal Cancer Metastasis. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:227-236. [PMID: 28815134 PMCID: PMC5543389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Background: Colorectal cancer is the second leading cancer-related death worldwide and a majority of patients die from metastasis. Chronic intestinal inflammation plays an important role in tumor progression of colorectal cancer. However, few study works on systematically predicting colorectal cancer metastasis using inflammatory cytokine genes. Results: We developed a supervised machine learning approach to predict colorectal cancer tumor progression using patient level genomic features. To better understand the role of cytokines, we integrated the metastatic-related genes from mouse phenotypic data. In addition, pathway analysis and network visualization were also applied to top significant genes ranked by feature weights of the final prediction model. The combined model of cytokines and mouse phenotypes achieved a predictive accuracy of 75.54%, higher than the model based on mouse phenotypes independently (70.42%, p-value<0.05). In additional, the combined model outperformed the model based on the existing metastatic-related epithelial-to-mesenchymal transition (EMT) genes (75.54% vs. 71.61%, p-value<0.05). We also observed that the most important cytokine gene features of the our model interact with the cancer driver genes and are highly associated with the colorectal cancer metastasis signaling pathway. Conclusion: We developed a combined model using both cytokine and mouse phenotype information to predict colorectal cancer metastasis. The results suggested that the inflammatory cytokines increase the power of predicting metastasis. We also systematically demonstrated the critical role of cytokines in progression of colorectal tumor.
Collapse
Affiliation(s)
- Xiaoshu Cai
- Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - Yang Chen
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Chunlei Zheng
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rong Xu
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
17
|
González-Pérez S, Pazos F, Chagoyen M. Factors affecting interactome-based prediction of human genes associated with clinical signs. BMC Bioinformatics 2017; 18:340. [PMID: 28715999 PMCID: PMC5514523 DOI: 10.1186/s12859-017-1754-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 07/12/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical signs are a fundamental aspect of human pathologies. While disease diagnosis is problematic or impossible in many cases, signs are easier to perceive and categorize. Clinical signs are increasingly used, together with molecular networks, to prioritize detected variants in clinical genomics pipelines, even if the patient is still undiagnosed. Here we analyze the ability of these network-based methods to predict genes that underlie clinical signs from the human interactome. RESULTS Our analysis reveals that these approaches can locate genes associated with clinical signs with variable performance that depends on the sign and associated disease. We analyzed several clinical and biological factors that explain these variable results, including number of genes involved (mono- vs. oligogenic diseases), mode of inheritance, type of clinical sign and gene product function. CONCLUSIONS Our results indicate that the characteristics of the clinical signs and their related diseases should be considered for interpreting the results of network-prediction methods, such as those aimed at discovering disease-related genes and variants. These results are important due the increasing use of clinical signs as an alternative to diseases for studying the molecular basis of human pathologies.
Collapse
Affiliation(s)
- Sara González-Pérez
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Mónica Chagoyen
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain.
| |
Collapse
|
18
|
Chen Y, Xu R. Context-sensitive network-based disease genetics prediction and its implications in drug discovery. Bioinformatics 2017; 33:1031-1039. [PMID: 28062449 DOI: 10.1093/bioinformatics/btw737] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 11/19/2016] [Indexed: 01/05/2023] Open
Abstract
Motivation Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach ( p<e-22 ). The area under the receiver operating characteristic curve for the CSN approach was also significantly higher than the SBN approach (0.91 versus 0.87, p<e-3 ). In addition, we predicted genes for Parkinson's disease using CSNs, and demonstrated that the top-ranked genes are highly relevant to PD pathologenesis. We pin-pointed a top-ranked drug target gene for PD, and found its association with neurodegeneration supported by literature. In summary, CSNs lead to significantly improve the disease genetics prediction comparing with SBNs and provide leads for potential drug targets. Availability and Implementation nlp.case.edu/public/data/. Contact rxx@case.edu.
Collapse
|
19
|
Gao Z, Chen Y, Cai X, Xu R. Predict drug permeability to blood-brain-barrier from clinical phenotypes: drug side effects and drug indications. Bioinformatics 2017; 33:901-908. [PMID: 27993785 PMCID: PMC5860495 DOI: 10.1093/bioinformatics/btw713] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 10/16/2016] [Accepted: 11/19/2016] [Indexed: 12/25/2022] Open
Abstract
Motivation Blood-Brain-Barrier (BBB) is a rigorous permeability barrier for maintaining homeostasis of Central Nervous System (CNS). Determination of compound's permeability to BBB is prerequisite in CNS drug discovery. Existing computational methods usually predict drug BBB permeability from chemical structure and they generally apply to small compounds passing BBB through passive diffusion. As abundant information on drug side effects and indications has been recorded over time through extensive clinical usage, we aim to explore BBB permeability prediction from a new angle and introduce a novel approach to predict BBB permeability from drug clinical phenotypes (drug side effects and drug indications). This method can apply to both small compounds and macro-molecules penetrating BBB through various mechanisms besides passive diffusion. Results We composed a training dataset of 213 drugs with known brain and blood steady-state concentrations ratio and extracted their side effects and indications as features. Next, we trained SVM models with polynomial kernel and obtained accuracy of 76.0%, AUC 0.739, and F 1 score (macro weighted) 0.760 with Monte Carlo cross validation. The independent test accuracy was 68.3%, AUC 0.692, F 1 score 0.676. When both chemical features and clinical phenotypes were available, combining the two types of features achieved significantly better performance than chemical feature based approach (accuracy 85.5% versus 72.9%, AUC 0.854 versus 0.733, F 1 score 0.854 versus 0.725; P < e -90 ). We also conducted de novo prediction and identified 110 drugs in SIDER database having the potential to penetrate BBB, which could serve as start point for CNS drug repositioning research. Availability and Implementation https://github.com/bioinformatics-gao/CASE-BBB-prediction-Data. Contact rxx@case.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Gao
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Yang Chen
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Xiaoshu Cai
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
20
|
Liu H, Song Y, Guan J, Luo L, Zhuang Z. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinformatics 2016; 17:539. [PMID: 28155639 PMCID: PMC5259862 DOI: 10.1186/s12859-016-1336-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background Since traditional drug research and development is often time-consuming and high-risk, there is an increasing interest in establishing new medical indications for approved drugs, referred to as drug repositioning, which provides a relatively low-cost and high-efficiency approach for drug discovery. With the explosive growth of large-scale biochemical and phenotypic data, drug repositioning holds great potential for precision medicine in the post-genomic era. It is urgent to develop rational and systematic approaches to predict new indications for approved drugs on a large scale. Results In this paper, we propose the two-pass random walks with restart on a heterogenous network, TP-NRWRH for short, to predict new indications for approved drugs. Rather than random walk on bipartite network, we integrated the drug-drug similarity network, disease-disease similarity network and known drug-disease association network into one heterogenous network, on which the two-pass random walks with restart is implemented. We have conducted performance evaluation on two datasets of drug-disease associations, and the results show that our method has higher performance than six existing methods. A case study on the Alzheimer’s disease showed that nine of top 10 predicted drugs have been approved or investigational for neurodegenerative diseases. The experimental results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs. Conclusions We proposed a two-pass random walk with restart on the drug-disease heterogeneous network, referred to as TP-NRWRH, to predict new indications for approved drugs. Performance evaluation on two independent datasets showed that TP-NRWRH achieved higher performance than six existing methods on 10-fold cross validations. The case study on the Alzheimer’s disease showed that nine of top 10 predicted drugs have been approved or are investigational for neurodegenerative diseases. The results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs.
Collapse
Affiliation(s)
- Hui Liu
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.,Changzhou University, Jiangsu, 213164, China
| | - Yinglong Song
- Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
| | - Libo Luo
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.
| | - Ziheng Zhuang
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China. .,Changzhou University, Jiangsu, 213164, China.
| |
Collapse
|
21
|
Chen Y, Xu R. Drug repurposing for glioblastoma based on molecular subtypes. J Biomed Inform 2016; 64:131-138. [PMID: 27697594 PMCID: PMC6146394 DOI: 10.1016/j.jbi.2016.09.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 08/23/2016] [Accepted: 09/27/2016] [Indexed: 01/12/2023]
Abstract
A recent multi-platform analysis by The Cancer Genome Atlas identified four distinct molecular subtypes for glioblastoma (GBM) and demonstrated that the subtypes correlate with clinical phenotypes and treatment responses. In this study, we developed a computational drug repurposing approach to predict GBM drugs based on the molecular subtypes. Our approach leverages the genomic signature for each GBM subtype, and integrates the human cancer genomics with mouse phenotype data to identify the opportunity of reusing the FDA-approved agents to treat specific GBM subtypes. Specifically, we first constructed the phenotype profile for each GBM subtype using their genomic signatures. For each approved drug, we also constructed a phenotype profile using the drug target genes. Then we developed an algorithm to match and prioritize drugs based on their phenotypic similarities to the GBM subtypes. Our approach is highly generalizable for other disorders if provided with a list of disorder-specific genes. We first evaluated the approach in predicting drugs for the whole GBM. For a combined set of approved, potential and off-label GBM drugs, we achieved a median rank of 9.3%, which is significantly higher (p
Collapse
Affiliation(s)
- Yang Chen
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, United States.
| |
Collapse
|
22
|
Ni J, Koyuturk M, Tong H, Haines J, Xu R, Zhang X. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model. BMC Bioinformatics 2016; 17:453. [PMID: 27829360 PMCID: PMC5103411 DOI: 10.1186/s12859-016-1317-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2016] [Accepted: 10/29/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. RESULTS In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. CONCLUSIONS In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/ .
Collapse
Affiliation(s)
- Jingchao Ni
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Mehmet Koyuturk
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Hanghang Tong
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, 699 S. Mill Ave., Tempe, 85281, AZ, USA
| | - Jonathan Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Xiang Zhang
- College of Information Sciences and Technology, Pennsylvania State University, 332 Information Sciences and Technology Building, University Park, 16802, PA, USA.
| |
Collapse
|
23
|
Chen Y, Xu R. Phenome-based gene discovery provides information about Parkinson's disease drug targets. BMC Genomics 2016; 17 Suppl 5:493. [PMID: 27586503 PMCID: PMC5009520 DOI: 10.1186/s12864-016-2820-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Parkinson disease (PD) is a severe neurodegenerative disease without curative drugs. The highly complex and heterogeneous disease mechanisms are still unclear. Detecting novel PD associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for drugs. METHODS We propose a phenome-based gene prediction strategy to identify disease-associated genes for PD. We integrated multiple disease phenotype networks, a gene functional relationship network, and known PD genes to predict novel candidate genes. Then we investigated the translational potential of the predicted genes in drug discovery. RESULTS In a cross validation analysis, the average rank for 15 known PD genes is within top 0.8 %. We also tested the algorithm with an independent validation set of 669 PD-associated genes detected by genome-wide association studies. The top ranked genes predicted by our approach are enriched for these validation genes. In addition, our approach prioritized the target genes for FDA-approved PD drugs and the drugs that have been tested for PD in clinical trials. Pathway analysis shows that the prioritized drug target genes are closely associated with PD pathogenesis. The result provides empirical evidence that our computational gene prediction approach identifies novel candidate genes for PD, and has the potential to lead to rapid drug discovery.
Collapse
Affiliation(s)
- Yang Chen
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
24
|
Abstract
BACKGROUND Alzheimer's disease (AD) is complex, with genetic, epigenetic, and environmental factors contributing to disease susceptibility and progression. While significant progress has been made in understanding genetic, molecular, behavioral, and neurological aspects of AD, relatively little is known about which environmental factors are important in AD etiology and how they interact with genetic factors in the development of AD. Here, we propose a data-driven, hypotheses-free computational approach to characterize which and how human gut microbial metabolites, an important modifiable environmental factor, may contribute to various aspects of AD. MATERIALS AND METHODS We integrated vast amounts of complex and heterogeneous biomedical data, including disease genetics, chemical genetics, human microbial metabolites, protein-protein interactions, and genetic pathways. We developed a novel network-based approach to model the genetic interactions between all human microbial metabolites and genetic diseases. We identified metabolites that share significant genetic commonality with AD in humans. We developed signal prioritization algorithms to identify the co-regulated genetic pathways underlying the identified AD-metabolite (brain-gut) connections. RESULTS We validated our algorithms using known microbial metabolite-AD associations, namely AD-3,4-dihydroxybenzeneacetic acid, AD-mannitol, and AD-succinic acid. Our study provides supporting evidence that human gut microbial metabolites may be an important mechanistic link between environmental exposure and various aspects of AD. We identified metabolites that are significantly associated with various aspects in AD, including AD susceptibility, cognitive decline, biomarkers, age of onset, and the onset of AD. We identified common genetic pathways underlying AD biomarkers and its top one ranked metabolite trimethylamine N-oxide (TMAO), a gut microbial metabolite of dietary meat and fat. These coregulated pathways between TMAO-AD may provide insights into the mechanisms of how dietary meat and fat contribute to AD. CONCLUSIONS Employing an integrated computational approach, we provide intriguing and supporting evidence for a role of microbial metabolites, an important modifiable environmental factor, in AD etiology. Our study provides the foundations for subsequent hypothesis-driven biological and clinical studies of brain-gut-environment interactions in AD.
Collapse
Affiliation(s)
- Rong Xu
- Department of Epidemiology and Biostatistics, Institute of Computational Biology, School of Medicine, Case Western Reserve University, 2103 Cornell Road, Cleveland, 44106 USA
| | | |
Collapse
|
25
|
Chen Y, Gao Z, Wang B, Xu R. Towards precision medicine-based therapies for glioblastoma: interrogating human disease genomics and mouse phenotypes. BMC Genomics 2016; 17 Suppl 7:516. [PMID: 27557118 PMCID: PMC5001238 DOI: 10.1186/s12864-016-2908-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Glioblastoma (GBM) is the most common and aggressive brain tumors. It has poor prognosis even with optimal radio- and chemo-therapies. Since GBM is highly heterogeneous, drugs that target on specific molecular profiles of individual tumors may achieve maximized efficacy. Currently, the Cancer Genome Atlas (TCGA) projects have identified hundreds of GBM-associated genes. We develop a drug repositioning approach combining disease genomics and mouse phenotype data towards predicting targeted therapies for GBM. METHODS We first identified disease specific mouse phenotypes using the most recently discovered GBM genes. Then we systematically searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with GBM. We evaluated the ranks for approved and novel GBM drugs, and compared with an existing approach, which also use the mouse phenotype data but not the disease genomics data. RESULTS We achieved significantly higher ranks for the approved and novel GBM drugs than the earlier approach. For all positive examples of GBM drugs, we achieved a median rank of 9.2 45.6 of the top predictions have been demonstrated effective in inhibiting the growth of human GBM cells. CONCLUSION We developed a computational drug repositioning approach based on both genomic and phenotypic data. Our approach prioritized existing GBM drugs and outperformed a recent approach. Overall, our approach shows potential in discovering new targeted therapies for GBM.
Collapse
Affiliation(s)
- Yang Chen
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA
| | - Zhen Gao
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA
| | - Bingcheng Wang
- Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA.
| |
Collapse
|
26
|
Xu R, Wang Q. A genomics-based systems approach towards drug repositioning for rheumatoid arthritis. BMC Genomics 2016; 17 Suppl 7:518. [PMID: 27557330 PMCID: PMC5001200 DOI: 10.1186/s12864-016-2910-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by inflammation and destruction of synovial joints. RA affects up to 1 % of the population worldwide. Currently, there are no drugs that can cure RA or achieve sustained remission. The unknown cause of the disease represents a significant challenge in the drug development. In this study, we address this challenge by proposing an alternative drug discovery approach that integrates and reasons over genetic interrelationships between RA and other genetic diseases as well as a large amount of higher-level drug treatment data. We first constructed a genetic disease network using disease genetics data from Genome-Wide Association Studies (GWAS). We developed a network-based ranking algorithm to prioritize diseases genetically-related to RA (RA-related diseases). We then developed a drug prioritization algorithm to reposition drugs from RA-related diseases to treat RA. Results Our algorithm found 74 of the 80 FDA-approved RA drugs and ranked them highly (recall: 0.925, median ranking: 8.93 %), demonstrating the validity of our strategy. When compared to a study that used GWAS data to directly connect RA-associated genes to drug targets (“direct genetics-based” approach), our algorithm (“indirect genetics-based”) achieved a comparable overall performance, but complementary precision and recall in retrospective validation (precision: 0.22, recall: 0.36; F1: 0.27 vs. precision: 0.74, recall: 0.16; F1: 0.28). Our approach performed significantly better in novel predictions when evaluated using 165 not-yet-FDA-approved RA drugs (precision: 0.46, recall: 0.50; F1: 0.47 vs. precision: 0.40, recall: 0.006; F1: 0.01). Conclusions In summary, although the fundamental pathophysiological mechanisms remain uncharacterized, our proposed computation-based drug discovery approach to analyzing genetic and treatment interrelationships among thousands of diseases and drugs can facilitate the discovery of innovative drugs for treating RA.
Collapse
Affiliation(s)
- Rong Xu
- Department of Epidemiology and Biostatistics, Institute of Computational Biology, School of Medicine, Case Western Reserve University, 2103 Cornell Road, Cleveland, 44106, OH, USA.
| | | |
Collapse
|
27
|
Cai X, Chen Y, Gao Z, Xu R. Explore Small Molecule-induced Genome-wide Transcriptional Profiles for Novel Inflammatory Bowel Disease Drug. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:22-31. [PMID: 27570643 PMCID: PMC5001780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inflammatory Bowel Disease (IBD) is a chronic and relapsing disorder, which affects millions people worldwide. Current drug options cannot cure the disease and may cause severe side effects. We developed a systematic framework to identify novel IBD drugs exploiting millions of genomic signatures for chemical compounds. Specifically, we searched all FDA-approved drugs for candidates that share similar genomic profiles with IBD. In the evaluation experiments, our approach ranked approved IBD drugs averagely within top 26% among 858 candidates, significantly outperforming a state-of-art genomics-based drug repositioning method (p-value < e-8). Our approach also achieved significantly higher average precision than the state-of-art approach in predicting potential IBD drugs from clinical trials (0.072 vs. 0.043, p<0.1) and off-label IBD drugs (0.198 vs. 0.138, p<0.1). Furthermore, we found evidences supporting the therapeutic potential of the top-ranked drugs, such as Naloxone, in literature and through analyzing target genes and pathways.
Collapse
Affiliation(s)
- Xiaoshu Cai
- Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - Yang Chen
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Zhen Gao
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rong Xu
- Department of Epidemiology & Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
28
|
Chen Y, Cai X, Xu R. Combining Human Disease Genetics and Mouse Model Phenotypes towards Drug Repositioning for Parkinson's disease. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1851-60. [PMID: 26958284 PMCID: PMC4765695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Parkinson's disease (PD) is a severe neurodegenerative disorder without effective treatments. Here, we present a novel drug repositioning approach to predict new drugs for PD leveraging both disease genetics and large amounts of mouse model phenotypes. First, we identified PD-specific mouse phenotypes using well-studied human disease genes. Then we searched all FDA-approved drugs for candidates that share similar mouse phenotype profiles with PD. We demonstrated the validity of our approach using drugs that have been approved for PD: 10 approved PD drugs were ranked within top 10% among 1197 candidates. In predicting novel PD drugs, our approach achieved a mean average precision of 0.24, which is significantly higher (p
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - Xiaoshu Cai
- Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
29
|
Wang Q, Xu R. DenguePredict: An Integrated Drug Repositioning Approach towards Drug Discovery for Dengue. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1279-88. [PMID: 26958268 PMCID: PMC4765554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Dengue is a viral disease of expanding global incidence without cures. Here we present a drug repositioning system (DenguePredict) leveraging upon a unique drug treatment database and vast amounts of disease- and drug-related data. We first constructed a large-scale genetic disease network with enriched dengue genetics data curated from biomedical literature. We applied a network-based ranking algorithm to find dengue-related diseases from the disease network. We then developed a novel algorithm to prioritize FDA-approved drugs from dengue-related diseases to treat dengue. When tested in a de-novo validation setting, DenguePredict found the only two drugs tested in clinical trials for treating dengue and ranked them highly: chloroquine ranked at top 0.96% and ivermectin at top 22.75%. We showed that drugs targeting immune systems and arachidonic acid metabolism-related apoptotic pathways might represent innovative drugs to treat dengue. In summary, DenguePredict, by combining comprehensive disease- and drug-related data and novel algorithms, may greatly facilitate drug discovery for dengue.
Collapse
Affiliation(s)
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Institute of Computational Biology, School of Medicine, Case Western Reserve University, Cleveland, OH
| |
Collapse
|