101
|
Ratnakumar A, Weinhold N, Mar JC, Riaz N. Protein-Protein interactions uncover candidate 'core genes' within omnigenic disease networks. PLoS Genet 2020; 16:e1008903. [PMID: 32678846 PMCID: PMC7390454 DOI: 10.1371/journal.pgen.1008903] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 07/29/2020] [Accepted: 06/01/2020] [Indexed: 01/09/2023] Open
Abstract
Genome wide association studies (GWAS) of human diseases have generally identified many loci associated with risk with relatively small effect sizes. The omnigenic model attempts to explain this observation by suggesting that diseases can be thought of as networks, where genes with direct involvement in disease-relevant biological pathways are named ‘core genes’, while peripheral genes influence disease risk via their interactions or regulatory effects on core genes. Here, we demonstrate a method for identifying candidate core genes solely from genes in or near disease-associated SNPs (GWAS hits) in conjunction with protein-protein interaction network data. Applied to 1,381 GWAS studies from 5 ancestries, we identify a total of 1,865 candidate core genes in 343 GWAS studies. Our analysis identifies several well-known disease-related genes that are not identified by GWAS, including BRCA1 in Breast Cancer, Amyloid Precursor Protein (APP) in Alzheimer’s Disease, INS in A1C measurement and Type 2 Diabetes, and PCSK9 in LDL cholesterol, amongst others. Notably candidate core genes are preferentially enriched for disease relevance over GWAS hits and are enriched for both Clinvar pathogenic variants and known drug targets—consistent with the predictions of the omnigenic model. We subsequently use parent term annotations provided by the GWAS catalog, to merge related GWAS studies and identify candidate core genes in over-arching disease processes such as cancer–where we identify 109 candidate core genes. A recent theory suggests that only a small number of genes underpin the biology of a disease, these genes are called ‘core genes’, and for most diseases, these core genes remain unknown. The suggested methods for finding them requires complex and expensive experiments. We reasoned that if we merge currently available datasets in smart ways, we may be able to uncover these ‘core genes’. Our method finds “hub” proteins by merging lists of genes previously linked with disease to information on how proteins interact with each other. We found that many of these hub proteins have central roles in disease, such as insulin for both A1C measurement and Type 2 Diabetes, BRCA1 in Breast cancer, and Amyloid Precursor Protein in Alzheimer’s Disease. We think these ‘hub’ proteins are candidate ‘core genes’, and offer our method as a way to find ‘core genes’ by utilizing publicly available reference datasets.
Collapse
Affiliation(s)
- Abhirami Ratnakumar
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
- * E-mail:
| | - Nils Weinhold
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| | - Jessica C. Mar
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Australia
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| |
Collapse
|
102
|
Shen C, Luo J, Lai Z, Ding P. Multiview Joint Learning-Based Method for Identifying Small-Molecule-Associated MiRNAs by Integrating Pharmacological, Genomics, and Network Knowledge. J Chem Inf Model 2020; 60:4085-4097. [DOI: 10.1021/acs.jcim.0c00244] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Zihan Lai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang 421001, China
| |
Collapse
|
103
|
Sarkar D, Maranas CD. SNPeffect: identifying functional roles of SNPs using metabolic networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:512-531. [PMID: 32167625 PMCID: PMC9328443 DOI: 10.1111/tpj.14746] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 02/20/2020] [Indexed: 05/04/2023]
Abstract
Genetic sources of phenotypic variation have been a focus of plant studies aimed at improving agricultural yield and understanding adaptive processes. Genome-wide association studies identify the genetic background behind a trait by examining associations between phenotypes and single-nucleotide polymorphisms (SNPs). Although such studies are common, biological interpretation of the results remains a challenge; especially due to the confounding nature of population structure and the systematic biases thus introduced. Here, we propose a complementary analysis (SNPeffect) that offers putative genotype-to-phenotype mechanistic interpretations by integrating biochemical knowledge encoded in metabolic models. SNPeffect is used to explain differential growth rate and metabolite accumulation in A. thaliana and P. trichocarpa accessions as the outcome of SNPs in enzyme-coding genes. To this end, we also constructed a genome-scale metabolic model for Populus trichocarpa, the first for a perennial woody tree. As expected, our results indicate that growth is a complex polygenic trait governed by carbon and energy partitioning. The predicted set of functional SNPs in both species are associated with experimentally characterized growth-determining genes and also suggest putative ones. Functional SNPs were found in pathways such as amino acid metabolism, nucleotide biosynthesis, and cellulose and lignin biosynthesis, in line with breeding strategies that target pathways governing carbon and energy partition.
Collapse
Affiliation(s)
- Debolina Sarkar
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| | - Costas D. Maranas
- Department of Chemical EngineeringPennsylvania State UniversityUniversity ParkPAUSA
| |
Collapse
|
104
|
Lu Y, Li Y, Li G, Lu H. Identification of potential markers for type 2 diabetes mellitus via bioinformatics analysis. Mol Med Rep 2020; 22:1868-1882. [PMID: 32705173 PMCID: PMC7411335 DOI: 10.3892/mmr.2020.11281] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Accepted: 01/20/2020] [Indexed: 12/15/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) is a multifactorial and multigenetic disease, and its pathogenesis is complex and largely unknown. In the present study, microarray data (GSE201966) of β-cell enriched tissue obtained by laser capture microdissection were downloaded, including 10 control and 10 type 2 diabetic subjects. A comprehensive bioinformatics analysis of microarray data in the context of protein-protein interaction (PPI) networks was employed, combined with subcellular location information to mine the potential candidate genes for T2DM and provide further insight on the possible mechanisms involved. First, differential analysis screened 108 differentially expressed genes. Then, 83 candidate genes were identified in the layered network in the context of PPI via network analysis, which were either directly or indirectly linked to T2DM. Of those genes obtained through literature retrieval analysis, 27 of 83 were involved with the development of T2DM; however, the rest of the 56 genes need to be verified by experiments. The functional analysis of candidate genes involved in a number of biological activities, demonstrated that 46 upregulated candidate genes were involved in ‘inflammatory response’ and ‘lipid metabolic process’, and 37 downregulated candidate genes were involved in ‘positive regulation of cell death’ and ‘positive regulation of cell proliferation’. These candidate genes were also involved in different signaling pathways associated with ‘PI3K/Akt signaling pathway’, ‘Rap1 signaling pathway’, ‘Ras signaling pathway’ and ‘MAPK signaling pathway’, which are highly associated with the development of T2DM. Furthermore, a microRNA (miR)-target gene regulatory network and a transcription factor-target gene regulatory network were constructed based on miRNet and NetworkAnalyst databases, respectively. Notably, hsa-miR-192-5p, hsa-miR-124-5p and hsa-miR-335-5p appeared to be involved in T2DM by potentially regulating the expression of various candidate genes, including procollagen C-endopeptidase enhancer 2, connective tissue growth factor and family with sequence similarity 105, member A, protein phosphatase 1 regulatory inhibitor subunit 1 A and C-C motif chemokine receptor 4. Smad5 and Bcl6, as transcription factors, are regulated by ankyrin repeat domain 23 and transmembrane protein 37, respectively, which might also be used in the molecular diagnosis and targeted therapy of T2DM. Taken together, the results of the present study may offer insight for future genomic-based individualized treatment of T2DM and help determine the underlying molecular mechanisms that lead to T2DM.
Collapse
Affiliation(s)
- Yana Lu
- Key Laboratory of Dai and Southern Medicine of Xishuangbanna Dai Autonomous Prefecture, Yunnan Branch, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Jinghong, Yunnan 666100, P.R. China
| | - Yihang Li
- Key Laboratory of Dai and Southern Medicine of Xishuangbanna Dai Autonomous Prefecture, Yunnan Branch, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Jinghong, Yunnan 666100, P.R. China
| | - Guang Li
- Key Laboratory of Dai and Southern Medicine of Xishuangbanna Dai Autonomous Prefecture, Yunnan Branch, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Jinghong, Yunnan 666100, P.R. China
| | - Haitao Lu
- Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| |
Collapse
|
105
|
Hristov BH, Chazelle B, Singh M. uKIN Combines New and Prior Information with Guided Network Propagation to Accurately Identify Disease Genes. Cell Syst 2020; 10:470-479.e3. [PMID: 32684276 PMCID: PMC7821437 DOI: 10.1016/j.cels.2020.05.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/24/2020] [Accepted: 05/19/2020] [Indexed: 12/23/2022]
Abstract
Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
106
|
Ruan P, Wang Y, Shen R, Wang S. Using association signal annotations to boost similarity network fusion. Bioinformatics 2020; 35:3718-3726. [PMID: 30863842 DOI: 10.1093/bioinformatics/btz124] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 01/17/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. RESULTS In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. AVAILABILITY AND IMPLEMENTATION The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peifeng Ruan
- Department of Statistics, Columbian College of Arts and Sciences, The George Washington University, Washington, DC, USA
| | - Ya Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
107
|
Chubar V, Van Leeuwen K, Bijttebier P, Van Assche E, Bosmans G, Van den Noortgate W, van Winkel R, Goossens L, Claes S. Gene-environment interaction: New insights into perceived parenting and social anxiety among adolescents. Eur Psychiatry 2020; 63:e64. [PMID: 32507125 PMCID: PMC7355173 DOI: 10.1192/j.eurpsy.2020.62] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background. Social anxiety symptoms (SAS) are among the most common mental health problems during adolescence, and it has been shown that parenting influences the adolescent’s level of social anxiety. In addition, it is now widely assumed that most mental health problems, including social anxiety, originate from a complex interplay between genes and environment. However, to date, gene–environment (G × E) interactions studies in the field of social anxiety remain limited. In this study, we have examined how 274 genes involved in different neurotransmission pathways interact with five aspects of perceived parenting as environmental exposure (i.e., support, proactive control, psychological control, punitive control, and harsh punitive control) to affect SAS during adolescence. Methods. We have applied an analytical technique that allows studying genetic information at the gene level, by aggregating data from multiple single-nucleotide-polymorphisms within the same gene and by taking into account the linkage disequilibrium structure of the gene. All participants were part of the STRATEGIES cohort of 948 Flemish adolescents (mean age = 13.7), a population-based study on the development of problem behaviors in adolescence. Relevant genes were preselected based on prior findings and neurotransmitter-related functional protein networks. Results. The results suggest that genes involved in glutamate (SLC1A1), glutathione neurotransmission (GSTZ1), and oxidative stress (CALCRL), in association with harsh punitive parenting, may contribute to social anxiety in adolescence. Isolated polymorphisms in these genes have been related to anxiety and related disorders in earlier work.Conclusions: Taken together, these findings provide new insights into possible biological pathways and environmental risk factors involved in the etiology of social anxiety symptoms’ development. Conclusions. Taken together, these findings provide new insights into possible biological pathways and environmental risk factors involved in the etiology of social anxiety symptoms’ development.
Collapse
Affiliation(s)
- Viktoria Chubar
- Mind-Body Research Group, Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Karla Van Leeuwen
- Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Patricia Bijttebier
- School Psychology and Development in Context, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Evelien Van Assche
- Mind-Body Research Group, Department of Neuroscience, KU Leuven, Leuven, Belgium.,University Psychiatric Center KU Leuven, Leuven, Belgium
| | - Guy Bosmans
- Clinical Psychology, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Wim Van den Noortgate
- Department of Methodology of Educational Sciences, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Ruud van Winkel
- University Psychiatric Center KU Leuven, Leuven, Belgium.,Center for Contextual Psychiatry, Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Luc Goossens
- School Psychology and Child and Adolescent Development Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Stephan Claes
- Mind-Body Research Group, Department of Neuroscience, KU Leuven, Leuven, Belgium.,University Psychiatric Center KU Leuven, Leuven, Belgium
| |
Collapse
|
108
|
Dai Y, Ma W, Zhang T, Yang J, Zang C, Liu K, Wang X, Wang J, Wu Z, Zhang X, Li C, Li J, Wang X, Guo J, Li L. Long Noncoding RNA Expression Profiling During the Neuronal Differentiation of Glial Precursor Cells from Rat Dorsal Root Ganglia. BIOTECHNOL BIOPROC E 2020. [DOI: 10.1007/s12257-019-0317-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
109
|
Seth S, Debnath S, Chakraborty N. In silico analysis of functional linkage among arsenic induced MATE genes in rice. BIOTECHNOLOGY REPORTS (AMSTERDAM, NETHERLANDS) 2020; 26:e00390. [PMID: 32435604 PMCID: PMC7231838 DOI: 10.1016/j.btre.2019.e00390] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/21/2019] [Accepted: 10/21/2019] [Indexed: 10/27/2022]
Abstract
MATE genes play an important role in cellular detoxification processes. Nine MATE genes were identified by a transcriptomics study previously. Candidate gene prioritization was done where 29 new genes were found to interact with 09 guide genes. Therefore, a total of 38 genes were analyzed here to predict a concise model by gene prioritization study. Those genes were analyzed further in Rice Interactions Viewer programme, and based on high ICV, 10 new genes were found to interact among themselves at protein level. Surprisingly, only 05 genes were found to play a key role at protein level. These 15 genes were analyzed for their interaction with soil available inorganic arsenic species. Maximum expression levels were found mostly at young inflorescence and seed development stage for those genes. So, these genes may have a direct role in arsenic sequestration from cells and thereby providing safety to the developing embryo within the seed.
Collapse
Affiliation(s)
- Snigdhamayee Seth
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| | - Sandip Debnath
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| | - N.R. Chakraborty
- Department of Genetics & Plant Breeding, Palli Siksha Bhavana (Institute of Agriculture), Visva-Bharati, Sriniketan, 731236, India
| |
Collapse
|
110
|
Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2020; 47:D573-D580. [PMID: 30418591 PMCID: PMC6323914 DOI: 10.1093/nar/gky1126] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022] Open
Abstract
Human gene networks have proven useful in many aspects of disease research, with numerous network-based strategies developed for generating hypotheses about gene-disease-drug associations. The ability to predict and organize genes most relevant to a specific disease has proven especially important. We previously developed a human functional gene network, HumanNet, by integrating diverse types of omics data using Bayesian statistics framework and demonstrated its ability to retrieve disease genes. Here, we present HumanNet v2 (http://www.inetbio.org/humannet), a database of human gene networks, which was updated by incorporating new data types, extending data sources and improving network inference algorithms. HumanNet now comprises a hierarchy of human gene networks, allowing for more flexible incorporation of network information into studies. HumanNet performs well in ranking disease-linked gene sets with minimal literature-dependent biases. We observe that incorporating model organisms’ protein–protein interactions does not markedly improve disease gene predictions, suggesting that many of the disease gene associations are now captured directly in human-derived datasets. With an improved interactive user interface for disease network analysis, we expect HumanNet will be a useful resource for network medicine.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Eiru Kim
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Traver Hart
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
111
|
G2G: A web-server for the prediction of human synthetic lethal interactions. Comput Struct Biotechnol J 2020; 18:1028-1031. [PMID: 32419903 PMCID: PMC7215103 DOI: 10.1016/j.csbj.2020.04.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/18/2020] [Accepted: 04/19/2020] [Indexed: 12/04/2022] Open
Abstract
Genetic interactions (GIs) are fundamental to our understanding of biological processes in the cell. While GIs have been systematically mapped in yeast, there is scarce information about them in humans. Recently, we have suggested a state-of-the-art hierarchical method that leverages gene ontology information for predicting GIs in yeast. Here, we adapt this method and apply it for the first time to predict GIs in human. We introduce a web service called G2G for this task that is available at http://bnet.cs.tau.ac.il/g2g/.
Collapse
|
112
|
Fonseca PAS, Suárez-Vega A, Cánovas A. Weighted Gene Correlation Network Meta-Analysis Reveals Functional Candidate Genes Associated with High- and Sub-Fertile Reproductive Performance in Beef Cattle. Genes (Basel) 2020; 11:E543. [PMID: 32408659 PMCID: PMC7290847 DOI: 10.3390/genes11050543] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 05/04/2020] [Accepted: 05/06/2020] [Indexed: 12/13/2022] Open
Abstract
Improved reproductive efficiency could lead to economic benefits for the beef industry, once the intensive selection pressure has led to a decreased fertility. However, several factors limit our understanding of fertility traits, including genetic differences between populations and statistical limitations. In the present study, the RNA-sequencing data from uterine samples of high-fertile (HF) and sub-fertile (SF) animals was integrated using co-expression network meta-analysis, weighted gene correlation network analysis, identification of upstream regulators, variant calling, and network topology approaches. Using this pipeline, top hub-genes harboring fixed variants (HF × SF) were identified in differentially co-expressed gene modules (DcoExp). The functional prioritization analysis identified the genes with highest potential to be key-regulators of the DcoExp modules between HF and SF animals. Consequently, 32 functional candidate genes (10 upstream regulators and 22 top hub-genes of DcoExp modules) were identified. These genes were associated with the regulation of relevant biological processes for fertility, such as embryonic development, germ cell proliferation, and ovarian hormone regulation. Additionally, 100 candidate variants (single nucleotide polymorphisms (SNPs) and insertions and deletions (INDELs)) were identified within those genes. In the long-term, the results obtained here may help to reduce the frequency of subfertility in beef herds, reducing the associated economic losses caused by this condition.
Collapse
Affiliation(s)
- Pablo A. S. Fonseca
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada;
| | | | - Angela Cánovas
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON N1G 2W1, Canada;
| |
Collapse
|
113
|
Ku AA, Hu HM, Zhao X, Shah KN, Kongara S, Wu D, McCormick F, Balmain A, Bandyopadhyay S. Integration of multiple biological contexts reveals principles of synthetic lethality that affect reproducibility. Nat Commun 2020; 11:2375. [PMID: 32398776 PMCID: PMC7217969 DOI: 10.1038/s41467-020-16078-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 04/08/2020] [Indexed: 12/30/2022] Open
Abstract
Synthetic lethal screens have the potential to identify new vulnerabilities incurred by specific cancer mutations but have been hindered by lack of agreement between studies. In the case of KRAS, we identify that published synthetic lethal screen hits significantly overlap at the pathway rather than gene level. Analysis of pathways encoded as protein networks could identify synthetic lethal candidates that are more reproducible than those previously reported. Lack of overlap likely stems from biological rather than technical limitations as most synthetic lethal phenotypes are strongly modulated by changes in cellular conditions or genetic context, the latter determined using a pairwise genetic interaction map that identifies numerous interactions that suppress synthetic lethal effects. Accounting for pathway, cellular and genetic context nominates a DNA repair dependency in KRAS-mutant cells, mediated by a network containing BRCA1. We provide evidence for why most reported synthetic lethals are not reproducible which is addressable using a multi-faceted testing framework.
Collapse
Affiliation(s)
- Angel A Ku
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Hsien-Ming Hu
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Xin Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Khyati N Shah
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Sameera Kongara
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Di Wu
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Frank McCormick
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94158, USA
| | - Sourav Bandyopadhyay
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, 94158, USA.
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94158, USA.
| |
Collapse
|
114
|
Laurent JM, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM. Humanization of yeast genes with multiple human orthologs reveals functional divergence between paralogs. PLoS Biol 2020; 18:e3000627. [PMID: 32421706 PMCID: PMC7259792 DOI: 10.1371/journal.pbio.3000627] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 05/29/2020] [Accepted: 04/14/2020] [Indexed: 01/17/2023] Open
Abstract
Despite over a billion years of evolutionary divergence, several thousand human genes possess clearly identifiable orthologs in yeast, and many have undergone lineage-specific duplications in one or both lineages. These duplicated genes may have been free to diverge in function since their expansion, and it is unclear how or at what rate ancestral functions are retained or partitioned among co-orthologs between species and within gene families. Thus, in order to investigate how ancestral functions are retained or lost post-duplication, we systematically replaced hundreds of essential yeast genes with their human orthologs from gene families that have undergone lineage-specific duplications, including those with single duplications (1 yeast gene to 2 human genes, 1:2) or higher-order expansions (1:>2) in the human lineage. We observe a variable pattern of replaceability across different ortholog classes, with an obvious trend toward differential replaceability inside gene families, and rarely observe replaceability by all members of a family. We quantify the ability of various properties of the orthologs to predict replaceability, showing that in the case of 1:2 orthologs, replaceability is predicted largely by the divergence and tissue-specific expression of the human co-orthologs, i.e., the human proteins that are less diverged from their yeast counterpart and more ubiquitously expressed across human tissues more often replace their single yeast ortholog. These trends were consistent with in silico simulations demonstrating that when only one ortholog can replace its corresponding yeast equivalent, it tends to be the least diverged of the pair. Replaceability of yeast genes having more than 2 human co-orthologs was marked by retention of orthologous interactions in functional or protein networks as well as by more ancestral subcellular localization. Overall, we performed >400 human gene replaceability assays, revealing 50 new human-yeast complementation pairs, thus opening up avenues to further functionally characterize these human genes in a simplified organismal context.
Collapse
Affiliation(s)
- Jon M. Laurent
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Systems Genetics, NYU Langone Health, New York, New York, United States of America
| | - Riddhiman K. Garge
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| | - Ashley I. Teufel
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| | - Claus O. Wilke
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Aashiq H. Kachroo
- The Department of Biology, Centre for Applied Synthetic Biology, Concordia University, Montreal, Quebec, Canada
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
115
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
116
|
Wei PJ, Wu FX, Xia J, Su Y, Wang J, Zheng CH. Prioritizing Cancer Genes Based on an Improved Random Walk Method. Front Genet 2020; 11:377. [PMID: 32411180 PMCID: PMC7198854 DOI: 10.3389/fgene.2020.00377] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/26/2020] [Indexed: 12/18/2022] Open
Abstract
Identifying driver genes that contribute to cancer progression from numerous passenger genes, although a central goal, is a major challenge. The protein-protein interaction network provides convenient and reasonable assistance for driver gene discovery. Random walk-based methods have been widely used to prioritize nodes in social or biological networks. However, most studies select the next arriving node uniformly from the random walker's neighbors. Few consider transiting preference according to the degree of random walker's neighbors. In this study, based on the random walk method, we propose a novel approach named Driver_IRW (Driver genes discovery with Improved Random Walk method), to prioritize cancer genes in cancer-related network. The key idea of Driver_IRW is to assign different transition probabilities for different edges of a constructed cancer-related network in accordance with the degree of the nodes' neighbors. Furthermore, the global centrality (here is betweenness centrality) and Katz feedback centrality are incorporated into the framework to evaluate the probability to walk to the seed nodes. Experimental results on four cancer types indicate that Driver_IRW performs more efficiently than some previously published methods for uncovering known cancer-related genes. In conclusion, our method can aid in prioritizing cancer-related genes and complement traditional frequency and network-based methods.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Jing Wang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
- College of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
117
|
Klein HU, Schäfer M, Bennett DA, Schwender H, De Jager PL. Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks. PLoS Comput Biol 2020; 16:e1007771. [PMID: 32255787 PMCID: PMC7138305 DOI: 10.1371/journal.pcbi.1007771] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 03/03/2020] [Indexed: 12/28/2022] Open
Abstract
Biomedical research studies have generated large multi-omic datasets to study complex diseases like Alzheimer’s disease (AD). An important aim of these studies is the identification of candidate genes that demonstrate congruent disease-related alterations across the different data types measured by the study. We developed a new method to detect such candidate genes in large multi-omic case-control studies that measure multiple data types in the same set of samples. The method is based on a gene-centric integrative coefficient quantifying to what degree consistent differences are observed in the different data types. For statistical inference, a Bayesian hierarchical model is used to study the distribution of the integrative coefficient. The model employs a conditional autoregressive prior to integrate a functional gene network and to share information between genes known to be functionally related. We applied the method to an AD dataset consisting of histone acetylation, DNA methylation, and RNA transcription data from human cortical tissue samples of 233 subjects, and we detected 816 genes with consistent differences between persons with AD and controls. The findings were validated in protein data and in RNA transcription data from two independent AD studies. Finally, we found three subnetworks of jointly dysregulated genes within the functional gene network which capture three distinct biological processes: myeloid cell differentiation, protein phosphorylation and synaptic signaling. Further investigation of the myeloid network indicated an upregulation of this network in early stages of AD prior to accumulation of hyperphosphorylated tau and suggested that increased CSF1 transcription in astrocytes may contribute to microglial activation in AD. Thus, we developed a method that integrates multiple data types and external knowledge of gene function to detect candidate genes, applied the method to an AD dataset, and identified several disease-related genes and processes demonstrating the usefulness of the integrative approach. Recent technological advances have led to a new generation of studies that interrogate multiple molecular levels in the same target tissue of a set of subjects, generating complex multi-omic datasets with which to study disease mechanism. These datasets of genetic, epigenomic, transcriptomic, and other data have the potential to reveal novel biological insights; however, integrative analyses remain challenging and require new computational methods. We developed an integrative Bayesian approach to detect genes with consistent differences between case and control samples across multiple data types. The method further integrates prior knowledge about gene function in the form of a gene functional similarity network to improve statistical inference by sharing information between related genes. We applied our method to an Alzheimer’s disease dataset of epigenomic and transcriptomic data and detected and then validated several novel and known candidate genes as well as three major disease-related biological processes. One of these processes reflected microglial activation and included the cytokine CSF1. Single-nucleus data revealed that CSF1 was primarily upregulated in astrocytes, implicating the involvement of this cell type in microglial activation. Hence, we demonstrated that integrative analysis approaches to multi-omic datasets can improve candidate gene detection and thereby generate new insights into complex diseases.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
- * E-mail:
| | - Martin Schäfer
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
| |
Collapse
|
118
|
Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 2020; 22:2096-2105. [PMID: 32249297 DOI: 10.1093/bib/bbaa036] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/09/2020] [Accepted: 02/25/2020] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhongyu Wei
- Research School of Computer Science, Australian National University, Canberra, 2601, Australia
| | - Idil Tuncali
- School of Data Science, Fudan University, Shanghai, 200433, China
| | | | | |
Collapse
|
119
|
Qiang J, Ding W, Kuijjer M, Quackenbush J, Chen P. Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:67775-67789. [PMID: 36329870 PMCID: PMC9629797 DOI: 10.1109/access.2020.2982569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data.
Collapse
Affiliation(s)
- Jipeng Qiang
- Department of Computer Science, Yangzhou University, Yangzhou 225127, China
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Wei Ding
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Marieke Kuijjer
- Centre for Molecular Medicine Norway, University of Oslo Faculty of Medicine, 0318 Oslo, Norway
| | - John Quackenbush
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Ping Chen
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| |
Collapse
|
120
|
Wang S, Wang W, Wang W, Xia P, Yu L, Lu Y, Chen X, Xu C, Liu H. Context-Specific Coordinately Regulatory Network Prioritize Breast Cancer Genetic Risk Factors. Front Genet 2020; 11:255. [PMID: 32273883 PMCID: PMC7113376 DOI: 10.3389/fgene.2020.00255] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 03/03/2020] [Indexed: 12/16/2022] Open
Abstract
Breast cancer (BC) is one of the most common tumors, leading the causes of cancer death in women. However, the pathogenesis of BC still remains unclear, and the atlas of BC-associated risk factors is far from complete. In this study, we constructed a BC-specific coordinately regulatory network (CRN) to prioritize potential BC-associated protein-coding genes (PCGs) and non-coding RNAs (ncRNAs). We integrated 813 BC sample transcriptome data from The Cancer Genome Atlas (TCGA) and eight types of regulatory relationships to construct BC-specific CRN, including 387 transcription factors (TFs), 174 microRNAs (miRNAs), 407 long non-coding RNAs (lncRNAs), and 905 PCGs. After that, the random walk with restart (RWR) method was performed on the CRN by using the known BC-associated factors as seeds, and potential BC-associated risk factors were prioritized. The leave-one-out cross-validation (LOOCV) was utilized on the BC-specific CRN and achieved an area under the curve (AUC) of 0.92. The performances of common CRN, common protein-protein interaction (PPI) network, and BC-specific PPI network were also evaluated, demonstrating that the context-specific CRN prioritizes BC risk factors. Functional analysis for the top 100-ranked risk factors in the candidate list revealed that these factors were significantly enriched in cancer-related functions and had significant semantic similarity with BC-related gene ontology (GO) terms. Differential expression analysis and survival analysis proved that the prioritized risk factors significantly associated with BC progression and prognosis. In total, we provided a computational method to predict reliable BC-associated risk factors, which would help improve the understanding of the pathology of BC and benefit disease diagnosis and prognosis.
Collapse
Affiliation(s)
- Shuyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wencan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weida Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Peng Xia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lei Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ye Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chaohan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hui Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
121
|
Chang JW, Ding Y, Tahir Ul Qamar M, Shen Y, Gao J, Chen LL. A deep learning model based on sparse auto-encoder for prioritizing cancer-related genes and drug target combinations. Carcinogenesis 2020; 40:624-632. [PMID: 30944926 DOI: 10.1093/carcin/bgz044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 01/06/2019] [Accepted: 03/10/2019] [Indexed: 12/21/2022] Open
Abstract
Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big challenge to accurately discover cancer-related proteins/genes via automatic learning from large-scale protein/gene expression data and protein-protein interaction data. Most of the existing methods are based on network construction combined with gene expression profiles, which ignore the diversity between normal samples and disease cell lines. In this study, we introduced a deep learning model based on a sparse auto-encoder to learn the specific characteristics of protein interactions in cancer cell lines integrated with protein expression data. The model showed learning ability to identify cancer-related proteins/genes from the input of different protein expression profiles by extracting the characteristics of protein interaction information, which could also predict cancer-related protein combinations. Comparing with other reported methods including differential expression and network-based methods, our model got the highest area under the curve value (>0.8) in predicting cancer-related genes. Our study prioritized ~500 high-confidence cancer-related genes; among these genes, 211 already known cancer drug targets were found, which supported the accuracy of our method. The above results indicated that the proposed auto-encoder model could computationally prioritize candidate proteins/genes involved in cancer and improve the targeted therapies research.
Collapse
Affiliation(s)
- Ji-Wei Chang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yuduan Ding
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Muhammad Tahir Ul Qamar
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Junxiang Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Ling-Ling Chen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
122
|
Lagunes-García G, Rodríguez-González A, Prieto-Santamaría L, García Del Valle EP, Zanin M, Menasalvas-Ruiz E. DISNET: a framework for extracting phenotypic disease information from public sources. PeerJ 2020; 8:e8580. [PMID: 32110491 PMCID: PMC7032061 DOI: 10.7717/peerj.8580] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Accepted: 01/16/2020] [Indexed: 12/25/2022] Open
Abstract
Background Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system's API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system's reliability.
Collapse
Affiliation(s)
- Gerardo Lagunes-García
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain.,Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
| | - Lucía Prieto-Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | | | - Massimiliano Zanin
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | - Ernestina Menasalvas-Ruiz
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
123
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
124
|
Blatti C, Emad A, Berry MJ, Gatzke L, Epstein M, Lanier D, Rizal P, Ge J, Liao X, Sobh O, Lambert M, Post CS, Xiao J, Groves P, Epstein AT, Chen X, Srinivasan S, Lehnert E, Kalari KR, Wang L, Weinshilboum RM, Song JS, Jongeneel CV, Han J, Ravaioli U, Sobh N, Bushell CB, Sinha S. Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform. PLoS Biol 2020; 18:e3000583. [PMID: 31971940 PMCID: PMC6977717 DOI: 10.1371/journal.pbio.3000583] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 12/19/2019] [Indexed: 12/19/2022] Open
Abstract
We present Knowledge Engine for Genomics (KnowEnG), a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis, and expression signature analysis. The system specializes in "knowledge-guided" data mining and machine learning algorithms, in which user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge bases and encoded in a massive "Knowledge Network." KnowEnG adheres to "FAIR" principles (findable, accessible, interoperable, and reuseable): its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution, and are interoperable with other computing platforms. The analysis tools are made available through multiple access modes, including a web portal with specialized visualization modules. We demonstrate the KnowEnG system's potential value in democratization of advanced tools for the modern genomics era through several case studies that use its tools to recreate and expand upon the published analysis of cancer data sets.
Collapse
Affiliation(s)
- Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
| | - Matthew J. Berry
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Lisa Gatzke
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Milt Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Daniel Lanier
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Pramod Rizal
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jing Ge
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoxia Liao
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Omar Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mike Lambert
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Corey S. Post
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jinfeng Xiao
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Peter Groves
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Aidan T. Epstein
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xi Chen
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Subhashini Srinivasan
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Erik Lehnert
- Seven Bridges Genomics, Charlestown, Massachusetts, United States of America
| | - Krishna R. Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Richard M. Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Jun S. Song
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - C. Victor Jongeneel
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jiawei Han
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Umberto Ravaioli
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Nahil Sobh
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Colleen B. Bushell
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
125
|
Yao Y, Ramsey SA. CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:535-546. [PMID: 31797625 PMCID: PMC6897322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University,Department of Biomedical Sciences, Oregon State University Corvallis, OR 97330, USA
| |
Collapse
|
126
|
Huang EW, Bhope A, Lim J, Sinha S, Emad A. Tissue-guided LASSO for prediction of clinical drug response using preclinical samples. PLoS Comput Biol 2020; 16:e1007607. [PMID: 31967990 PMCID: PMC6975549 DOI: 10.1371/journal.pcbi.1007607] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/15/2019] [Indexed: 12/12/2022] Open
Abstract
Prediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients. We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples' tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide better prediction performance. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs' mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.
Collapse
Affiliation(s)
- Edward W. Huang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Ameya Bhope
- Department of Electrical and Computer Engineering, McGill University, Canada
| | - Jing Lim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Canada
| |
Collapse
|
127
|
Hur B, Kang D, Lee S, Moon JH, Lee G, Kim S. Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments. BMC Bioinformatics 2019; 20:667. [PMID: 31881980 PMCID: PMC6941187 DOI: 10.1186/s12859-019-3302-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 12/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. RESULTS We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. CONCLUSIONS We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianet.
Collapse
Affiliation(s)
- Benjamin Hur
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Ji Hwan Moon
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Gung Lee
- National Creative Research Initiatives Center for Adipose Tissue Remodeling, Institute of Molecular Biology and Genetics, Department of Biological Sciences, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea. .,Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea. .,Bioinformatics Institute, Seoul National University, 1 Gwanak-ro, Seoul, Korea.
| |
Collapse
|
128
|
MiRNA-disease interaction prediction based on kernel neighborhood similarity and multi-network bidirectional propagation. BMC Med Genomics 2019; 12:185. [PMID: 31865912 PMCID: PMC6927119 DOI: 10.1186/s12920-019-0622-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Studies have shown that miRNAs are functionally associated with the development of many human diseases, but the roles of miRNAs in diseases and their underlying molecular mechanisms have not been fully understood. The research on miRNA-disease interaction has received more and more attention. Compared with the complexity and high cost of biological experiments, computational methods can rapidly and efficiently predict the potential miRNA-disease interaction and can be used as a beneficial supplement to experimental methods. Results In this paper, we proposed a novel computational model of kernel neighborhood similarity and multi-network bidirectional propagation (KNMBP) for miRNA-disease interaction prediction, especially for new miRNAs and new diseases. First, we integrated multiple data sources of diseases and miRNAs, respectively, to construct a novel disease semantic similarity network and miRNA functional similarity network. Secondly, based on the modified miRNA-disease interactions, we use the kernel neighborhood similarity algorithm to calculate the disease kernel neighborhood similarity and the miRNA kernel neighborhood similarity. Finally, we utilize bidirectional propagation algorithm to predict the miRNA-disease interaction scores based on the integrated disease similarity network and miRNA similarity network. As a result, the AUC value of 5-fold cross validation for all interactions by KNMBP is 0.93126 based on the commonly used dataset, and the AUC values for all interactions, for all miRNAs, for all disease is 0.93795、0.86363、0.86937 based on another dataset extracted by ourselves, which are higher than other state-of-the-art methods. In addition, our model has good parameter robustness. The case study further demonstrated the predictive performance of the model for novel miRNA-disease interactions. Conclusions Our KNMBP algorithm efficiently integrates multiple omics data from miRNAs and diseases to stably and efficiently predict potential miRNA-disease interactions. It is anticipated that KNMBP would be a useful tool in biomedical research.
Collapse
|
129
|
Hill A, Gleim S, Kiefer F, Sigoillot F, Loureiro J, Jenkins J, Morris MK. Benchmarking network algorithms for contextualizing genes of interest. PLoS Comput Biol 2019; 15:e1007403. [PMID: 31860671 PMCID: PMC6944391 DOI: 10.1371/journal.pcbi.1007403] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 01/06/2020] [Accepted: 09/11/2019] [Indexed: 12/11/2022] Open
Abstract
Computational approaches have shown promise in contextualizing genes of interest with known molecular interactions. In this work, we evaluate seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input. Our work highlights strengths and weaknesses of each algorithm and results in a recommendation of algorithms best suited for performing different tasks. In our labs, we aimed to use network algorithms to contextualize hits from functional genomics screens and gene expression studies. In order to understand how to apply these algorithms to our data, we characterized seventeen previously published algorithms based on characteristics of their output and their performance in three tasks: cross validation, prediction of drug targets, and behavior with random input.
Collapse
Affiliation(s)
- Abby Hill
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Scott Gleim
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Florian Kiefer
- Novartis Informatics, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Frederic Sigoillot
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Joseph Loureiro
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Jeremy Jenkins
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Melody K. Morris
- Respiratory Disease Area Department, Novartis Institutes for Biomedical Research, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
130
|
Leal LG, David A, Jarvelin MR, Sebert S, Männikkö M, Karhunen V, Seaby E, Hoggart C, Sternberg MJE. Identification of disease-associated loci using machine learning for genotype and network data integration. Bioinformatics 2019; 35:5182-5190. [PMID: 31070705 PMCID: PMC6954643 DOI: 10.1093/bioinformatics/btz310] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/28/2019] [Accepted: 04/25/2019] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. RESULTS We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals' ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user's research needs. AVAILABILITY AND IMPLEMENTATION An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Marjo-Riita Jarvelin
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu 90220, Finland
- Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK
- Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Sylvain Sebert
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu 90220, Finland
| | - Minna Männikkö
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
| | - Ville Karhunen
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu 90220, Finland
- Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK
- Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Eleanor Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London W2 1PG, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
131
|
Wang Y, Juan L, Peng J, Zang T, Wang Y. Prioritizing candidate diseases-related metabolites based on literature and functional similarity. BMC Bioinformatics 2019; 20:574. [PMID: 31760947 PMCID: PMC6876110 DOI: 10.1186/s12859-019-3127-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. Results We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.
Collapse
Affiliation(s)
- Yongtian Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, People's Republic of China
| | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| |
Collapse
|
132
|
Hyung D, Mallon AM, Kyung DS, Cho SY, Seong JK. TarGo: network based target gene selection system for human disease related mouse models. Lab Anim Res 2019; 35:23. [PMID: 32257911 PMCID: PMC7081697 DOI: 10.1186/s42826-019-0023-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 10/21/2019] [Indexed: 11/25/2022] Open
Abstract
Genetically engineered mouse models are used in high-throughput phenotyping screens to understand genotype-phenotype associations and their relevance to human diseases. However, not all mutant mouse lines with detectable phenotypes are associated with human diseases. Here, we propose the “Target gene selection system for Genetically engineered mouse models” (TarGo). Using a combination of human disease descriptions, network topology, and genotype-phenotype correlations, novel genes that are potentially related to human diseases are suggested. We constructed a gene interaction network using protein-protein interactions, molecular pathways, and co-expression data. Several repositories for human disease signatures were used to obtain information on human disease-related genes. We calculated disease- or phenotype-specific gene ranks using network topology and disease signatures. In conclusion, TarGo provides many novel features for gene function prediction.
Collapse
Affiliation(s)
- Daejin Hyung
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea
| | - Ann-Marie Mallon
- 2MRC Harwell Institute, Mammalian Genetics Unit, Oxfordshire, OX11 0RD UK
| | - Dong Soo Kyung
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| | - Soo Young Cho
- 1National Cancer Center, 323 Ilsan-ro, Goyang-si, Kyeonggi-do 10408 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea
| | - Je Kyung Seong
- 3Laboratory of Developmental Biology and Genomics, Research Institute for Veterinary Science, and BK21 Plus Program for Creative Veterinary Science, College of Veterinary Medicine, Seoul National University, Seoul, 08826 Republic of Korea.,4Korea Mouse Phenotyping Center (KMPC), Seoul National University, Seoul, 08826 Republic of Korea.,5Interdisciplinary Program for Bioinformatics, Program for Cancer Biology and BIO-MAX institute, Seoul National University, Seoul, 08826 Republic of Korea
| |
Collapse
|
133
|
Chen Z, Wang X, Gao P, Liu H, Song B. Predicting Disease Related microRNA Based on Similarity and Topology. Cells 2019; 8:cells8111405. [PMID: 31703479 PMCID: PMC6912199 DOI: 10.3390/cells8111405] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/31/2019] [Accepted: 11/05/2019] [Indexed: 12/19/2022] Open
Abstract
It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease-miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms. The results of fivefold cross validation demonstrated that STIM outperforms many existing methods, particularly in terms of the area under the curve. In addition, the top 30 candidate miRNAs recommended by STIM in a case study of lung neoplasm have been confirmed in previous experiments, which proved the validity of the method.
Collapse
Affiliation(s)
- Zhihua Chen
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Xinke Wang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Peng Gao
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Hongju Liu
- College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
| | - Bosheng Song
- School of Information Science and Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
134
|
Alam A, Imam N, Ahmed MM, Tazyeen S, Tamkeen N, Farooqui A, Malik MZ, Ishrat R. Identification and Classification of Differentially Expressed Genes and Network Meta-Analysis Reveals Potential Molecular Signatures Associated With Tuberculosis. Front Genet 2019; 10:932. [PMID: 31749827 PMCID: PMC6844239 DOI: 10.3389/fgene.2019.00932] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 09/05/2019] [Indexed: 12/13/2022] Open
Abstract
Tuberculosis (TB) is one of deadly transmissible disease that causes death worldwide; however, only 10% of people infected with Mycobacteriumtuberculosis develop disease, indicating that host genetic factors may play key role in determining susceptibility to TB disease. In this way, the analysis of gene expression profiling of TB infected individuals can give us a snapshot of actively expressed genes and transcripts under various conditions. In the present study, we have analyzed microarray data set and compared the gene expression profiles of patients with different datasets of healthy control, latent infection, and active TB. We observed the transition of genes from normal condition to different stages of the TB and identified and annotated those genes/pathways/processes that have important roles in TB disease during its cyclic interventions in the human body. We identified 488 genes that were differentially expressed at various stages of TB and allocated to pathways and gene set enrichment analysis. These pathways as well as GSEA’s importance were evaluated according to the number of DEGs presents in both. In addition, we studied the gene regulatory networks that may help to further understand the molecular mechanism of immune response against the TB infection and provide us a new angle for future biomarker and therapeutic targets. In this study, we identified 26 leading hubs which are deeply rooted from top to bottom in the gene regulatory network and work as the backbone of the network. These leading hubs contains 31 key regulator genes, of which 14 genes were up-regulated and 17 genes were down-regulated. The proposed approach is based on gene-expression profiling, and network analysis approaches predict some unknown TB-associated genes, which can be considered (or can be tested) as reliable candidates for further (in vivo/in vitro) studies.
Collapse
Affiliation(s)
- Aftab Alam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Nikhat Imam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India.,Department of Mathematics, Institute of Computer Science & Information Technology, Magadh University, Bodh Gaya, India
| | - Mohd Murshad Ahmed
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Safia Tazyeen
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Naaila Tamkeen
- Department of Biosciences, Jamia Millia Islamia, New Delhi, India
| | - Anam Farooqui
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| | - Md Zubbair Malik
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Romana Ishrat
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
135
|
Ma X, Sun P, Zhang ZY. An Integrative Framework for Protein Interaction Network and Methylation Data to Discover Epigenetic Modules. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1855-1866. [PMID: 29994031 DOI: 10.1109/tcbb.2018.2831666] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
DNA methylation is a critical epigenetic modification that plays an important role in cancers. The available algorithms fail to fully characterize epigenetic modules. To address this issue, we first characterize the epigenetic module as a group of well-connected genes in the protein interaction network and are also co-methylated based on gene methylation profiles. Then, the epigenetic module discovery problem is transformed into an optimization problem. Then, a regularized nonnegative matrix factorization algorithm for methylation modules (RNMF-MM) is presented, where the co-methylation constraint is treated as a regularizer. Using the artificial networks with known module structure, we demonstrate that the proposed algorithm outperforms state-of-the-art approaches in terms of accuracy. On the basis of breast cancer methylation data and protein interaction network, the RNMF-MM algorithm discovers methylation modules that are significantly more enriched by the known pathways than those obtained by other algorithms. These modules serve as biomarkers for predicting cancer stages and estimating survival time of patients. The proposed model and algorithm provide an effective way for the integrative analysis of protein interaction network and methylation data.
Collapse
|
136
|
Gligorijevic V, Barot M, Bonneau R. deepNF: deep network fusion for protein function prediction. Bioinformatics 2019; 34:3873-3881. [PMID: 29868758 PMCID: PMC6223364 DOI: 10.1093/bioinformatics/bty440] [Citation(s) in RCA: 131] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/28/2018] [Indexed: 01/10/2023] Open
Abstract
Motivation The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly non-linear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. Results We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting gene ontology terms of varying type and specificity. Availability and implementation deepNF is freely available at: https://github.com/VGligorijevic/deepNF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Gligorijevic
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Meet Barot
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.,Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, USA.,Center for Data Science, New York University, New York, NY, USA
| |
Collapse
|
137
|
De Novo Pathway-Based Classification of Breast Cancer Subtypes. Methods Mol Biol 2019. [PMID: 31583640 DOI: 10.1007/978-1-4939-9873-9_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Breast cancer is a heterogeneous disease for which various clinically relevant subtypes have been reported. These subtypes are characterized by molecular differences which direct treatment selection. The state of the art for breast cancer subtyping utilizes histochemistry or gene expression to measure a few selected markers. However, classification based on molecular pathways (rather than individual markers) is a more robust way to classify breast cancer samples into known subtypes.Here, we present PathClass, a web application that allows its users to predict breast cancer subtypes using various traditional as well as advanced methods. This includes methods based on classical gene expression panels as well as de novo pathway-based predictors. Users can predict labels for datasets in the Gene Expression Omnibus or upload their own expression profiling data.Availability: https://pathclass.compbio.sdu.dk/ .
Collapse
|
138
|
Reyna MA, Leiserson MDM, Raphael BJ. Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 2019; 34:i972-i980. [PMID: 30423088 PMCID: PMC6129270 DOI: 10.1093/bioinformatics/bty613] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Motivation The analysis of high-dimensional ‘omics data is often informed by the use of biological interaction networks. For example, protein–protein interaction networks have been used to analyze gene expression data, to prioritize germline variants, and to identify somatic driver mutations in cancer. In these and other applications, the underlying computational problem is to identify altered subnetworks containing genes that are both highly altered in an ‘omics dataset and are topologically close (e.g. connected) on an interaction network. Results We introduce Hierarchical HotNet, an algorithm that finds a hierarchy of altered subnetworks. Hierarchical HotNet assesses the statistical significance of the resulting subnetworks over a range of biological scales and explicitly controls for ascertainment bias in the network. We evaluate the performance of Hierarchical HotNet and several other algorithms that identify altered subnetworks on the problem of predicting cancer genes and significantly mutated subnetworks. On somatic mutation data from The Cancer Genome Atlas, Hierarchical HotNet outperforms other methods and identifies significantly mutated subnetworks containing both well-known cancer genes and candidate cancer genes that are rarely mutated in the cohort. Hierarchical HotNet is a robust algorithm for identifying altered subnetworks across different ‘omics datasets. Availability and implementation http://github.com/raphael-group/hierarchical-hotnet. Supplementary information Supplementary material are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew A Reyna
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mark D M Leiserson
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| |
Collapse
|
139
|
Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019; 34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
140
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
141
|
Chagoyen M, Ranea JAG, Pazos F. Applications of molecular networks in biomedicine. Biol Methods Protoc 2019; 4:bpz012. [PMID: 32395629 PMCID: PMC7200821 DOI: 10.1093/biomethods/bpz012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/20/2019] [Accepted: 08/28/2019] [Indexed: 12/12/2022] Open
Abstract
Due to the large interdependence between the molecular components of living systems, many phenomena, including those related to pathologies, cannot be explained in terms of a single gene or a small number of genes. Molecular networks, representing different types of relationships between molecular entities, embody these large sets of interdependences in a framework that allow their mining from a systemic point of view to obtain information. These networks, often generated from high-throughput omics datasets, are used to study the complex phenomena of human pathologies from a systemic point of view. Complementing the reductionist approach of molecular biology, based on the detailed study of a small number of genes, systemic approaches to human diseases consider that these are better reflected in large and intricate networks of relationships between genes. These networks, and not the single genes, provide both better markers for diagnosing diseases and targets for treating them. Network approaches are being used to gain insight into the molecular basis of complex diseases and interpret the large datasets associated with them, such as genomic variants. Network formalism is also suitable for integrating large, heterogeneous and multilevel datasets associated with diseases from the molecular level to organismal and epidemiological scales. Many of these approaches are available to nonexpert users through standard software packages.
Collapse
Affiliation(s)
- Monica Chagoyen
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| |
Collapse
|
142
|
Zeng X, Liu L, Lü L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 2019; 34:2425-2432. [PMID: 29490018 DOI: 10.1093/bioinformatics/bty112] [Citation(s) in RCA: 168] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Accepted: 02/24/2018] [Indexed: 12/12/2022] Open
Abstract
Motivation The identification of disease-related microRNAs (miRNAs) is an essential but challenging task in bioinformatics research. Similarity-based link prediction methods are often used to predict potential associations between miRNAs and diseases. In these methods, all unobserved associations are ranked by their similarity scores. Higher score indicates higher probability of existence. However, most previous studies mainly focus on designing advanced methods to improve the prediction accuracy while neglect to investigate the link predictability of the networks that present the miRNAs and diseases associations. In this work, we construct a bilayer network by integrating the miRNA-disease network, the miRNA similarity network and the disease similarity network. We use structural consistency as an indicator to estimate the link predictability of the related networks. On the basis of the indicator, a derivative algorithm, called structural perturbation method (SPM), is applied to predict potential associations between miRNAs and diseases. Results The link predictability of bilayer network is higher than that of miRNA-disease network, indicating that the prediction of potential miRNAs-diseases associations on bilayer network can achieve higher accuracy than based merely on the miRNA-disease network. A comparison between the SPM and other algorithms reveals the reliable performance of SPM which performed well in a 5-fold cross-validation. We test fifteen networks. The AUC values of SPM are higher than some well-known methods, indicating that SPM could serve as a useful computational method for improving the identification accuracy of miRNA‒disease associations. Moreover, in a case study on breast neoplasm, 80% of the top-20 predicted miRNAs have been manually confirmed by previous experimental studies. Availability and implementation https://github.com/lecea/SPM-code.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- Department of Computer Science, Xiamen University, Xiamen, China.,Department of Artificial Intelligence, Universidad Politécnica de Madrid (UPM) Campus Montegancedo s/n, Boadilla del Monte, Madrid, Spain
| | - Li Liu
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Linyuan Lü
- Alibaba Research Center for Complexity Sciences, Alibaba Business College, Hangzhou Normal University, Hangzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
143
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
144
|
Moutaoufik MT, Malty R, Amin S, Zhang Q, Phanse S, Gagarinova A, Zilocchi M, Hoell L, Minic Z, Gagarinova M, Aoki H, Stockwell J, Jessulat M, Goebels F, Broderick K, Scott NE, Vlasblom J, Musso G, Prasad B, Lamantea E, Garavaglia B, Rajput A, Murayama K, Okazaki Y, Foster LJ, Bader GD, Cayabyab FS, Babu M. Rewiring of the Human Mitochondrial Interactome during Neuronal Reprogramming Reveals Regulators of the Respirasome and Neurogenesis. iScience 2019; 19:1114-1132. [PMID: 31536960 PMCID: PMC6831851 DOI: 10.1016/j.isci.2019.08.057] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 06/28/2019] [Accepted: 08/29/2019] [Indexed: 12/13/2022] Open
Abstract
Mitochondrial protein (MP) assemblies undergo alterations during neurogenesis, a complex process vital in brain homeostasis and disease. Yet which MP assemblies remodel during differentiation remains unclear. Here, using mass spectrometry-based co-fractionation profiles and phosphoproteomics, we generated mitochondrial interaction maps of human pluripotent embryonal carcinoma stem cells and differentiated neuronal-like cells, which presented as two discrete cell populations by single-cell RNA sequencing. The resulting networks, encompassing 6,442 high-quality associations among 600 MPs, revealed widespread changes in mitochondrial interactions and site-specific phosphorylation during neuronal differentiation. By leveraging the networks, we show the orphan C20orf24 as a respirasome assembly factor whose disruption markedly reduces respiratory chain activity in patients deficient in complex IV. We also find that a heme-containing neurotrophic factor, neuron-derived neurotrophic factor [NENF], couples with Parkinson disease-related proteins to promote neurotrophic activity. Our results provide insights into the dynamic reorganization of mitochondrial networks during neuronal differentiation and highlights mechanisms for MPs in respirasome, neuronal function, and mitochondrial diseases. Rewiring of mitochondrial (mt) protein interaction network in distinct cell states Dramatic changes in site-specific phosphorylation during neuronal differentiation C20orf24 is a respirasome assembly factor depleted in patients deficient in CIV NENF binding with DJ-1/PINK1 promotes neurotrophic activity and neuronal survival
Collapse
Affiliation(s)
| | - Ramy Malty
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Shahreen Amin
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Qingzhou Zhang
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Sadhna Phanse
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Alla Gagarinova
- Department of Biochemistry, University of Saskatchewan, Saskatoon, SK S7N 5E5, Canada
| | - Mara Zilocchi
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Larissa Hoell
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Zoran Minic
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Maria Gagarinova
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Hiroyuki Aoki
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Jocelyn Stockwell
- Department of Surgery, Neuroscience Research Group, College of Medicine, University of Saskatchewan, Saskatoon, SK S7N 5E5, Canada
| | - Matthew Jessulat
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Florian Goebels
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Kirsten Broderick
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Nichollas E Scott
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - James Vlasblom
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Gabriel Musso
- Department of Medicine, Harvard Medical School and Cardiovascular Division, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Bhanu Prasad
- Department of Medicine, Regina Qu'Appelle Health Region, Regina, SK S4P 0W5, Canada
| | - Eleonora Lamantea
- Medical Genetics and Neurogenetics Unit, Fondazione IRCCS Instituto Neurologico Carlo Besta, via L. Temolo, 4, 20126 Milan, Italy
| | - Barbara Garavaglia
- Medical Genetics and Neurogenetics Unit, Fondazione IRCCS Instituto Neurologico Carlo Besta, via L. Temolo, 4, 20126 Milan, Italy
| | - Alex Rajput
- Department of Medicine, Division of Neurology, College of Medicine, University of Saskatchewan, Saskatoon, SK S7N 5E5, Canada
| | - Kei Murayama
- Department of Metabolism, Chiba Children's Hospital, 579-1 Heta-cho, Midori, Chiba 266-0007, Japan
| | - Yasushi Okazaki
- Graduate School of Medicine, Intractable Disease Research Center, Juntendo University, Hongo 2-1-1, Bunkyo-ku, Tokyo 113-8421, Japan
| | - Leonard J Foster
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Francisco S Cayabyab
- Department of Surgery, Neuroscience Research Group, College of Medicine, University of Saskatchewan, Saskatoon, SK S7N 5E5, Canada
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada.
| |
Collapse
|
145
|
Picart-Armada S, Barrett SJ, Willé DR, Perera-Lluna A, Gutteridge A, Dessailly BH. Benchmarking network propagation methods for disease gene identification. PLoS Comput Biol 2019; 15:e1007276. [PMID: 31479437 PMCID: PMC6743778 DOI: 10.1371/journal.pcbi.1007276] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 09/13/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022] Open
Abstract
In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes. The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic associations to disease.
Collapse
Affiliation(s)
- Sergio Picart-Armada
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
- * E-mail:
| | | | | | - Alexandre Perera-Lluna
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
| | - Alex Gutteridge
- Computational Biology and Statistics, GSK, Stevenage, United Kingdom
| | | |
Collapse
|
146
|
Lee T, Lee S, Yang S, Lee I. MaizeNet: a co-functional network for network-assisted systems genetics in Zea mays. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 99:571-582. [PMID: 31006149 DOI: 10.1111/tpj.14341] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/21/2019] [Accepted: 03/28/2019] [Indexed: 05/27/2023]
Abstract
Maize (Zea mays) has multiple uses in human food, animal fodder, starch and sweetener production and as a biofuel, and is accordingly the most extensively cultivated cereal worldwide. To enhance maize production, genetic factors underlying important agricultural traits, including stress tolerance and flowering, have been explored through forward and reverse genetics approaches. Co-functional gene networks are systems biology resources useful in identifying trait-associated genes in plants by prioritizing candidate genes. Here, we present MaizeNet (http://www.inetbio.org/maizenet/), a genome-scale co-functional network of Z. mays genes, and a companion web server for network-assisted systems genetics. We describe the validation of MaizeNet network quality and its ability to functionally predict molecular pathways and complex traits in maize. Furthermore, we demonstrate that MaizeNet-based prioritization of candidate genes can facilitate the identification of cell wall biosynthesis genes and detect network communities associated with flowering-time candidate genes derived from genome-wide association studies. The demonstrated gene prioritization and subnetwork analysis can be conducted by simply submitting maize gene models based on the commonly used B73 RefGen_v3 and the latest B73 RefGen_v4 reference genomes on the MaizeNet web server. MaizeNet-based network-assisted systems genetics will substantially accelerate the discovery of trait-associated genes for crop improvement.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Sungho Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, 03722, Korea
| |
Collapse
|
147
|
Detailed modeling of positive selection improves detection of cancer driver genes. Nat Commun 2019; 10:3399. [PMID: 31363082 PMCID: PMC6667447 DOI: 10.1038/s41467-019-11284-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 07/02/2019] [Indexed: 01/04/2023] Open
Abstract
Identifying driver genes from somatic mutations is a central problem in cancer biology. Existing methods, however, either lack explicit statistical models, or use models based on simplistic assumptions. Here, we present driverMAPS (Model-based Analysis of Positive Selection), a model-based approach to driver gene identification. This method explicitly models positive selection at the single-base level, as well as highly heterogeneous background mutational processes. In particular, the selection model captures elevated mutation rates in functionally important sites using multiple external annotations, and spatial clustering of mutations. Simulations under realistic evolutionary models demonstrate the increased power of driverMAPS over current approaches. Applying driverMAPS to TCGA data of 20 tumor types, we identified 159 new potential driver genes, including the mRNA methyltransferase METTL3-METTL14. We experimentally validated METTL3 as a tumor suppressor gene in bladder cancer, providing support to the important role mRNA modification plays in tumorigenesis. Finding driver genes sheds lights on the biological mechanisms propelling the development of a tumour, and can suggest therapeutic strategies. Here, the authors develop driverMAPS, a model-based approach to identify driver genes, and apply it to TCGA datasets.
Collapse
|
148
|
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics 2019; 34:1953-1956. [PMID: 29365045 DOI: 10.1093/bioinformatics/bty002] [Citation(s) in RCA: 181] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 01/22/2018] [Indexed: 01/09/2023] Open
Abstract
Summary DincRNA aims to provide a comprehensive web-based bioinformatics toolkit to elucidate the entangled relationships among diseases and non-coding RNAs (ncRNAs) from the perspective of disease similarity. The quantitative way to illustrate relationships of pair-wise diseases always depends on their molecular mechanisms, and structures of the directed acyclic graph of Disease Ontology (DO). Corresponding methods for calculating similarity of pair-wise diseases involve Resnik's, Lin's, Wang's, PSB and SemFunSim methods. Recently, disease similarity was validated suitable for calculating functional similarities of ncRNAs and prioritizing ncRNA-disease pairs, and it has been widely applied for predicting the ncRNA function due to the limited biological knowledge from wet lab experiments of these RNAs. For this purpose, a large number of algorithms and priori knowledge need to be integrated. e.g. 'pair-wise best, pairs-average' (PBPA) and 'pair-wise all, pairs-maximum' (PAPM) methods for calculating functional similarities of ncRNAs, and random walk with restart (RWR) method for prioritizing ncRNA-disease pairs. To facilitate the exploration of disease associations and ncRNA function, DincRNA implemented all of the above eight algorithms based on DO and disease-related genes. Currently, it provides the function to query disease similarity scores, miRNA and lncRNA functional similarity scores, and the prioritization scores of lncRNA-disease and miRNA-disease pairs. Availability and implementation http://bio-annotation.cn:18080/DincRNAClient/. Contact biofomeng@hotmail.com or qhjiang@hit.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liang Cheng
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Yang Hu
- Department of Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Sheng 150001, China
| | - Jie Sun
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Meng Zhou
- Department of College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Sheng 150081, China
| | - Qinghua Jiang
- Department of Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang Sheng 150001, China
| |
Collapse
|
149
|
Wan C, Cozzetto D, Fa R, Jones DT. Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks. PLoS One 2019; 14:e0209958. [PMID: 31335894 PMCID: PMC6650051 DOI: 10.1371/journal.pone.0209958] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 07/01/2019] [Indexed: 12/02/2022] Open
Abstract
Protein-protein interaction network data provides valuable information that infers direct links between genes and their biological roles. This information brings a fundamental hypothesis for protein function prediction that interacting proteins tend to have similar functions. With the help of recently-developed network embedding feature generation methods and deep maxout neural networks, it is possible to extract functional representations that encode direct links between protein-protein interactions information and protein function. Our novel method, STRING2GO, successfully adopts deep maxout neural networks to learn functional representations simultaneously encoding both protein-protein interactions and functional predictive information. The experimental results show that STRING2GO outperforms other protein-protein interaction network-based prediction methods and one benchmark method adopted in a recent large scale protein function prediction competition.
Collapse
Affiliation(s)
- Cen Wan
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Rui Fa
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, United Kingdom
| | - David T. Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Biomedical Data Science Laboratory, The Francis Crick Institute, London, United Kingdom
- * E-mail:
| |
Collapse
|
150
|
Ramsahai E, Tripathi V, John M. Cancer driver genes: a guilty by resemblance doctrine. PeerJ 2019; 7:e6979. [PMID: 31275738 PMCID: PMC6598669 DOI: 10.7717/peerj.6979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 04/16/2019] [Indexed: 11/30/2022] Open
Abstract
A major benefit of expansive cancer genome projects is the discovery of new targets for drug treatment and development. To date, cancer driver genes have been primarily identified by methods based on gene mutation frequency. This approach fails to identify culpable genes that are not mutated, rarely mutated, or contribute to the development of rare forms of cancer. Due to the complexity of the disease and the sheer volume of data, computational methods may encounter a NP-complete problem. We have developed a novel pathway and reach (PAR) method that employs a guilty by resemblance approach to identify cancer driver genes that avoids the above problems. Essentially PAR sifts through a list of genes of biological pathways to find those that are common to the same pathways and possess a similar 2-reach topology metric as a reference set of recognized driver genes. This approach leads to faster processing times and eliminates any dependency on gene mutation frequency. Out of the three pathways, signal transduction, immune system, and gene expression, a set of 50 candidate driver genes were identified, 30 of which were new. The top five were HGF, E2F1, C6, MIF, and CDK2.
Collapse
Affiliation(s)
- Emilie Ramsahai
- Department of Mathematics and Statistics, The University of the West Indies, St. Augustine, Trinidad and Tobago
| | - Vrijesh Tripathi
- Department of Mathematics and Statistics, The University of the West Indies, St. Augustine, Trinidad and Tobago
| | - Melford John
- Department of Preclinical Sciences, The University of the West Indies, St. Augustine, Trinidad and Tobago
| |
Collapse
|