251
|
Tian Z, Guo M, Wang C, Liu X, Wang S. Refine gene functional similarity network based on interaction networks. BMC Bioinformatics 2017; 18:550. [PMID: 29297381 PMCID: PMC5751769 DOI: 10.1186/s12859-017-1969-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene functional similarity networks so far. In this research, we will try to build a high reliable gene functional similarity network to promote its further application. RESULTS Here, we propose a novel method to construct and refine the gene functional similarity network. It mainly contains three steps. First, we establish an integrated gene functional similarity networks based on different functional similarity calculation methods. Then, we construct a referenced gene-gene association network based on the protein-protein interaction networks. At last, we refine the spurious edges in the integrated gene functional similarity network with the help of the referenced gene-gene association network. Experiment results indicate that the refined gene functional similarity network (RGFSN) exhibits a scale-free, small world and modular architecture, with its degrees fit best to power law distribution. In addition, we conduct protein complex prediction experiment for human based on RGFSN and achieve an outstanding result, which implies it has high reliability and wide application significance. CONCLUSIONS Our efforts are insightful for constructing and refining gene functional similarity networks, which can be applied to build other high quality biological networks.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044 People’s Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Shiming Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| |
Collapse
|
252
|
Zeng X, Ding N, Rodríguez-Patón A, Zou Q. Probability-based collaborative filtering model for predicting gene-disease associations. BMC Med Genomics 2017; 10:76. [PMID: 29297351 PMCID: PMC5751590 DOI: 10.1186/s12920-017-0313-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. METHODS We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. RESULTS We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. CONCLUSIONS PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- Department of Computer Science, School of information science and technology, Xiamen University, Xiamen, China
- Department of Artificial Intelligence, Universidad Politcnica de Madrid (UPM), Madrid, Spain
| | - Ningxiang Ding
- Department of Computer Science, School of information science and technology, Xiamen University, Xiamen, China
| | - Alfonso Rodríguez-Patón
- Department of Artificial Intelligence, Universidad Politcnica de Madrid (UPM), Madrid, Spain
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|
253
|
Hu Y, Zhou M, Shi H, Ju H, Jiang Q, Cheng L. Measuring disease similarity and predicting disease-related ncRNAs by a novel method. BMC Med Genomics 2017; 10:71. [PMID: 29297338 PMCID: PMC5751624 DOI: 10.1186/s12920-017-0315-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Background Similar diseases are always caused by similar molecular origins, such as diasease-related protein-coding genes (PCGs). And the molecular associations reflect their similarity. Therefore, current methods for calculating disease similarity often utilized functional interactions of PCGs. Besides, the existing methods have neglected a fact that genes could also be associated in the gene functional network (GFN) based on intermediate nodes. Methods Here we presented a novel method, InfDisSim, to deduce the similarity of diseases. InfDisSim utilized the whole network based on random walk with damping to model the information flow. A benchmark set of similar disease pairs was employed to evaluate the performance of InfDisSim. Results The region beneath the receiver operating characteristic curve (AUC) was calculated to assess the performance. As a result, InfDisSim reaches a high AUC (0.9786) which indicates a very good performance. Furthermore, after calculating the disease similarity by the InfDisSim, we reconfirmed that similar diseases tend to have common therapeutic drugs (Pearson correlation γ2 = 0.1315, p = 2.2e-16). Finally, the disease similarity computed by infDisSim was employed to construct a miRNA similarity network (MSN) and lncRNA similarity network (LSN), which were further exploited to predict potential associations of lncRNA-disease pairs and miRNA-disease pairs, respectively. High AUC (0.9893, 0.9007) based on leave-one-out cross validation shows that the LSN and MSN is very appropriate for predicting novel disease-related lncRNAs and miRNAs, respectively. Conclusions The high AUC based on benchmark data indicates the method performs well. The method is valuable in the prediction of disease-related lncRNAs and miRNAs. Electronic supplementary material The online version of this article (doi: 10.1186/s12920-017-0315-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hong Ju
- Department of information engineering, Heilongjiang biological science and technology Career Academy, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
254
|
Zhang X, Gao L, Jia S. Extracting Fitness Relationships and Oncogenic Patterns among Driver Genes in Cancer. Molecules 2017; 23:molecules23010039. [PMID: 29295608 PMCID: PMC5943933 DOI: 10.3390/molecules23010039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 12/13/2017] [Accepted: 12/18/2017] [Indexed: 11/16/2022] Open
Abstract
Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes (“fitness relationships”) in tumorigenesis. We introduce a network-based method for extracting the fitness relationships of driver genes by modeling the network properties of the “fitness” of cancer cells. Colon adenocarcinoma (COAD) and skin cutaneous malignant melanoma (SKCM) are employed as case studies. Consistent results derived from different background networks suggest the reliability of the identified fitness relationships. Additionally co-occurrence analysis and pathway analysis reveal the functional significance of the fitness relationships with signaling transduction. In addition, a subset of driver genes called the “fitness core” is recognized for each case. Further analyses indicate the functional importance of the fitness core in carcinogenesis, and provide potential therapeutic opportunities in medicinal intervention. Fitness relationships characterize the functional continuity among driver genes in carcinogenesis, and suggest new insights in understanding the oncogenic mechanisms of cancers, as well as providing guiding information for medicinal intervention.
Collapse
Affiliation(s)
- Xindong Zhang
- School of Computer Science and Technology, Xidian University, Xi'an 710000, China.
- School of Computer Science, Xi'an Polytechnic University, Xi'an 710000, China.
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710000, China.
| | - Songwei Jia
- School of Software, Xidian University, Xi'an 710000, China.
| |
Collapse
|
255
|
Yang F, Wu D, Lin L, Yang J, Yang T, Zhao J. The integration of weighted gene association networks based on information entropy. PLoS One 2017; 12:e0190029. [PMID: 29272314 PMCID: PMC5741255 DOI: 10.1371/journal.pone.0190029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 12/06/2017] [Indexed: 01/18/2023] Open
Abstract
Constructing genome scale weighted gene association networks (WGAN) from multiple data sources is one of research hot spots in systems biology. In this paper, we employ information entropy to describe the uncertain degree of gene-gene links and propose a strategy for data integration of weighted networks. We use this method to integrate four existing human weighted gene association networks and construct a much larger WGAN, which includes richer biology information while still keeps high functional relevance between linked gene pairs. The new WGAN shows satisfactory performance in disease gene prediction, which suggests the reliability of our integration strategy. Compared with existing integration methods, our method takes the advantage of the inherent characteristics of the component networks and pays less attention to the biology background of the data. It can make full use of existing biological networks with low computational effort.
Collapse
Affiliation(s)
- Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Duzhi Wu
- Rongzhi College of Chongqing Technology and Business, Chongqing, China
- * E-mail: (DW); (JZ)
| | - Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- * E-mail: (DW); (JZ)
| |
Collapse
|
256
|
Lin L, Yang T, Fang L, Yang J, Yang F, Zhao J. Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network. BMC SYSTEMS BIOLOGY 2017; 11:121. [PMID: 29212543 PMCID: PMC5718078 DOI: 10.1186/s12918-017-0519-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/24/2017] [Indexed: 01/24/2023]
Abstract
Background Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge. Results In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence. Conclusions The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases. Electronic supplementary material The online version of this article (10.1186/s12918-017-0519-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Ling Fang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
257
|
Yang H, Wei Q, Zhong X, Yang H, Li B. Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework. Bioinformatics 2017; 33:483-490. [PMID: 27797769 DOI: 10.1093/bioinformatics/btw662] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 10/17/2016] [Indexed: 01/06/2023] Open
Abstract
Motivation Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. Results We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. Availability and Implementation The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . Contacts hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hai Yang
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA.,Vanderbilt Genetics Institute, Nashville, TN, USA
| | - Qiang Wei
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA.,Vanderbilt Genetics Institute, Nashville, TN, USA
| | - Xue Zhong
- Vanderbilt Genetics Institute, Nashville, TN, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Hushan Yang
- Department of Medical Oncology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN, USA.,Vanderbilt Genetics Institute, Nashville, TN, USA
| |
Collapse
|
258
|
Disease gene classification with metagraph representations. Methods 2017; 131:83-92. [DOI: 10.1016/j.ymeth.2017.06.036] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 06/23/2017] [Accepted: 06/30/2017] [Indexed: 12/28/2022] Open
|
259
|
Szedlak A, Sims S, Smith N, Paternostro G, Piermarocchi C. Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems. PLoS Comput Biol 2017; 13:e1005849. [PMID: 29149186 PMCID: PMC5711035 DOI: 10.1371/journal.pcbi.1005849] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 12/01/2017] [Accepted: 10/25/2017] [Indexed: 12/18/2022] Open
Abstract
Modern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases AURKB, NEK1, TTK, and WEE1 causes simulated HeLa cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model. Cell cycle—the process in which a parent cell replicates its DNA and divides into two daughter cells—is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.
Collapse
Affiliation(s)
- Anthony Szedlak
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, United States of America
| | - Spencer Sims
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, United States of America
| | - Nicholas Smith
- Salgomed Inc., Del Mar, California, United States of America
| | - Giovanni Paternostro
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California, United States of America
| | - Carlo Piermarocchi
- Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
260
|
Miryala SK, Anbarasu A, Ramaiah S. Discerning molecular interactions: A comprehensive review on biomolecular interaction databases and network analysis tools. Gene 2017; 642:84-94. [PMID: 29129810 DOI: 10.1016/j.gene.2017.11.028] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 10/17/2017] [Accepted: 11/08/2017] [Indexed: 12/12/2022]
Abstract
Computational analysis of biomolecular interaction networks is now gaining a lot of importance to understand the functions of novel genes/proteins. Gene interaction (GI) network analysis and protein-protein interaction (PPI) network analysis play a major role in predicting the functionality of interacting genes or proteins and gives an insight into the functional relationships and evolutionary conservation of interactions among the genes. An interaction network is a graphical representation of gene/protein interactome, where each gene/protein is a node, and interaction between gene/protein is an edge. In this review, we discuss the popular open source databases that serve as data repositories to search and collect protein/gene interaction data, and also tools available for the generation of interaction network, visualization and network analysis. Also, various network analysis approaches like topological approach and clustering approach to study the network properties and functional enrichment server which illustrates the functions and pathway of the genes and proteins has been discussed. Hence the distinctive attribute mentioned in this review is not only to provide an overview of tools and web servers for gene and protein-protein interaction (PPI) network analysis but also to extract useful and meaningful information from the interaction networks.
Collapse
Affiliation(s)
- Sravan Kumar Miryala
- Medical and Biological Computing Laboratory, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Anand Anbarasu
- Medical and Biological Computing Laboratory, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Sudha Ramaiah
- Medical and Biological Computing Laboratory, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
261
|
Nagamani S, Gaur AS, Tanneeru K, Muneeswaran G, Madugula SS, Consortium M, Druzhilovskiy D, Poroikov VV, Sastry GN. Molecular property diagnostic suite (MPDS): Development of disease-specific open source web portals for drug discovery. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:913-926. [PMID: 29206500 DOI: 10.1080/1062936x.2017.1402819] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Accepted: 11/06/2017] [Indexed: 06/07/2023]
Abstract
Molecular property diagnostic suite (MPDS) is a Galaxy-based open source drug discovery and development platform. MPDS web portals are designed for several diseases, such as tuberculosis, diabetes mellitus, and other metabolic disorders, specifically aimed to evaluate and estimate the drug-likeness of a given molecule. MPDS consists of three modules, namely data libraries, data processing, and data analysis tools which are configured and interconnected to assist drug discovery for specific diseases. The data library module encompasses vast information on chemical space, wherein the MPDS compound library comprises 110.31 million unique molecules generated from public domain databases. Every molecule is assigned with a unique ID and card, which provides complete information for the molecule. Some of the modules in the MPDS are specific to the diseases, while others are non-specific. Importantly, a suitably altered protocol can be effectively generated for another disease-specific MPDS web portal by modifying some of the modules. Thus, the MPDS suite of web portals shows great promise to emerge as disease-specific portals of great value, integrating chemoinformatics, bioinformatics, molecular modelling, and structure- and analogue-based drug discovery approaches.
Collapse
Affiliation(s)
- S Nagamani
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | - A S Gaur
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | - K Tanneeru
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | - G Muneeswaran
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | - S S Madugula
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| | | | | | - V V Poroikov
- b Institute of Biomedical Chemistry , Moscow , Russia
| | - G N Sastry
- a Centre for Molecular Modeling , CSIR-Indian Institute of Chemical Technology , Hyderabad , India
| |
Collapse
|
262
|
Zhong J, Wang J, Ding X, Zhang Z, Li M, Wu FX, Pan Y. Protein Inference from the Integration of Tandem MS Data and Interactome Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1399-1409. [PMID: 28113634 DOI: 10.1109/tcbb.2016.2601618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Since proteins are digested into a mixture of peptides in the preprocessing step of tandem mass spectrometry (MS), it is difficult to determine which specific protein a shared peptide belongs to. In recent studies, besides tandem MS data and peptide identification information, some other information is exploited to infer proteins. Different from the methods which first use only tandem MS data to infer proteins and then use network information to refine them, this study proposes a protein inference method named TMSIN, which uses interactome networks directly. As two interacting proteins should co-exist, it is reasonable to assume that if one of the interacting proteins is confidently inferred in a sample, its interacting partners should have a high probability in the same sample, too. Therefore, we can use the neighborhood information of a protein in an interactome network to adjust the probability that the shared peptide belongs to the protein. In TMSIN, a multi-weighted graph is constructed by incorporating the bipartite graph with interactome network information, where the bipartite graph is built with the peptide identification information. Based on multi-weighted graphs, TMSIN adopts an iterative workflow to infer proteins. At each iterative step, the probability that a shared peptide belongs to a specific protein is calculated by using the Bayes' law based on the neighbor protein support scores of each protein which are mapped by the shared peptides. We carried out experiments on yeast data and human data to evaluate the performance of TMSIN in terms of ROC, q-value, and accuracy. The experimental results show that AUC scores yielded by TMSIN are 0.742 and 0.874 in yeast dataset and human dataset, respectively, and TMSIN yields the maximum number of true positives when q-value less than or equal to 0.05. The overlap analysis shows that TMSIN is an effective complementary approach for protein inference.
Collapse
|
263
|
Patkar S, Magen A, Sharan R, Hannenhalli S. A network diffusion approach to inferring sample-specific function reveals functional changes associated with breast cancer. PLoS Comput Biol 2017; 13:e1005793. [PMID: 29190299 PMCID: PMC5708603 DOI: 10.1371/journal.pcbi.1005793] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 09/27/2017] [Indexed: 11/18/2022] Open
Abstract
Guilt-by-association codifies the empirical observation that a gene's function is informed by its neighborhood in a biological network. This would imply that when a gene's network context is altered, for instance in disease condition, so could be the gene's function. Although context-specific changes in biological networks have been explored, the potential changes they may induce on the functional roles of genes are yet to be characterized. Here we analyze, for the first time, the network-induced potential functional changes in breast cancer. Using transcriptomic samples for 1047 breast tumors and 110 healthy breast tissues from TCGA, we derive sample-specific protein interaction networks and assign sample-specific functions to genes via a diffusion strategy. Testing for significant changes in the inferred functions between normal and cancer samples, we find several functions to have significantly gained or lost genes in cancer, not due to differential expression of genes known to perform the function, but rather due to changes in the network topology. Our predicted functional changes are supported by mutational and copy number profiles in breast cancers. Our diffusion-based functional assignment provides a novel characterization of a tumor that is complementary to the standard approach based on functional annotation alone. Importantly, this characterization is effective in predicting patient survival, as well as in predicting several known histopathological subtypes of breast cancer.
Collapse
Affiliation(s)
- Sushant Patkar
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Assaf Magen
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
264
|
Alcaraz N, List M, Batra R, Vandin F, Ditzel HJ, Baumbach J. De novo pathway-based biomarker identification. Nucleic Acids Res 2017; 45:e151. [PMID: 28934488 PMCID: PMC5766193 DOI: 10.1093/nar/gkx642] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/13/2017] [Indexed: 02/07/2023] Open
Abstract
Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for the patients suffering from complex diseases, such as cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts to mitigate these drawbacks have led to the development of network-based approaches that integrate pathway information to produce meta-gene (MG) features. Also, MG approaches have only dealt with the two-class problem of good versus poor outcome prediction. Stratifying patients based on their molecular subtypes can provide a detailed view of the disease and lead to more personalized therapies. We propose and discuss a novel MG approach based on de novo pathways, which for the first time have been used as features in a multi-class setting to predict cancer subtypes. Comprehensive evaluation in a large cohort of breast cancer samples from The Cancer Genome Atlas (TCGA) revealed that MGs are considerably more stable than SG models, while also providing valuable insight into the cancer hallmarks that drive them. In addition, when tested on an independent benchmark non-TCGA dataset, MG features consistently outperformed SG models. We provide an easy-to-use web service at http://pathclass.compbio.sdu.dk where users can upload their own gene expression datasets from breast cancer studies and obtain the subtype predictions from all the classifiers.
Collapse
Affiliation(s)
- Nicolas Alcaraz
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark.,The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Markus List
- Computational Biology and Applied Algorithms, Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Richa Batra
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Munich, Germany.,Department of Dermatology and Allergy, Technical University of Munich, 80802 Munich, Germany
| | - Fabio Vandin
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Department of Information and Engineering, University of Padowa, 35122 Padowa, Italy
| | - Henrik J Ditzel
- Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, 5000 Odense, Denmark.,Department of Oncology, Odense University Hospital, 5000 Odense, Denmark
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark.,Computational Systems Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| |
Collapse
|
265
|
Carlin DE, Demchak B, Pratt D, Sage E, Ideker T. Network propagation in the cytoscape cyberinfrastructure. PLoS Comput Biol 2017; 13:e1005598. [PMID: 29023449 PMCID: PMC5638226 DOI: 10.1371/journal.pcbi.1005598] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/29/2017] [Indexed: 01/15/2023] Open
Abstract
Network propagation is an important and widely used algorithm in systems biology, with applications in protein function prediction, disease gene prioritization, and patient stratification. However, up to this point it has required significant expertise to run. Here we extend the popular network analysis program Cytoscape to perform network propagation as an integrated function. Such integration greatly increases the access to network propagation by putting it in the hands of biologists and linking it to the many other types of network analysis and visualization available through Cytoscape. We demonstrate the power and utility of the algorithm by identifying mutations conferring resistance to Vemurafenib.
Collapse
Affiliation(s)
- Daniel E. Carlin
- Department of Medicine, University of California-San Diego, San Diego, California, United States of America
- * E-mail:
| | - Barry Demchak
- Department of Medicine, University of California-San Diego, San Diego, California, United States of America
| | - Dexter Pratt
- Department of Medicine, University of California-San Diego, San Diego, California, United States of America
| | - Eric Sage
- Department of Medicine, University of California-San Diego, San Diego, California, United States of America
| | - Trey Ideker
- Department of Medicine, University of California-San Diego, San Diego, California, United States of America
| |
Collapse
|
266
|
Mezlini AM, Goldenberg A. Incorporating networks in a probabilistic graphical model to find drivers for complex human diseases. PLoS Comput Biol 2017; 13:e1005580. [PMID: 29023450 PMCID: PMC5638204 DOI: 10.1371/journal.pcbi.1005580] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 05/09/2017] [Indexed: 12/12/2022] Open
Abstract
Discovering genetic mechanisms driving complex diseases is a hard problem. Existing methods often lack power to identify the set of responsible genes. Protein-protein interaction networks have been shown to boost power when detecting gene-disease associations. We introduce a Bayesian framework, Conflux, to find disease associated genes from exome sequencing data using networks as a prior. There are two main advantages to using networks within a probabilistic graphical model. First, networks are noisy and incomplete, a substantial impediment to gene discovery. Incorporating networks into the structure of a probabilistic models for gene inference has less impact on the solution than relying on the noisy network structure directly. Second, using a Bayesian framework we can keep track of the uncertainty of each gene being associated with the phenotype rather than returning a fixed list of genes. We first show that using networks clearly improves gene detection compared to individual gene testing. We then show consistently improved performance of Conflux compared to the state-of-the-art diffusion network-based method Hotnet2 and a variety of other network and variant aggregation methods, using randomly generated and literature-reported gene sets. We test Hotnet2 and Conflux on several network configurations to reveal biases and patterns of false positives and false negatives in each case. Our experiments show that our novel Bayesian framework Conflux incorporates many of the advantages of the current state-of-the-art methods, while offering more flexibility and improved power in many gene-disease association scenarios. Networks and pathway-based methods are commonly used to improve the power of gene detection in associations with complex human diseases. Network diffusion approaches have shown their effectiveness and superior performance in cancer studies. Still, there are many problems such as noise and missingness with currently available human networks that bias the results of gene detection. We propose a novel graphical model-based method Conflux that overcomes several of the pitfalls of the existing state-of-the-art approaches while building on their successes. Conflux integrates genotype data with networks directly, using diffusion-like methods, but only as part of a structure in a probabilistic model to reduce the negative effect of the noise in the networks. This Bayesian framework allows Conflux to keep track of the uncertainty in the gene list that is being associated with the disease and consequently rank the genes with respect to our confidence in the association. It also allows for the discovery of gene sets that are not fully supported by the network if they have enough support in the data. These improvements result in a flexible approach that improves the power in many gene-disease association scenarios while reducing the number of false positives reported.
Collapse
Affiliation(s)
- Aziz M Mezlini
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| |
Collapse
|
267
|
Xi J, Wang M, Li A. DGPathinter: a novel model for identifying driver genes via knowledge-driven matrix factorization with prior knowledge from interactome and pathways. PEERJ COMPUTER SCIENCE 2017; 3:e133. [DOI: 10.7717/peerj-cs.133] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Cataloging mutated driver genes that confer a selective growth advantage for tumor cells from sporadic passenger mutations is a critical problem in cancer genomic research. Previous studies have reported that some driver genes are not highly frequently mutated and cannot be tested as statistically significant, which complicates the identification of driver genes. To address this issue, some existing approaches incorporate prior knowledge from an interactome to detect driver genes which may be dysregulated by interaction network context. However, altered operations of many pathways in cancer progression have been frequently observed, and prior knowledge from pathways is not exploited in the driver gene identification task. In this paper, we introduce a driver gene prioritization method called driver gene identification through pathway and interactome information (DGPathinter), which is based on knowledge-based matrix factorization model with prior knowledge from both interactome and pathways incorporated. When DGPathinter is applied on somatic mutation datasets of three types of cancers and evaluated by known driver genes, the prioritizing performances of DGPathinter are better than the existing interactome driven methods. The top ranked genes detected by DGPathinter are also significantly enriched for known driver genes. Moreover, most of the top ranked scored pathways given by DGPathinter are also cancer progression-associated pathways. These results suggest that DGPathinter is a useful tool to identify potential driver genes.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, China
- Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, China
| |
Collapse
|
268
|
Sandor C, Beer NL, Webber C. Diverse type 2 diabetes genetic risk factors functionally converge in a phenotype-focused gene network. PLoS Comput Biol 2017; 13:e1005816. [PMID: 29059180 PMCID: PMC5667928 DOI: 10.1371/journal.pcbi.1005816] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Revised: 11/02/2017] [Accepted: 10/11/2017] [Indexed: 12/14/2022] Open
Abstract
Type 2 Diabetes (T2D) constitutes a global health burden. Efforts to uncover predisposing genetic variation have been considerable, yet detailed knowledge of the underlying pathogenesis remains poor. Here, we constructed a T2D phenotypic-linkage network (T2D-PLN), by integrating diverse gene functional information that highlight genes, which when disrupted in mice, elicit similar T2D-relevant phenotypes. Sensitising the network to T2D-relevant phenotypes enabled significant functional convergence to be detected between genes implicated in monogenic or syndromic diabetes and genes lying within genomic regions associated with T2D common risk. We extended these analyses to a recent multiethnic T2D case-control exome of 12,940 individuals that found no evidence of T2D risk association for rare frequency variants outside of previously known T2D risk loci. Examining associations involving protein-truncating variants (PTV), most at low population frequencies, the T2D-PLN was able to identify a convergent set of biological pathways that were perturbed within four of five independent T2D case/control ethnic sets of 2000 to 5000 exomes each. These same pathways were found to be over-represented among both known monogenic or syndromic diabetes genes and genes within T2D-associated common risk loci. Our study demonstrates convergent biology amongst variants representing different classes of T2D genetic risk. Although convergence was observed at the pathway level, few of the contributing genes were found in common between different cohorts or variant classes, most notably between the exome variant sets which suggests that future rare variant studies may be better focusing their power onto a single population of recent common ancestry.
Collapse
Affiliation(s)
- Cynthia Sandor
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicola L. Beer
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Caleb Webber
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
269
|
Frasca M. Gene2DisCo: Gene to disease using disease commonalities. Artif Intell Med 2017; 82:34-46. [DOI: 10.1016/j.artmed.2017.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 07/24/2017] [Accepted: 08/13/2017] [Indexed: 01/10/2023]
|
270
|
Ye Y, Gao L, Zhang S. Integrative Analysis of Transcription Factor Combinatorial Interactions Using a Bayesian Tensor Factorization Approach. Front Genet 2017; 8:140. [PMID: 29033978 PMCID: PMC5625019 DOI: 10.3389/fgene.2017.00140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 09/15/2017] [Indexed: 11/13/2022] Open
Abstract
Transcription factors play a key role in transcriptional regulation of genes and determination of cellular identity through combinatorial interactions. However, current studies about combinatorial regulation is deficient due to lack of experimental data in the same cellular environment and extensive existence of data noise. Here, we adopt a Bayesian CANDECOMP/PARAFAC (CP) factorization approach (BCPF) to integrate multiple datasets in a network paradigm for determining precise TF interaction landscapes. In our first application, we apply BCPF to integrate three networks built based on diverse datasets of multiple cell lines from ENCODE respectively to predict a global and precise TF interaction network. This network gives 38 novel TF interactions with distinct biological functions. In our second application, we apply BCPF to seven types of cell type TF regulatory networks and predict seven cell lineage TF interaction networks, respectively. By further exploring the dynamics and modularity of them, we find cell lineage-specific hub TFs participate in cell type or lineage-specific regulation by interacting with non-specific TFs. Furthermore, we illustrate the biological function of hub TFs by taking those of cancer lineage and blood lineage as examples. Taken together, our integrative analysis can reveal more precise and extensive description about human TF combinatorial interactions.
Collapse
Affiliation(s)
- Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
271
|
Nikolayeva I, Guitart Pla O, Schwikowski B. Network module identification-A widespread theoretical bias and best practices. Methods 2017; 132:19-25. [PMID: 28941788 DOI: 10.1016/j.ymeth.2017.08.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Revised: 08/14/2017] [Accepted: 08/18/2017] [Indexed: 10/18/2022] Open
Abstract
Biological processes often manifest themselves as coordinated changes across modules, i.e., sets of interacting genes. Commonly, the high dimensionality of genome-scale data prevents the visual identification of such modules, and straightforward computational search through a set of known pathways is a limited approach. Therefore, tools for the data-driven, computational, identification of modules in gene interaction networks have become popular components of visualization and visual analytics workflows. However, many such tools are known to result in modules that are large, and therefore hard to interpret biologically. Here, we show that the empirically known tendency towards large modules can be attributed to a statistical bias present in many module identification tools, and discuss possible remedies from a mathematical perspective. In the current absence of a straightforward practical solution, we outline our view of best practices for the use of the existing tools.
Collapse
Affiliation(s)
- Iryna Nikolayeva
- Systems Biology Lab, Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France; Functional Genetics of Infectious Diseases Unit, Department Genomes and Genetics, Institut Pasteur, Paris, France; Université Paris-Descartes, Sorbonne Paris Cité, Paris, France
| | - Oriol Guitart Pla
- Systems Biology Lab, Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Benno Schwikowski
- Systems Biology Lab, Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France.
| |
Collapse
|
272
|
Hu Y, Zhao L, Liu Z, Ju H, Shi H, Xu P, Wang Y, Cheng L. DisSetSim: an online system for calculating similarity between disease sets. J Biomed Semantics 2017; 8:28. [PMID: 29297411 PMCID: PMC5763469 DOI: 10.1186/s13326-017-0140-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Functional similarity between molecules results in similar phenotypes, such as diseases. Therefore, it is an effective way to reveal the function of molecules based on their induced diseases. However, the lack of a tool for obtaining the similarity score of pair-wise disease sets (SSDS) limits this type of application. Results Here, we introduce DisSetSim, an online system to solve this problem in this article. Five state-of-the-art methods involving Resnik’s, Lin’s, Wang’s, PSB, and SemFunSim methods were implemented to measure the similarity score of pair-wise diseases (SSD) first. And then “pair-wise-best pairs-average” (PWBPA) method was implemented to calculated the SSDS by the SSD. The system was applied for calculating the functional similarity of miRNAs based on their induced disease sets. The results were further used to predict potential disease-miRNA relationships. Conclusions The high area under the receiver operating characteristic curve AUC (0.9296) based on leave-one-out cross validation shows that the PWBPA method achieves a high true positive rate and a low false positive rate. The system can be accessed from http://www.bio-annotation.cn:8080/DisSetSim/.
Collapse
Affiliation(s)
- Yang Hu
- Harbin Institute of Technology, School of Life Science and Technology, Harbin, 150001, People's Republic of China
| | - Lingling Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhiyan Liu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Hong Ju
- Department of information engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, 150001, People's Republic of China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China
| | - Peigang Xu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China.
| |
Collapse
|
273
|
Zhao Y, Song WM, Zhang F, Zhou MM, Zhang W, Walsh MJ, Zhang B. Distinct distributions of genomic features of the 5’ and 3’ partners of coding somatic cancer gene fusions: arising mechanisms and functional implications. Oncotarget 2017; 8:66769-66783. [PMID: 28977995 PMCID: PMC5620135 DOI: 10.18632/oncotarget.10734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Accepted: 06/06/2016] [Indexed: 11/25/2022] Open
Abstract
The genomic features and arising mechanisms of coding cancer somatic gene fusions (CSGFs) largely remain elusive. In this study, we show the gene origin stratification pattern of CSGF partners that fusion partners in human cancers are significantly enriched for genes with the gene age ofEuteleostomes and with the gene family age of Bilateria. GC skew (a measurement of G, C nucleotide content bias, (G-C)/(G+C)) is a useful measurement to indicate the DNA leading strand, lagging strand, replication origin, and replication terminal and DNA-RNA R-loop formation. We find that GC skew bias at the 5 prime (5′) but not the 3 prime (3’) partners of CSGFs, coincident with the polarity feature of gene expression breadth that the 5’ partners are more ubiquitous while the 3’ fusion partners are more tissue specific in general. We reveal distinct length and composition distributions of 5’ and 3’ of CSGFs, including sequence features corresponded to the 5’ untranslated regions (UTRs), 3’ UTRs, and the N-terminal sequences of the encoded proteins. Oncogenic somatic gene fusions are most enriched for the 5’ and 3’ genes’ somatic amplification alongside a substantial proportion of other types of combinations. At the function level, 5’ partners of CSGFs appear more likely to be tumour suppressor genes while many 3’ partners appear to be proto-oncogene. Such distinct polarities of CSGFs at the evolutionary, structural, genomic and functional levels indicate the heterogeneous arsing mechanisms of CSGFs including R-loops and suggest potential novel targeted therapeutics specific to CSGF functional categories.
Collapse
Affiliation(s)
- Yongzhong Zhao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY 10029, USA
- Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| | - Won-Min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY 10029, USA
- Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| | - Fan Zhang
- Department of Medicine, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| | - Ming-Ming Zhou
- Department of Structural and Chemical Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| | - Weijia Zhang
- Department of Medicine, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| | - Martin J. Walsh
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY 10029, USA
- Department of Structural and Chemical Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY 10029, USA
- Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA
| |
Collapse
|
274
|
Xiao Q, Luo J, Liang C, Cai J, Ding P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 2017; 34:239-248. [PMID: 28968779 DOI: 10.1093/bioinformatics/btx545] [Citation(s) in RCA: 178] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 07/21/2017] [Accepted: 08/31/2017] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION MicroRNAs (miRNAs) play crucial roles in post-transcriptional regulations and various cellular processes. The identification of disease-related miRNAs provides great insights into the underlying pathogenesis of diseases at a system level. However, most existing computational approaches are biased towards known miRNA-disease associations, which is inappropriate for those new diseases or miRNAs without any known association information. RESULTS In this study, we propose a new method with graph regularized non-negative matrix factorization in heterogeneous omics data, called GRNMF, to discover potential associations between miRNAs and diseases, especially for new diseases and miRNAs or those diseases and miRNAs with sparse known associations. First, we integrate the disease semantic information and miRNA functional information to estimate disease similarity and miRNA similarity, respectively. Considering that there is no available interaction observed for new diseases or miRNAs, a preprocessing step is developed to construct the interaction score profiles that will assist in prediction. Next, a graph regularized non-negative matrix factorization framework is utilized to simultaneously identify potential associations for all diseases. The results indicated that our proposed method can effectively prioritize disease-associated miRNAs with higher accuracy compared with other recent approaches. Moreover, case studies also demonstrated the effectiveness of GRNMF to infer unknown miRNA-disease associations for those novel diseases and miRNAs. AVAILABILITY AND IMPLEMENTATION The code of GRNMF is freely available at https://github.com/XIAO-HN/GRNMF/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Cheng Liang
- College of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jie Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
275
|
Zhang Q, Li J, Wang D, Wang Y. Finding disagreement pathway signatures and constructing an ensemble model for cancer classification. Sci Rep 2017; 7:10044. [PMID: 28855608 PMCID: PMC5577098 DOI: 10.1038/s41598-017-10258-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 08/07/2017] [Indexed: 12/02/2022] Open
Abstract
Cancer classification based on molecular level is a relatively routine research procedure with advances in high-throughput molecular profiling techniques. However, the number of genes typically far exceeds the number of the sample size in gene expression studies. The existing gene selection methods are almost based on statistics and machine learning, overlooking relevant biological principles or knowledge while working with biological data. Here, we propose a robust ensemble learning paradigm, which incorporates multiple pathways information, to predict cancer classification. We compare the proposed method with other methods, such as Elastic SCAD and PPDMF, and estimate the classification performance. The results show that the proposed method has the higher performances on most metrics and robust performance. We further investigate the biological mechanism of the ensemble feature genes. The results demonstrate that the ensemble feature genes are associated with drug targets/clinically-relevant cancer. In addition, some core biological pathways and biological process underlying clinically-relevant phenotypes are identified by function annotation. Overall, our research can provide a new perspective for the further study of molecular activities and manifestations of cancer.
Collapse
Affiliation(s)
- Qiaosheng Zhang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, 150001, P.R. China
- Heilongjiang Bayi Agricultural University, College of Science, Daqing, 163319, P.R. China
| | - Jie Li
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, 150001, P.R. China.
| | - Dong Wang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, 150001, P.R. China
| | - Yadong Wang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, 150001, P.R. China
| |
Collapse
|
276
|
Feiglin A, Allen BK, Kohane IS, Kong SW. Comprehensive Analysis of Tissue-wide Gene Expression and Phenotype Data Reveals Tissues Affected in Rare Genetic Disorders. Cell Syst 2017; 5:140-148.e2. [PMID: 28822752 PMCID: PMC5928498 DOI: 10.1016/j.cels.2017.06.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 03/21/2017] [Accepted: 06/29/2017] [Indexed: 01/23/2023]
Abstract
Linking putatively pathogenic variants to the tissues they affect is necessary for determining the correct diagnostic workup and therapeutic regime in undiagnosed patients. Here, we explored how gene expression across healthy tissues can be used to infer this link. We integrated 6,665 tissue-wide transcriptomes with genetic disorder knowledge bases covering 3,397 diseases. Receiver-operating characteristics (ROC) analysis using expression levels in each tissue and across tissues indicated significant but modest associations between elevated expression and phenotype for most tissues (maximum area under ROC curve = 0.69). At extreme elevation, associations were marked. Upregulation of disease genes in affected tissues was pronounced for genes associated with autosomal dominant over recessive disorders. Pathways enriched for genes expressed and associated with phenotypes highlighted tissue functionality, including lipid metabolism in spleen and DNA repair in adipose tissue. These results suggest features useful for evaluating the likelihood of particular tissue manifestations in genetic disorders. The web address of an interactive platform integrating these data is provided.
Collapse
Affiliation(s)
- Ariel Feiglin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Bryce K Allen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
277
|
Peng J, Lu J, Shang X, Chen J. Identifying consistent disease subnetworks using DNet. Methods 2017; 131:104-110. [PMID: 28807723 DOI: 10.1016/j.ymeth.2017.07.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 07/25/2017] [Accepted: 07/26/2017] [Indexed: 12/12/2022] Open
Abstract
It is critical to identify disease-specific subnetworks from the vastly available genome-wide gene expression data for elucidating how genes perform high-level biological functions together. Various algorithms have been developed for disease gene identification. However, the topological structure of the disease networks (or even the fraction of the networks) has been left largely unexplored. In this article, we present DNet, a method for the identification of significant disease subnetworks by integrating both the network structure and gene expression information. Our work will lead to the identification of missing key disease genes, which are be highly expressed in a disease-specific gene expression dataset. The experimental evaluation of our method on both the Leukemia and the Duchenne Muscular Dystrophy gene expression datasets show that DNet performs better than the existing state-of-the-art methods. In addition, literature supports were found for the discovered disease subnetworks in a case study.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, USA; Department of Internal Medicine, University of Kentucky, Lexington, USA; Department of Computer Science, University of Kentucky, Lexington, USA.
| |
Collapse
|
278
|
Climente-González H, Porta-Pardo E, Godzik A, Eyras E. The Functional Impact of Alternative Splicing in Cancer. Cell Rep 2017; 20:2215-2226. [DOI: 10.1016/j.celrep.2017.08.012] [Citation(s) in RCA: 387] [Impact Index Per Article: 48.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 07/15/2017] [Accepted: 07/26/2017] [Indexed: 12/29/2022] Open
|
279
|
Wang JY, Chen LL, Zhou XH. Identifying prognostic signature in ovarian cancer using DirGenerank. Oncotarget 2017; 8:46398-46413. [PMID: 28615526 PMCID: PMC5542276 DOI: 10.18632/oncotarget.18189] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 04/26/2017] [Indexed: 12/27/2022] Open
Abstract
Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes' correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients.
Collapse
Affiliation(s)
- Jian-Yong Wang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Ling-Ling Chen
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Xiong-Hui Zhou
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
280
|
Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I. GWAB: a web server for the network-based boosting of human genome-wide association data. Nucleic Acids Res 2017; 45:W154-W161. [PMID: 28449091 PMCID: PMC5793838 DOI: 10.1093/nar/gkx284] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Revised: 04/01/2017] [Accepted: 04/17/2017] [Indexed: 12/29/2022] Open
Abstract
During the last decade, genome-wide association studies (GWAS) have represented a major approach to dissect complex human genetic diseases. Due in part to limited statistical power, most studies identify only small numbers of candidate genes that pass the conventional significance thresholds (e.g. P ≤ 5 × 10-8). This limitation can be partly overcome by increasing the sample size, but this comes at a higher cost. Alternatively, weak association signals can be boosted by incorporating independent data. Previously, we demonstrated the feasibility of boosting GWAS disease associations using gene networks. Here, we present a web server, GWAB (www.inetbio.org/gwab), for the network-based boosting of human GWAS data. Using GWAS summary statistics (P-values) for SNPs along with reference genes for a disease of interest, GWAB reprioritizes candidate disease genes by integrating the GWAS and network data. We found that GWAB could more effectively retrieve disease-associated reference genes than GWAS could alone. As an example, we describe GWAB-boosted candidate genes for coronary artery disease and supporting data in the literature. These results highlight the inherent value in sub-threshold GWAS associations, which are often not publicly released. GWAB offers a feasible general approach to boost such associations for human disease genetics.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Changbae Bang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - Sohyun Hwang
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| | - U Martin Singh-Blom
- Cognition Group, Schibsted Products & Technologies, Västra Järnvägsgatan 21, 111 64 Stockholm, Sweden
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA
- Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 120-749, Korea
| |
Collapse
|
281
|
Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:905-915. [PMID: 27076459 DOI: 10.1109/tcbb.2016.2550432] [Citation(s) in RCA: 209] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Since the discovery of the regulatory function of microRNA (miRNA), increased attention has focused on identifying the relationship between miRNA and disease. It has been suggested that computational method are an efficient way to identify potential disease-related miRNAs for further confirmation using biological experiments. In this paper, we first highlighted three limitations commonly associated with previous computational methods. To resolve these limitations, we established disease similarity subnetwork and miRNA similarity subnetwork by integrating multiple data sources, where the disease similarity is composed of disease semantic similarity and disease functional similarity, and the miRNA similarity is calculated using the miRNA-target gene and miRNA-lncRNA (long non-coding RNA) associations. Then, a heterogeneous network was constructed by connecting the disease similarity subnetwork and the miRNA similarity subnetwork using the known miRNA-disease associations. We extended random walk with restart to predict miRNA-disease associations in the heterogeneous network. The leave-one-out cross-validation achieved an average area under the curve (AUC) of 0:8049 across 341 diseases and 476 miRNAs. For five-fold cross-validation, our method achieved an AUC from 0:7970 to 0:9249 for 15 human diseases. Case studies further demonstrated the feasibility of our method to discover potential miRNA-disease associations. An online service for prediction is freely available at http://ifmda.aliapp.com.
Collapse
|
282
|
Abstract
Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery - network propagation - suggesting that it is a powerful data transformation method of broad utility in genetic research.
Collapse
|
283
|
Drew K, Lee C, Huizar RL, Tu F, Borgeson B, McWhite CD, Ma Y, Wallingford JB, Marcotte EM. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol Syst Biol 2017; 13:932. [PMID: 28596423 PMCID: PMC5488662 DOI: 10.15252/msb.20167490] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Macromolecular protein complexes carry out many of the essential functions of cells, and many genetic diseases arise from disrupting the functions of such complexes. Currently, there is great interest in defining the complete set of human protein complexes, but recent published maps lack comprehensive coverage. Here, through the synthesis of over 9,000 published mass spectrometry experiments, we present hu.MAP, the most comprehensive and accurate human protein complex map to date, containing > 4,600 total complexes, > 7,700 proteins, and > 56,000 unique interactions, including thousands of confident protein interactions not identified by the original publications. hu.MAP accurately recapitulates known complexes withheld from the learning procedure, which was optimized with the aid of a new quantitative metric (k‐cliques) for comparing sets of sets. The vast majority of complexes in our map are significantly enriched with literature annotations, and the map overall shows improved coverage of many disease‐associated proteins, as we describe in detail for ciliopathies. Using hu.MAP, we predicted and experimentally validated candidate ciliopathy disease genes in vivo in a model vertebrate, discovering CCDC138, WDR90, and KIAA1328 to be new cilia basal body/centriolar satellite proteins, and identifying ANKRD55 as a novel member of the intraflagellar transport machinery. By offering significant improvements to the accuracy and coverage of human protein complexes, hu.MAP (http://proteincomplexes.org) serves as a valuable resource for better understanding the core cellular functions of human proteins and helping to determine mechanistic foundations of human disease.
Collapse
Affiliation(s)
- Kevin Drew
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA
| | - Chanjae Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Ryan L Huizar
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Fan Tu
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Blake Borgeson
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Claire D McWhite
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Yun Ma
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.,The Otolaryngology Hospital, The First Affiliated Hospital of Sun Yat-sen University Sun Yat-sen University, Guangzhou, China
| | - John B Wallingford
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA .,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
284
|
A novel network regularized matrix decomposition method to detect mutated cancer genes in tumour samples with inter-patient heterogeneity. Sci Rep 2017; 7:2855. [PMID: 28588243 PMCID: PMC5460199 DOI: 10.1038/s41598-017-03141-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 04/20/2017] [Indexed: 01/01/2023] Open
Abstract
Inter-patient heterogeneity is a major challenge for mutated cancer genes detection which is crucial to advance cancer diagnostics and therapeutics. To detect mutated cancer genes in heterogeneous tumour samples, a prominent strategy is to determine whether the genes are recurrently mutated in their interaction network context. However, recent studies show that some cancer genes in different perturbed pathways are mutated in different subsets of samples. Subsequently, these genes may not display significant mutational recurrence and thus remain undiscovered even in consideration of network information. We develop a novel method called mCGfinder to efficiently detect mutated cancer genes in tumour samples with inter-patient heterogeneity. Based on matrix decomposition framework incorporated with gene interaction network information, mCGfinder can successfully measure the significance of mutational recurrence of genes in a subset of samples. When applying mCGfinder on TCGA somatic mutation datasets of five types of cancers, we find that the genes detected by mCGfinder are significantly enriched for known cancer genes, and yield substantially smaller p-values than other existing methods. All the results demonstrate that mCGfinder is an efficient method in detecting mutated cancer genes.
Collapse
|
285
|
Le Morvan M, Zinovyev A, Vert JP. NetNorM: Capturing cancer-relevant information in somatic exome mutation data with gene networks for cancer stratification and prognosis. PLoS Comput Biol 2017; 13:e1005573. [PMID: 28650955 PMCID: PMC5507468 DOI: 10.1371/journal.pcbi.1005573] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Revised: 07/11/2017] [Accepted: 05/15/2017] [Indexed: 01/01/2023] Open
Abstract
Genome-wide somatic mutation profiles of tumours can now be assessed efficiently and promise to move precision medicine forward. Statistical analysis of mutation profiles is however challenging due to the low frequency of most mutations, the varying mutation rates across tumours, and the presence of a majority of passenger events that hide the contribution of driver events. Here we propose a method, NetNorM, to represent whole-exome somatic mutation data in a form that enhances cancer-relevant information using a gene network as background knowledge. We evaluate its relevance for two tasks: survival prediction and unsupervised patient stratification. Using data from 8 cancer types from The Cancer Genome Atlas (TCGA), we show that it improves over the raw binary mutation data and network diffusion for these two tasks. In doing so, we also provide a thorough assessment of somatic mutations prognostic power which has been overlooked by previous studies because of the sparse and binary nature of mutations.
Collapse
Affiliation(s)
- Marine Le Morvan
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75006 Paris, France
- Institut Curie, 75248 Paris Cedex 5, France
- INSERM, U900, 75248 Paris Cedex 5, France
| | - Andrei Zinovyev
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75006 Paris, France
- Institut Curie, 75248 Paris Cedex 5, France
- INSERM, U900, 75248 Paris Cedex 5, France
| | - Jean-Philippe Vert
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, 75006 Paris, France
- Institut Curie, 75248 Paris Cedex 5, France
- INSERM, U900, 75248 Paris Cedex 5, France
- Department of Mathematics and Applications, Ecole normale supérieure, CNRS, PSL Research University, 75005 Paris, France
| |
Collapse
|
286
|
Ko YA, Yi H, Qiu C, Huang S, Park J, Ledo N, Köttgen A, Li H, Rader DJ, Pack MA, Brown CD, Susztak K. Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease. Am J Hum Genet 2017; 100:940-953. [PMID: 28575649 DOI: 10.1016/j.ajhg.2017.05.004] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 05/05/2017] [Indexed: 01/22/2023] Open
Abstract
Chronic kidney disease (CKD) is a complex gene-environmental disease affecting close to 10% of the US population. Genome-wide association studies (GWASs) have identified sequence variants, localized to non-coding genomic regions, associated with kidney function. Despite these robust observations, the mechanism by which variants lead to CKD remains a critical unanswered question. Expression quantitative trait loci (eQTL) analysis is a method to identify genetic variation associated with gene expression changes in specific tissue types. We hypothesized that an integrative analysis combining CKD GWAS and kidney eQTL results can identify candidate genes for CKD. We performed eQTL analysis by correlating genotype with RNA-seq-based gene expression levels in 96 human kidney samples. Applying stringent statistical criteria, we detected 1,886 genes whose expression differs with the sequence variants. Using direct overlap and Bayesian methods, we identified new potential target genes for CKD. With respect to one of the target genes, lysosomal beta A mannosidase (MANBA), we observed that genetic variants associated with MANBA expression in the kidney showed statistically significant colocalization with variants identified in CKD GWASs, indicating that MANBA is a potential target gene for CKD. The expression of MANBA was significantly lower in kidneys of subjects with risk alleles. Suppressing manba expression in zebrafish resulted in renal tubule defects and pericardial edema, phenotypes typically induced by kidney dysfunction. Our analysis shows that gene-expression changes driven by genetic variation in the kidney can highlight potential new target genes for CKD development.
Collapse
|
287
|
He Z, Zhang J, Yuan X, Liu Z, Liu B, Tuo S, Liu Y. Network based stratification of major cancers by integrating somatic mutation and gene expression data. PLoS One 2017; 12:e0177662. [PMID: 28520777 PMCID: PMC5433734 DOI: 10.1371/journal.pone.0177662] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 05/01/2017] [Indexed: 11/20/2022] Open
Abstract
The stratification of cancer into subtypes that are significantly associated with clinical outcomes is beneficial for targeted prognosis and treatment. In this study, we integrated somatic mutation and gene expression data to identify clusters of patients. In contrast to previous studies, we constructed cancer-type-specific significant co-expression networks (SCNs) rather than using a fixed gene network across all cancers, such as the network-based stratification (NBS) method, which ignores cancer heterogeneity. For each type of cancer, the gene expression data were used to construct the SCN network, while the gene somatic mutation data were mapped onto the network, propagated, and used for further clustering. For the clustering, we adopted an improved network-regularized non-negative matrix factorization (netNMF) (netNMF_HC) for a more precise classification. We applied our method to various datasets, including ovarian cancer (OV), lung adenocarcinoma (LUAD) and uterine corpus endometrial carcinoma (UCEC) cohorts derived from the TCGA (The Cancer Genome Atlas) project. Based on the results, we evaluated the performance of our method to identify survival-relevant subtypes and further compared it to the NBS method, which adopts priori networks and netNMF algorithm. The proposed algorithm outperformed the NBS method in identifying informative cancer subtypes that were significantly associated with clinical outcomes in most cancer types we studied. In particular, our method identified survival-associated UCEC subtypes that were not identified by the NBS method. Our analysis indicated valid subtyping of patient could be applied by mutation data with cancer-type-specific SCNs and netNMF_HC for individual cancers because of specific cancer co-expression patterns and more precise clustering.
Collapse
Affiliation(s)
- Zongzhen He
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
- * E-mail:
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Zhaowen Liu
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Baobao Liu
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Shouheng Tuo
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Yajun Liu
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| |
Collapse
|
288
|
Zeng X, Liao Y, Liu Y, Zou Q. Prediction and Validation of Disease Genes Using HeteSim Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:687-695. [PMID: 26890920 DOI: 10.1109/tcbb.2016.2520947] [Citation(s) in RCA: 152] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp.
Collapse
|
289
|
Song L, Huang SSC, Wise A, Castanon R, Nery JR, Chen H, Watanabe M, Thomas J, Bar-Joseph Z, Ecker JR. A transcription factor hierarchy defines an environmental stress response network. Science 2017; 354:354/6312/aag1550. [PMID: 27811239 PMCID: PMC5217750 DOI: 10.1126/science.aag1550] [Citation(s) in RCA: 339] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 09/28/2016] [Indexed: 12/17/2022]
Abstract
Environmental stresses are universally encountered by microbes, plants, and animals. Yet systematic studies of stress-responsive transcription factor (TF) networks in multicellular organisms have been limited. The phytohormone abscisic acid (ABA) influences the expression of thousands of genes, allowing us to characterize complex stress-responsive regulatory networks. Using chromatin immunoprecipitation sequencing, we identified genome-wide targets of 21 ABA-related TFs to construct a comprehensive regulatory network in Arabidopsis thaliana Determinants of dynamic TF binding and a hierarchy among TFs were defined, illuminating the relationship between differential gene expression patterns and ABA pathway feedback regulation. By extrapolating regulatory characteristics of observed canonical ABA pathway components, we identified a new family of transcriptional regulators modulating ABA and salt responsiveness and demonstrated their utility to modulate plant resilience to osmotic stress.
Collapse
Affiliation(s)
- Liang Song
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Shao-Shan Carol Huang
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Aaron Wise
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rosa Castanon
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Joseph R Nery
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Huaming Chen
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Marina Watanabe
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Jerushah Thomas
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joseph R Ecker
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA. .,Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.,Howard Hughes Medical Institute (HHMI), Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| |
Collapse
|
290
|
Guala D, Sonnhammer ELL. A large-scale benchmark of gene prioritization methods. Sci Rep 2017; 7:46598. [PMID: 28429739 PMCID: PMC5399445 DOI: 10.1038/srep46598] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/22/2017] [Indexed: 11/16/2022] Open
Abstract
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Collapse
Affiliation(s)
- Dimitri Guala
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
291
|
Jang K, Kim K, Cho A, Lee I, Choi JK. Network perturbation by recurrent regulatory variants in cancer. PLoS Comput Biol 2017; 13:e1005449. [PMID: 28333928 PMCID: PMC5383347 DOI: 10.1371/journal.pcbi.1005449] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 04/06/2017] [Accepted: 03/10/2017] [Indexed: 12/12/2022] Open
Abstract
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. Identifying driver variants is a current challenge facing cancer genomics. A well-established and robust method for this is to find recurrence in large cohorts of samples. Recurrence patterns of amino acid-changing variants can reveal oncogenes and tumor suppressor genes. However, such single-gene approaches have limitations because of rare variants. Therefore, recurrently affected protein complexes, network modules, or signaling pathways have been identified based on network-level recurrence. Here we dissect chromatin interactome to identify cis-regulatory variants that show high gene-level recurrence. We then employ the gene regulatory network and protein interactome to characterize putative cancer genes with cis-regulatory variant recurrence. These genes were located at critical positions in the regulatory network. By contrast, they are at the circumference in the protein interactome; instead, they form a network module with coding cancer genes located at hub positions. Furthermore, the coding and regulatory variants associated via these interactions showed exclusive and complementary occurrence patterns across tumor samples. Therefore, we suggest that transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes.
Collapse
Affiliation(s)
- Kiwon Jang
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Kwoneel Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Ara Cho
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
- * E-mail:
| |
Collapse
|
292
|
Luck K, Sheynkman GM, Zhang I, Vidal M. Proteome-Scale Human Interactomics. Trends Biochem Sci 2017; 42:342-354. [PMID: 28284537 DOI: 10.1016/j.tibs.2017.02.006] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 02/10/2017] [Accepted: 02/16/2017] [Indexed: 01/28/2023]
Abstract
Cellular functions are mediated by complex interactome networks of physical, biochemical, and functional interactions between DNA sequences, RNA molecules, proteins, lipids, and small metabolites. A thorough understanding of cellular organization requires accurate and relatively complete models of interactome networks at proteome scale. The recent publication of four human protein-protein interaction (PPI) maps represents a technological breakthrough and an unprecedented resource for the scientific community, heralding a new era of proteome-scale human interactomics. Our knowledge gained from these and complementary studies provides fresh insights into the opportunities and challenges when analyzing systematically generated interactome data, defines a clear roadmap towards the generation of a first reference interactome, and reveals new perspectives on the organization of cellular life.
Collapse
Affiliation(s)
- Katja Luck
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
| | - Ivy Zhang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
293
|
Yu Y, Zuo X, He M, Gao J, Fu Y, Qin C, Meng L, Wang W, Song Y, Cheng Y, Zhou F, Chen G, Zheng X, Wang X, Liang B, Zhu Z, Fu X, Sheng Y, Hao J, Liu Z, Yan H, Mangold E, Ruczinski I, Liu J, Marazita ML, Ludwig KU, Beaty TH, Zhang X, Sun L, Bian Z. Genome-wide analyses of non-syndromic cleft lip with palate identify 14 novel loci and genetic heterogeneity. Nat Commun 2017; 8:14364. [PMID: 28232668 PMCID: PMC5333091 DOI: 10.1038/ncomms14364] [Citation(s) in RCA: 191] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 12/20/2016] [Indexed: 01/08/2023] Open
Abstract
Non-syndromic cleft lip with palate (NSCLP) is the most serious sub-phenotype of non-syndromic orofacial clefts (NSOFC), which are the most common craniofacial birth defects in humans. Here we conduct a GWAS of NSCLP with multiple independent replications, totalling 7,404 NSOFC cases and 16,059 controls from several ethnicities, to identify new NSCLP risk loci, and explore the genetic heterogeneity between sub-phenotypes of NSOFC. We identify 41 SNPs within 26 loci that achieve genome-wide significance, 14 of which are novel (RAD54B, TMEM19, KRT18, WNT9B, GSC/DICER1, PTCH1, RPS26, OFCC1/TFAP2A, TAF1B, FGF10, MSX1, LINC00640, FGFR1 and SPRY1). These 26 loci collectively account for 10.94% of the heritability for NSCLP in Chinese population. We find evidence of genetic heterogeneity between the sub-phenotypes of NSOFC and among different populations. This study substantially increases the number of genetic susceptibility loci for NSCLP and provides important insights into the genetic aetiology of this common craniofacial malformation. Non-syndromic cleft lip with palate is a common birth defect of unknown aetiology. Here, the authors discover 14 new genes associated with this condition, and show genetic heterogeneity in this and other non-syndromic orofacial clefting disorders.
Collapse
Affiliation(s)
- Yanqin Yu
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Xianbo Zuo
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Miao He
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China.,Department of Pediatric Dentistry, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Jinping Gao
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Yuchuan Fu
- Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Chuanqi Qin
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China.,Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Liuyan Meng
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Wenjun Wang
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Yaling Song
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Yong Cheng
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Fusheng Zhou
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Gang Chen
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Xiaodong Zheng
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Xinhuan Wang
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| | - Bo Liang
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Zhengwei Zhu
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Xiazhou Fu
- Department of Genetics and Centre for Developmental Biology, College of Life Science, Wuhan University, Wuhan, Hubei 430072, China
| | - Yujun Sheng
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Jiebing Hao
- The Second Charity Hospital of Henan Province, Jiaozuo, Henan 454000, China
| | - Zhongyin Liu
- Stomatological Hospital of Nanyang, Nanyang, Henan 473013, China
| | - Hansong Yan
- Stomatological Hospital of Xiangyang, Xiangyang, Hubei 441011, China
| | - Elisabeth Mangold
- Institute of Human Genetics, Life and Brain Center, University of Bonn, 53127 Bonn, Germany
| | - Ingo Ruczinski
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | - Jianjun Liu
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China
| | - Mary L Marazita
- Department of Oral Biology and Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15219, USA.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA.,Clinical and Translational Science, Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | - Kerstin U Ludwig
- Institute of Human Genetics, Life and Brain Center, University of Bonn, 53127 Bonn, Germany.,Department of Genomics, Life and Brain Center, University of Bonn, 53127 Bonn, Germany
| | - Terri H Beaty
- Department of Epidemiology, School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA
| | - Xuejun Zhang
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China.,Department of Dermatology at No. 2 Hospital, Anhui Medical University, Hefei, Anhui 230022, China.,Institute of Dermatology and Department of Dermatology, Huashan Hospital of Fudan University, Shanghai 200040, China
| | - Liangdan Sun
- Institute of Dermatology and Department of Dermatology at No. 1 Hospital, Anhui Medical University, Hefei, Anhui 230032, China.,State Key Lab Incubation of Dermatology, Ministry of Science and Technology, Hefei, China.,Key Lab of Dermatology, Ministry of Education, Heifei, China.,Key Lab of Gene Resources Utilization for Severe Inherited Disorders, Anhui 230032, China.,Collaborative Innovation Center of Complex and Severe skin Disease, Anhui Medical University, Hefei, Anhui 230032, China.,The Key Laboratory of Major Autoimmune Diseases, Anhui Province, Anhui 230032, China
| | - Zhuan Bian
- The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
| |
Collapse
|
294
|
Abstract
Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.
Collapse
|
295
|
Yang J, Yang T, Wu D, Lin L, Yang F, Zhao J. The integration of weighted human gene association networks based on link prediction. BMC SYSTEMS BIOLOGY 2017; 11:12. [PMID: 28137253 PMCID: PMC5282786 DOI: 10.1186/s12918-017-0398-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 01/25/2017] [Indexed: 12/27/2022]
Abstract
Background Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Results Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. Conclusions The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0398-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Duzhi Wu
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Limei Lin
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Fan Yang
- Department of Mathematics, Logistical Engineering University, Chongqing, China
| | - Jing Zhao
- Department of Mathematics, Logistical Engineering University, Chongqing, China. .,Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
296
|
Shim JE, Lee T, Lee I. From sequencing data to gene functions: co-functional network approaches. Anim Cells Syst (Seoul) 2017; 21:77-83. [PMID: 30460054 PMCID: PMC6138336 DOI: 10.1080/19768354.2017.1284156] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 01/15/2017] [Indexed: 01/04/2023] Open
Abstract
Advanced high-throughput sequencing technology accumulated massive amount of genomics and transcriptomics data in the public databases. Due to the high technical accessibility, DNA and RNA sequencing have huge potential for the study of gene functions in most species including animals and crops. A proven analytic platform to convert sequencing data to gene functional information is co-functional network. Because all genes exert their functions through interactions with others, network analysis is a legitimate way to study gene functions. The workflow of network-based functional study is composed of three steps: (i) inferencing co-functional links, (ii) evaluating and integrating the links into genome-scale networks, and (iii) generating functional hypotheses from the networks. Co-functional links can be inferred from DNA sequencing data by using phylogenetic profiling, gene neighborhood, domain profiling, associalogs, and co-expression analysis from RNA sequencing data. The inferred links are then evaluated and integrated into a genome-scale network with aid from gold-standard co-functional links. Functional hypotheses can be generated from the network based on (i) network connectivity, (ii) network propagation, and (iii) subnetwork analysis. The functional analysis pipeline described here requires only sequencing data which can be readily available for most species by next-generation sequencing technology. Therefore, co-functional networks will greatly potentiate the use of the sequencing data for the study of genetics in any cellular organism.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
297
|
Muhammad SA, Raza W, Nguyen T, Bai B, Wu X, Chen J. Cellular Signaling Pathways in Insulin Resistance-Systems Biology Analyses of Microarray Dataset Reveals New Drug Target Gene Signatures of Type 2 Diabetes Mellitus. Front Physiol 2017; 8:13. [PMID: 28179884 PMCID: PMC5264126 DOI: 10.3389/fphys.2017.00013] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 01/09/2017] [Indexed: 01/09/2023] Open
Abstract
Purpose: Type 2 diabetes mellitus (T2DM) is a chronic and metabolic disorder affecting large set of population of the world. To widen the scope of understanding of genetic causes of this disease, we performed interactive and toxicogenomic based systems biology study to find potential T2DM related genes after cDNA differential analysis. Methods: From the list of 50-differential expressed genes (p < 0.05), we found 9-T2DM related genes using extensive data mapping. In our constructed gene-network, T2DM-related differentially expressed seeder genes (9-genes) are found to interact with functionally related gene signatures (31-genes). The genetic interaction network of both T2DM-associated seeder as well as signature genes generally relates well with the disease condition based on toxicogenomic and data curation. Results: These networks showed significant enrichment of insulin signaling, insulin secretion and other T2DM-related pathways including JAK-STAT, MAPK, TGF, Toll-like receptor, p53 and mTOR, adipocytokine, FOXO, PPAR, P13-AKT, and triglyceride metabolic pathways. We found some enriched pathways that are common in different conditions. We recognized 11-signaling pathways as a connecting link between gene signatures in insulin resistance and T2DM. Notably, in the drug-gene network, the interacting genes showed significant overlap with 13-FDA approved and few non-approved drugs. This study demonstrates the value of systems genetics for identifying 18 potential genes associated with T2DM that are probable drug targets. Conclusions: This integrative and network based approaches for finding variants in genomic data expect to accelerate identification of new drug target molecules for different diseases and can speed up drug discovery outcomes.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya UniversityMultan, Pakistan; Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China
| | - Waseem Raza
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University Multan, Pakistan
| | - Thanh Nguyen
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China; Department of Computer and Information Science, Purdue UniversityIndianapolis, IN, USA
| | - Baogang Bai
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical University Wenzhou, China
| | - Xiaogang Wu
- Institute for Systems Biology Seattle, WA, USA
| | - Jake Chen
- Institute of Biopharmaceutical Informatics and Technologies, Wenzhou Medical UniversityWenzhou, China; Wenzhou Medical University, 1st Affiliate Hospital WenzhouWenzhou, China; Department of Computer and Information Science, Purdue UniversityIndianapolis, IN, USA; Indiana Center for Systems Biology and Personalized Medicine, Indiana University-Purdue UniversityIndianapolis, IN, USA; Informatics Institute, School of Medicine, The University of AlabamaBirmingham, AL, USA
| |
Collapse
|
298
|
Liang S, Tippens ND, Zhou Y, Mort M, Stenson PD, Cooper DN, Yu H. iRegNet3D: three-dimensional integrated regulatory network for the genomic analysis of coding and non-coding disease mutations. Genome Biol 2017; 18:10. [PMID: 28100260 PMCID: PMC5241969 DOI: 10.1186/s13059-016-1138-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 12/16/2016] [Indexed: 01/05/2023] Open
Abstract
The mechanistic details of most disease-causing mutations remain poorly explored within the context of regulatory networks. We present a high-resolution three-dimensional integrated regulatory network (iRegNet3D) in the form of a web tool, where we resolve the interfaces of all known transcription factor (TF)-TF, TF-DNA and chromatin-chromatin interactions for the analysis of both coding and non-coding disease-associated mutations to obtain mechanistic insights into their functional impact. Using iRegNet3D, we find that disease-associated mutations may perturb the regulatory network through diverse mechanisms including chromatin looping. iRegNet3D promises to be an indispensable tool in large-scale sequencing and disease association studies.
Collapse
Affiliation(s)
- Siqi Liang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, 14853, USA.,Weill Institute for Cell and Molecular Biology, Ithaca, NY, 14853, USA
| | - Nathaniel D Tippens
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, 14853, USA.,Weill Institute for Cell and Molecular Biology, Ithaca, NY, 14853, USA
| | - Yaoda Zhou
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, 14853, USA.,Weill Institute for Cell and Molecular Biology, Ithaca, NY, 14853, USA
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, 14853, USA. .,Weill Institute for Cell and Molecular Biology, Ithaca, NY, 14853, USA.
| |
Collapse
|
299
|
Greene CS, Himmelstein DS. Genetic Association-Guided Analysis of Gene Networks for the Study of Complex Traits. ACTA ACUST UNITED AC 2017; 9:179-84. [PMID: 27094199 DOI: 10.1161/circgenetics.115.001181] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 03/08/2016] [Indexed: 12/29/2022]
Affiliation(s)
- Casey S Greene
- From the Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia (C.S.G.); and Biological and Medical Informatics, University of California, San Francisco (D.S.H.).
| | - Daniel S Himmelstein
- From the Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia (C.S.G.); and Biological and Medical Informatics, University of California, San Francisco (D.S.H.)
| |
Collapse
|
300
|
Kim E, Hwang S, Lee I. SoyNet: a database of co-functional networks for soybean Glycine max. Nucleic Acids Res 2017; 45:D1082-D1089. [PMID: 27492285 PMCID: PMC5210602 DOI: 10.1093/nar/gkw704] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 07/27/2016] [Accepted: 07/27/2016] [Indexed: 01/09/2023] Open
Abstract
Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance.
Collapse
Affiliation(s)
- Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|