401
|
Hope for GWAS: relevant risk genes uncovered from GWAS statistical noise. Int J Mol Sci 2014; 15:17601-21. [PMID: 25268625 PMCID: PMC4227180 DOI: 10.3390/ijms151017601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Revised: 09/01/2014] [Accepted: 09/22/2014] [Indexed: 02/07/2023] Open
Abstract
Hundreds of genetic variants have been associated to common diseases through genome-wide association studies (GWAS), yet there are limits to current approaches in detecting true small effect risk variants against a background of false positive findings. Here we addressed the missing heritability problem, aiming to test whether there are indeed risk variants within GWAS statistical noise and to develop a systematic strategy to retrieve these hidden variants. Employing an integrative approach, which combines protein-protein interactions with association data from GWAS for 6 common diseases, we found that associated-genes at less stringent significance levels (p < 0.1) with any of these diseases are functionally connected beyond noise expectation. This functional coherence was used to identify disease-relevant subnetworks, which were shown to be enriched in known genes, outperforming the selection of top GWAS genes. As a proof of principle, we applied this approach to breast cancer, supporting well-known breast cancer genes, while pinpointing novel susceptibility genes for experimental validation. This study reinforces the idea that GWAS are under-analyzed and that missing heritability is rather hidden. It extends the use of protein networks to reveal this missing heritability, thus leveraging the large investment in GWAS that produced so far little tangible gain.
Collapse
|
402
|
Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A. Quantifying the impact and extent of undocumented biomedical synonymy. PLoS Comput Biol 2014; 10:e1003799. [PMID: 25255227 PMCID: PMC4177665 DOI: 10.1371/journal.pcbi.1003799] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 06/26/2014] [Indexed: 12/14/2022] Open
Abstract
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies. Automated systems that extract and integrate information from the research literature have become common in biomedicine. As the same meaning can be expressed in many distinct but synonymous ways, access to comprehensive thesauri may enable such systems to maximize their performance. Here, we establish the importance of synonymy for a specific text-mining task (named-entity normalization), and we suggest that current thesauri may be woefully inadequate in their documentation of this linguistic phenomenon. To test this claim, we develop a model for estimating the amount of missing synonymy. We apply our model to both biomedical terminologies and general-English thesauri, predicting massive amounts of missing synonymy for both lexicons. Furthermore, we verify some of our predictions for the latter domain through “crowd-sourcing.” Overall, our work highlights the dramatic incompleteness of current biomedical thesauri, and to mitigate this issue, we propose the creation of “living” terminologies, which would automatically harvest undocumented synonymy and help smart machines enrich biomedicine.
Collapse
Affiliation(s)
- David R. Blair
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Kanix Wang
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Svetlozar Nestorov
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
| | - James A. Evans
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- Department of Sociology, University of Chicago, Chicago, Illinois, United States of America
| | - Andrey Rzhetsky
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- Departments of Medicine and Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
403
|
Hu Y, Petit SA, Ficarro SB, Toomire KJ, Xie A, Lim E, Cao SA, Park E, Eck MJ, Scully R, Brown M, Marto JA, Livingston DM. PARP1-driven poly-ADP-ribosylation regulates BRCA1 function in homologous recombination-mediated DNA repair. Cancer Discov 2014; 4:1430-47. [PMID: 25252691 DOI: 10.1158/2159-8290.cd-13-0891] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
UNLABELLED BRCA1 promotes homologous recombination-mediated DNA repair (HRR). However, HRR must be tightly regulated to prevent illegitimate recombination. We previously found that BRCA1 HRR function is regulated by the RAP80 complex, but the mechanism was unclear. We have now observed that PARP1 interacts with and poly-ADP-ribosylates (aka PARsylates) BRCA1. PARsylation is directed at the BRCA1 DNA binding domain and downmodulates its function. Moreover, RAP80 contains a poly-ADP-ribose-interacting domain that binds PARsylated BRCA1 and helps to maintain the stability of PARP1-BRCA1-RAP80 complexes. BRCA1 PARsylation is a key step in BRCA1 HRR control. When BRCA1 PARsylation is defective, it gives rise to excessive HRR and manifestations of genome instability. BRCA1 PARsylation and/or RAP80 expression is defective in a subset of sporadic breast cancer cell lines and patient-derived tumor xenograft models. These observations are consistent with the possibility that such defects, when chronic, contribute to tumor development in BRCA1+/+ individuals. SIGNIFICANCE We propose a model that describes how BRCA1 functions to both support and restrict HRR. BRCA1 PARsylation is a key event in this process, failure of which triggers hyper-recombination and chromosome instability. Thus, hyperfunctioning BRCA1 can elicit genomic abnormalities similar to those observed in the absence of certain BRCA1 functions.
Collapse
Affiliation(s)
- Yiduo Hu
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Genetics, Harvard Medical School, Boston, Massachusetts. Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Sarah A Petit
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Genetics, Harvard Medical School, Boston, Massachusetts
| | - Scott B Ficarro
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts. Blais Proteomics Center, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Kimberly J Toomire
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Genetics, Harvard Medical School, Boston, Massachusetts
| | - Anyong Xie
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Elgene Lim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Shiliang A Cao
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Eunyoung Park
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts
| | - Michael J Eck
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts
| | - Ralph Scully
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Myles Brown
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Jarrod A Marto
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts. Blais Proteomics Center, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - David M Livingston
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts. Department of Genetics, Harvard Medical School, Boston, Massachusetts. Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.
| |
Collapse
|
404
|
Abstract
The challenging task of studying and modeling complex dynamics of biological systems in order to describe various human diseases has gathered great interest in recent years. Major biological processes are mediated through protein interactions, hence there is a need to understand the chaotic network that forms these processes in pursuance of understanding human diseases. The applications of protein interaction networks to disease datasets allow the identification of genes and proteins associated with diseases, the study of network properties, identification of subnetworks, and network-based disease gene classification. Although various protein interaction network analysis strategies have been employed, grand challenges are still existing. Global understanding of protein interaction networks via integration of high-throughput functional genomics data from different levels will allow researchers to examine the disease pathways and identify strategies to control them. As a result, it seems likely that more personalized, more accurate and more rapid disease gene diagnostic techniques will be devised in the future, as well as novel strategies that are more personalized. This mini-review summarizes the current practice of protein interaction networks in medical research as well as challenges to be overcome.
Collapse
Affiliation(s)
- Tuba Sevimoglu
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| | - Kazim Yalcin Arga
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| |
Collapse
|
405
|
Cornish AJ, Markowetz F. SANTA: quantifying the functional content of molecular networks. PLoS Comput Biol 2014; 10:e1003808. [PMID: 25210953 PMCID: PMC4161294 DOI: 10.1371/journal.pcbi.1003808] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 07/15/2014] [Indexed: 12/31/2022] Open
Abstract
Linking networks of molecular interactions to cellular functions and phenotypes is a key goal in systems biology. Here, we adapt concepts of spatial statistics to assess the functional content of molecular networks. Based on the guilt-by-association principle, our approach (called SANTA) quantifies the strength of association between a gene set and a network, and functionally annotates molecular networks like other enrichment methods annotate lists of genes. As a general association measure, SANTA can (i) functionally annotate experimentally derived networks using a collection of curated gene sets and (ii) annotate experimentally derived gene sets using a collection of curated networks, as well as (iii) prioritize genes for follow-up analyses. We exemplify the efficacy of SANTA in several case studies using the S. cerevisiae genetic interaction network and genome-wide RNAi screens in cancer cell lines. Our theory, simulations, and applications show that SANTA provides a principled statistical way to quantify the association between molecular networks and cellular functions and phenotypes. SANTA is available from http://bioconductor.org/packages/release/bioc/html/SANTA.html.
Collapse
Affiliation(s)
- Alex J. Cornish
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Florian Markowetz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom
| |
Collapse
|
406
|
Honti F, Meader S, Webber C. Unbiased functional clustering of gene variants with a phenotypic-linkage network. PLoS Comput Biol 2014; 10:e1003815. [PMID: 25166029 PMCID: PMC4148192 DOI: 10.1371/journal.pcbi.1003815] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 07/14/2014] [Indexed: 01/04/2023] Open
Abstract
Groupwise functional analysis of gene variants is becoming standard in next-generation sequencing studies. As the function of many genes is unknown and their classification to pathways is scant, functional associations between genes are often inferred from large-scale omics data. Such data types—including protein–protein interactions and gene co-expression networks—are used to examine the interrelations of the implicated genes. Statistical significance is assessed by comparing the interconnectedness of the mutated genes with that of random gene sets. However, interconnectedness can be affected by confounding bias, potentially resulting in false positive findings. We show that genes implicated through de novo sequence variants are biased in their coding-sequence length and longer genes tend to cluster together, which leads to exaggerated p-values in functional studies; we present here an integrative method that addresses these bias. To discern molecular pathways relevant to complex disease, we have inferred functional associations between human genes from diverse data types and assessed them with a novel phenotype-based method. Examining the functional association between de novo gene variants, we control for the heretofore unexplored confounding bias in coding-sequence length. We test different data types and networks and find that the disease-associated genes cluster more significantly in an integrated phenotypic-linkage network than in other gene networks. We present a tool of superior power to identify functional associations among genes mutated in the same disease even after accounting for significant sequencing study bias and demonstrate the suitability of this method to functionally cluster variant genes underlying polygenic disorders. Plenty of gene variants have been associated with a disease, yet most of the heritability, along with the molecular basis, of common diseases remains unexplained. However, it is widely thought that the products of genes whose mutations are implicated in the same disease function together in the same biological pathways and it is the disruption of these pathways that underlies the disease. Such pathways are not well defined and their identification could help elucidate disease mechanisms. Consequently, groupwise functional analyses of gene variants to identify common disease-relevant pathways are becoming standard in next-generation sequencing studies, but we find that these analyses are confounded by coding-sequence length bias. We control for these bias and describe a phenotype-based approach which outperforms other methods in discerning functional associations among the disease-associated genes. We also demonstrate the suitability of this method to functionally dissect the gene variants underlying a complex disorder, the identified functional clusters offering insight into disease mechanisms.
Collapse
Affiliation(s)
- Frantisek Honti
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Stephen Meader
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
407
|
Ma X, Gao L, Tan K. Modeling disease progression using dynamics of pathway connectivity. Bioinformatics 2014; 30:2343-2350. [PMID: 24771518 PMCID: PMC4133583 DOI: 10.1093/bioinformatics/btu298] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Revised: 03/31/2014] [Accepted: 04/23/2014] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Disease progression is driven by dynamic changes in both the activity and connectivity of molecular pathways. Understanding these dynamic events is critical for disease prognosis and effective treatment. Compared with activity dynamics, connectivity dynamics is poorly explored. RESULTS We describe the M-module algorithm to identify gene modules with common members but varied connectivity across multiple gene co-expression networks (aka M-modules). We introduce a novel metric to capture the connectivity dynamics of an entire M-module. We find that M-modules with dynamic connectivity have distinct topological and biochemical properties compared with static M-modules and hub genes. We demonstrate that incorporation of module connectivity dynamics significantly improves disease stage prediction. We identify different sets of M-modules that are important for specific disease stage transitions and offer new insights into the molecular events underlying disease progression. Besides modeling disease progression, the algorithm and metric introduced here are broadly applicable to modeling dynamics of molecular pathways. AVAILABILITY AND IMPLEMENTATION M-module is implemented in R. The source code is freely available at http://www.healthcare.uiowa.edu/labs/tan/M-module.zip.
Collapse
Affiliation(s)
- Xiaoke Ma
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Long Gao
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
408
|
Igarashi K, Ochiai K, Itoh-Nakadai A, Muto A. Orchestration of plasma cell differentiation by Bach2 and its gene regulatory network. Immunol Rev 2014; 261:116-25. [DOI: 10.1111/imr.12201] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Kazuhiko Igarashi
- Department of Biochemistry; Tohoku University Graduate School of Medicine; Sendai Japan
- CREST; Japan Science and Technology Agency; Sendai Japan
| | - Kyoko Ochiai
- Department of Biochemistry; Tohoku University Graduate School of Medicine; Sendai Japan
- CREST; Japan Science and Technology Agency; Sendai Japan
| | - Ari Itoh-Nakadai
- Department of Biochemistry; Tohoku University Graduate School of Medicine; Sendai Japan
- CREST; Japan Science and Technology Agency; Sendai Japan
| | - Akihiko Muto
- Department of Biochemistry; Tohoku University Graduate School of Medicine; Sendai Japan
- CREST; Japan Science and Technology Agency; Sendai Japan
| |
Collapse
|
409
|
Zhu F, Shi L, Li H, Eksi R, Engel JD, Guan Y. Modeling dynamic functional relationship networks and application to ex vivo human erythroid differentiation. ACTA ACUST UNITED AC 2014; 30:3325-33. [PMID: 25115705 DOI: 10.1093/bioinformatics/btu542] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Functional relationship networks, which summarize the probability of co-functionality between any two genes in the genome, could complement the reductionist focus of modern biology for understanding diverse biological processes in an organism. One major limitation of the current networks is that they are static, while one might expect functional relationships to consistently reprogram during the differentiation of a cell lineage. To address this potential limitation, we developed a novel algorithm that leverages both differentiation stage-specific expression data and large-scale heterogeneous functional genomic data to model such dynamic changes. We then applied this algorithm to the time-course RNA-Seq data we collected for ex vivo human erythroid cell differentiation. RESULTS Through computational cross-validation and literature validation, we show that the resulting networks correctly predict the (de)-activated functional connections between genes during erythropoiesis. We identified known critical genes, such as HBD and GATA1, and functional connections during erythropoiesis using these dynamic networks, while the traditional static network was not able to provide such information. Furthermore, by comparing the static and the dynamic networks, we identified novel genes (such as OSBP2 and PDZK1IP1) that are potential drivers of erythroid cell differentiation. This novel method of modeling dynamic networks is applicable to other differentiation processes where time-course genome-scale expression data are available, and should assist in generating greater understanding of the functional dynamics at play across the genome during development. AVAILABILITY AND IMPLEMENTATION The network described in this article is available at http://guanlab.ccmb.med.umich.edu/stageSpecificNetwork.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Lihong Shi
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Hongdong Li
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - James Douglas Engel
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| |
Collapse
|
410
|
Schramm SJ, Jayaswal V, Goel A, Li SS, Yang YH, Mann GJ, Wilkins MR. Molecular interaction networks for the analysis of human disease: utility, limitations, and considerations. Proteomics 2014; 13:3393-405. [PMID: 24166987 DOI: 10.1002/pmic.201200570] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 01/01/2023]
Abstract
High-throughput '-omics' data can be combined with large-scale molecular interaction networks, for example, protein-protein interaction networks, to provide a unique framework for the investigation of human molecular biology. Interest in these integrative '-omics' methods is growing rapidly because of their potential to understand complexity and association with disease; such approaches have a focus on associations between phenotype and "network-type." The potential of this research is enticing, yet there remain a series of important considerations. Here, we discuss interaction data selection, data quality, the relative merits of using data from large high-throughput studies versus a meta-database of smaller literature-curated studies, and possible issues of sociological or inspection bias in interaction data. Other work underway, especially international consortia to establish data formats, quality standards and address data redundancy, and the improvements these efforts are making to the field, is also evaluated. We present options for researchers intending to use large-scale molecular interaction networks as a functional context for protein or gene expression data, including microRNAs, especially in the context of human disease.
Collapse
Affiliation(s)
- Sarah-Jane Schramm
- Sydney Medical School, Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, NSW, Australia; Melanoma Institute Australia, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|
411
|
Bunyavanich S, Schadt EE, Himes BE, Lasky-Su J, Qiu W, Lazarus R, Ziniti JP, Cohain A, Linderman M, Torgerson DG, Eng CS, Pino-Yanes M, Padhukasahasram B, Yang JJ, Mathias RA, Beaty TH, Li X, Graves P, Romieu I, Navarro BDR, Salam MT, Vora H, Nicolae DL, Ober C, Martinez FD, Bleecker ER, Meyers DA, Gauderman WJ, Gilliland F, Burchard EG, Barnes KC, Williams LK, London SJ, Zhang B, Raby BA, Weiss ST. Integrated genome-wide association, coexpression network, and expression single nucleotide polymorphism analysis identifies novel pathway in allergic rhinitis. BMC Med Genomics 2014; 7:48. [PMID: 25085501 PMCID: PMC4127082 DOI: 10.1186/1755-8794-7-48] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Accepted: 06/04/2014] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Allergic rhinitis is a common disease whose genetic basis is incompletely explained. We report an integrated genomic analysis of allergic rhinitis. METHODS We performed genome wide association studies (GWAS) of allergic rhinitis in 5633 ethnically diverse North American subjects. Next, we profiled gene expression in disease-relevant tissue (peripheral blood CD4+ lymphocytes) collected from subjects who had been genotyped. We then integrated the GWAS and gene expression data using expression single nucleotide (eSNP), coexpression network, and pathway approaches to identify the biologic relevance of our GWAS. RESULTS GWAS revealed ethnicity-specific findings, with 4 genome-wide significant loci among Latinos and 1 genome-wide significant locus in the GWAS meta-analysis across ethnic groups. To identify biologic context for these results, we constructed a coexpression network to define modules of genes with similar patterns of CD4+ gene expression (coexpression modules) that could serve as constructs of broader gene expression. 6 of the 22 GWAS loci with P-value ≤ 1x10-6 tagged one particular coexpression module (4.0-fold enrichment, P-value 0.0029), and this module also had the greatest enrichment (3.4-fold enrichment, P-value 2.6 × 10-24) for allergic rhinitis-associated eSNPs (genetic variants associated with both gene expression and allergic rhinitis). The integrated GWAS, coexpression network, and eSNP results therefore supported this coexpression module as an allergic rhinitis module. Pathway analysis revealed that the module was enriched for mitochondrial pathways (8.6-fold enrichment, P-value 4.5 × 10-72). CONCLUSIONS Our results highlight mitochondrial pathways as a target for further investigation of allergic rhinitis mechanism and treatment. Our integrated approach can be applied to provide biologic context for GWAS of other diseases.
Collapse
Affiliation(s)
- Supinda Bunyavanich
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
- Division of Pediatric Allergy and Immunology, Department of Pediatrics, and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
| | - Blanca E Himes
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Weiliang Qiu
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Ross Lazarus
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Medical Bioinformatics, Baker IDI, Melbourne, Australia
| | - John P Ziniti
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Ariella Cohain
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
| | - Michael Linderman
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
| | - Dara G Torgerson
- Department of Medicine and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Celeste S Eng
- Department of Medicine and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Maria Pino-Yanes
- Department of Medicine and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- IBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Badri Padhukasahasram
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, MI, USA
| | - James J Yang
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Rasika A Mathias
- Departments of Medicine and Epidemiology, Johns Hopkins University, Baltimore, MD, USA
| | - Terri H Beaty
- Departments of Medicine and Epidemiology, Johns Hopkins University, Baltimore, MD, USA
| | - Xingnan Li
- Center for Genomics, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - Penelope Graves
- Arizona Respiratory Center and BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | | | | | - M Towhid Salam
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Hita Vora
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Dan L Nicolae
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Fernando D Martinez
- Arizona Respiratory Center and BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Eugene R Bleecker
- Center for Genomics, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - Deborah A Meyers
- Center for Genomics, Wake Forest University School of Medicine, Winston Salem, NC, USA
| | - W James Gauderman
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Frank Gilliland
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Esteban G Burchard
- Department of Medicine and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Kathleen C Barnes
- Departments of Medicine and Epidemiology, Johns Hopkins University, Baltimore, MD, USA
| | - L Keoki Williams
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, MI, USA
- Department of Internal Medicine, Henry Ford Health System, Detroit, MI, USA
| | - Stephanie J London
- Division of Intramural Research, Department of Health and Human Services, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle, Park, NC, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
| | - Benjamin A Raby
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
412
|
Shin J, Lee T, Kim H, Lee I. Complementarity between distance- and probability-based methods of gene neighbourhood identification for pathway reconstruction. MOLECULAR BIOSYSTEMS 2014; 10:24-9. [PMID: 24194096 DOI: 10.1039/c3mb70366e] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Identifying gene neighbourhoods using either distance- or probability-based measures has proven effective in retrieving co-functional links. We report that these two approaches are highly complementary, with differential sensitivity for the core pathway links. We demonstrate that integrating these measures improves prediction of both pathways and phenotypes.
Collapse
Affiliation(s)
- Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.
| | | | | | | |
Collapse
|
413
|
Dittmar WJ, McIver L, Michalak P, Garner HR, Valdez G. EvoCor: a platform for predicting functionally related genes using phylogenetic and expression profiles. Nucleic Acids Res 2014; 42:W72-5. [PMID: 24848012 PMCID: PMC4086105 DOI: 10.1093/nar/gku442] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 05/03/2014] [Accepted: 05/07/2014] [Indexed: 01/18/2023] Open
Abstract
The wealth of publicly available gene expression and genomic data provides unique opportunities for computational inference to discover groups of genes that function to control specific cellular processes. Such genes are likely to have co-evolved and be expressed in the same tissues and cells. Unfortunately, the expertise and computational resources required to compare tens of genomes and gene expression data sets make this type of analysis difficult for the average end-user. Here, we describe the implementation of a web server that predicts genes involved in affecting specific cellular processes together with a gene of interest. We termed the server 'EvoCor', to denote that it detects functional relationships among genes through evolutionary analysis and gene expression correlation. This web server integrates profiles of sequence divergence derived by a Hidden Markov Model (HMM) and tissue-wide gene expression patterns to determine putative functional linkages between pairs of genes. This server is easy to use and freely available at http://pilot-hmm.vbi.vt.edu/.
Collapse
Affiliation(s)
- W James Dittmar
- Virginia Tech Carilion School of Medicine, Roanoke, VA 24016, USA
| | - Lauren McIver
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
| | - Pawel Michalak
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
| | - Harold R Garner
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
| | - Gregorio Valdez
- Virginia Tech Carilion Research Institute, Roanoke, VA 24016, USA; Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
414
|
Cheng L, Li J, Ju P, Peng J, Wang Y. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association. PLoS One 2014; 9:e99415. [PMID: 24932637 PMCID: PMC4059643 DOI: 10.1371/journal.pone.0099415] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/14/2014] [Indexed: 01/20/2023] Open
Abstract
Background Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue. Methods SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity. Results The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.
Collapse
Affiliation(s)
- Liang Cheng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Jie Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Peng Ju
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Jiajie Peng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
415
|
Koyejo O, Lee C, Ghosh J. A constrained matrix-variate Gaussian process for transposable data. Mach Learn 2014. [DOI: 10.1007/s10994-014-5444-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
416
|
Grennan KS, Chen C, Gershon ES, Liu C. Molecular network analysis enhances understanding of the biology of mental disorders. Bioessays 2014; 36:606-616. [PMID: 24733456 PMCID: PMC4300946 DOI: 10.1002/bies.201300147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We provide an introduction to network theory, evidence to support a connection between molecular network structure and neuropsychiatric disease, and examples of how network approaches can expand our knowledge of the molecular bases of these diseases. Without systematic methods to derive their biological meanings and inter-relatedness, the many molecular changes associated with neuropsychiatric disease, including genetic variants, gene expression changes, and protein differences, present an impenetrably complex set of findings. Network approaches can potentially help integrate and reconcile these findings, as well as provide new insights into the molecular architecture of neuropsychiatric diseases. Network approaches to neuropsychiatric disease are still in their infancy, and we discuss what might be done to improve their prospects.
Collapse
Affiliation(s)
| | | | - Elliot S. Gershon
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Chunyu Liu
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
417
|
Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014; 61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]
Abstract
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. MATERIALS AND METHODS We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. RESULTS The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. CONCLUSIONS Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
Collapse
Affiliation(s)
- Giorgio Valentini
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Horacio Caniza
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Matteo Re
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
418
|
Abstract
IBD is a spectrum of chronic disorders that constitute an important health problem worldwide. The hunt for genetic determinants of disease onset and course has culminated in the Immunochip project, which has identified >160 loci containing IBD susceptibility genes. In this Review, we highlight how genetic association studies have informed our understanding of the pathogenesis of IBD by focusing research efforts on key pathways involved in innate immunity, autophagy, lymphocyte differentiation and chemotaxis. Several of these novel genetic markers and cellular pathways are promising candidates for patient stratification and therapeutic targeting.
Collapse
|
419
|
Cheng F, Jia P, Wang Q, Lin CC, Li WH, Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol Biol Evol 2014; 31:2156-69. [PMID: 24881052 DOI: 10.1093/molbev/msu167] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Quan Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of ChicagoBiodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of MedicineDepartment of Cancer Biology, Vanderbilt University School of MedicineDepartment of Psychiatry, Vanderbilt University School of MedicineCenter for Quantitative Sciences, Vanderbilt University Medical Center
| |
Collapse
|
420
|
Hwang S, Kim E, Yang S, Marcotte EM, Lee I. MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network. Nucleic Acids Res 2014; 42:W147-53. [PMID: 24861622 PMCID: PMC4086117 DOI: 10.1093/nar/gku434] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Despite recent advances in human genetics, model organisms are indispensable for human disease research. Most human disease pathways are evolutionally conserved among other species, where they may phenocopy the human condition or be associated with seemingly unrelated phenotypes. Much of the known gene-to-phenotype association information is distributed across diverse databases, growing rapidly due to new experimental techniques. Accessible bioinformatics tools will therefore facilitate translation of discoveries from model organisms into human disease biology. Here, we present a web-based discovery tool for human disease studies, MORPHIN (model organisms projected on a human integrated gene network), which prioritizes the most relevant human diseases for a given set of model organism genes, potentially highlighting new model systems for human diseases and providing context to model organism studies. Conceptually, MORPHIN investigates human diseases by an orthology-based projection of a set of model organism genes onto a genome-scale human gene network. MORPHIN then prioritizes human diseases by relevance to the projected model organism genes using two distinct methods: a conventional overlap-based gene set enrichment analysis and a network-based measure of closeness between the query and disease gene sets capable of detecting associations undetectable by the conventional overlap-based methods. MORPHIN is freely accessible at http://www.inetbio.org/morphin.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, TX 78712, USA
| | - Eiru Kim
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | - Sunmo Yang
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, Yonsei University, Seoul, 120-749, Korea
| |
Collapse
|
421
|
Guala D, Sjölund E, Sonnhammer ELL. MaxLink: network-based prioritization of genes tightly linked to a disease seed set. Bioinformatics 2014; 30:2689-90. [DOI: 10.1093/bioinformatics/btu344] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
|
422
|
Xie L, Ge X, Tan H, Xie L, Zhang Y, Hart T, Yang X, Bourne PE. Towards structural systems pharmacology to study complex diseases and personalized medicine. PLoS Comput Biol 2014; 10:e1003554. [PMID: 24830652 PMCID: PMC4022462 DOI: 10.1371/journal.pcbi.1003554] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Genome-Wide Association Studies (GWAS), whole genome sequencing, and high-throughput omics techniques have generated vast amounts of genotypic and molecular phenotypic data. However, these data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, which continues along a one-drug-one-target-one-disease paradigm. As a partial consequence, both the cost to launch a new drug and the attrition rate are increasing. Systems pharmacology and pharmacogenomics are emerging to exploit the available data and potentially reverse this trend, but, as we argue here, more is needed. To understand the impact of genetic, epigenetic, and environmental factors on drug action, we must study the structural energetics and dynamics of molecular interactions in the context of the whole human genome and interactome. Such an approach requires an integrative modeling framework for drug action that leverages advances in data-driven statistical modeling and mechanism-based multiscale modeling and transforms heterogeneous data from GWAS, high-throughput sequencing, structural genomics, functional genomics, and chemical genomics into unified knowledge. This is not a small task, but, as reviewed here, progress is being made towards the final goal of personalized medicines for the treatment of complex diseases.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
- Ph.D. Program in Computer Science, Biology, and Biochemistry, The Graduate Center, The City University of New York, New York, New York, United States of America
- * E-mail:
| | - Xiaoxia Ge
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Hepan Tan
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Yinliang Zhang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Thomas Hart
- Department of Biological Sciences, Hunter College, The City University of New York, New York, New York, United States of America
| | - Xiaowei Yang
- School of Public Health, Hunter College, The City University of New York, New York, New York, United States of America
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
423
|
Emig-Agius D, Olivieri K, Pache L, Shih HL, Pustovalova O, Bessarabova M, Young JAT, Chanda SK, Ideker T. An integrated map of HIV-human protein complexes that facilitate viral infection. PLoS One 2014; 9:e96687. [PMID: 24817247 PMCID: PMC4016004 DOI: 10.1371/journal.pone.0096687] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/11/2014] [Indexed: 12/03/2022] Open
Abstract
Recent proteomic and genetic studies have aimed to identify a complete network of interactions between HIV and human proteins and genes. This HIV-human interaction network provides invaluable information as to how HIV exploits the host machinery and can be used as a starting point for further functional analyses. We integrated this network with complementary datasets of protein function and interaction to nominate human protein complexes with likely roles in viral infection. Based on our approach we identified a global map of 40 HIV-human protein complexes with putative roles in HIV infection, some of which are involved in DNA replication and repair, transcription, translation, and cytoskeletal regulation. Targeted RNAi screens were used to validate several proteins and complexes for functional impact on viral infection. Thus, our HIV-human protein complex map provides a significant resource of potential HIV-host interactions for further study.
Collapse
Affiliation(s)
- Dorothea Emig-Agius
- Departments of Medicine and Bioengineering, University of California at San Diego, La Jolla, California, United States of America
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - Kevin Olivieri
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Lars Pache
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Hsin Ling Shih
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Olga Pustovalova
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - Marina Bessarabova
- IP&Science, Thomson Reuters Scientific Inc., Carlsbad, California, United States of America
| | - John A. T. Young
- The Salk Institute for Biological Studies, La Jolla, California, United States of America
| | - Sumit K. Chanda
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Trey Ideker
- Departments of Medicine and Bioengineering, University of California at San Diego, La Jolla, California, United States of America
| |
Collapse
|
424
|
Joshi S, Singh AR, Zulcic M, Bao L, Messer K, Ideker T, Dutkowski J, Durden DL. Rac2 controls tumor growth, metastasis and M1-M2 macrophage differentiation in vivo. PLoS One 2014; 9:e95893. [PMID: 24770346 PMCID: PMC4000195 DOI: 10.1371/journal.pone.0095893] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/31/2014] [Indexed: 12/16/2022] Open
Abstract
Although it is well-established that the macrophage M1 to M2 transition plays a role in tumor progression, the molecular basis for this process remains incompletely understood. Herein, we demonstrate that the small GTPase, Rac2 controls macrophage M1 to M2 differentiation and the metastatic phenotype in vivo. Using a genetic approach, combined with syngeneic and orthotopic tumor models we demonstrate that Rac2-/- mice display a marked defect in tumor growth, angiogenesis and metastasis. Microarray, RT-PCR and metabolomic analysis on bone marrow derived macrophages isolated from the Rac2-/- mice identify an important role for Rac2 in M2 macrophage differentiation. Furthermore, we define a novel molecular mechanism by which signals transmitted from the extracellular matrix via the α4β1 integrin and MCSF receptor lead to the activation of Rac2 and potentially regulate macrophage M2 differentiation. Collectively, our findings demonstrate a macrophage autonomous process by which the Rac2 GTPase is activated downstream of the α4β1 integrin and the MCSF receptor to control tumor growth, metastasis and macrophage differentiation into the M2 phenotype. Finally, using gene expression and metabolomic data from our Rac2-/- model, and information related to M1-M2 macrophage differentiation curated from the literature we executed a systems biologic analysis of hierarchical protein-protein interaction networks in an effort to develop an iterative interactome map which will predict additional mechanisms by which Rac2 may coordinately control macrophage M1 to M2 differentiation and metastasis.
Collapse
Affiliation(s)
- Shweta Joshi
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Alok R. Singh
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Muamera Zulcic
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Lei Bao
- UCSD Department of Biostatistics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Karen Messer
- UCSD Department of Biostatistics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Donald L. Durden
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Pediatrics and Rady Children's Hospital, San Diego, La Jolla, California, United States of America
| |
Collapse
|
425
|
Guney E, Oliva B. Analysis of the robustness of network-based disease-gene prioritization methods reveals redundancy in the human interactome and functional diversity of disease-genes. PLoS One 2014; 9:e94686. [PMID: 24733074 PMCID: PMC3986215 DOI: 10.1371/journal.pone.0094686] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 03/13/2014] [Indexed: 11/18/2022] Open
Abstract
Complex biological systems usually pose a trade-off between robustness and fragility where a small number of perturbations can substantially disrupt the system. Although biological systems are robust against changes in many external and internal conditions, even a single mutation can perturb the system substantially, giving rise to a pathophenotype. Recent advances in identifying and analyzing the sequential variations beneath human disorders help to comprehend a systemic view of the mechanisms underlying various disease phenotypes. Network-based disease-gene prioritization methods rank the relevance of genes in a disease under the hypothesis that genes whose proteins interact with each other tend to exhibit similar phenotypes. In this study, we have tested the robustness of several network-based disease-gene prioritization methods with respect to the perturbations of the system using various disease phenotypes from the Online Mendelian Inheritance in Man database. These perturbations have been introduced either in the protein-protein interaction network or in the set of known disease-gene associations. As the network-based disease-gene prioritization methods are based on the connectivity between known disease-gene associations, we have further used these methods to categorize the pathophenotypes with respect to the recoverability of hidden disease-genes. Our results have suggested that, in general, disease-genes are connected through multiple paths in the human interactome. Moreover, even when these paths are disturbed, network-based prioritization can reveal hidden disease-gene associations in some pathophenotypes such as breast cancer, cardiomyopathy, diabetes, leukemia, parkinson disease and obesity to a greater extend compared to the rest of the pathophenotypes tested in this study. Gene Ontology (GO) analysis highlighted the role of functional diversity for such diseases.
Collapse
Affiliation(s)
- Emre Guney
- Center for Complex Network Research, Northeastern University, Boston, Massachusetts, United States of America
| | - Baldo Oliva
- Structural Bioinformatics Group (GRIB), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
- * E-mail:
| |
Collapse
|
426
|
Bayesian systems-based genetic association analysis with effect strength estimation and omic wide interpretation: a case study in rheumatoid arthritis. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2014; 1142:143-76. [PMID: 24706282 DOI: 10.1007/978-1-4939-0404-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Rich dependency structures are often formed in genetic association studies between the phenotypic, clinical, and environmental descriptors. These descriptors may not be standardized, and may encompass various disease definitions and clinical endpoints which are only weakly influenced by various (e.g., genetic) factors. Such loosely defined complex intermediate clinical phenotypes are typically used in follow-up candidate gene association studies, e.g., after genome-wide analysis, to deepen the understanding of the associations and to estimate effect strength. This chapter discusses a solid methodology, which is useful in such a scenario, by using probabilistic graphical models, namely, Bayesian networks in the Bayesian statistical framework. This method offers systematically scalable, comprehensive hierarchical hypotheses about multivariate relevance. We discuss its workflow: from data engineering to semantic publication of the results. We overview the construction, visualization, and interpretation of complex hypotheses related to the structural analysis of relevance. Furthermore, we illustrate the use of a dependency model-based relevance measure, which takes into account the structural properties of the model, for quantifying the effect strength. Finally, we discuss the "interpretational" or translational challenge of a genetic association study, with a focus on the fusion of heterogeneous omic knowledge to reintegrate the results into a genome-wide context.
Collapse
|
427
|
Zhang SW, Shao DD, Zhang SY, Wang YB. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression. MOLECULAR BIOSYSTEMS 2014; 10:1400-8. [PMID: 24695957 DOI: 10.1039/c3mb70588a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene-protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein-protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for prioritizing candidate disease genes. The top prediction results of Alzheimer's disease are consistent with previous literature reports.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, 710072, Xi'an, China.
| | | | | | | |
Collapse
|
428
|
Itan Y, Mazel M, Mazel B, Abhyankar A, Nitschke P, Quintana-Murci L, Boisson-Dupuis S, Boisson B, Abel L, Zhang SY, Casanova JL. HGCS: an online tool for prioritizing disease-causing gene variants by biological distance. BMC Genomics 2014; 15:256. [PMID: 24694260 PMCID: PMC4051124 DOI: 10.1186/1471-2164-15-256] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Accepted: 03/26/2014] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Identifying the genotypes underlying human disease phenotypes is a fundamental step in human genetics and medicine. High-throughput genomic technologies provide thousands of genetic variants per individual. The causal genes of a specific phenotype are usually expected to be functionally close to each other. According to this hypothesis, candidate genes are picked from high-throughput data on the basis of their biological proximity to core genes - genes already known to be responsible for the phenotype. There is currently no effective gene-centric online interface for this purpose. RESULTS We describe here the human gene connectome server (HGCS), a powerful, easy-to-use interactive online tool enabling researchers to prioritize any list of genes according to their biological proximity to core genes associated with the phenotype of interest. We also make available an updated and extended version for all human gene-specific connectomes. The HGCS is freely available to noncommercial users from: http://hgc.rockefeller.edu. CONCLUSIONS The HGCS should help investigators from diverse fields to identify new disease-causing candidate genes more effectively, via a user-friendly online interface.
Collapse
Affiliation(s)
- Yuval Itan
- St, Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
429
|
Wang Y, Fang H, Yang T, Wu D, Zhao J. Degree‐adjusted algorithm for prioritisation of candidate disease genes from gene expression and protein interactome. IET Syst Biol 2014; 8:41-6. [DOI: 10.1049/iet-syb.2013.0038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Yichuan Wang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Haiyang Fang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Tinghong Yang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Duzhi Wu
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Jing Zhao
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| |
Collapse
|
430
|
GeneSense: a new approach for human gene annotation integrated with protein-protein interaction networks. Sci Rep 2014; 4:4474. [PMID: 24667292 PMCID: PMC3966033 DOI: 10.1038/srep04474] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 03/10/2014] [Indexed: 12/29/2022] Open
Abstract
Virtually all cellular functions involve protein-protein interactions (PPIs). As an increasing number of PPIs are identified and vast amount of information accumulated, researchers are finding different ways to interrogate the data and understand the interactions in context. However, it is widely recognized that a significant portion of the data is scattered, redundant, not considered high quality, and not readily accessible to researchers in a systematic fashion. In addition, it is challenging to identify the optimal protein targets in the current PPI networks. The GeneSense server was developed to integrate gene annotation and PPI networks in an expandable architecture that incorporates selected databases with the aim to assemble, analyze, evaluate and disseminate protein-protein association information in a comprehensive and user-friendly manner. Three network models including nodenet, leafnet and loopnet are used to identify the optimal protein targets in the complex networks. GeneSense is freely available at www.biomedsense.org/genesense.php.
Collapse
|
431
|
Xu Y, Guo M, Liu X, Wang C, Liu Y. SoyFN: a knowledge database of soybean functional networks. Database (Oxford) 2014; 2014:bau019. [PMID: 24618044 PMCID: PMC3949006 DOI: 10.1093/database/bau019] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 01/22/2014] [Accepted: 02/06/2014] [Indexed: 01/08/2023]
Abstract
Many databases for soybean genomic analysis have been built and made publicly available, but few of them contain knowledge specifically targeting the omics-level gene-gene, gene-microRNA (miRNA) and miRNA-miRNA interactions. Here, we present SoyFN, a knowledge database of soybean functional gene networks and miRNA functional networks. SoyFN provides user-friendly interfaces to retrieve, visualize, analyze and download the functional networks of soybean genes and miRNAs. In addition, it incorporates much information about KEGG pathways, gene ontology annotations and 3'-UTR sequences as well as many useful tools including SoySearch, ID mapping, Genome Browser, eFP Browser and promoter motif scan. SoyFN is a schema-free database that can be accessed as a Web service from any modern programming language using a simple Hypertext Transfer Protocol call. The Web site is implemented in Java, JavaScript, PHP, HTML and Apache, with all major browsers supported. We anticipate that this database will be useful for members of research communities both in soybean experimental science and bioinformatics. Database URL: http://nclab.hit.edu.cn/SoyFN.
Collapse
Affiliation(s)
- Yungang Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China and School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China and School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China and School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China and School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China and School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, P.R. China
| |
Collapse
|
432
|
Shindo A, Wallingford JB. PCP and septins compartmentalize cortical actomyosin to direct collective cell movement. Science 2014; 343:649-52. [PMID: 24503851 DOI: 10.1126/science.1243126] [Citation(s) in RCA: 144] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Despite our understanding of actomyosin function in individual migrating cells, we know little about the mechanisms by which actomyosin drives collective cell movement in vertebrate embryos. The collective movements of convergent extension drive both global reorganization of the early embryo and local remodeling during organogenesis. We report here that planar cell polarity (PCP) proteins control convergent extension by exploiting an evolutionarily ancient function of the septin cytoskeleton. By directing septin-mediated compartmentalization of cortical actomyosin, PCP proteins coordinate the specific shortening of mesenchymal cell-cell contacts, which in turn powers cell interdigitation. These data illuminate the interface between developmental signaling systems and the fundamental machinery of cell behavior and should provide insights into the etiology of human birth defects, such as spina bifida and congenital kidney cysts.
Collapse
Affiliation(s)
- Asako Shindo
- Howard Hughes Medical Institute and University of Texas at Austin, Austin, TX 78712, USA
| | | |
Collapse
|
433
|
Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum Genet 2014; 133:125-38. [PMID: 24122152 PMCID: PMC3943795 DOI: 10.1007/s00439-013-1377-1] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Accepted: 10/03/2013] [Indexed: 01/24/2023]
Abstract
Genome-wide association studies (GWAS) have rapidly become a powerful tool in genetic studies of complex diseases and traits. Traditionally, single marker-based tests have been used prevalently in GWAS and have uncovered tens of thousands of disease-associated SNPs. Network-assisted analysis (NAA) of GWAS data is an emerging area in which network-related approaches are developed and utilized to perform advanced analyses of GWAS data in order to study various human diseases or traits. Progress has been made in both methodology development and applications of NAA in GWAS data, and it has already been demonstrated that NAA results may enhance our interpretation and prioritization of candidate genes and markers. Inspired by the strong interest in and high demand for advanced GWAS data analysis, in this review article, we discuss the methodologies and strategies that have been reported for the NAA of GWAS data. Many NAA approaches search for subnetworks and assess the combined effects of multiple genes participating in the resultant subnetworks through a gene set analysis. With no restriction to pre-defined canonical pathways, NAA has the advantage of defining subnetworks with the guidance of the GWAS data under investigation. In addition, some NAA methods prioritize genes from GWAS data based on their interconnections in the reference network. Here, we summarize NAA applications to various diseases and discuss the available options and potential caveats related to their practical usage. Additionally, we provide perspectives regarding this rapidly growing research area.
Collapse
|
434
|
Novarino G, Fenstermaker AG, Zaki MS, Hofree M, Silhavy JL, Heiberg AD, Abdellateef M, Rosti B, Scott E, Mansour L, Masri A, Kayserili H, Al-Aama JY, Abdel-Salam GMH, Karminejad A, Kara M, Kara B, Bozorgmehri B, Ben-Omran T, Mojahedi F, El Din Mahmoud IG, Bouslam N, Bouhouche A, Benomar A, Hanein S, Raymond L, Forlani S, Mascaro M, Selim L, Shehata N, Al-Allawi N, Bindu P, Azam M, Gunel M, Caglayan A, Bilguvar K, Tolun A, Issa MY, Schroth J, Spencer EG, Rosti RO, Akizu N, Vaux KK, Johansen A, Koh AA, Megahed H, Durr A, Brice A, Stevanin G, Gabriel SB, Ideker T, Gleeson JG. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 2014; 343:506-511. [PMID: 24482476 PMCID: PMC4157572 DOI: 10.1126/science.1247363] [Citation(s) in RCA: 416] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Hereditary spastic paraplegias (HSPs) are neurodegenerative motor neuron diseases characterized by progressive age-dependent loss of corticospinal motor tract function. Although the genetic basis is partly understood, only a fraction of cases can receive a genetic diagnosis, and a global view of HSP is lacking. By using whole-exome sequencing in combination with network analysis, we identified 18 previously unknown putative HSP genes and validated nearly all of these genes functionally or genetically. The pathways highlighted by these mutations link HSP to cellular transport, nucleotide metabolism, and synapse and axon development. Network analysis revealed a host of further candidate genes, of which three were mutated in our cohort. Our analysis links HSP to other neurodegenerative disorders and can facilitate gene discovery and mechanistic understanding of disease.
Collapse
Affiliation(s)
- Gaia Novarino
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ali G. Fenstermaker
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Maha S. Zaki
- Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Center, Cairo 12311, Egypt
| | - Matan Hofree
- Department of Computer Science and Engineering and Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jennifer L. Silhavy
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Andrew D. Heiberg
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mostafa Abdellateef
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Basak Rosti
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Eric Scott
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Lobna Mansour
- Department of Pediatric Neurology, Neurometabolic Unit, Cairo University Children’s Hospital, Cairo 406, Egypt
| | - Amira Masri
- Division of Child Neurology, Department of Pediatrics, University of Jordan, Amman 11942, Jordan
| | - Hulya Kayserili
- Istanbul University, Istanbul Medical Faculty, Medical Genetics Department, 34093 Istanbul, Turkey
| | - Jumana Y. Al-Aama
- Department of Genetic Medicine, King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia
| | - Ghada M. H. Abdel-Salam
- Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Center, Cairo 12311, Egypt
| | | | - Majdi Kara
- Department of Pediatrics, Tripoli Children’s Hospital, Tripoli, Libya
| | - Bulent Kara
- Kocaeli University, Medical Faculty, Department of Pediatric Neurology, 41380 Umuttepe, Kocaeli, Turkey
| | - Bita Bozorgmehri
- Kariminejad-Najmabadi Pathology and Genetics Center, Tehran, Iran
| | - Tawfeg Ben-Omran
- Clinical and Metabolic Genetics Division, Department of Pediatrics, Hamad Medical Corporation, Doha 3050, Qatar
| | - Faezeh Mojahedi
- Mashhad Medical Genetic Counseling Center, 91767 Mashhad, Iran
| | - Iman Gamal El Din Mahmoud
- Department of Pediatric Neurology, Neurometabolic Unit, Cairo University Children’s Hospital, Cairo 406, Egypt
| | - Naima Bouslam
- Université Mohammed V Souissi, Equipe de Recherchéde Maladies Neurodégéneratives (ERMN) and Centre de Recherche en Épidémiologie Clinique et Essais Thérapeutiques (CRECET), 6402 Rabat, Morocco
| | - Ahmed Bouhouche
- Université Mohammed V Souissi, Equipe de Recherchéde Maladies Neurodégéneratives (ERMN) and Centre de Recherche en Épidémiologie Clinique et Essais Thérapeutiques (CRECET), 6402 Rabat, Morocco
| | - Ali Benomar
- Université Mohammed V Souissi, Equipe de Recherchéde Maladies Neurodégéneratives (ERMN) and Centre de Recherche en Épidémiologie Clinique et Essais Thérapeutiques (CRECET), 6402 Rabat, Morocco
| | - Sylvain Hanein
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
| | - Laure Raymond
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
| | - Sylvie Forlani
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
| | - Massimo Mascaro
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Laila Selim
- Department of Pediatric Neurology, Neurometabolic Unit, Cairo University Children’s Hospital, Cairo 406, Egypt
| | - Nabil Shehata
- Department of Pediatrics and Neonatology, Saudi German Hospital, Post Office Box 84348, Riyadh, Kingdom of Saudi Arabia
| | - Nasir Al-Allawi
- Department of Pathology, School of Medicine, University of Dohuk, Dohuk, Iraq
| | - P.S. Bindu
- Department of Neurology, National Institute of Mental Health and Neurosciences, Bangalore, India
| | - Matloob Azam
- Department of Pediatrics and Child Neurology, Wah Medical College, Wah Cantt, Pakistan
| | - Murat Gunel
- Department of Genetics and Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Ahmet Caglayan
- Department of Genetics and Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Kaya Bilguvar
- Department of Genetics and Neurosurgery, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Aslihan Tolun
- Department of Molecular Biology and Genetics, Bogazici University, 34342 Istanbul, Turkey
| | - Mahmoud Y. Issa
- Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Center, Cairo 12311, Egypt
| | - Jana Schroth
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emily G. Spencer
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Rasim O. Rosti
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Naiara Akizu
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Keith K. Vaux
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Anide Johansen
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alice A. Koh
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| | - Hisham Megahed
- Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Center, Cairo 12311, Egypt
| | - Alexandra Durr
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
- Assistance Publique–Hôpitaux de Paris, Fédération de Génétique, Pitié-Salpêtrière Hospital, 75013 Paris, France
| | - Alexis Brice
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
- Assistance Publique–Hôpitaux de Paris, Fédération de Génétique, Pitié-Salpêtrière Hospital, 75013 Paris, France
- Institut du Cerveau et de la Moelle Épinière, 75013 Paris, France
| | - Giovanni Stevanin
- Centre de Recherche de l’Institut du Cerveau et de la Moelle épinière, INSERM U1127, CNRS UMR7225; UPMC Univ Paris VI UMR_S975, 75013 Paris, France
- Assistance Publique–Hôpitaux de Paris, Fédération de Génétique, Pitié-Salpêtrière Hospital, 75013 Paris, France
- Institut du Cerveau et de la Moelle Épinière, 75013 Paris, France
- Laboratoire de Neurogénétique, Ecole Pratique des Hautes Etudes, Institut du Cerveau et de la Moelle Épinière, 75013 Paris, France
| | - Stacy B. Gabriel
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Trey Ideker
- Department of Computer Science and Engineering and Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Joseph G. Gleeson
- Howard Hughes Medical Institute, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
435
|
Gillis J, Ballouz S, Pavlidis P. Bias tradeoffs in the creation and analysis of protein-protein interaction networks. J Proteomics 2014; 100:44-54. [PMID: 24480284 DOI: 10.1016/j.jprot.2014.01.020] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Revised: 01/13/2014] [Accepted: 01/17/2014] [Indexed: 02/04/2023]
Abstract
UNLABELLED Networks constructed from aggregated protein-protein interaction data are commonplace in biology. But the studies these data are derived from were conducted with their own hypotheses and foci. Focusing on data from budding yeast present in BioGRID, we determine that many of the downstream signals present in network data are significantly impacted by biases in the original data. We determine the degree to which selection bias in favor of biologically interesting bait proteins goes down with study size, while we also find that promiscuity in prey contributes more substantially in larger studies. We analyze interaction studies over time with respect to data in the Gene Ontology and find that reproducibly observed interactions are less likely to favor multifunctional proteins. We find that strong alignment between co-expression and protein-protein interaction data occurs only for extreme co-expression values, and use this data to suggest candidates for targets likely to reveal novel biology in follow-up studies. BIOLOGICAL SIGNIFICANCE Protein-protein interaction data finds particularly heavy use in the interpretation of disease-causal variants. In principle, network data allows researchers to find novel commonalities among candidate genes. In this study, we detail several of the most salient biases contributing to aggregated protein-protein interaction databases. We find strong evidence for the role of selection and laboratory biases. Many of these effects contribute to the commonalities researchers find for disease genes. In order for characterization of disease genes and their interactions to not simply be an artifact of researcher preference, it is imperative to identify data biases explicitly. Based on this, we also suggest ways to move forward in producing candidates less influenced by prior knowledge. This article is part of a Special Issue entitled: Can Proteomics Fill the Gap Between Genomics and Phenotypes?
Collapse
Affiliation(s)
- Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY 11797, United States.
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY 11797, United States.
| | - Paul Pavlidis
- Department of Psychiatry and Centre for High-Throughput Biology, University of British Columbia, 2185 East Mall., Vancouver, BC V6T 1Z4, Canada.
| |
Collapse
|
436
|
Groop L, Pociot F. Genetics of diabetes--are we missing the genes or the disease? Mol Cell Endocrinol 2014; 382:726-739. [PMID: 23587769 DOI: 10.1016/j.mce.2013.04.002] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 01/25/2013] [Accepted: 04/02/2013] [Indexed: 12/20/2022]
Abstract
Diabetes is a group of metabolic diseases characterized by hyperglycemia resulting from defects in insulin secretion, insulin action, or both. The chronic hyperglycemia of diabetes is associated with long-term damage, dysfunction, and failure of different organs, especially the eyes, kidneys, nerves, heart, and blood vessels. Several pathogenic processes are involved in the development of diabetes. These range from autoimmune destruction of the beta-cells of the pancreas with consequent insulin deficiency to abnormalities that result in resistance to insulin action (American Diabetes Association, 2011). The vast majority of cases of diabetes fall into two broad categories. In type 1 diabetes (T1D), the cause is an absolute deficiency of insulin secretion, whereas in type 2 diabetes (T2D), the cause is a combination of resistance to insulin action and an inadequate compensatory insulin secretory response. However, the subdivision into two main categories represents a simplification of the real situation, and research during the recent years has shown that the disease is much more heterogeneous than a simple subdivision into two major subtypes assumes. Worldwide prevalence figures estimate that there are 280 million diabetic patients in 2011 and more than 500 million in 2030 (http://www.diabetesatlas.org/). In Europe, about 6-8% of the population suffer from diabetes, of them about 90% has T2D and 10% T1D, thereby making T2D to the fastest increasing disease in Europe and worldwide. This epidemic has been ascribed to a collision between the genes and the environment. While our knowledge about the genes is clearly better for T1D than for T2D given the strong contribution of variation in the HLA region to the risk of T1D, the opposite is the case for T2D, where our knowledge about the environmental triggers (obesity, lack of exercise) is much better than the understanding of the underlying genetic causes. This lack of knowledge about the underlying genetic causes of diabetes is often referred to as missing heritability (Manolio et al., 2009) which exceeds 80% for T2D but less than 25% for T1D. In the following review, we will discuss potential sources of this missing heritability which also includes the possibility that our definition of diabetes and its subgroups is imprecise and thereby making the identification of genetic causes difficult.
Collapse
Affiliation(s)
- Leif Groop
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, University Hospital Skåne, Malmö, Sweden; Glostrup Research Institute, Glostrup University Hospital, Glostrup, Denmark.
| | - Flemming Pociot
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, University Hospital Skåne, Malmö, Sweden; Glostrup Research Institute, Glostrup University Hospital, Glostrup, Denmark
| |
Collapse
|
437
|
Qian Y, Besenbacher S, Mailund T, Schierup MH. Identifying disease associated genes by network propagation. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 1:S6. [PMID: 24565229 PMCID: PMC4080512 DOI: 10.1186/1752-0509-8-s1-s6] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background Genome-wide association studies have identified many individual genes associated with complex traits. However, pathway and network information have not been fully exploited in searches for genetic determinants, and including this information may increase our understanding of the underlying biology of common diseases. Results In this study, we propose a framework to address this problem in a principled way, with the underlying hypothesis that complex disease operates through multiple connected genes. Associations inferred from GWAS are translated into prior scores for vertices in a protein-protein interaction network, and these scores are propagated through the network. Permutation is used to select genes that are guilty-by-association and thus consistently obtain high scores after network propagation. We apply the approach to data of Crohn's disease and call candidate genes that have been reported by other independent GWAS, but not in the analysed data set. A prediction model based on these candidate genes show good predictive power as measured by Area Under the Receiver Operating Curve (AUC) in 10 fold cross-validations. Conclusions Our network propagation method applied to a genome-wide association study increases association findings over other approaches.
Collapse
|
438
|
Chung MI, Kwon T, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB. Coordinated genomic control of ciliogenesis and cell movement by RFX2. eLife 2014; 3:e01439. [PMID: 24424412 PMCID: PMC3889689 DOI: 10.7554/elife.01439] [Citation(s) in RCA: 103] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 11/27/2013] [Indexed: 12/16/2022] Open
Abstract
The mechanisms linking systems-level programs of gene expression to discrete cell biological processes in vivo remain poorly understood. In this study, we have defined such a program for multi-ciliated epithelial cells (MCCs), a cell type critical for proper development and homeostasis of the airway, brain and reproductive tracts. Starting from genomic analysis of the cilia-associated transcription factor Rfx2, we used bioinformatics and in vivo cell biological approaches to gain insights into the molecular basis of cilia assembly and function. Moreover, we discovered a previously un-recognized role for an Rfx factor in cell movement, finding that Rfx2 cell-autonomously controls apical surface expansion in nascent MCCs. Thus, Rfx2 coordinates multiple, distinct gene expression programs in MCCs, regulating genes that control cell movement, ciliogenesis, and cilia function. As such, the work serves as a paradigm for understanding genomic control of cell biological processes that span from early cell morphogenetic events to terminally differentiated cellular functions. DOI: http://dx.doi.org/10.7554/eLife.01439.001.
Collapse
Affiliation(s)
- Mei-I Chung
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
| | - Taejoon Kwon
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
| | - Fan Tu
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
| | - Eric R Brooks
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
| | - Rakhi Gupta
- Department of Genetics, Stanford University, Stanford, United States
| | - Matthew Meyer
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
| | - Julie C Baker
- Department of Genetics, Stanford University, Stanford, United States
| | - Edward M Marcotte
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, United States
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, United States
| | - John B Wallingford
- Department of Molecular Biosciences, University of Texas at Austin, Austin, United States
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, United States
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, United States
- Howard Hughes Medical Institute, University of Texas at Austin, Austin, United States
| |
Collapse
|
439
|
Wu B, Xie J, Du Z, Wu J, Zhang P, Xu L, Li E. PPI network analysis of mRNA expression profile of ezrin knockdown in esophageal squamous cell carcinoma. BIOMED RESEARCH INTERNATIONAL 2014; 2014:651954. [PMID: 25126570 PMCID: PMC4122099 DOI: 10.1155/2014/651954] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 06/13/2014] [Accepted: 06/17/2014] [Indexed: 02/05/2023]
Abstract
Ezrin, coding protein EZR which cross-links actin filaments, overexpresses and involves invasion, metastasis, and poor prognosis in various cancers including esophageal squamous cell carcinoma (ESCC). In our previous study, Ezrin was knock down and analyzed by mRNA expression profile which has not been fully mined. In this study, we applied protein-protein interactions (PPI) network knowledge and methods to explore our understanding of these differentially expressed genes (DEGs). PPI subnetworks showed that hundreds of DEGs interact with thousands of other proteins. Subcellular localization analyses found that the DEGs and their directly or indirectly interacting proteins distribute in multiple layers, which was applied to analyze the shortest paths between EZR and other DEGs. Gene ontology annotation generated a functional annotation map and found hundreds of significant terms, especially those associated with cytoskeleton organization of Ezrin protein, such as "cytoskeleton organization," "regulation of actin filament-based process," and "regulation of actin cytoskeleton organization." The algorithm of Random Walk with Restart was applied to prioritize the DEGs and identified several cancer related DEGs ranked closest to EZR. These analyses based on PPI network have greatly expanded our comprehension of the mRNA expression profile of Ezrin knockdown for future examination of the roles and mechanisms of Ezrin.
Collapse
Affiliation(s)
- Bingli Wu
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
| | - Jianjun Xie
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
| | - Zepeng Du
- Department of Pathology, Shantou Central Hospital, Shantou 515041, China
| | - Jianyi Wu
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
| | - Pixian Zhang
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
| | - Liyan Xu
- Institute of Oncologic Pathology, Shantou University Medical College, Shantou 515041, China
- *Liyan Xu: and
| | - Enmin Li
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou 515041, China
- *Enmin Li:
| |
Collapse
|
440
|
Yu D, Kim M, Xiao G, Hwang TH. Review of biological network data and its applications. Genomics Inform 2013; 11:200-10. [PMID: 24465231 PMCID: PMC3897847 DOI: 10.5808/gi.2013.11.4.200] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/20/2013] [Accepted: 11/21/2013] [Indexed: 12/16/2022] Open
Abstract
Studying biological networks, such as protein-protein interactions, is key to understanding complex biological activities. Various types of large-scale biological datasets have been collected and analyzed with high-throughput technologies, including DNA microarray, next-generation sequencing, and the two-hybrid screening system, for this purpose. In this review, we focus on network-based approaches that help in understanding biological systems and identifying biological functions. Accordingly, this paper covers two major topics in network biology: reconstruction of gene regulatory networks and network-based applications, including protein function prediction, disease gene prioritization, and network-based genome-wide association study.
Collapse
Affiliation(s)
- Donghyeon Yu
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Minsoo Kim
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tae Hyun Hwang
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
441
|
Hou L, Chen M, Zhang CK, Cho J, Zhao H. Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum Mol Genet 2013; 23:2780-90. [PMID: 24381306 DOI: 10.1093/hmg/ddt668] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Although Genome Wide Association Studies (GWAS) have identified many susceptibility loci for common diseases, they only explain a small portion of heritability. It is challenging to identify the remaining disease loci because their association signals are likely weak and difficult to identify among millions of candidates. One potentially useful direction to increase statistical power is to incorporate functional genomics information, especially gene expression networks, to prioritize GWAS signals. Most current methods utilizing network information to prioritize disease genes are based on the 'guilt by association' principle, in which networks are treated as static, and disease-associated genes are assumed to locate closer with each other than random pairs in the network. In contrast, we propose a novel 'guilt by rewiring' principle. Studying the dynamics of gene networks between controls and patients, this principle assumes that disease genes more likely undergo rewiring in patients, whereas most of the network remains unaffected in disease condition. To demonstrate this principle, we consider the changes of co-expression networks in Crohn's disease patients and controls, and how network dynamics reveals information on disease associations. Our results demonstrate that network rewiring is abundant in the immune system, and disease-associated genes are more likely to be rewired in patients. To integrate this network rewiring feature and GWAS signals, we propose to use the Markov random field framework to integrate network information to prioritize genes. Applications in Crohn's disease and Parkinson's disease show that this framework leads to more replicable results, and implicates potentially disease-associated pathways.
Collapse
Affiliation(s)
- Lin Hou
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | | | | | | | | |
Collapse
|
442
|
Cheng J, Wu W, Zhang Y, Li X, Jiang X, Wei G, Tao S. A new computational strategy for predicting essential genes. BMC Genomics 2013; 14:910. [PMID: 24359534 PMCID: PMC3880044 DOI: 10.1186/1471-2164-14-910] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 11/29/2013] [Indexed: 12/17/2022] Open
Abstract
Background Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. Results We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. Conclusions FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Collapse
Affiliation(s)
| | | | | | | | | | - Gehong Wei
- College of Life Science, State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, China.
| | | |
Collapse
|
443
|
Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 2013; 14:719-32. [PMID: 24045689 DOI: 10.1038/nrg3552] [Citation(s) in RCA: 363] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
A central goal of systems biology is to elucidate the structural and functional architecture of the cell. To this end, large and complex networks of molecular interactions are being rapidly generated for humans and model organisms. A recent focus of bioinformatics research has been to integrate these networks with each other and with diverse molecular profiles to identify sets of molecules and interactions that participate in a common biological function - that is, 'modules'. Here, we classify such integrative approaches into four broad categories, describe their bioinformatic principles and review their applications.
Collapse
|
444
|
Carter H, Hofree M, Ideker T. Genotype to phenotype via network analysis. Curr Opin Genet Dev 2013; 23:611-21. [PMID: 24238873 PMCID: PMC3866044 DOI: 10.1016/j.gde.2013.10.003] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Revised: 10/04/2013] [Accepted: 10/09/2013] [Indexed: 02/06/2023]
Abstract
A prime objective of genomic medicine is the identification of disease-causing mutations and the mechanisms by which such events result in disease. As most disease phenotypes arise not from single genes and proteins but from a complex network of molecular interactions, a priori knowledge about the molecular network serves as a framework for biological inference and data mining. Here we review recent developments at the interface of biological networks and mutation analysis. We examine how mutations may be treated as a perturbation of the molecular interaction network and what insights may be gained from taking this perspective. We review work that aims to transform static networks into rich context-dependent networks and recent attempts to integrate non-coding RNAs into such analysis. Finally, we conclude with an overview of the many challenges and opportunities that lie ahead.
Collapse
Affiliation(s)
- Hannah Carter
- Institute for Genomic Medicine and Department of Medicine, University of California, San Diego, 9500 Gillman Drive, La Jolla, CA 92093, United States
| | | | | |
Collapse
|
445
|
Leiserson MDM, Eldridge JV, Ramachandran S, Raphael BJ. Network analysis of GWAS data. Curr Opin Genet Dev 2013; 23:602-10. [PMID: 24287332 PMCID: PMC3867794 DOI: 10.1016/j.gde.2013.09.003] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 09/19/2013] [Accepted: 09/23/2013] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies (GWAS) identify genetic variants that distinguish a control population from a population with a specific trait. Two challenges in GWAS are: (1) identification of the causal variant within a longer haplotype that is associated with the trait; (2) identification of causal variants for polygenic traits that are caused by variants in multiple genes within a pathway. We review recent methods that use information in protein-protein and protein-DNA interaction networks to address these two challenges.
Collapse
Affiliation(s)
- Mark D M Leiserson
- Department of Computer Science, Brown University, Providence, RI 02912, United States; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, United States
| | | | | | | |
Collapse
|
446
|
Fuxman Bass JI, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM. Using networks to measure similarity between genes: association index selection. Nat Methods 2013; 10:1169-76. [PMID: 24296474 PMCID: PMC3959882 DOI: 10.1038/nmeth.2728] [Citation(s) in RCA: 164] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 07/22/2013] [Indexed: 02/08/2023]
Abstract
Biological networks can be used to functionally annotate genes on the basis of interaction-profile similarities. Metrics known as association indices can be used to quantify interaction-profile similarity. We provide an overview of commonly used association indices, including the Jaccard index and the Pearson correlation coefficient, and compare their performance in different types of analyses of biological networks. We introduce the Guide for Association Index for Networks (GAIN), a web tool for calculating and comparing interaction-profile similarities and defining modules of genes with similar profiles.
Collapse
Affiliation(s)
- Juan I Fuxman Bass
- 1] Program in Systems Biology, University of Massachusetts Medical School, Worcester, Massachusetts, USA. [2] Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | | | | | | | | | | |
Collapse
|
447
|
Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, Melamed R, Rabadan R, Bernstam EV, Brunak S, Jensen LJ, Nicolae D, Shah NH, Grossman RL, Cox NJ, White KP, Rzhetsky A. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell 2013; 155:70-80. [PMID: 24074861 DOI: 10.1016/j.cell.2013.08.030] [Citation(s) in RCA: 142] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 03/30/2013] [Accepted: 08/16/2013] [Indexed: 12/19/2022]
Abstract
Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.
Collapse
Affiliation(s)
- David R Blair
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
448
|
Teng L, He B, Gao P, Gao L, Tan K. Discover context-specific combinatorial transcription factor interactions by integrating diverse ChIP-Seq data sets. Nucleic Acids Res 2013; 42:e24. [PMID: 24217919 PMCID: PMC3936738 DOI: 10.1093/nar/gkt1105] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Combinatorial interactions among transcription factors (TFs) are critical for integrating diverse intrinsic and extrinsic signals, fine-tuning regulatory output and increasing the robustness and plasticity of regulatory systems. Current knowledge about combinatorial regulation is rather limited due to the lack of suitable experimental technologies and bioinformatics tools. The rapid accumulation of ChIP-Seq data has provided genome-wide occupancy maps for a large number of TFs and chromatin modification marks for identifying enhancers without knowing individual TF binding sites. Integration of the two data types has not been researched extensively, resulting in underused data and missed opportunities. We describe a novel method for discovering frequent combinatorial occupancy patterns by multiple TFs at enhancers. Our method is based on probabilistic item set mining and takes into account uncertainty in both types of ChIP-Seq data. By joint analysis of 108 TFs in four human cell types, we found that cell–type-specific interactions among TFs are abundant and that the majority of enhancers have flexible architecture. We show that several families of transposable elements disproportionally overlap with enhancers with combinatorial patterns, suggesting that these transposable element families play an important role in the evolution of combinatorial regulation.
Collapse
Affiliation(s)
- Li Teng
- Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA, Interdisciplinary Graduate Program in Genetics, University of Iowa, Iowa City, IA 52242, USA and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | | | | | | | | |
Collapse
|
449
|
Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods 2013; 10:1108-15. [PMID: 24037242 PMCID: PMC3866081 DOI: 10.1038/nmeth.2651] [Citation(s) in RCA: 530] [Impact Index Per Article: 44.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 08/12/2013] [Indexed: 12/30/2022]
Abstract
Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence.
Collapse
Affiliation(s)
- Matan Hofree
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California USA
| | - John P Shen
- Department of Medicine, University of California, San Diego, La Jolla, California USA
| | - Hannah Carter
- Department of Medicine, University of California, San Diego, La Jolla, California USA
| | - Andrew Gross
- Department of Bioengineering, University of California, San Diego, La Jolla, California USA
| | - Trey Ideker
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California USA
- Department of Medicine, University of California, San Diego, La Jolla, California USA
- Department of Bioengineering, University of California, San Diego, La Jolla, California USA
| |
Collapse
|
450
|
Pavlidis P, Gillis J. Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update. F1000Res 2013; 2:230. [PMID: 24715959 PMCID: PMC3962002 DOI: 10.12688/f1000research.2-230.v1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/21/2013] [Indexed: 12/12/2022] Open
Abstract
In an opinion published in 2012, we reviewed and discussed our studies of how gene network-based guilt-by-association (GBA) is impacted by confounds related to gene multifunctionality. We found such confounds account for a significant part of the GBA signal, and as a result meaningfully evaluating and applying computationally-guided GBA is more challenging than generally appreciated. We proposed that effort currently spent on incrementally improving algorithms would be better spent in identifying the features of data that do yield novel functional insights. We also suggested that part of the problem is the reliance by computational biologists on gold standard annotations such as the Gene Ontology. In the year since, there has been continued heavy activity in GBA-based research, including work that contributes to our understanding of the issues we raised. Here we provide a review of some of the most relevant recent work, or which point to new areas of progress and challenges.
Collapse
Affiliation(s)
- Paul Pavlidis
- Centre for High-Throughput Biology and Department of Psychiatry, University of British Columbia, Vancouver, V6T1Z4, Canada
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY, 11797, USA
| |
Collapse
|