1
|
Fischer SN, Claussen ER, Kourtis S, Sdelci S, Orchard S, Hermjakob H, Kustatscher G, Drew K. hu.MAP3.0: atlas of human protein complexes by integration of >25,000 proteomic experiments. Mol Syst Biol 2025:10.1038/s44320-025-00121-5. [PMID: 40425816 DOI: 10.1038/s44320-025-00121-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Revised: 05/07/2025] [Accepted: 05/09/2025] [Indexed: 05/29/2025] Open
Abstract
Macromolecular protein complexes carry out most cellular functions. Unfortunately, we lack the subunit composition for many human protein complexes. To address this gap we integrated >25,000 mass spectrometry experiments using a machine learning approach to identify >15,000 human protein complexes. We show our map of protein complexes is highly accurate and more comprehensive than previous maps, placing nearly 70% of human proteins into their physical contexts. We globally characterize our complexes using mass spectrometry based protein covariation data (ProteomeHD.2) and identify covarying complexes suggesting common functional associations. hu.MAP3.0 generates testable functional hypotheses for 472 uncharacterized proteins which we support using AlphaFold modeling. Additionally, we use AlphaFold modeling to identify 5871 mutually exclusive proteins in hu.MAP3.0 complexes suggesting complexes serve different functional roles depending on their subunit composition. We identify expression as the primary way cells and organisms relieve the conflict of mutually exclusive subunits. Finally, we import our complexes to EMBL-EBI's Complex Portal ( https://www.ebi.ac.uk/complexportal/home ) and provide complexes through our hu.MAP3.0 web interface ( https://humap3.proteincomplexes.org/ ). We expect our resource to be highly impactful to the broader research community.
Collapse
Affiliation(s)
- Samantha N Fischer
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Erin R Claussen
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Savvas Kourtis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sara Sdelci
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Georg Kustatscher
- Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, UK
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, 60607, USA.
| |
Collapse
|
2
|
Fischer SN, Claussen ER, Kourtis S, Sdelci S, Orchard S, Hermjakob H, Kustatscher G, Drew K. hu.MAP3.0: Atlas of human protein complexes by integration of > 25,000 proteomic experiments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617930. [PMID: 39464102 PMCID: PMC11507723 DOI: 10.1101/2024.10.11.617930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Macromolecular protein complexes carry out most functions in the cell including essential functions required for cell survival. Unfortunately, we lack the subunit composition for all human protein complexes. To address this gap we integrated >25,000 mass spectrometry experiments using a machine learning approach to identify > 15,000 human protein complexes. We show our map of protein complexes is highly accurate and more comprehensive than previous maps, placing ~75% of human proteins into their physical contexts. We globally characterize our complexes using protein co-variation data (ProteomeHD.2) and identify co-varying complexes suggesting common functional associations. Our map also generates testable functional hypotheses for 472 uncharacterized proteins which we support using AlphaFold modeling. Additionally, we use AlphaFold modeling to identify 511 mutually exclusive protein pairs in hu.MAP3.0 complexes suggesting complexes serve different functional roles depending on their subunit composition. We identify expression as the primary way cells and organisms relieve the conflict of mutually exclusive subunits. Finally, we import our complexes to EMBL-EBI's Complex Portal (https://www.ebi.ac.uk/complexportal/home) as well as provide complexes through our hu.MAP3.0 web interface (https://humap3.proteincomplexes.org/). We expect our resource to be highly impactful to the broader research community.
Collapse
Affiliation(s)
- Samantha N. Fischer
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607
| | - Erin R. Claussen
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607
| | - Savvas Kourtis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sara Sdelci
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Georg Kustatscher
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607
| |
Collapse
|
3
|
Perrone MC, Lerner MG, Dunworth M, Ewald AJ, Bader JS. Prioritizing drug targets by perturbing biological network response functions. PLoS Comput Biol 2024; 20:e1012195. [PMID: 38935814 PMCID: PMC11236158 DOI: 10.1371/journal.pcbi.1012195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/10/2024] [Accepted: 05/24/2024] [Indexed: 06/29/2024] Open
Abstract
Therapeutic interventions are designed to perturb the function of a biological system. However, there are many types of proteins that cannot be targeted with conventional small molecule drugs. Accordingly, many identified gene-regulatory drivers and downstream effectors are currently undruggable. Drivers and effectors are often connected by druggable signaling and regulatory intermediates. Methods to identify druggable intermediates therefore have general value in expanding the set of targets available for hypothesis-driven validation. Here we identify and prioritize potential druggable intermediates by developing a network perturbation theory, termed NetPert, for response functions of biological networks. Dynamics are defined by a network structure in which vertices represent genes and proteins, and edges represent gene-regulatory interactions and protein-protein interactions. Perturbation theory for network dynamics prioritizes targets that interfere with signaling from driver to response genes. Applications to organoid models for metastatic breast cancer demonstrate the ability of this mathematical framework to identify and prioritize druggable intermediates. While the short-time limit of the perturbation theory resembles betweenness centrality, NetPert is superior in generating target rankings that correlate with previous wet-lab assays and are more robust to incomplete or noisy network data. NetPert also performs better than a related graph diffusion approach. Wet-lab assays demonstrate that drugs for targets identified by NetPert, including targets that are not themselves differentially expressed, are active in suppressing additional metastatic phenotypes.
Collapse
Affiliation(s)
- Matthew C. Perrone
- Institute for Computational Medicine and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Michael G. Lerner
- Department of Physics, Engineering and Astronomy, Earlham College, Richmond, Indiana, United States of America
| | - Matthew Dunworth
- Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Andrew J. Ewald
- Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Baltimore, Maryland, United States of America
- Giovanis Institute for Translational Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Joel S. Bader
- Institute for Computational Medicine and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Baltimore, Maryland, United States of America
- Giovanis Institute for Translational Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
4
|
Cox RM, Papoulas O, Shril S, Lee C, Gardner T, Battenhouse AM, Lee M, Drew K, McWhite CD, Yang D, Leggere JC, Durand D, Hildebrandt F, Wallingford JB, Marcotte EM. Ancient eukaryotic protein interactions illuminate modern genetic traits and disorders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.26.595818. [PMID: 38853926 PMCID: PMC11160598 DOI: 10.1101/2024.05.26.595818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
All eukaryotes share a common ancestor from roughly 1.5 - 1.8 billion years ago, a single-celled, swimming microbe known as LECA, the Last Eukaryotic Common Ancestor. Nearly half of the genes in modern eukaryotes were present in LECA, and many current genetic diseases and traits stem from these ancient molecular systems. To better understand these systems, we compared genes across modern organisms and identified a core set of 10,092 shared protein-coding gene families likely present in LECA, a quarter of which are uncharacterized. We then integrated >26,000 mass spectrometry proteomics analyses from 31 species to infer how these proteins interact in higher-order complexes. The resulting interactome describes the biochemical organization of LECA, revealing both known and new assemblies. We analyzed these ancient protein interactions to find new human gene-disease relationships for bone density and congenital birth defects, demonstrating the value of ancestral protein interactions for guiding functional genetics today.
Collapse
Affiliation(s)
- Rachael M Cox
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Ophelia Papoulas
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Shirlee Shril
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - Chanjae Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Tynan Gardner
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Anna M Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Muyoung Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - David Yang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Janelle C Leggere
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, 4400 5th Avenue Pittsburgh, PA 15213, USA
| | - Friedhelm Hildebrandt
- Division of Nephrology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02215, USA
| | - John B Wallingford
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
5
|
Farooq QUA, Shaukat Z, Aiman S, Li CH. Protein-protein interactions: Methods, databases, and applications in virus-host study. World J Virol 2021; 10:288-300. [PMID: 34909403 PMCID: PMC8641042 DOI: 10.5501/wjv.v10.i6.288] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/19/2021] [Accepted: 07/30/2021] [Indexed: 02/06/2023] Open
Abstract
Almost all the cellular processes in a living system are controlled by proteins: They regulate gene expression, catalyze chemical reactions, transport small molecules across membranes, and transmit signal across membranes. Even, a viral infection is often initiated through virus-host protein interactions. Protein-protein interactions (PPIs) are the physical contacts between two or more proteins and they represent complex biological functions. Nowadays, PPIs have been used to construct PPI networks to study complex pathways for revealing the functions of unknown proteins. Scientists have used PPIs to find the molecular basis of certain diseases and also some potential drug targets. In this review, we will discuss how PPI networks are essential to understand the molecular basis of virus-host relationships and several databases which are dedicated to virus-host interaction studies. Here, we present a short but comprehensive review on PPIs, including the experimental and computational methods of finding PPIs, the databases dedicated to virus-host PPIs, and the associated various applications in protein interaction networks of some lethal viruses with their hosts.
Collapse
Affiliation(s)
- Qurat ul Ain Farooq
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Zeeshan Shaukat
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Sara Aiman
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chun-Hua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
6
|
Drew K, Wallingford JB, Marcotte EM. hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol Syst Biol 2021; 17:e10016. [PMID: 33973408 PMCID: PMC8111494 DOI: 10.15252/msb.202010016] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 12/30/2022] Open
Abstract
A general principle of biology is the self-assembly of proteins into functional complexes. Characterizing their composition is, therefore, required for our understanding of cellular functions. Unfortunately, we lack knowledge of the comprehensive set of identities of protein complexes in human cells. To address this gap, we developed a machine learning framework to identify protein complexes in over 15,000 mass spectrometry experiments which resulted in the identification of nearly 7,000 physical assemblies. We show our resource, hu.MAP 2.0, is more accurate and comprehensive than previous state of the art high-throughput protein complex resources and gives rise to many new hypotheses, including for 274 completely uncharacterized proteins. Further, we identify 253 promiscuous proteins that participate in multiple complexes pointing to possible moonlighting roles. We have made hu.MAP 2.0 easily searchable in a web interface (http://humap2.proteincomplexes.org/), which will be a valuable resource for researchers across a broad range of interests including systems biology, structural biology, and molecular explanations of disease.
Collapse
Affiliation(s)
- Kevin Drew
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
- Present address:
Department of Biological SciencesUniversity of Illinois at ChicagoChicagoILUSA
| | - John B Wallingford
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| | - Edward M Marcotte
- Department of Molecular BiosciencesCenter for Systems and Synthetic BiologyUniversity of TexasAustinTXUSA
| |
Collapse
|
7
|
Rohde PD, Kristensen TN, Sarup P, Muñoz J, Malmendal A. Prediction of complex phenotypes using the Drosophila melanogaster metabolome. Heredity (Edinb) 2021; 126:717-732. [PMID: 33510469 PMCID: PMC8102504 DOI: 10.1038/s41437-021-00404-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/04/2021] [Accepted: 01/04/2021] [Indexed: 01/30/2023] Open
Abstract
Understanding the genotype-phenotype map and how variation at different levels of biological organization is associated are central topics in modern biology. Fast developments in sequencing technologies and other molecular omic tools enable researchers to obtain detailed information on variation at DNA level and on intermediate endophenotypes, such as RNA, proteins and metabolites. This can facilitate our understanding of the link between genotypes and molecular and functional organismal phenotypes. Here, we use the Drosophila melanogaster Genetic Reference Panel and nuclear magnetic resonance (NMR) metabolomics to investigate the ability of the metabolome to predict organismal phenotypes. We performed NMR metabolomics on four replicate pools of male flies from each of 170 different isogenic lines. Our results show that metabolite profiles are variable among the investigated lines and that this variation is highly heritable. Second, we identify genes associated with metabolome variation. Third, using the metabolome gave better prediction accuracies than genomic information for four of five quantitative traits analyzed. Our comprehensive characterization of population-scale diversity of metabolomes and its genetic basis illustrates that metabolites have large potential as predictors of organismal phenotypes. This finding is of great importance, e.g., in human medicine, evolutionary biology and animal and plant breeding.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| | - Torsten Nygaard Kristensen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
- Department of Animal Science, Aarhus University, Tjele, Denmark
| | - Pernille Sarup
- Department of Molecular Biology and Genetics, Aarhus University, Tjele, Denmark
- Nordic Seed A/S, Odder, Denmark
| | - Joaquin Muñoz
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Anders Malmendal
- Department of Science and Environment, Roskilde University, Roskilde, Denmark.
| |
Collapse
|
8
|
Hristov BH, Chazelle B, Singh M. uKIN Combines New and Prior Information with Guided Network Propagation to Accurately Identify Disease Genes. Cell Syst 2020; 10:470-479.e3. [PMID: 32684276 PMCID: PMC7821437 DOI: 10.1016/j.cels.2020.05.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/24/2020] [Accepted: 05/19/2020] [Indexed: 12/23/2022]
Abstract
Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
9
|
Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2020; 47:D573-D580. [PMID: 30418591 PMCID: PMC6323914 DOI: 10.1093/nar/gky1126] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022] Open
Abstract
Human gene networks have proven useful in many aspects of disease research, with numerous network-based strategies developed for generating hypotheses about gene-disease-drug associations. The ability to predict and organize genes most relevant to a specific disease has proven especially important. We previously developed a human functional gene network, HumanNet, by integrating diverse types of omics data using Bayesian statistics framework and demonstrated its ability to retrieve disease genes. Here, we present HumanNet v2 (http://www.inetbio.org/humannet), a database of human gene networks, which was updated by incorporating new data types, extending data sources and improving network inference algorithms. HumanNet now comprises a hierarchy of human gene networks, allowing for more flexible incorporation of network information into studies. HumanNet performs well in ranking disease-linked gene sets with minimal literature-dependent biases. We observe that incorporating model organisms’ protein–protein interactions does not markedly improve disease gene predictions, suggesting that many of the disease gene associations are now captured directly in human-derived datasets. With an improved interactive user interface for disease network analysis, we expect HumanNet will be a useful resource for network medicine.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Eiru Kim
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Traver Hart
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
10
|
Levy M, Sporns O, MacLean JN. Network Analysis of Murine Cortical Dynamics Implicates Untuned Neurons in Visual Stimulus Coding. Cell Rep 2020; 31:107483. [PMID: 32294431 PMCID: PMC7218481 DOI: 10.1016/j.celrep.2020.03.047] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 01/22/2020] [Accepted: 03/13/2020] [Indexed: 02/02/2023] Open
Abstract
Unbiased and dense sampling of large populations of layer 2/3 pyramidal neurons in mouse primary visual cortex (V1) reveals two functional sub-populations: neurons tuned and untuned to drifting gratings. Whether functional interactions between these two groups contribute to the representation of visual stimuli is unclear. To examine these interactions, we summarize the population partial pairwise correlation structure as a directed and weighted graph. We find that tuned and untuned neurons have distinct topological properties, with untuned neurons occupying central positions in functional networks (FNs). Implementation of a decoder that utilizes the topology of these FNs yields accurate decoding of visual stimuli. We further show that decoding performance degrades comparably following manipulations of either tuned or untuned neurons. Our results demonstrate that untuned neurons are an integral component of V1 FNs and suggest that network interactions contain information about the stimulus that is accessible to downstream elements.
Collapse
Affiliation(s)
- Maayan Levy
- Committee on Computational Neuroscience, The University of Chicago, Chicago, IL 60637, USA
| | - Olaf Sporns
- Indiana University Network Science Institute, Indiana University, Bloomington, IN 47405, USA; Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA
| | - Jason N MacLean
- Committee on Computational Neuroscience, The University of Chicago, Chicago, IL 60637, USA; Department of Neurobiology, The University of Chicago, Chicago, IL 60637, USA; Grossman Institute for Neuroscience, Quantitative Biology and Human Behavior.
| |
Collapse
|
11
|
Kim E, Bae D, Yang S, Ko G, Lee S, Lee B, Lee I. BiomeNet: a database for construction and analysis of functional interaction networks for any species with a sequenced genome. Bioinformatics 2020; 36:1584-1589. [PMID: 31599923 PMCID: PMC7703761 DOI: 10.1093/bioinformatics/btz776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 10/01/2019] [Accepted: 10/08/2019] [Indexed: 01/03/2023] Open
Abstract
Motivation Owing to advanced DNA sequencing and genome assembly technology, the number of species with sequenced genomes is rapidly increasing. The aim of the recently launched Earth BioGenome Project is to sequence genomes of all eukaryotic species on Earth over the next 10 years, making it feasible to obtain genomic blueprints of the majority of animal and plant species by this time. Genetic models of the sequenced species will later be subject to functional annotation, and a comprehensive molecular network should facilitate functional analysis of individual genes and pathways. However, network databases are lagging behind genome sequencing projects as even the largest network database provides gene networks for less than 10% of sequenced eukaryotic genomes, and the knowledge gap between genomes and interactomes continues to widen. Results We present BiomeNet, a database of 95 scored networks comprising over 8 million co-functional links, which can build and analyze gene networks for any species with the sequenced genome. BiomeNet transfers functional interactions between orthologous proteins from source networks to the target species within minutes and automatically constructs gene networks with the quality comparable to that of existing networks. BiomeNet enables assembly of the first-in-species gene networks not available through other databases, which are highly predictive of diverse biological processes and can also provide network analysis by extracting subnetworks for individual biological processes and network-based gene prioritizations. These data indicate that BiomeNet could enhance the benefits of decoding the genomes of various species, thus improving our understanding of the Earth’ biodiversity. Availability and implementation The BiomeNet is freely available at http://kobic.re.kr/biomenet/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eiru Kim
- Department of Biotechnology, Yonsei University, Seodaemun-gu, Seoul 03722, Korea
| | - Dasom Bae
- Department of Biotechnology, Yonsei University, Seodaemun-gu, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, Yonsei University, Seodaemun-gu, Seoul 03722, Korea
| | - Gunhwan Ko
- Korean Bioinformation Center, KRIBB, Yuseong-gu, Daejeon 34141, Korea
| | - Sungho Lee
- Department of Biotechnology, Yonsei University, Seodaemun-gu, Seoul 03722, Korea
| | - Byungwook Lee
- Korean Bioinformation Center, KRIBB, Yuseong-gu, Daejeon 34141, Korea
| | - Insuk Lee
- Department of Biotechnology, Yonsei University, Seodaemun-gu, Seoul 03722, Korea
| |
Collapse
|
12
|
Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet 2020; 11:106. [PMID: 32180795 PMCID: PMC7057719 DOI: 10.3389/fgene.2020.00106] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/29/2020] [Indexed: 02/01/2023] Open
Abstract
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion—also referred to as network propagation—has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network diffusion provides a quantitative estimation of network proximity between genes associated with one or more different data types, from simple binary vectors to real vectors. Therefore, this powerful data transformation method has also been increasingly used in integrative analyses of multiple collections of biological scores and/or one or more interaction networks. We present an overview of the state of the art of bioinformatics pipelines that use network diffusion processes for the integrative analysis of omics data. We discuss the fundamental ways in which network diffusion is exploited, open issues and potential developments in the field. Current trends suggest that network diffusion is a tool of broad utility in omics data analysis. It is reasonable to think that it will continue to be used and further refined as new data types arise (e.g. single cell datasets) and the identification of system-level patterns will be considered more and more important in omics data analysis.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.,Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy
| | - Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.,National Institute of Nuclear Physics (INFN), Bologna, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| |
Collapse
|
13
|
Zhou B, Yan Y, Wang Y, You S, Freeman MR, Yang W. Quantitative proteomic analysis of prostate tissue specimens identifies deregulated protein complexes in primary prostate cancer. Clin Proteomics 2019; 16:15. [PMID: 31011308 PMCID: PMC6461817 DOI: 10.1186/s12014-019-9236-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 04/09/2019] [Indexed: 12/18/2022] Open
Abstract
Background Prostate cancer (PCa) is the most frequently diagnosed non-skin cancer and a leading cause of mortality among males in developed countries. However, our understanding of the global changes of protein complexes within PCa tissue specimens remains very limited, although it has been well recognized that protein complexes carry out essentially all major processes in living organisms and that their deregulation drives the pathogenesis and progression of various diseases. Methods By coupling tandem mass tagging-synchronous precursor selection-mass spectrometry/mass spectrometry/mass spectrometry with differential expression and co-regulation analyses, the present study compared the differences between protein complexes in normal prostate, low-grade PCa, and high-grade PCa tissue specimens. Results Globally, a large downregulated putative protein–protein interaction (PPI) network was detected in both low-grade and high-grade PCa, yet a large upregulated putative PPI network was only detected in high-grade but not low-grade PCa, compared with normal controls. To identify specific protein complexes that are deregulated in PCa, quantified proteins were mapped to protein complexes in CORUM (v3.0), a high-quality collection of 4274 experimentally verified mammalian protein complexes. Differential expression and gene ontology (GO) enrichment analyses suggested that 13 integrin complexes involved in cell adhesion were significantly downregulated in both low- and high-grade PCa compared with normal prostate, and that four Prothymosin alpha (ProTα) complexes were significantly upregulated in high-grade PCa compared with normal prostate. Moreover, differential co-regulation and GO enrichment analyses indicated that the assembly levels of six protein complexes involved in RNA splicing were significantly increased in low-grade PCa, and those of four subcomplexes of mitochondrial complex I were significantly increased in high-grade PCa, compared with normal prostate. Conclusions In summary, to the best of our knowledge, the study represents the first large-scale and quantitative, albeit indirect, comparison of individual protein complexes in human PCa tissue specimens. It may serve as a useful resource for better understanding the deregulation of protein complexes in primary PCa. Electronic supplementary material The online version of this article (10.1186/s12014-019-9236-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bo Zhou
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| | - Yiwu Yan
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| | - Yang Wang
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| | - Sungyong You
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| | - Michael R Freeman
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| | - Wei Yang
- Division of Cancer Biology and Therapeutics, Departments of Surgery and Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Rm. 4009, Davis Research Bldg 8700 Beverly Blvd, Los Angeles, CA 90048 USA
| |
Collapse
|
14
|
Peng J, Guan J, Shang X. Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder. Front Genet 2019; 10:226. [PMID: 31001311 PMCID: PMC6454041 DOI: 10.3389/fgene.2019.00226] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 02/28/2019] [Indexed: 12/26/2022] Open
Abstract
Identifying genes associated with Parkinson's disease plays an extremely important role in the diagnosis and treatment of Parkinson's disease. In recent years, based on the guilt-by-association hypothesis, many methods have been proposed to predict disease-related genes, but few of these methods are designed or used for Parkinson's disease gene prediction. In this paper, we propose a novel prediction method for Parkinson's disease gene prediction, named N2A-SVM. N2A-SVM includes three parts: extracting features of genes based on network, reducing the dimension using deep neural network, and predicting Parkinson's disease genes using a machine learning method. The evaluation test shows that N2A-SVM performs better than existing methods. Furthermore, we evaluate the significance of each step in the N2A-SVM algorithm and the influence of the hyper-parameters on the result. In addition, we train N2A-SVM on the recent dataset and used it to predict Parkinson's disease genes. The predicted top-rank genes can be verified based on literature study.
Collapse
Affiliation(s)
| | | | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
15
|
Zhao W, Wang L, Yu Y. Gene module analysis of juvenile myelomonocytic leukemia and screening of anticancer drugs. Oncol Rep 2018; 40:3155-3170. [PMID: 30272300 PMCID: PMC6196601 DOI: 10.3892/or.2018.6709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 07/19/2018] [Indexed: 11/05/2022] Open
Abstract
Juvenile myelomonocytic leukemia (JMML) is a rare but severe primary hemopoietic system tumor of childhood, most frequent in children 4 years and younger. There are currently no specific anticancer therapies targeting JMML, and the underlying gene expression changes have not been revealed. To define molecular targets and possible biomarkers for early diagnosis, optimal treatment, and prognosis, we conducted microarray data analysis using the Gene Expression Omnibus, and constructed protein‑protein interaction networks of all differentially expressed genes. Modular bioinformatics analysis revealed four core functional modules for JMML. We analyzed the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway functions associated with these modules. Using the CMap database, nine potential anticancer drugs were identified that modulate expression levels of many JMML‑associated genes. In addition, we identified possible miRNAs and transcription factors regulating these differentially expressed genes. This study defines a new research strategy for developing JMML‑targeted chemotherapies.
Collapse
Affiliation(s)
- Wencheng Zhao
- Department of Paediatrics, The First Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang 150001, P.R. China
| | - Lin Wang
- Key Laborarory, The First Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang 150001, P.R. China
| | - Yongbin Yu
- Key Laborarory, The First Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang 150001, P.R. China
| |
Collapse
|
16
|
Skinnider MA, Stacey RG, Foster LJ. Genomic data integration systematically biases interactome mapping. PLoS Comput Biol 2018; 14:e1006474. [PMID: 30332399 PMCID: PMC6192561 DOI: 10.1371/journal.pcbi.1006474] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 08/30/2018] [Indexed: 12/15/2022] Open
Abstract
Elucidating the complete network of protein-protein interactions, or interactome, is a fundamental goal of the post-genomic era, yet existing interactome maps are far from complete. To increase the throughput and resolution of interactome mapping, methods for protein-protein interaction discovery by co-migration have been introduced. However, accurate identification of interacting protein pairs within the resulting large-scale proteomic datasets is challenging. Consequently, most computational pipelines for co-migration data analysis incorporate external genomic datasets to distinguish interacting from non-interacting protein pairs. The effect of this procedure on interactome mapping is poorly understood. Here, we conduct a rigorous analysis of genomic data integration for interactome recovery across a large number of co-migration datasets, spanning diverse experimental and computational methods. We find that genomic data integration leads to an increase in the functional coherence of the resulting interactome maps, but this comes at the expense of a decrease in power to discover novel interactions. Importantly, putative novel interactions predicted by genomic data integration are no more likely to later be experimentally discovered than those predicted from co-migration data alone. Our results reveal a widespread and unappreciated limitation in a methodology that has been widely used to map the interactome of humans and model organisms.
Collapse
Affiliation(s)
| | - R. Greg Stacey
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Leonard J. Foster
- Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
- Department of Biochemistry, University of British Columbia, Vancouver, Canada
| |
Collapse
|
17
|
Meyer K, Kirchner M, Uyar B, Cheng JY, Russo G, Hernandez-Miranda LR, Szymborska A, Zauber H, Rudolph IM, Willnow TE, Akalin A, Haucke V, Gerhardt H, Birchmeier C, Kühn R, Krauss M, Diecke S, Pascual JM, Selbach M. Mutations in Disordered Regions Can Cause Disease by Creating Dileucine Motifs. Cell 2018; 175:239-253.e17. [PMID: 30197081 DOI: 10.1016/j.cell.2018.08.019] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 06/09/2018] [Accepted: 08/08/2018] [Indexed: 01/12/2023]
Abstract
Many disease-causing missense mutations affect intrinsically disordered regions (IDRs) of proteins, but the molecular mechanism of their pathogenicity is enigmatic. Here, we employ a peptide-based proteomic screen to investigate the impact of mutations in IDRs on protein-protein interactions. We find that mutations in disordered cytosolic regions of three transmembrane proteins (GLUT1, ITPR1, and CACNA1H) lead to an increased clathrin binding. All three mutations create dileucine motifs known to mediate clathrin-dependent trafficking. Follow-up experiments on GLUT1 (SLC2A1), the glucose transporter causative of GLUT1 deficiency syndrome, revealed that the mutated protein mislocalizes to intracellular compartments. Mutant GLUT1 interacts with adaptor proteins (APs) in vitro, and knocking down AP-2 reverts the cellular mislocalization and restores glucose transport. A systematic analysis of other known disease-causing variants revealed a significant and specific overrepresentation of gained dileucine motifs in structurally disordered cytosolic domains of transmembrane proteins. Thus, several mutations in disordered regions appear to cause "dileucineopathies."
Collapse
Affiliation(s)
- Katrina Meyer
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Marieluise Kirchner
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Bora Uyar
- Bioinformatics Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Jing-Yuan Cheng
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Giulia Russo
- Molecular Pharmacology and Cell Biology, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Luis R Hernandez-Miranda
- Developmental Biology/Signal Transduction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Anna Szymborska
- Integrative Vascular Biology Laboratory, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research) partner site, 13347 Berlin, Germany
| | - Henrik Zauber
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Ina-Maria Rudolph
- Molecular Cardiovascular Research, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Thomas E Willnow
- Molecular Cardiovascular Research, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Altuna Akalin
- Bioinformatics Platform, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Volker Haucke
- Molecular Pharmacology and Cell Biology, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Holger Gerhardt
- Integrative Vascular Biology Laboratory, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany; DZHK (German Centre for Cardiovascular Research) partner site, 13347 Berlin, Germany; Berlin Institute of Health (BIH), 10178 Berlin, Germany
| | - Carmen Birchmeier
- Developmental Biology/Signal Transduction, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Ralf Kühn
- Berlin Institute of Health (BIH), 10178 Berlin, Germany; Core Facility Transgenics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Michael Krauss
- Molecular Pharmacology and Cell Biology, Leibniz-Forschungsinstitut für Molekulare Pharmakologie, Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Sebastian Diecke
- DZHK (German Centre for Cardiovascular Research) partner site, 13347 Berlin, Germany; Berlin Institute of Health (BIH), 10178 Berlin, Germany; Core Facility Pluripotent Stem Cells, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Juan M Pascual
- Department of Neurology and Neurotherapeutics, UT Southwestern Medical Center, 5323 Harry Hines Blvd. Dallas, TX 75390, USA
| | - Matthias Selbach
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Robert-Rössle-Str. 10, 13125 Berlin, Germany; Charité-Universitätsmedizin Berlin, 10117 Berlin, Germany.
| |
Collapse
|
18
|
Lee D, Cho KH. Topological estimation of signal flow in complex signaling networks. Sci Rep 2018; 8:5262. [PMID: 29588498 PMCID: PMC5869720 DOI: 10.1038/s41598-018-23643-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 03/16/2018] [Indexed: 12/15/2022] Open
Abstract
In a cell, any information about extra- or intra-cellular changes is transferred and processed through a signaling network and dysregulation of signal flow often leads to disease such as cancer. So, understanding of signal flow in the signaling network is critical to identify drug targets. Owing to the development of high-throughput measurement technologies, the structure of a signaling network is becoming more available, but detailed kinetic parameter information about molecular interactions is still very limited. A question then arises as to whether we can estimate the signal flow based only on the structure information of a signaling network. To answer this question, we develop a novel algorithm that can estimate the signal flow using only the topological information and apply it to predict the direction of activity change in various signaling networks. Interestingly, we find that the average accuracy of the estimation algorithm is about 60–80% even though we only use the topological information. We also find that this predictive power gets collapsed if we randomly alter the network topology, showing the importance of network topology. Our study provides a basis for utilizing the topological information of signaling networks in precision medicine or drug target discovery.
Collapse
Affiliation(s)
- Daewon Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Kwang-Hyun Cho
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
19
|
Tien M, Fiebig A, Crosson S. Gene network analysis identifies a central post-transcriptional regulator of cellular stress survival. eLife 2018. [PMID: 29537368 PMCID: PMC5869019 DOI: 10.7554/elife.33684] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Cells adapt to shifts in their environment by remodeling transcription. Measuring changes in transcription at the genome scale is now routine, but defining the functional significance of individual genes within large gene expression datasets remains a major challenge. We applied a network-based algorithm to interrogate publicly available gene expression data to predict genes that serve major functional roles in Caulobacter crescentus stress survival. This approach identified GsrN, a conserved small RNA that is directly activated by the general stress sigma factor, σT, and functions as a potent post-transcriptional regulator of survival across distinct conditions including osmotic and oxidative stress. Under hydrogen peroxide stress, GsrN protects cells by base pairing with the leader of katG mRNA and activating expression of KatG catalase/peroxidase protein. We conclude that GsrN convenes a post-transcriptional layer of gene expression that serves a central functional role in Caulobacter stress physiology.
Collapse
Affiliation(s)
- Matthew Tien
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, United States
| | - Aretha Fiebig
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, United States
| | - Sean Crosson
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, United States.,Department of Microbiology, University of Chicago, Chicago, United States
| |
Collapse
|
20
|
Ryan CJ, Kennedy S, Bajrami I, Matallanas D, Lord CJ. A Compendium of Co-regulated Protein Complexes in Breast Cancer Reveals Collateral Loss Events. Cell Syst 2017; 5:399-409.e5. [PMID: 29032073 PMCID: PMC5660599 DOI: 10.1016/j.cels.2017.09.011] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 07/31/2017] [Accepted: 09/18/2017] [Indexed: 12/19/2022]
Abstract
Protein complexes are responsible for the bulk of activities within the cell, but how their behavior and abundance varies across tumors remains poorly understood. By combining proteomic profiles of breast tumors with a large-scale protein-protein interaction network, we have identified a set of 285 high-confidence protein complexes whose subunits have highly correlated protein abundance across tumor samples. We used this set to identify complexes that are reproducibly under- or overexpressed in specific breast cancer subtypes. We found that mutation or deletion of one subunit of a co-regulated complex was often associated with a collateral reduction in protein expression of additional complex members. This collateral loss phenomenon was typically evident from proteomic, but not transcriptomic, profiles, suggesting post-transcriptional control. Mutation of the tumor suppressor E-cadherin (CDH1) was associated with a collateral loss of members of the adherens junction complex, an effect we validated using an engineered model of E-cadherin loss.
Collapse
Affiliation(s)
- Colm J Ryan
- School of Computer Science, University College Dublin, Dublin 4, Ireland; Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland.
| | - Susan Kennedy
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Ilirjana Bajrami
- The Breast Cancer Now Toby Robins Breast Cancer Research Centre and CRUK Gene Function Laboratory, The Institute of Cancer Research, London SW3 6JB, UK
| | - David Matallanas
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin 4, Ireland
| | - Christopher J Lord
- The Breast Cancer Now Toby Robins Breast Cancer Research Centre and CRUK Gene Function Laboratory, The Institute of Cancer Research, London SW3 6JB, UK
| |
Collapse
|
21
|
Kim CY, Lee I. Functional gene networks based on the gene neighborhood in metagenomes. Anim Cells Syst (Seoul) 2017. [DOI: 10.1080/19768354.2017.1382388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
22
|
The LRRK2 G2385R variant is a partial loss-of-function mutation that affects synaptic vesicle trafficking through altered protein interactions. Sci Rep 2017; 7:5377. [PMID: 28710481 PMCID: PMC5511190 DOI: 10.1038/s41598-017-05760-9] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 05/04/2017] [Indexed: 12/20/2022] Open
Abstract
Mutations in the Leucine-rich repeat kinase 2 gene (LRRK2) are associated with familial Parkinson's disease (PD). LRRK2 protein contains several functional domains, including protein-protein interaction domains at its N- and C-termini. In this study, we analyzed the functional features attributed to LRRK2 by its N- and C-terminal domains. We combined TIRF microscopy and synaptopHluorin assay to visualize synaptic vesicle trafficking. We found that N- and C-terminal domains have opposite impact on synaptic vesicle dynamics. Biochemical analysis demonstrated that different proteins are bound at the two extremities, namely β3-Cav2.1 at N-terminus part and β-Actin and Synapsin I at C-terminus domain. A sequence variant (G2385R) harboured within the C-terminal WD40 domain increases the risk for PD. Complementary biochemical and imaging approaches revealed that the G2385R variant alters strength and quality of LRRK2 interactions and increases fusion of synaptic vesicles. Our data suggest that the G2385R variant behaves like a loss-of-function mutation that mimics activity-driven events. Impaired scaffolding capabilities of mutant LRRK2 resulting in perturbed vesicular trafficking may arise as a common pathophysiological denominator through which different LRRK2 pathological mutations cause disease.
Collapse
|
23
|
Drew K, Lee C, Huizar RL, Tu F, Borgeson B, McWhite CD, Ma Y, Wallingford JB, Marcotte EM. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol Syst Biol 2017; 13:932. [PMID: 28596423 PMCID: PMC5488662 DOI: 10.15252/msb.20167490] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Macromolecular protein complexes carry out many of the essential functions of cells, and many genetic diseases arise from disrupting the functions of such complexes. Currently, there is great interest in defining the complete set of human protein complexes, but recent published maps lack comprehensive coverage. Here, through the synthesis of over 9,000 published mass spectrometry experiments, we present hu.MAP, the most comprehensive and accurate human protein complex map to date, containing > 4,600 total complexes, > 7,700 proteins, and > 56,000 unique interactions, including thousands of confident protein interactions not identified by the original publications. hu.MAP accurately recapitulates known complexes withheld from the learning procedure, which was optimized with the aid of a new quantitative metric (k‐cliques) for comparing sets of sets. The vast majority of complexes in our map are significantly enriched with literature annotations, and the map overall shows improved coverage of many disease‐associated proteins, as we describe in detail for ciliopathies. Using hu.MAP, we predicted and experimentally validated candidate ciliopathy disease genes in vivo in a model vertebrate, discovering CCDC138, WDR90, and KIAA1328 to be new cilia basal body/centriolar satellite proteins, and identifying ANKRD55 as a novel member of the intraflagellar transport machinery. By offering significant improvements to the accuracy and coverage of human protein complexes, hu.MAP (http://proteincomplexes.org) serves as a valuable resource for better understanding the core cellular functions of human proteins and helping to determine mechanistic foundations of human disease.
Collapse
Affiliation(s)
- Kevin Drew
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA
| | - Chanjae Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Ryan L Huizar
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Fan Tu
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Blake Borgeson
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Claire D McWhite
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Yun Ma
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.,The Otolaryngology Hospital, The First Affiliated Hospital of Sun Yat-sen University Sun Yat-sen University, Guangzhou, China
| | - John B Wallingford
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA .,Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
24
|
Mirabello C, Wallner B. InterPred: A pipeline to identify and model protein-protein interactions. Proteins 2017; 85:1159-1170. [DOI: 10.1002/prot.25280] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Revised: 02/27/2017] [Accepted: 03/01/2017] [Indexed: 12/22/2022]
Affiliation(s)
- Claudio Mirabello
- Division of Bioinformatics, Department of Physics, Chemistry and Biology; Linköping University; Linköping 581 83 Sweden
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology; Linköping University; Linköping 581 83 Sweden
| |
Collapse
|
25
|
Feng BJ. PERCH: A Unified Framework for Disease Gene Prioritization. Hum Mutat 2017; 38:243-251. [PMID: 27995669 PMCID: PMC5299048 DOI: 10.1002/humu.23158] [Citation(s) in RCA: 147] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022]
Abstract
To interpret genetic variants discovered from next-generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (Polymorphism Evaluation, Ranking, and Classification for a Heritable trait), available at http://BJFengLab.org/. It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare-variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case-controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co-segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing.
Collapse
Affiliation(s)
- Bing-Jian Feng
- Department of Dermatology, University of Utah, Salt Lake City, UT 84132, USA
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84132, USA
| |
Collapse
|
26
|
Ansari S, Voichita C, Donato M, Tagett R, Draghici S. A novel pathway analysis approach based on the unexplained disregulation of genes. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2017; 105:482-495. [PMID: 30337764 PMCID: PMC6190577 DOI: 10.1109/jproc.2016.2531000] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A crucial step in the understanding of any phenotype is the correct identification of the signaling pathways that are significantly impacted in that phenotype. However, most current pathway analysis methods produce both false positives as well as false negatives in certain circumstances. We hypothesized that such incorrect results are due to the fact that the existing methods fail to distinguish between the primary dis-regulation of a given gene itself and the effects of signaling coming from upstream. Furthermore, a modern whole-genome experiment performed with a next-generation technology spends a great deal of effort to measure the entire set of 30,000-100,000 transcripts in the genome. This is followed by the selection of a few hundreds differentially expressed genes, step that literally discards more than 99% of the collected data. We also hypothesized that such a drastic filtering could discard many genes that play crucial roles in the phenotype. We propose a novel topology-based pathway analysis method that identifies significantly impacted pathways using the entire set of measurements, thus allowing the full use of the data provided by NGS techniques. The results obtained on 24 real data sets involving 12 different human diseases, as well as on 8 yeast knock-out data sets show that the proposed method yields significant improvements with respect to the state-of-the-art methods: SPIA, GSEA and GSA. AVAILABILITY Primary dis-regulation analysis is implemented in R and included in ROntoTools Bioconductor package (versions ≥ 2.0.0). https://www.bioconductor.org/packages/release/bioc/html/ROntoTools.html.
Collapse
Affiliation(s)
- Sahar Ansari
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Calin Voichita
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Michele Donato
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Rebecca Tagett
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| |
Collapse
|
27
|
Shim JE, Lee T, Lee I. From sequencing data to gene functions: co-functional network approaches. Anim Cells Syst (Seoul) 2017; 21:77-83. [PMID: 30460054 PMCID: PMC6138336 DOI: 10.1080/19768354.2017.1284156] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 01/15/2017] [Indexed: 01/04/2023] Open
Abstract
Advanced high-throughput sequencing technology accumulated massive amount of genomics and transcriptomics data in the public databases. Due to the high technical accessibility, DNA and RNA sequencing have huge potential for the study of gene functions in most species including animals and crops. A proven analytic platform to convert sequencing data to gene functional information is co-functional network. Because all genes exert their functions through interactions with others, network analysis is a legitimate way to study gene functions. The workflow of network-based functional study is composed of three steps: (i) inferencing co-functional links, (ii) evaluating and integrating the links into genome-scale networks, and (iii) generating functional hypotheses from the networks. Co-functional links can be inferred from DNA sequencing data by using phylogenetic profiling, gene neighborhood, domain profiling, associalogs, and co-expression analysis from RNA sequencing data. The inferred links are then evaluated and integrated into a genome-scale network with aid from gold-standard co-functional links. Functional hypotheses can be generated from the network based on (i) network connectivity, (ii) network propagation, and (iii) subnetwork analysis. The functional analysis pipeline described here requires only sequencing data which can be readily available for most species by next-generation sequencing technology. Therefore, co-functional networks will greatly potentiate the use of the sequencing data for the study of genetics in any cellular organism.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
28
|
Nandi S, Subramanian A, Sarkar RR. An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features. MOLECULAR BIOSYSTEMS 2017; 13:1584-1596. [DOI: 10.1039/c7mb00234c] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
We propose an integrated machine learning process to predict gene essentiality in Escherichia coli K-12 MG1655 metabolism that outperforms known methods.
Collapse
Affiliation(s)
- Sutanu Nandi
- Chemical Engineering and Process Development
- CSIR-National Chemical Laboratory
- Pune-411008
- India
- Academy of Scientific & Innovative Research (AcSIR)
| | - Abhishek Subramanian
- Chemical Engineering and Process Development
- CSIR-National Chemical Laboratory
- Pune-411008
- India
- Academy of Scientific & Innovative Research (AcSIR)
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development
- CSIR-National Chemical Laboratory
- Pune-411008
- India
- Academy of Scientific & Innovative Research (AcSIR)
| |
Collapse
|
29
|
Shim H, Kim JH, Kim CY, Hwang S, Kim H, Yang S, Lee JE, Lee I. Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource. Nucleic Acids Res 2016; 44:9611-9623. [PMID: 27903883 PMCID: PMC5175370 DOI: 10.1093/nar/gkw897] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 09/23/2016] [Accepted: 09/29/2016] [Indexed: 12/16/2022] Open
Abstract
Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery.
Collapse
Affiliation(s)
- Hongseok Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Ji Hyun Kim
- Department of Health Sciences & Technology, SAIHST, Sungkyunkwan University, Seoul 06351, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Ji Eun Lee
- Department of Health Sciences & Technology, SAIHST, Sungkyunkwan University, Seoul 06351, Korea .,Samsung Genome Institute, Samsung Medical Center, Seoul 06351, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
30
|
Yang S, Kim CY, Hwang S, Kim E, Kim H, Shim H, Lee I. COEXPEDIA: exploring biomedical hypotheses via co-expressions associated with medical subject headings (MeSH). Nucleic Acids Res 2016; 45:D389-D396. [PMID: 27679477 PMCID: PMC5210615 DOI: 10.1093/nar/gkw868] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2016] [Revised: 09/20/2016] [Accepted: 09/21/2016] [Indexed: 12/21/2022] Open
Abstract
The use of high-throughput array and sequencing technologies has produced unprecedented amounts of gene expression data in central public depositories, including the Gene Expression Omnibus (GEO). The immense amount of expression data in GEO provides both vast research opportunities and data analysis challenges. Co-expression analysis of high-dimensional expression data has proven effective for the study of gene functions, and several co-expression databases have been developed. Here, we present a new co-expression database, COEXPEDIA (www.coexpedia.org), which is distinctive from other co-expression databases in three aspects: (i) it contains only co-functional co-expressions that passed a rigorous statistical assessment for functional association, (ii) the co-expressions were inferred from individual studies, each of which was designed to investigate gene functions with respect to a particular biomedical context such as a disease and (iii) the co-expressions are associated with medical subject headings (MeSH) that provide biomedical information for anatomical, disease, and chemical relevance. COEXPEDIA currently contains approximately eight million co-expressions inferred from 384 and 248 GEO series for humans and mice, respectively. We describe how these MeSH-associated co-expressions enable the identification of diseases and drugs previously unknown to be related to a gene or a gene group of interest.
Collapse
Affiliation(s)
- Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Hongseok Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
31
|
PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants. Sci Rep 2016; 6:31356. [PMID: 27515999 PMCID: PMC4981870 DOI: 10.1038/srep31356] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/18/2016] [Indexed: 01/05/2023] Open
Abstract
Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network web service PoplarGene, offering comprehensive functional interactions and extensive poplar gene functional annotations. PoplarGene incorporates two network-based gene prioritization algorithms, neighborhood-based prioritization and context-based prioritization, which can be used to perform gene prioritization in a complementary manner. Furthermore, the co-functional information in PoplarGene can be applied to other woody plant proteomes with high efficiency via orthology transfer. In addition to poplar gene sequences, the webserver also accepts Arabidopsis reference gene as input to guide the search for novel candidate functional genes in PoplarGene. We believe that PoplarGene (http://bioinformatics.caf.ac.cn/PoplarGene and http://124.127.201.25/PoplarGene) will greatly benefit the research community, facilitating studies of poplar and other woody plants.
Collapse
|
32
|
Gao B, Shao Q, Choudhry H, Marcus V, Dong K, Ragoussis J, Gao ZH. Weighted gene co-expression network analysis of colorectal cancer liver metastasis genome sequencing data and screening of anti-metastasis drugs. Int J Oncol 2016; 49:1108-18. [PMID: 27571956 DOI: 10.3892/ijo.2016.3591] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 06/03/2016] [Indexed: 11/06/2022] Open
Abstract
Approximately 9% of cancer-related deaths are caused by colorectal cancer (CRC). CRC patients are prone to liver metastasis, which is the most important cause for the high CRC mortality rate. Understanding the molecular mechanism of CRC liver metastasis could help us to find novel targets for the effective treatment of this deadly disease. Using weighted gene co-expression network analysis on the sequencing data of CRC with and with metastasis, we identified 5 colorectal cancer liver metastasis related modules which were labeled as brown, blue, grey, yellow and turquoise. In the brown module, which represents the metastatic tumor in the liver, gene ontology (GO) analysis revealed functions including the G-protein coupled receptor protein signaling pathway, epithelial cell differentiation and cell surface receptor linked signal transduction. In the blue module, which represents the primary CRC that has metastasized, GO analysis showed that the genes were mainly enriched in GO terms including G-protein coupled receptor protein signaling pathway, cell surface receptor linked signal transduction, and negative regulation of cell differentiation. In the yellow and turquoise modules, which represent the primary non-metastatic CRC, 13 downregulated CRC liver metastasis-related candidate miRNAs were identified (e.g. hsa-miR-204, hsa-miR-455, etc.). Furthermore, analyzing the DrugBank database and mining the literature identified 25 and 12 candidate drugs that could potentially block the metastatic processes of the primary tumor and inhibit the progression of metastatic tumors in the liver, respectively. Data generated from this study not only furthers our understanding of the genetic alterations that drive the metastatic process, but also guides the development of molecular-targeted therapy of colorectal cancer liver metastasis.
Collapse
Affiliation(s)
- Bo Gao
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang 150001, P.R. China
| | - Qin Shao
- Department of Pathology, The Research Institute of McGill University Health Center, Montreal, Québec H4A 3J1, Canada
| | - Hani Choudhry
- McGill University and Genome Quebec Innovation Centre, Montreal, Québec H3B 1S6, Canada
| | - Victoria Marcus
- Department of Pathology, The Research Institute of McGill University Health Center, Montreal, Québec H4A 3J1, Canada
| | - Kung Dong
- Department of Pathology, Beijing Youan Hospital, Capital Medical University, Beijing 100069, P.R. China
| | - Jiannis Ragoussis
- McGill University and Genome Quebec Innovation Centre, Montreal, Québec H3B 1S6, Canada
| | - Zu-Hua Gao
- Department of Pathology, The Research Institute of McGill University Health Center, Montreal, Québec H4A 3J1, Canada
| |
Collapse
|
33
|
Cho A, Shim JE, Kim E, Supek F, Lehner B, Lee I. MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol 2016; 17:129. [PMID: 27333808 PMCID: PMC4918128 DOI: 10.1186/s13059-016-0989-x] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 05/24/2016] [Indexed: 12/21/2022] Open
Abstract
A major challenge for distinguishing cancer-causing driver mutations from inconsequential passenger mutations is the long-tail of infrequently mutated genes in cancer genomes. Here, we present and evaluate a method for prioritizing cancer genes accounting not only for mutations in individual genes but also in their neighbors in functional networks, MUFFINN (MUtations For Functional Impact on Network Neighbors). This pathway-centric method shows high sensitivity compared with gene-centric analyses of mutation data. Notably, only a marginal decrease in performance is observed when using 10 % of TCGA patient samples, suggesting the method may potentiate cancer genome projects with small patient populations.
Collapse
Affiliation(s)
- Ara Cho
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Fran Supek
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), 08003, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain.,Division of Electronics, Rudjer Boskovic Institute, 10000, Zagreb, Croatia
| | - Ben Lehner
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain.
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea.
| |
Collapse
|
34
|
Verleyen W, Ballouz S, Gillis J. Positive and negative forms of replicability in gene network analysis. Bioinformatics 2015; 32:1065-73. [PMID: 26668004 DOI: 10.1093/bioinformatics/btv734] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 12/09/2015] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION Gene networks have become a central tool in the analysis of genomic data but are widely regarded as hard to interpret. This has motivated a great deal of comparative evaluation and research into best practices. We explore the possibility that this may lead to overfitting in the field as a whole. RESULTS We construct a model of 'research communities' sampling from real gene network data and machine learning methods to characterize performance trends. Our analysis reveals an important principle limiting the value of replication, namely that targeting it directly causes 'easy' or uninformative replication to dominate analyses. We find that when sampling across network data and algorithms with similar variability, the relationship between replicability and accuracy is positive (Spearman's correlation, rs ∼0.33) but where no such constraint is imposed, the relationship becomes negative for a given gene function (rs ∼ -0.13). We predict factors driving replicability in some prior analyses of gene networks and show that they are unconnected with the correctness of the original result, instead reflecting replicable biases. Without these biases, the original results also vanish replicably. We show these effects can occur quite far upstream in network data and that there is a strong tendency within protein-protein interaction data for highly replicable interactions to be associated with poor quality control. AVAILABILITY AND IMPLEMENTATION Algorithms, network data and a guide to the code available at: https://github.com/wimverleyen/AggregateGeneFunctionPrediction CONTACT jgillis@cshl.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- W Verleyen
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard Woodbury, NY 11797, USA
| | - S Ballouz
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard Woodbury, NY 11797, USA
| | - J Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard Woodbury, NY 11797, USA
| |
Collapse
|
35
|
Schweppe DK, Harding C, Chavez JD, Wu X, Ramage E, Singh PK, Manoil C, Bruce JE. Host-Microbe Protein Interactions during Bacterial Infection. ACTA ACUST UNITED AC 2015; 22:1521-1530. [PMID: 26548613 DOI: 10.1016/j.chembiol.2015.09.015] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 09/11/2015] [Accepted: 09/24/2015] [Indexed: 12/24/2022]
Abstract
Interspecies protein-protein interactions are essential mediators of infection. While bacterial proteins required for host cell invasion and infection can be identified through bacterial mutant library screens, information about host target proteins and interspecies complex structures has been more difficult to acquire. Using an unbiased chemical crosslinking/mass spectrometry approach, we identified interspecies protein-protein interactions in human lung epithelial cells infected with Acinetobacter baumannii. These efforts resulted in identification of 3,076 crosslinked peptide pairs and 46 interspecies protein-protein interactions. Most notably, the key A. baumannii virulence factor, OmpA, was identified as crosslinked to host proteins involved in desmosomes, specialized structures that mediate host cell-to-cell adhesion. Co-immunoprecipitation and transposon mutant experiments were used to verify these interactions and demonstrate relevance for host cell invasion and acute murine lung infection. These results shed new light on A. baumannii-host protein interactions and their structural features, and the presented approach is generally applicable to other systems.
Collapse
Affiliation(s)
- Devin K Schweppe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher Harding
- Departments of Medicine and Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Juan D Chavez
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Xia Wu
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Elizabeth Ramage
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Pradeep K Singh
- Departments of Medicine and Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Colin Manoil
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - James E Bruce
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Departments of Medicine and Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington School of Medicine, 850 Republican Street, Brotman Building, Room 154, Seattle, WA 98109, USA.
| |
Collapse
|
36
|
Applications of comparative evolution to human disease genetics. Curr Opin Genet Dev 2015; 35:16-24. [PMID: 26338499 DOI: 10.1016/j.gde.2015.08.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 08/11/2015] [Accepted: 08/12/2015] [Indexed: 12/15/2022]
Abstract
Direct comparison of human diseases with model phenotypes allows exploration of key areas of human biology which are often inaccessible for practical or ethical reasons. We review recent developments in comparative evolutionary approaches for finding models for genetic disease, including high-throughput generation of gene/phenotype relationship data, the linking of orthologous genes and phenotypes across species, and statistical methods for linking human diseases to model phenotypes.
Collapse
|
37
|
|
38
|
Shim JE, Hwang S, Lee I. Pathway-Dependent Effectiveness of Network Algorithms for Gene Prioritization. PLoS One 2015; 10:e0130589. [PMID: 26091506 PMCID: PMC4474432 DOI: 10.1371/journal.pone.0130589] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2014] [Accepted: 05/22/2015] [Indexed: 01/18/2023] Open
Abstract
A network-based approach has proven useful for the identification of novel genes associated with complex phenotypes, including human diseases. Because network-based gene prioritization algorithms are based on propagating information of known phenotype-associated genes through networks, the pathway structure of each phenotype might significantly affect the effectiveness of algorithms. We systematically compared two popular network algorithms with distinct mechanisms – direct neighborhood which propagates information to only direct network neighbors, and network diffusion which diffuses information throughout the entire network – in prioritization of genes for worm and human phenotypes. Previous studies reported that network diffusion generally outperforms direct neighborhood for human diseases. Although prioritization power is generally measured for all ranked genes, only the top candidates are significant for subsequent functional analysis. We found that high prioritizing power of a network algorithm for all genes cannot guarantee successful prioritization of top ranked candidates for a given phenotype. Indeed, the majority of the phenotypes that were more efficiently prioritized by network diffusion showed higher prioritizing power for top candidates by direct neighborhood. We also found that connectivity among pathway genes for each phenotype largely determines which network algorithm is more effective, suggesting that the network algorithm used for each phenotype should be chosen with consideration of pathway gene connectivity.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- * E-mail:
| |
Collapse
|
39
|
Chen TS, Petrey D, Garzon JI, Honig B. Predicting peptide-mediated interactions on a genome-wide scale. PLoS Comput Biol 2015; 11:e1004248. [PMID: 25938916 PMCID: PMC4418708 DOI: 10.1371/journal.pcbi.1004248] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 03/18/2015] [Indexed: 12/20/2022] Open
Abstract
We describe a method to predict protein-protein interactions (PPIs) formed between structured domains and short peptide motifs. We take an integrative approach based on consensus patterns of known motifs in databases, structures of domain-motif complexes from the PDB and various sources of non-structural evidence. We combine this set of clues using a Bayesian classifier that reports the likelihood of an interaction and obtain significantly improved prediction performance when compared to individual sources of evidence and to previously reported algorithms. Our Bayesian approach was integrated into PrePPI, a structure-based PPI prediction method that, so far, has been limited to interactions formed between two structured domains. Around 80,000 new domain-motif mediated interactions were predicted, thus enhancing PrePPI’s coverage of the human protein interactome. Complexes formed between a structured domain on one protein and an unstructured peptide on another are ubiquitous. However, they are often quite difficult to detect experimentally. The development of computational approaches to predict domain-motif interactions is therefore an important goal. We report a method to predict domain-motif interactions using a Bayesian approach to integrate evidence from a variety of sources, including three-dimensional structural and non-structural information. The method was applied to the entire human proteome and showed significant improvement over existing methods. The method was incorporated into PrePPI, a computational pipeline for the prediction of protein-protein interactions that relies heavily on structural information. Approximately 80,000 new interactions were detected. The new PrePPI database provides easy access to about 400,000 human protein-protein interactions and should thus constitute a valuable resource in a variety of biological applications including the characterization of molecular interaction networks and, more generally, in the study of interactions mediated by proteins in families that may not be extensively studied experimentally.
Collapse
Affiliation(s)
- T. Scott Chen
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Jose Ignacio Garzon
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
40
|
Lee T, Kim H, Lee I. Network-assisted crop systems genetics: network inference and integrative analysis. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:61-70. [PMID: 25698380 DOI: 10.1016/j.pbi.2015.02.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Revised: 01/15/2015] [Accepted: 02/02/2015] [Indexed: 05/24/2023]
Abstract
Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
41
|
Lee T, Oh T, Yang S, Shin J, Hwang S, Kim CY, Kim H, Shim H, Shim JE, Ronald PC, Lee I. RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res 2015; 43:W122-7. [PMID: 25813048 PMCID: PMC4489288 DOI: 10.1093/nar/gkv253] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/12/2015] [Indexed: 11/20/2022] Open
Abstract
Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Taeyun Oh
- The Joint Bioenergy Institute, Emeryville CA and Department of Plant Pathology and the Genome Center, University of California, Davis, CA 95616, USA
| | - Sunmo Yang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Hongseok Shim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Jung Eun Shim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Pamela C Ronald
- The Joint Bioenergy Institute, Emeryville CA and Department of Plant Pathology and the Genome Center, University of California, Davis, CA 95616, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
42
|
Porras P, Duesbury M, Fabregat A, Ueffing M, Orchard S, Gloeckner CJ, Hermjakob H. A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology. Proteomics 2015; 15:1390-404. [PMID: 25648416 PMCID: PMC4415485 DOI: 10.1002/pmic.201400390] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 01/15/2015] [Accepted: 01/29/2015] [Indexed: 02/04/2023]
Abstract
Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps.
Collapse
Affiliation(s)
- Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | | | | | | |
Collapse
|
43
|
Caufield JH, Abreu M, Wimble C, Uetz P. Protein complexes in bacteria. PLoS Comput Biol 2015; 11:e1004107. [PMID: 25723151 PMCID: PMC4344305 DOI: 10.1371/journal.pcbi.1004107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 01/02/2015] [Indexed: 01/26/2023] Open
Abstract
Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies. Though more than 20,000 binary protein-protein interactions have been published for a few well-studied bacterial species, the results rarely capture the full extent to which proteins take part in complexes. Here, we use experimentally-observed protein complexes from E. coli or Mycoplasma pneumoniae, as well as gene orthology, to predict protein complexes across many species of bacteria. Surprisingly, the majority of protein complexes is not conserved, demonstrating an unexpected evolutionary flexibility. We also observe broader trends within protein complex conservation, especially in genome-reduced species with minimal sets of protein complexes.
Collapse
Affiliation(s)
- J. Harry Caufield
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Marco Abreu
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Christopher Wimble
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Peter Uetz
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
- * E-mail:
| |
Collapse
|
44
|
Taşan M, Musso G, Hao T, Vidal M, MacRae CA, Roth FP. Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat Methods 2014; 12:154-9. [PMID: 25532137 DOI: 10.1038/nmeth.3215] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 11/24/2014] [Indexed: 12/27/2022]
Abstract
Genome-wide association (GWA) studies have linked thousands of loci to human diseases, but the causal genes and variants at these loci generally remain unknown. Although investigators typically focus on genes closest to the associated polymorphisms, the causal gene is often more distal. Reliance on published work to prioritize candidates is biased toward well-characterized genes. We describe a 'prix fixe' strategy and software that uses genome-scale shared-function networks to identify sets of mutually functionally related genes spanning multiple GWA loci. Using associations from ∼100 GWA studies covering ten cancer types, our approach outperformed the common alternative strategy in ranking known cancer genes. As more GWA loci are discovered, the strategy will have increased power to elucidate the causes of human disease.
Collapse
Affiliation(s)
- Murat Taşan
- 1] Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. [4] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [5] Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Gabriel Musso
- 1] Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA. [2] Cardiovascular Division, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Tong Hao
- 1] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [2] Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Marc Vidal
- 1] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [2] Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Calum A MacRae
- 1] Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA. [2] Cardiovascular Division, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Frederick P Roth
- 1] Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. [4] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [5] Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada. [6] Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| |
Collapse
|
45
|
Mosca E, Alfieri R, Milanesi L. Diffusion of information throughout the host interactome reveals gene expression variations in network proximity to target proteins of hepatitis C virus. PLoS One 2014; 9:e113660. [PMID: 25461596 PMCID: PMC4251971 DOI: 10.1371/journal.pone.0113660] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 10/27/2014] [Indexed: 12/22/2022] Open
Abstract
Hepatitis C virus infection is one of the most common and chronic in the world, and hepatitis associated with HCV infection is a major risk factor for the development of cirrhosis and hepatocellular carcinoma (HCC). The rapidly growing number of viral-host and host protein-protein interactions is enabling more and more reliable network-based analyses of viral infection supported by omics data. The study of molecular interaction networks helps to elucidate the mechanistic pathways linking HCV molecular activities and the host response that modulates the stepwise hepatocarcinogenic process from preneoplastic lesions (cirrhosis and dysplasia) to HCC. Simulating the impact of HCV-host molecular interactions throughout the host protein-protein interaction (PPI) network, we ranked the host proteins in relation to their network proximity to viral targets. We observed that the set of proteins in the neighborhood of HCV targets in the host interactome is enriched in key players of the host response to HCV infection. In opposition to HCV targets, subnetworks of proteins in network proximity to HCV targets are significantly enriched in proteins reported as differentially expressed in preneoplastic and neoplastic liver samples by two independent studies. Using multi-objective optimization, we extracted subnetworks that are simultaneously "guilt-by-association" with HCV proteins and enriched in proteins differentially expressed. These subnetworks contain established, recently proposed and novel candidate proteins for the regulation of the mechanisms of liver cells response to chronic HCV infection.
Collapse
Affiliation(s)
- Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
| | - Roberta Alfieri
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
| |
Collapse
|
46
|
Verleyen W, Ballouz S, Gillis J. Measuring the wisdom of the crowds in network-based gene function inference. Bioinformatics 2014; 31:745-52. [DOI: 10.1093/bioinformatics/btu715] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
47
|
Zhang X, Gu J, Cao L, Ma Y, Su Z, Luo F, Wang Z, Li N, Yuan G, Chen L, Xu X, Xiao W. Insights into the inhibition and mechanism of compounds against LPS-induced PGE2 production: a pathway network-based approach and molecular dynamics simulations. Integr Biol (Camb) 2014; 6:1162-9. [PMID: 25228393 DOI: 10.1039/c4ib00141a] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
In comparison to the current target-based screening approach, it is increasingly evident that active lead compounds based on disease-related phenotypes are more likely to be translated to clinical trials during drug development. That is, because human diseases are in essence the outcome of the abnormal function of multiple genes, especially in complex diseases. Therefore, as a conventional technology in the early phase of active lead compound discovery, computational methods that can connect molecular interactions and disease-related phenotypes to evaluate the efficacy of compounds are in urgently required. In this work, a computational approach that integrates molecular docking and pathway network analysis (network efficiency and network flux) was developed to evaluate the efficacy of a compound against LPS-induced Prostaglandin E2(PGE2) production. The predicted results were then validated in vitro, and a correlation with the experimental results was analyzed using linear regression. In addition, molecular dynamics (MD) simulations were performed to explore the molecular mechanism of the most potent compounds. There were 12 hits out of 28 predicted ingredients separated from Reduning injection (RDN). The predicted results have a good agreement with the experimental inhibitory potency (IC50) (correlation coefficient = 0.80). The most potent compounds could target several proteins to regulate the pathway network. This might partly interpret the molecular mechanism of RDN on fever. Meanwhile, the good correlation of the computational model with the wet experimental results might bridge the gap between molecule-target interactions and phenotypic response, especially for multi-target compounds. Therefore, it would be helpful for active lead compound discovery, the understanding of the multiple targets and synergic essence of traditional Chinese medicine (TCM).
Collapse
Affiliation(s)
- Xinzhuang Zhang
- State Key Laboratory of New-tech for Chinese Medicine Pharmaceutical Process, Kanion Pharmaceutical Corporation, Lianyungang City 222002, P. R. China.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Clancy T, Hovig E. From proteomes to complexomes in the era of systems biology. Proteomics 2014; 14:24-41. [PMID: 24243660 DOI: 10.1002/pmic.201300230] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 10/22/2013] [Accepted: 11/06/2013] [Indexed: 01/16/2023]
Abstract
Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex-complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery-potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.
Collapse
Affiliation(s)
- Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
49
|
Mis-sesnse mutations in Tafazzin (TAZ) that escort to mild clinical symptoms of Barth syndrome is owed to the minimal inhibitory effect of the mutations on the enzyme function: In-silico evidence. Interdiscip Sci 2014; 7:21-35. [PMID: 25118650 DOI: 10.1007/s12539-013-0019-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 09/24/2013] [Accepted: 11/06/2013] [Indexed: 01/16/2023]
Abstract
Tafazzin (EC 2.3.1.23) is a Phospholipid Transacylase involved in Cardiolipin remodeling on mitochondrial membrane and coded by TAZ gene (Cytogenetic Location: Xq28) in human. Its mutations cause Barth syndrome (MIM ID: #302060)/3-Methyl Glutaconyl Aciduria Type II, an inborn error of metabolism often leading to foetal or infantile fatality. Nevertheless, some mis-sense mutations result in mild clinical symptoms. To evaluate the rationale of mild symptoms and for an insight of Tafazzin active site, sequence based and structure based ramifications of wild and mutant Tafazzins were compared in-silico. Sequence based domain predictions, surface accessibilities on substitution & conserved catalytic sites with statistical drifts, as well as thermal stability changes for the mutations and the interaction analysis of Tafazzin were performed. Crystal structure of Tafazzin is not yet resolved experimentally, therefore 3D coordinates of Tafazzin and its mutants were spawned through homology modeling. Energetically minimized and structurally validated models were used for comparative docking simulations. We analyzed active site geometry of the models in addition to calculating overall substrate binding efficiencies for each of the enzyme-ligand complex deduced from binding energies instead of comparing only the docking scores. Also, individual binding energies of catalytic residues on conserved HX4D motif of Acyltransferase superfamily present in Tafazzins were estimated. This work elucidates the basis of mild symptoms in patients with mis-sense mutations, identifies the most pathogenic mutant among others in the study and also divulges the critical role of HX4D domain towards successful transacylation by Taffazin. The in-silico observations are in complete agreement with clinical findings reported for the patients with mutations.
Collapse
|
50
|
Amato R, Morleo M, Giaquinto L, di Bernardo D, Franco B. A network-based approach to dissect the cilia/centrosome complex interactome. BMC Genomics 2014; 15:658. [PMID: 25102769 PMCID: PMC4137083 DOI: 10.1186/1471-2164-15-658] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 07/31/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cilia are microtubule-based organelles protruding from almost all mammalian cells which, when dysfunctional, result in genetic disorders called "ciliopathies". High-throughput studies have revealed that cilia are composed of thousands of proteins. However, despite many efforts, much remains to be determined regarding the biological functions of this increasingly important complex organelle. RESULTS We have derived an online tool, from a systematic network-based approach to dissect the cilia/centrosome complex interactome (CCCI). The tool integrates all current available data into a model which provides an "interaction" perspective on ciliary function. We generated a network of interactions between human proteins organized into functionally relevant "communities", which can be defined as groups of genes that are both highly inter-connected and strongly co-expressed. We then combined sequence and co-expression data in order to identify the transcription factors responsible for regulating genes within their respective communities. Our analyses have discovered communities significantly specialized for delegating specific biological functions such as mRNA processing, protein translation, folding and degradation processes that had never been associated with ciliary proteins until now. CONCLUSIONS CCCI will allow us to clarify the roles of previously unknown ciliary functions, elucidate the molecular mechanisms underlying ciliary-associated phenotypes, and apply our knowledge of the functional roles of relatively uncharacterized molecular entities to disease phenotypes and new clinical applications.
Collapse
Affiliation(s)
| | | | | | | | - Brunella Franco
- Telethon Institute of Genetics and Medicine (TIGEM), Naples, Italy.
| |
Collapse
|