1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Rout M, Kour B, Vuree S, Lulu SS, Medicherla KM, Suravajhala P. Diabetes mellitus susceptibility with varied diseased phenotypes and its comparison with phenome interactome networks. World J Clin Cases 2022; 10:5957-5964. [PMID: 35949812 PMCID: PMC9254192 DOI: 10.12998/wjcc.v10.i18.5957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 02/02/2022] [Accepted: 04/22/2022] [Indexed: 02/06/2023] Open
Abstract
An emerging area of interest in understanding disease phenotypes is systems genomics. Complex diseases such as diabetes have played an important role towards understanding the susceptible genes and mutations. A wide number of methods have been employed and strategies such as polygenic risk score and allele frequencies have been useful, but understanding the candidate genes harboring those mutations is an unmet goal. In this perspective, using systems genomic approaches, we highlight the application of phenome-interactome networks in diabetes and provide deep insights. LINC01128, which we previously described as candidate for diabetes, is shown as an example to discuss the approach.
Collapse
Affiliation(s)
- Madhusmita Rout
- Department of Pediatrics, University of Oklahoma Health Sciences Centre, Oklahoma City, OK 73104, United States
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Jaipur 302001, Rajasthan, India
| | - Bhumandeep Kour
- Department of Biotechnology, Lovely Professional University, Phagwara 144001, Punjab, India
| | - Sugunakar Vuree
- Department of Biotechnology, Lovely Professional University, Phagwara 144001, Punjab, India
| | - Sajitha S Lulu
- Department of Biotechnology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Krishna Mohan Medicherla
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research, Jaipur 302001, Rajasthan, India
| | - Prashanth Suravajhala
- Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Vallikavu PO, Amritapuri, Clappana, Kollam 690525, Kerala, India
| |
Collapse
|
3
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
4
|
Zhang N, Zang T. A multi-network integration approach for measuring disease similarity based on ncRNA regulation and heterogeneous information. BMC Bioinformatics 2022; 23:89. [PMID: 35255810 PMCID: PMC8902705 DOI: 10.1186/s12859-022-04613-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 02/14/2022] [Indexed: 11/28/2022] Open
Abstract
Background Measuring similarity between complex diseases has significant implications for revealing the pathogenesis of diseases and development in the domain of biomedicine. It has been consentaneous that functional associations between disease-related genes and semantic associations can be applied to calculate disease similarity. Currently, more and more studies have demonstrated the profound involvement of non-coding RNA in the regulation of genome organization and gene expression. Thus, taking ncRNA into account can be useful in measuring disease similarities. However, existing methods ignore the regulation functions of ncRNA in biological process. In this study, we proposed a novel deep-learning method to deduce disease similarity. Results In this article, we proposed a novel method, ImpAESim, a framework integrating multiple networks embedding to learn compact feature representations and disease similarity calculation. We first utilize three different disease-related information networks to build up a heterogeneous network, after a network diffusion process, RWR, a compact feature learning model composed of classic Auto Encoder (AE) and improved AE model is proposed to extract constraints and low-dimensional feature representations. We finally obtain an accurate and low-dimensional feature representation of diseases, then we employed the cosine distance as the measurement of disease similarity. Conclusion ImpAESim focuses on extracting a low-dimensional vector representation of features based on ncRNA regulation, and gene–gene interaction network. Our method can significantly reduce the calculation bias resulted from the sparse disease associations which are derived from semantic associations.
Collapse
Affiliation(s)
- Ningyi Zhang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
5
|
Maudsley S, Leysen H, van Gastel J, Martin B. Systems Pharmacology: Enabling Multidimensional Therapeutics. COMPREHENSIVE PHARMACOLOGY 2022:725-769. [DOI: 10.1016/b978-0-12-820472-6.00017-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
6
|
Leysen H, Walter D, Christiaenssen B, Vandoren R, Harputluoğlu İ, Van Loon N, Maudsley S. GPCRs Are Optimal Regulators of Complex Biological Systems and Orchestrate the Interface between Health and Disease. Int J Mol Sci 2021; 22:ijms222413387. [PMID: 34948182 PMCID: PMC8708147 DOI: 10.3390/ijms222413387] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
GPCRs arguably represent the most effective current therapeutic targets for a plethora of diseases. GPCRs also possess a pivotal role in the regulation of the physiological balance between healthy and pathological conditions; thus, their importance in systems biology cannot be underestimated. The molecular diversity of GPCR signaling systems is likely to be closely associated with disease-associated changes in organismal tissue complexity and compartmentalization, thus enabling a nuanced GPCR-based capacity to interdict multiple disease pathomechanisms at a systemic level. GPCRs have been long considered as controllers of communication between tissues and cells. This communication involves the ligand-mediated control of cell surface receptors that then direct their stimuli to impact cell physiology. Given the tremendous success of GPCRs as therapeutic targets, considerable focus has been placed on the ability of these therapeutics to modulate diseases by acting at cell surface receptors. In the past decade, however, attention has focused upon how stable multiprotein GPCR superstructures, termed receptorsomes, both at the cell surface membrane and in the intracellular domain dictate and condition long-term GPCR activities associated with the regulation of protein expression patterns, cellular stress responses and DNA integrity management. The ability of these receptorsomes (often in the absence of typical cell surface ligands) to control complex cellular activities implicates them as key controllers of the functional balance between health and disease. A greater understanding of this function of GPCRs is likely to significantly augment our ability to further employ these proteins in a multitude of diseases.
Collapse
Affiliation(s)
- Hanne Leysen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Deborah Walter
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Bregje Christiaenssen
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Romi Vandoren
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - İrem Harputluoğlu
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Department of Chemistry, Middle East Technical University, Çankaya, Ankara 06800, Turkey
| | - Nore Van Loon
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
| | - Stuart Maudsley
- Receptor Biology Lab, University of Antwerp, 2610 Wilrijk, Belgium; (H.L.); (D.W.); (B.C.); (R.V.); (İ.H.); (N.V.L.)
- Correspondence:
| |
Collapse
|
7
|
Corpas M, Megy K, Mistry V, Metastasio A, Lehmann E. Whole Genome Interpretation for a Family of Five. Front Genet 2021; 12:535123. [PMID: 33763108 PMCID: PMC7982663 DOI: 10.3389/fgene.2021.535123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Although best practices have emerged on how to analyse and interpret personal genomes, the utility of whole genome screening remains underdeveloped. A large amount of information can be gathered from various types of analyses via whole genome sequencing including pathogenicity screening, genetic risk scoring, fitness, nutrition, and pharmacogenomic analysis. We recognize different levels of confidence when assessing the validity of genetic markers and apply rigorous standards for evaluation of phenotype associations. We illustrate the application of this approach on a family of five. By applying analyses of whole genomes from different methodological perspectives, we are able to build a more comprehensive picture to assist decision making in preventative healthcare and well-being management. Our interpretation and reporting outputs provide input for a clinician to develop a healthcare plan for the individual, based on genetic and other healthcare data.
Collapse
Affiliation(s)
- Manuel Corpas
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Institute of Continuing Education Madingley Hall Madingley, University of Cambridge, Cambridge, United Kingdom.,Facultad de Ciencias de la Salud, Universidad Internacional de La Rioja, Madrid, Spain
| | - Karyn Megy
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Department of Haematology, University of Cambridge & National Health Service (NHS) Blood and Transplant, Cambridge, United Kingdom
| | | | - Antonio Metastasio
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Camden and Islington NHS Foundation Trust, London, United Kingdom
| | - Edmund Lehmann
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom
| |
Collapse
|
8
|
Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, Chen J, Zhou X. A network-based machine-learning framework to identify both functional modules and disease genes. Hum Genet 2021; 140:897-913. [PMID: 33409574 DOI: 10.1007/s00439-020-02253-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/22/2020] [Indexed: 01/20/2023]
Abstract
Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson's disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.
Collapse
Affiliation(s)
- Kuo Yang
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,Institute for TCM-X, MOE Key Laboratory of Bioinformatics / Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, 10084, China
| | - Kezhi Lu
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,imec-DistriNet, KU Leuven, Leuven, 3001, Belgium
| | - Yang Wu
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jian Yu
- Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Xuezhong Zhou
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China. .,Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| |
Collapse
|
9
|
Yang K, Wang R, Liu G, Shu Z, Wang N, Zhang R, Yu J, Chen J, Li X, Zhou X. HerGePred: Heterogeneous Network Embedding Representation for Disease Gene Prediction. IEEE J Biomed Health Inform 2020; 23:1805-1815. [PMID: 31283472 DOI: 10.1109/jbhi.2018.2870728] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The discovery of disease-causing genes is a critical step towards understanding the nature of a disease and determining a possible cure for it. In recent years, many computational methods to identify disease genes have been proposed. However, making full use of disease-related (e.g., symptoms) and gene-related (e.g., gene ontology and protein-protein interactions) information to improve the performance of disease gene prediction is still an issue. Here, we develop a heterogeneous disease-gene-related network (HDGN) embedding representation framework for disease gene prediction (called HerGePred). Based on this framework, a low-dimensional vector representation (LVR) of the nodes in the HDGN can be obtained. Then, we propose two specific algorithms, namely, an LVR-based similarity prediction and a random walk with restart on a reconstructed heterogeneous disease-gene network (RW-RDGN), to predict disease genes with high performance. First, to validate the rationality of the framework, we analyze the similarity-based overlap distribution of disease pairs and design an experiment for disease-gene association recovery, the results of which revealed that the LVR of nodes performs well at preserving the local and global network structure of the HDGN. Then, we apply tenfold cross validation and external validation to compare our methods with other well-known disease gene prediction algorithms. The experimental results show that the RW-RDGN performs better than the state-of-the-art algorithm. The prediction results of disease candidate genes are essential for molecular mechanism investigation and experimental validation. The source codes of HerGePred and experimental data are available at https://github.com/yangkuoone/HerGePred.
Collapse
|
10
|
Karaman B, Sippl W. Computational Drug Repurposing: Current Trends. Curr Med Chem 2019; 26:5389-5409. [DOI: 10.2174/0929867325666180530100332] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 05/06/2018] [Accepted: 05/14/2018] [Indexed: 01/31/2023]
Abstract
:
Biomedical discovery has been reshaped upon the exploding digitization of data
which can be retrieved from a number of sources, ranging from clinical pharmacology to
cheminformatics-driven databases. Now, supercomputing platforms and publicly available
resources such as biological, physicochemical, and clinical data, can all be integrated to construct
a detailed map of signaling pathways and drug mechanisms of action in relation to drug
candidates. Recent advancements in computer-aided data mining have facilitated analyses of
‘big data’ approaches and the discovery of new indications for pre-existing drugs has been
accelerated. Linking gene-phenotype associations to predict novel drug-disease signatures or
incorporating molecular structure information of drugs and protein targets with other kinds of
data derived from systems biology provide great potential to accelerate drug discovery and
improve the success of drug repurposing attempts. In this review, we highlight commonly
used computational drug repurposing strategies, including bioinformatics and cheminformatics
tools, to integrate large-scale data emerging from the systems biology, and consider both
the challenges and opportunities of using this approach. Moreover, we provide successful examples
and case studies that combined various in silico drug-repurposing strategies to predict
potential novel uses for known therapeutics.
Collapse
Affiliation(s)
- Berin Karaman
- Biruni University - Department of Pharmaceutical Chemistry, Istanbul, Turkey
| | - Wolfgang Sippl
- Martin-Luther University of Halle-Wittenberg - Institute of Pharmacy, Halle (Saale), Germany
| |
Collapse
|
11
|
Dozmorov MG. Disease classification: from phenotypic similarity to integrative genomics and beyond. Brief Bioinform 2019; 20:1769-1780. [DOI: 10.1093/bib/bby049] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Abstract
A fundamental challenge of modern biomedical research is understanding how diseases that are similar on the phenotypic level are similar on the molecular level. Integration of various genomic data sets with the traditionally used phenotypic disease similarity revealed novel genetic and molecular mechanisms and blurred the distinction between monogenic (Mendelian) and complex diseases. Network-based medicine has emerged as a complementary approach for identifying disease-causing genes, genetic mediators, disruptions in the underlying cellular functions and for drug repositioning. The recent development of machine and deep learning methods allow for leveraging real-life information about diseases to refine genetic and phenotypic disease relationships. This review describes the historical development and recent methodological advancements for studying disease classification (nosology).
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main Street, Richmond, VA, USA
| |
Collapse
|
12
|
Almasi SM, Hu T. Measuring the importance of vertices in the weighted human disease network. PLoS One 2019; 14:e0205936. [PMID: 30901770 PMCID: PMC6430629 DOI: 10.1371/journal.pone.0205936] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 02/26/2019] [Indexed: 12/11/2022] Open
Abstract
Many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network science has seen increasing applications on modeling complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases. In this article, known disease-gene associations were represented using a weighted bipartite network. We extracted a weighted human diseases network from such a bipartite network to show the correlations of diseases. Subsequently, we proposed a new centrality measurement for the weighted human disease network (WHDN) in order to quantify the importance of diseases. Using our centrality measurement to quantify the importance of vertices in WHDN, we were able to find a set of most central diseases. By investigating the 30 top diseases and their most correlated neighbors in the network, we identified disease linkages including known disease pairs and novel findings. Our research helps better understand the common genetic origin of human diseases and suggests top diseases that likely induce other related diseases.
Collapse
Affiliation(s)
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John’s, NL, Canada
| |
Collapse
|
13
|
Zhang J, Zou S, Deng L. Gene Ontology-based function prediction of long non-coding RNAs using bi-random walk. BMC Med Genomics 2018; 11:99. [PMID: 30453964 PMCID: PMC6245587 DOI: 10.1186/s12920-018-0414-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background With the development of sequencing technology, more and more long non-coding RNAs (lncRNAs) have been identified. Some lncRNAs have been confirmed that they play an important role in the process of development through the dosage compensation effect, epigenetic regulation, cell differentiation regulation and other aspects. However, the majority of the lncRNAs have not been functionally characterized. Explore the function of lncRNAs and the regulatory network has become a hot research topic currently. Methods In the work, a network-based model named BiRWLGO is developed. The ultimate goal is to predict the probable functions for lncRNAs at large scale. The new model starts with building a global network composed of three networks: lncRNA similarity network, lncRNA-protein association network and protein-protein interaction (PPI) network. After that, it utilizes bi-random walk algorithm to explore the similarities between lncRNAs and proteins. Finally, we can annotate an lncRNA with the Gene Ontology (GO) terms according to its neighboring proteins. Results We compare the performance of BiRWLGO with the state-of-the-art models on a manually annotated lncRNA benchmark with known GO terms. The experimental results assert that BiRWLGO outperforms other methods in terms of both maximum F-measure (Fmax) and coverage. Conclusions BiRWLGO is a relatively efficient method to predict the functions of lncRNA. When protein interaction data is integrated, the predictive performance of BiRWLGO gains a great improvement. Electronic supplementary material The online version of this article (10.1186/s12920-018-0414-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000, China.,School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Shuai Zou
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Lei Deng
- School of Software, Central South University, Changsha, 410075, China.
| |
Collapse
|
14
|
Henry S, McQuilkin A, McInnes BT. Association measures for estimating semantic similarity and relatedness between biomedical concepts. Artif Intell Med 2018; 93:1-10. [PMID: 30197305 DOI: 10.1016/j.artmed.2018.08.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 03/08/2018] [Accepted: 08/24/2018] [Indexed: 12/26/2022]
Abstract
Association measures quantify the observed likelihood a term pair co-occurs versus their predicted co-occurrence together if by chance. This is based both on the terms' individual occurrence frequencies, and their mutual co-occurrence frequencies. One application of association scores is estimating semantic relatedness, which is critical for many natural language processing applications, such as clustering of biomedical and clinical documents and the development of biomedical terminologies and ontololgies. In this paper we propose a method of generating association scores between biomedical concepts to estimate semantic relatedness. We use co-occurrence statistics between Unified Medical Language System (UMLS) concepts to account for lexical variation at the synonymous level, and introduce a process of concept expansion that exploits hierarchical information from the UMLS to account for lexical variation at the hyponymous level. State of the art results are achieved on several standard evaluation datasets, and an in depth analysis of hyper-parameters is presented.
Collapse
Affiliation(s)
- Sam Henry
- Virginia Commonwealth University, Richmond, VA, United States
| | - Alex McQuilkin
- Virginia Commonwealth University, Richmond, VA, United States
| | | |
Collapse
|
15
|
Gu S, Johnson J, Faisal FE, Milenković T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci Rep 2018; 8:12524. [PMID: 30131590 PMCID: PMC6104050 DOI: 10.1038/s41598-018-30831-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - John Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
16
|
Tian Z, Guo M, Wang C, Xing L, Wang L, Zhang Y. Constructing an integrated gene similarity network for the identification of disease genes. J Biomed Semantics 2017; 8:32. [PMID: 29297379 PMCID: PMC5763299 DOI: 10.1186/s13326-017-0141-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. RESULTS We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. CONCLUSIONS RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Chunyu Wang
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - LinLin Xing
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Lei Wang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| | - Yin Zhang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| |
Collapse
|
17
|
Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:905-915. [PMID: 27076459 DOI: 10.1109/tcbb.2016.2550432] [Citation(s) in RCA: 209] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Since the discovery of the regulatory function of microRNA (miRNA), increased attention has focused on identifying the relationship between miRNA and disease. It has been suggested that computational method are an efficient way to identify potential disease-related miRNAs for further confirmation using biological experiments. In this paper, we first highlighted three limitations commonly associated with previous computational methods. To resolve these limitations, we established disease similarity subnetwork and miRNA similarity subnetwork by integrating multiple data sources, where the disease similarity is composed of disease semantic similarity and disease functional similarity, and the miRNA similarity is calculated using the miRNA-target gene and miRNA-lncRNA (long non-coding RNA) associations. Then, a heterogeneous network was constructed by connecting the disease similarity subnetwork and the miRNA similarity subnetwork using the known miRNA-disease associations. We extended random walk with restart to predict miRNA-disease associations in the heterogeneous network. The leave-one-out cross-validation achieved an average area under the curve (AUC) of 0:8049 across 341 diseases and 476 miRNAs. For five-fold cross-validation, our method achieved an AUC from 0:7970 to 0:9249 for 15 human diseases. Case studies further demonstrated the feasibility of our method to discover potential miRNA-disease associations. An online service for prediction is freely available at http://ifmda.aliapp.com.
Collapse
|
18
|
Chen Y, Xu R. Context-sensitive network-based disease genetics prediction and its implications in drug discovery. Bioinformatics 2017; 33:1031-1039. [PMID: 28062449 DOI: 10.1093/bioinformatics/btw737] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 11/19/2016] [Indexed: 01/05/2023] Open
Abstract
Motivation Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach ( p<e-22 ). The area under the receiver operating characteristic curve for the CSN approach was also significantly higher than the SBN approach (0.91 versus 0.87, p<e-3 ). In addition, we predicted genes for Parkinson's disease using CSNs, and demonstrated that the top-ranked genes are highly relevant to PD pathologenesis. We pin-pointed a top-ranked drug target gene for PD, and found its association with neurodegeneration supported by literature. In summary, CSNs lead to significantly improve the disease genetics prediction comparing with SBNs and provide leads for potential drug targets. Availability and Implementation nlp.case.edu/public/data/. Contact rxx@case.edu.
Collapse
|
19
|
Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform 2017; 66:194-203. [PMID: 28104458 DOI: 10.1016/j.jbi.2017.01.008] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 01/11/2017] [Accepted: 01/13/2017] [Indexed: 12/24/2022]
Abstract
MicroRNAs (miRNAs) play a critical role by regulating their targets in post-transcriptional level. Identification of potential miRNA-disease associations will aid in deciphering the pathogenesis of human polygenic diseases. Several computational models have been developed to uncover novel miRNA-disease associations based on the predicted target genes. However, due to the insufficient number of experimentally validated miRNA-target interactions as well as the relatively high false-positive and false-negative rates of predicted target genes, it is still challenging for these prediction models to obtain remarkable performances. The purpose of this study is to prioritize miRNA candidates for diseases. We first construct a heterogeneous network, which consists of a disease similarity network, a miRNA functional similarity network and a known miRNA-disease association network. Then, an unbalanced bi-random walk-based algorithm on the heterogeneous network (BRWH) is adopted to discover potential associations by exploiting bipartite subgraphs. Based on 5-fold cross validation, the proposed network-based method achieves AUC values ranging from 0.782 to 0.907 for the 22 human diseases and an average AUC of almost 0.846. The experiments indicated that BRWH can achieve better performances compared with several popular methods. In addition, case studies of some common diseases further demonstrated the superior performance of our proposed method on prioritizing disease-related miRNA candidates.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| | - Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
20
|
Pouladi N, Achour I, Li H, Berghout J, Kenost C, Gonzalez-Garay ML, Lussier YA. Biomechanisms of Comorbidity: Reviewing Integrative Analyses of Multi-omics Datasets and Electronic Health Records. Yearb Med Inform 2016; 25:194-206. [PMID: 27830251 PMCID: PMC5171562 DOI: 10.15265/iy-2016-040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVES Disease comorbidity is a pervasive phenomenon impacting patients' health outcomes, disease management, and clinical decisions. This review presents past, current and future research directions leveraging both phenotypic and molecular information to uncover disease similarity underpinning the biology and etiology of disease comorbidity. METHODS We retrieved ~130 publications and retained 59, ranging from 2006 to 2015, that comprise a minimum number of five diseases and at least one type of biomolecule. We surveyed their methods, disease similarity metrics, and calculation of comorbidities in the electronic health records, if present. RESULTS Among the surveyed studies, 44% generated or validated disease similarity metrics in context of comorbidity, with 60% being published in the last two years. As inputs, 87% of studies utilized intragenic loci and proteins while 13% employed RNA (mRNA, LncRNA or miRNA). Network modeling was predominantly used (35%) followed by statistics (28%) to impute similarity between these biomolecules and diseases. Studies with large numbers of biomolecules and diseases used network models or naïve overlap of disease-molecule associations, while machine learning, statistics, and information retrieval were utilized in smaller and moderate sized studies. Multiscale computations comprising shared function, network topology, and phenotypes were performed exclusively on proteins. CONCLUSION This review highlighted the growing methods for identifying the molecular mechanisms underpinning comorbidities that leverage multiscale molecular information and patterns from electronic health records. The survey unveiled that intergenic polymorphisms have been overlooked for similarity imputation compared to their intragenic counterparts, offering new opportunities to bridge the mechanistic and similarity gaps of comorbidity.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Y A Lussier
- Dr. Yves A. Lussier, The University of Arizona, Bio5 Building, 1657 East Helen Street, Tucson, AZ 85721, USA, Fax: +1 520 626 4824, E-Mail:
| |
Collapse
|
21
|
Ni J, Koyuturk M, Tong H, Haines J, Xu R, Zhang X. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model. BMC Bioinformatics 2016; 17:453. [PMID: 27829360 PMCID: PMC5103411 DOI: 10.1186/s12859-016-1317-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2016] [Accepted: 10/29/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. RESULTS In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. CONCLUSIONS In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/ .
Collapse
Affiliation(s)
- Jingchao Ni
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Mehmet Koyuturk
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Hanghang Tong
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, 699 S. Mill Ave., Tempe, 85281, AZ, USA
| | - Jonathan Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106, OH, USA
| | - Xiang Zhang
- College of Information Sciences and Technology, Pennsylvania State University, 332 Information Sciences and Technology Building, University Park, 16802, PA, USA.
| |
Collapse
|
22
|
Chen Y, Xu R. Phenome-based gene discovery provides information about Parkinson's disease drug targets. BMC Genomics 2016; 17 Suppl 5:493. [PMID: 27586503 PMCID: PMC5009520 DOI: 10.1186/s12864-016-2820-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Parkinson disease (PD) is a severe neurodegenerative disease without curative drugs. The highly complex and heterogeneous disease mechanisms are still unclear. Detecting novel PD associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for drugs. METHODS We propose a phenome-based gene prediction strategy to identify disease-associated genes for PD. We integrated multiple disease phenotype networks, a gene functional relationship network, and known PD genes to predict novel candidate genes. Then we investigated the translational potential of the predicted genes in drug discovery. RESULTS In a cross validation analysis, the average rank for 15 known PD genes is within top 0.8 %. We also tested the algorithm with an independent validation set of 669 PD-associated genes detected by genome-wide association studies. The top ranked genes predicted by our approach are enriched for these validation genes. In addition, our approach prioritized the target genes for FDA-approved PD drugs and the drugs that have been tested for PD in clinical trials. Pathway analysis shows that the prioritized drug target genes are closely associated with PD pathogenesis. The result provides empirical evidence that our computational gene prediction approach identifies novel candidate genes for PD, and has the potential to lead to rapid drug discovery.
Collapse
Affiliation(s)
- Yang Chen
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
23
|
Jeong CS, Kim D. Inferring Crohn's disease association from exome sequences by integrating biological knowledge. BMC Med Genomics 2016; 9 Suppl 1:35. [PMID: 27535358 PMCID: PMC4989895 DOI: 10.1186/s12920-016-0189-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background Exome sequencing has been emerged as a primary method to identify detailed sequence variants associated with complex diseases including Crohn’s disease in the protein-coding regions of human genome. However, constructing an interpretable model for exome sequencing data is challenging because of the huge diversity of genomic variation. In addition, it has been known that utilizing biologically relevant information in a rigorous manner is essential for effectively extracting disease-associated information. Results In this paper, we incorporate three different types of biological knowledge such as predicted pathogenicity, disease gene annotation, and functional interaction network of human genes, and integrate them with exome sequence data in non-negative matrix tri-factorization framework. Based on the proposed method, we successfully identified Crohn’s disease patients from exome sequencing data and achieved the area under the receiver operating characteristics curve (AUC) of 0.816, while other clustering methods not using biological information achieved the AUC of 0.786. Moreover, the disease association score derived from our method showed higher correlation with Crohn’s disease genes than other unrelated genes. Conclusions As a consequence, by integrating biological information across multiple levels such as variant, gene, and systems, our method could be useful for identifying disease susceptibility and its associated genes from exome sequencing data.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, 34141 Daejeon, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, 34141 Daejeon, Republic of Korea.
| |
Collapse
|
24
|
Pevec U, Rozman N, Gorsek B, Kunej T. RASopathies: Presentation at the Genome, Interactome, and Phenome Levels. Mol Syndromol 2016; 7:72-9. [PMID: 27385963 DOI: 10.1159/000445733] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2016] [Indexed: 11/19/2022] Open
Abstract
Clinical symptoms often reflect molecular correlations between mutated proteins. Alignment between interactome and phenome levels reveals new disease genes and connections between previously unrelated diseases. Despite a great potential for novel discoveries, this approach is still rarely used in genomics. In the present study, we analyzed the data of 6 syndromes belonging to the RASopathy class of disorders (RASopathies) and presented them as a model to study associations between genome, interactome, and phenome levels. Causative genes and clinical symptoms were collected from OMIM and NCBI GeneReviews databases for 6 syndromes: Noonan, Noonan syndrome with multiple lentigines, neurofibromatosis type 1, cardiofaciocutaneous, and Legius and Costello syndrome. The STRING tool was used for the identification of protein interactions. Six RASopathy syndromes were found to be associated with 12 causative genes. We constructed an interactome of RASopathy proteins and their neighbors and developed a database of 328 clinical symptoms. The collected data was presented at genome, interactome, and phenome levels and as an integrated network of all 3 data types. The present study provides a baseline for future studies of associations between interactome and phenome in RASopathies and could serve as a novel approach to analyze phenotypically and genetically related diseases.
Collapse
Affiliation(s)
- Urska Pevec
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| | - Neva Rozman
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| | - Blaz Gorsek
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domzale, Slovenia
| |
Collapse
|
25
|
Wang L, Zhang C, Watkins J, Jin Y, McNutt M, Yin Y. SoftPanel: a website for grouping diseases and related disorders for generation of customized panels. BMC Bioinformatics 2016; 17:153. [PMID: 27044653 PMCID: PMC4820874 DOI: 10.1186/s12859-016-0998-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/23/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Targeted next-generation sequencing is playing an increasingly important role in biological research and clinical diagnosis by allowing researchers to sequence high priority genes at much higher depths and at a fraction of the cost of whole genome or exome sequencing. However, in designing the panel of genes to be sequenced, investigators need to consider the tradeoff between the better sensitivity of a broad panel and the higher specificity of a potentially more relevant panel. Although tools to prioritize candidate disease genes have been developed, the great majority of these require prior knowledge and a set of seed genes as input, which is only possible for diseases with a known genetic etiology. RESULTS To meet the demands of both researchers and clinicians, we have developed a user-friendly website called SoftPanel. This website is intended to serve users by allowing them to input a single disorder or a disorder group and generate a panel of genes predicted to underlie the disorder of interest. Various methods of retrieval including a keyword search, browsing of an arborized list of International Classification of Diseases, 10th revision (ICD-10) codes or using disorder phenotypic similarities can be combined to define a group of disorders and the genes known to be associated with them. Moreover, SoftPanel enables users to expand or refine a gene list by utilizing several biological data resources. In addition to providing users with the facility to create a "hard" panel that contains an exact gene list for targeted sequencing, SoftPanel also enables generation of a "soft" panel of genes, which may be used to further filter a significantly altered set of genes identified through whole genome or whole exome sequencing. The service and data provided by SoftPanel can be accessed at http://www.isb.pku.edu.cn/SoftPanel/ . A tutorial page is included for trying out sample data and interpreting results. CONCLUSION SoftPanel provides a convenient and powerful tool for creating a targeted panel of potential disease genes while supporting different forms of input. SoftPanel may be utilized in both genomics research and personalized medicine.
Collapse
Affiliation(s)
- Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Cong Zhang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Johnathan Watkins
- Institute for Mathematical and Molecular Biomedicine, King's College London, Guy's Campus, London, SE1 1UL, UK.,Department of Research Oncology, King's College London, Guy's Campus, Great Maze Pond, London, SE1 9RT, UK
| | - Yan Jin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Michael McNutt
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Yuxin Yin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
26
|
Abstract
MOTIVATION Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. RESULTS To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. AVAILABILITY AND IMPLEMENTATION nlp. CASE edu/public/data/DMN
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Li Li
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Rong Xu
- Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
27
|
Yang J, Wu SJ, Dai WT, Li YX, Li YY. The human disease network in terms of dysfunctional regulatory mechanisms. Biol Direct 2015; 10:60. [PMID: 26450611 PMCID: PMC4599653 DOI: 10.1186/s13062-015-0088-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/25/2015] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Elucidation of human disease similarities has emerged as an active research area, which is highly relevant to etiology, disease classification, and drug repositioning. In pioneer studies, disease similarity was commonly estimated according to clinical manifestation. Subsequently, scientists started to investigate disease similarity based on gene-phenotype knowledge, which were inevitably biased to well-studied diseases. In recent years, estimating disease similarity according to transcriptomic behavior significantly enhances the probability of finding novel disease relationships, while the currently available studies usually mine expression data through differential expression analysis that has been considered to have little chance of unraveling dysfunctional regulatory relationships, the causal pathogenesis of diseases. METHODS We developed a computational approach to measure human disease similarity based on expression data. Differential coexpression analysis, instead of differential expression analysis, was employed to calculate differential coexpression level of every gene for each disease, which was then summarized to the pathway level. Disease similarity was eventually calculated as the partial correlation coefficients of pathways' differential coexpression values between any two diseases. The significance of disease relationships were evaluated by permutation test. RESULTS Based on mRNA expression data and a differential coexpression analysis based method, we built a human disease network involving 1326 significant Disease-Disease links among 108 diseases. Compared with disease relationships captured by differential expression analysis based method, our disease links shared known disease genes and drugs more significantly. Some novel disease relationships were discovered, for example, Obesity and cancer, Obesity and Psoriasis, lung adenocarcinoma and S. pneumonia, which had been commonly regarded as unrelated to each other, but recently found to share similar molecular mechanisms. Additionally, it was found that both the type of disease and the type of affected tissue influenced the degree of disease similarity. A sub-network including Allergic asthma, Type 2 diabetes and Chronic kidney disease was extracted to demonstrate the exploration of their common pathogenesis. CONCLUSION The present study produces a global view of human diseasome for the first time from the viewpoint of regulation mechanisms, which therefore could provide insightful clues to etiology and pathogenesis, and help to perform drug repositioning and design novel therapeutic interventions.
Collapse
Affiliation(s)
- Jing Yang
- School of Biotechnology, East China University of Science and Technology, Shanghai, 200237, P.R. China. .,Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P.R. China.
| | - Su-Juan Wu
- School of Biotechnology, East China University of Science and Technology, Shanghai, 200237, P.R. China. .,Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| | - Wen-Tao Dai
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Engineering Research Center of Pharmaceutical Translation, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| | - Yi-Xue Li
- School of Biotechnology, East China University of Science and Technology, Shanghai, 200237, P.R. China. .,Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P.R. China. .,Shanghai Engineering Research Center of Pharmaceutical Translation, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| | - Yuan-Yuan Li
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Industrial Technology Institute, 1278 Keyuan Road, Shanghai, 201203, P.R. China. .,Shanghai Engineering Research Center of Pharmaceutical Translation, 1278 Keyuan Road, Shanghai, 201203, P.R. China.
| |
Collapse
|
28
|
Browne F, Wang H, Zheng H. A computational framework for the prioritization of disease-gene candidates. BMC Genomics 2015; 16 Suppl 9:S2. [PMID: 26330267 PMCID: PMC4547404 DOI: 10.1186/1471-2164-16-s9-s2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background The identification of genes and uncovering the role they play in diseases is an important and complex challenge. Genome-wide linkage and association studies have made advancements in identifying genetic variants that underpin human disease. An important challenge now is to identify meaningful disease-associated genes from a long list of candidate genes implicated by these analyses. The application of gene prioritization can enhance our understanding of disease mechanisms and aid in the discovery of drug targets. The integration of protein-protein interaction networks along with disease datasets and contextual information is an important tool in unraveling the molecular basis of diseases. Results In this paper we propose a computational pipeline for the prioritization of disease-gene candidates. Diverse heterogeneous data including: gene-expression, protein-protein interaction network, ontology-based similarity and topological measures and tissue-specific are integrated. The pipeline was applied to prioritize Alzheimer's Disease (AD) genes, whereby a list of 32 prioritized genes was generated. This approach correctly identified key AD susceptible genes: PSEN1 and TRAF1. Biological process enrichment analysis revealed the prioritized genes are modulated in AD pathogenesis including: regulation of neurogenesis and generation of neurons. Relatively high predictive performance (AUC: 0.70) was observed when classifying AD and normal gene expression profiles from individuals using leave-one-out cross validation. Conclusions This work provides a foundation for future investigation of diverse heterogeneous data integration for disease-gene prioritization.
Collapse
|
29
|
Chen Y, Xu R. Network-based gene prediction for Plasmodium falciparum malaria towards genetics-based drug discovery. BMC Genomics 2015; 16 Suppl 7:S9. [PMID: 26099491 PMCID: PMC4474419 DOI: 10.1186/1471-2164-16-s7-s9] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Malaria is the most deadly parasitic infectious disease. Existing drug treatments have limited efficacy in malaria elimination, and the complex pathogenesis of the disease is not fully understood. Detecting novel malaria-associated genes not only contributes in revealing the disease pathogenesis, but also facilitates discovering new targets for anti-malaria drugs. METHODS In this study, we developed a network-based approach to predict malaria-associated genes. We constructed a cross-species network to integrate human-human, parasite-parasite and human-parasite protein interactions. Then we extended the random walk algorithm on this network, and used known malaria genes as the seeds to find novel candidate genes for malaria. RESULTS We validated our algorithms using 77 known malaria genes: 14 human genes and 63 parasite genes were ranked averagely within top 2% and top 4%, respectively among human and parasite genomes. We also evaluated our method for predicting novel malaria genes using a set of 27 genes with literature supporting evidence. Our approach ranked 12 genes within top 1% and 24 genes within top 5%. In addition, we demonstrated that top-ranked candied genes were enriched for drug targets, and identified commonalities underlying top-ranked malaria genes through pathway analysis. In summary, the candidate malaria-associated genes predicted by our data-driven approach have the potential to guide genetics-based anti-malaria drug discovery.
Collapse
|
30
|
Jiang R. Walking on multiple disease-gene networks to prioritize candidate genes. J Mol Cell Biol 2015; 7:214-230. [PMID: 25681405 DOI: 10.1093/jmcb/mjv008] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 01/11/2015] [Indexed: 01/03/2025] Open
Abstract
Uncovering causal genes for human inherited diseases, as the primary step toward understanding the pathogenesis of these diseases, requires a combined analysis of genetic and genomic data. Although bioinformatics methods have been designed to prioritize candidate genes resulting from genetic linkage analysis or association studies, the coverage of both diseases and genes in existing methods is quite limited, thereby preventing the scan of causal genes for a significant proportion of diseases at the whole-genome level. To overcome this limitation, we propose a method named pgWalk to prioritize candidate genes by integrating multiple phenomic and genomic data. We derive three types of phenotype similarities among 7719 diseases and nine types of functional similarities among 20327 genes. Based on a pair of phenotype and gene similarities, we construct a disease-gene network and then simulate the process that a random walker wanders on such a heterogeneous network to quantify the strength of association between a candidate gene and a query disease. A weighted version of the Fisher's method with dependent correction is adopted to integrate 27 scores obtained in this way, and a final q-value is calibrated for prioritizing candidate genes. A series of validation experiments are conducted to demonstrate the superior performance of this approach. We further show the effectiveness of this method in exome sequencing studies of autism and epileptic encephalopathies. An online service and the standalone software of pgWalk can be found at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgwalk.
Collapse
Affiliation(s)
- Rui Jiang
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing 100084, China Department of Statistics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
31
|
Xie M, Xu Y, Zhang Y, Hwang T, Kuang R. Network-based Phenome-Genome Association Prediction by Bi-Random Walk. PLoS One 2015; 10:e0125138. [PMID: 25933025 PMCID: PMC4416812 DOI: 10.1371/journal.pone.0125138] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 03/14/2015] [Indexed: 12/15/2022] Open
Abstract
Motivation The availability of ontologies and systematic documentations of phenotypes and their genetic associations has enabled large-scale network-based global analyses of the association between the complete collection of phenotypes (phenome) and genes. To provide a fundamental understanding of how the network information is relevant to phenotype-gene associations, we analyze the circular bigraphs (CBGs) in OMIM human disease phenotype-gene association network and MGI mouse phentoype-gene association network, and introduce a bi-random walk (BiRW) algorithm to capture the CBG patterns in the networks for unveiling human and mouse phenome-genome association. BiRW performs separate random walk simultaneously on gene interaction network and phenotype similarity network to explore gene paths and phenotype paths in CBGs of different sizes to summarize their associations as predictions. Results The analysis of both OMIM and MGI associations revealed that majority of the phenotype-gene associations are covered by CBG patterns of small path lengths, and there is a clear correlation between the CBG coverage and the predictability of the phenotype-gene associations. In the experiments on recovering known associations in cross-validations on human disease phenotypes and mouse phenotypes, BiRW effectively improved prediction performance over the compared methods. The constructed global human disease phenome-genome association map also revealed interesting new predictions and phenotype-gene modules by disease classes.
Collapse
Affiliation(s)
- MaoQiang Xie
- College of Software, Nankai University, Tianjin, China
| | - YingJie Xu
- College of Software, Nankai University, Tianjin, China
| | - YaoGong Zhang
- College of Software, Nankai University, Tianjin, China
| | - TaeHyun Hwang
- Department of Clinical Science, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, USA
- * E-mail:
| |
Collapse
|
32
|
Abstract
Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.
Collapse
|
33
|
Emmert-Streib F, Tripathi S, Simoes RDM, Hawwa AF, Dehmer M. The human disease network. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.22816] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Chen Y, Zhang X, Zhang GQ, Xu R. Comparative analysis of a novel disease phenotype network based on clinical manifestations. J Biomed Inform 2014; 53:113-20. [PMID: 25277758 DOI: 10.1016/j.jbi.2014.09.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 08/18/2014] [Accepted: 09/21/2014] [Indexed: 12/21/2022]
Abstract
Systems approaches to analyzing disease phenotype networks in combination with protein functional interaction networks have great potential in illuminating disease pathophysiological mechanisms. While many genetic networks are readily available, disease phenotype networks remain largely incomplete. In this study, we built a large-scale Disease Manifestation Network (DMN) from 50,543 highly accurate disease-manifestation semantic relationships in the United Medical Language System (UMLS). Our new phenotype network contains 2305 nodes and 373,527 weighted edges to represent the disease phenotypic similarities. We first compared DMN with the networks representing genetic relationships among diseases, and demonstrated that the phenotype clustering in DMN reflects common disease genetics. Then we compared DMN with a widely-used disease phenotype network in previous gene discovery studies, called mimMiner, which was extracted from the textual descriptions in Online Mendelian Inheritance in Man (OMIM). We demonstrated that DMN contains different knowledge from the existing phenotype data source. Finally, a case study on Marfan syndrome further proved that DMN contains useful information and can provide leads to discover unknown disease causes. Integrating DMN in systems approaches with mimMiner and other data offers the opportunities to predict novel disease genetics. We made DMN publicly available at nlp/case.edu/public/data/DMN.
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Xiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Rong Xu
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States.
| |
Collapse
|
35
|
Li Y, Xu J, Ju H, Xiao Y, Chen H, Lv J, Shao T, Bai J, Zhang Y, Wang L, Wang X, Ren H, Li X. A network-based, integrative approach to identify genes with aberrant co-methylation in colorectal cancer. MOLECULAR BIOSYSTEMS 2014; 10:180-90. [PMID: 24317156 DOI: 10.1039/c3mb70270g] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Epigenetic changes, including aberrations in DNA methylation, are a common hallmark of many cancers. The identification and interpretation of epigenetic changes associated with cancers may benefit from integration with protein interactomes. Based on the assumption that genes implicated in a specific tumor phenotype will show high aberrant co-methylation patterns with their interacting partners, we propose an integrated approach to uncover cancer-associated genes by integrating a DNA methylome with an interactome. Aberrant co-methylated interactions were first identified in the specific cancer, and genes were then prioritized based on their enrichment in aberrant co-methylation. By applying this to a large-scale colorectal cancer (CRC) dataset, the proposed method increases the power to capture known genes. More importantly, genes possessing high aberrant co-methylation patterns, located at the topological center of the original protein-protein interaction network (PPIN), affect several cancer-associated pathways and form hotspots that are frequently hijacked in cancer. Additionally, the top-ranked candidate genes may also be useful as an indicator of CRC diagnosis and prognosis. Five fold cross-validation of the top-ranked genes in diagnosis reveals that it can achieve an area under the receiver operating characteristic (ROC) curve ranging from 82.2% to 98.4% in three independent datasets. Five of these genes form a core repressive module. CCNA1 and ESR1 in particular are evidently silenced by promoter hypermethylation in CRC cell lines and tissues, whose re-expression markedly suppresses tumor cell survival and clonogenicity. These results show that the network-centric method could identify novel disease biomarkers and model how oncogenic lesions mediate epigenetic changes, providing important insights into tumorigenesis.
Collapse
Affiliation(s)
- Yongsheng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
A phenome-guided drug repositioning through a latent variable model. BMC Bioinformatics 2014; 15:267. [PMID: 25103881 PMCID: PMC4137076 DOI: 10.1186/1471-2105-15-267] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2014] [Accepted: 07/21/2014] [Indexed: 11/23/2022] Open
Abstract
Background The phenome represents a distinct set of information in the human population. It has been explored particularly in its relationship with the genome to identify correlations for diseases. The phenome has been also explored for drug repositioning with efforts focusing on the search space for the most similar candidate drugs. For a comprehensive analysis of the phenome, we assumed that all phenotypes (indications and side effects) were inter-connected with a probabilistic distribution and this characteristic may offer an opportunity to identify new therapeutic indications for a given drug. Correspondingly, we employed Latent Dirichlet Allocation (LDA), which introduces latent variables (topics) to govern the phenome distribution. Results We developed our model on the phenome information in Side Effect Resource (SIDER). We first developed a LDA model optimized based on its recovery potential through perturbing the drug-phenotype matrix for each of the drug-indication pairs where each drug-indication relationship was switched to “unknown” one at the time and then recovered based on the remaining drug-phenotype pairs. Of the probabilistically significant pairs, 70% was successfully recovered. Next, we applied the model on the whole phenome to narrow down repositioning candidates and suggest alternative indications. We were able to retrieve approved indications of 6 drugs whose indications were not listed in SIDER. For 908 drugs that were present with their indication information, our model suggested alternative treatment options for further investigations. Several of the suggested new uses can be supported with information from the scientific literature. Conclusions The results demonstrated that the phenome can be further analyzed by a generative model, which can discover probabilistic associations between drugs and therapeutic uses. In this regard, LDA serves as an enrichment tool to explore new uses of existing drugs by narrowing down the search space. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-267) contains supplementary material, which is available to authorized users.
Collapse
|
37
|
Cheng L, Li J, Ju P, Peng J, Wang Y. SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association. PLoS One 2014; 9:e99415. [PMID: 24932637 PMCID: PMC4059643 DOI: 10.1371/journal.pone.0099415] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/14/2014] [Indexed: 01/20/2023] Open
Abstract
Background Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue. Methods SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity. Results The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.
Collapse
Affiliation(s)
- Liang Cheng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Jie Li
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Peng Ju
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Jiajie Peng
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
38
|
Wu J, Li Y, Jiang R. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies. PLoS Genet 2014; 10:e1004237. [PMID: 24651380 PMCID: PMC3961190 DOI: 10.1371/journal.pgen.1004237] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/27/2014] [Indexed: 01/06/2023] Open
Abstract
Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring. The detection of causative nonsynonymous single nucleotide variants (SNVs) is essential for the understanding of the pathogenesis of human inherited diseases. In this paper, we propose a statistical method called SPRING (Snv PRioritization via the INtegration of Genomic data) to combine six functional effect scores calculated by existing methods and five association scores derived from multiple genomic data sources to estimate the statistical significance that a nonsynonymous SNV is pathogenic for a query disease. We find that SPRING is effective in identifying disease-causing SNVs for diseases whose genetic bases are either partly known or completely unknown across a variety of inheritance styles. With real exome sequencing data, we show the qualified potential of SPRING in not only the detection of causative SNVs in simulation studies but also the identification of pathogenic de novo mutations for autism, epileptic encephalopathies and intellectual disability.
Collapse
Affiliation(s)
- Jiaxin Wu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
| | - Yanda Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST; Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| |
Collapse
|
39
|
Chen Y, Jacquemin T, Zhang S, Jiang R. Prioritizing protein complexes implicated in human diseases by network optimization. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 1:S2. [PMID: 24565064 PMCID: PMC4080363 DOI: 10.1186/1752-0509-8-s1-s2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background The detection of associations between protein complexes and human inherited diseases is of great importance in understanding mechanisms of diseases. Dysfunctions of a protein complex are usually defined by its member disturbance and consequently result in certain diseases. Although individual disease proteins have been widely predicted, computational methods are still absent for systematically investigating disease-related protein complexes. Results We propose a method, MAXCOM, for the prioritization of candidate protein complexes. MAXCOM performs a maximum information flow algorithm to optimize relationships between a query disease and candidate protein complexes through a heterogeneous network that is constructed by combining protein-protein interactions and disease phenotypic similarities. Cross-validation experiments on 539 protein complexes show that MAXCOM can rank 382 (70.87%) protein complexes at the top against protein complexes constructed at random. Permutation experiments further confirm that MAXCOM is robust to the network structure and parameters involved. We further analyze protein complexes ranked among top ten for breast cancer and demonstrate that the SWI/SNF complex is potentially associated with breast cancer. Conclusions MAXCOM is an effective method for the discovery of disease-related protein complexes based on network optimization. The high performance and robustness of this approach can facilitate not only pathologic studies of diseases, but also the design of drugs targeting on multiple proteins.
Collapse
|
40
|
Walking on a tissue-specific disease-protein-complex heterogeneous network for the discovery of disease-related protein complexes. BIOMED RESEARCH INTERNATIONAL 2013; 2013:732650. [PMID: 24455720 PMCID: PMC3888695 DOI: 10.1155/2013/732650] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 11/29/2022]
Abstract
Besides the pinpointing of individual disease-related genes, associating protein complexes to human inherited diseases is also of great importance, because a biological function usually arises from the cooperative behaviour of multiple proteins in a protein complex. Moreover, knowledge about disease-related protein complexes could also enhance the inference of disease genes and pathogenic genetic variants. Here, we have designed a computational systems biology approach to systematically analyse potential relationships between diseases and protein complexes. First, we construct a heterogeneous network which is composed of a disease-disease similarity layer, a tissue-specific protein-protein interaction layer, and a protein complex membership layer. Then, we propose a random walk model on this disease-protein-complex network for identifying protein complexes that are related to a query disease. With a series of leave-one-out cross-validation experiments, we show that our method not only possesses high performance but also demonstrates robustness regarding the parameters and the network structure. We further predict a landscape of associations between human diseases and protein complexes. This landscape can be used to facilitate the inference of disease genes, thereby benefiting studies on pathology of diseases.
Collapse
|
41
|
Chen Y, Wu X, Jiang R. Integrating human omics data to prioritize candidate genes. BMC Med Genomics 2013; 6:57. [PMID: 24344781 PMCID: PMC3878333 DOI: 10.1186/1755-8794-6-57] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 12/12/2013] [Indexed: 01/07/2023] Open
Abstract
Background The identification of genes involved in human complex diseases remains a great challenge in computational systems biology. Although methods have been developed to use disease phenotypic similarities with a protein-protein interaction network for the prioritization of candidate genes, other valuable omics data sources have been largely overlooked in these methods. Methods With this understanding, we proposed a method called BRIDGE to prioritize candidate genes by integrating disease phenotypic similarities with such omics data as protein-protein interactions, gene sequence similarities, gene expression patterns, gene ontology annotations, and gene pathway memberships. BRIDGE utilizes a multiple regression model with lasso penalty to automatically weight different data sources and is capable of discovering genes associated with diseases whose genetic bases are completely unknown. Results We conducted large-scale cross-validation experiments and demonstrated that more than 60% known disease genes can be ranked top one by BRIDGE in simulated linkage intervals, suggesting the superior performance of this method. We further performed two comprehensive case studies by applying BRIDGE to predict novel genes and transcriptional networks involved in obesity and type II diabetes. Conclusion The proposed method provides an effective and scalable way for integrating multi omics data to infer disease genes. Further applications of BRIDGE will be benefit to providing novel disease genes and underlying mechanisms of human diseases.
Collapse
Affiliation(s)
| | | | - Rui Jiang
- Department of Automation, MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
42
|
Panni S, Rombo SE. Searching for repetitions in biological networks: methods, resources and tools. Brief Bioinform 2013; 16:118-36. [PMID: 24300112 DOI: 10.1093/bib/bbt084] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We present here a compact overview of the data, models and methods proposed for the analysis of biological networks based on the search for significant repetitions. In particular, we concentrate on three problems widely studied in the literature: 'network alignment', 'network querying' and 'network motif extraction'. We provide (i) details of the experimental techniques used to obtain the main types of interaction data, (ii) descriptions of the models and approaches introduced to solve such problems and (iii) pointers to both the available databases and software tools. The intent is to lay out a useful roadmap for identifying suitable strategies to analyse cellular data, possibly based on the joint use of different interaction data types or analysis techniques.
Collapse
|
43
|
Leiserson MDM, Eldridge JV, Ramachandran S, Raphael BJ. Network analysis of GWAS data. Curr Opin Genet Dev 2013; 23:602-10. [PMID: 24287332 PMCID: PMC3867794 DOI: 10.1016/j.gde.2013.09.003] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 09/19/2013] [Accepted: 09/23/2013] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies (GWAS) identify genetic variants that distinguish a control population from a population with a specific trait. Two challenges in GWAS are: (1) identification of the causal variant within a longer haplotype that is associated with the trait; (2) identification of causal variants for polygenic traits that are caused by variants in multiple genes within a pathway. We review recent methods that use information in protein-protein and protein-DNA interaction networks to address these two challenges.
Collapse
Affiliation(s)
- Mark D M Leiserson
- Department of Computer Science, Brown University, Providence, RI 02912, United States; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, United States
| | | | | | | |
Collapse
|
44
|
Guo Y, Wei X, Das J, Grimson A, Lipkin S, Clark A, Yu H. Dissecting disease inheritance modes in a three-dimensional protein network challenges the "guilt-by-association" principle. Am J Hum Genet 2013; 93:78-89. [PMID: 23791107 DOI: 10.1016/j.ajhg.2013.05.022] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2012] [Revised: 05/02/2013] [Accepted: 05/23/2013] [Indexed: 10/26/2022] Open
Abstract
To better understand different molecular mechanisms by which mutations lead to various human diseases, we classified 82,833 disease-associated mutations according to their inheritance modes (recessive versus dominant) and molecular types (in-frame [missense point mutations and in-frame indels] versus truncating [nonsense mutations and frameshift indels]) and systematically examined the effects of different classes of disease mutations in a three-dimensional protein interactome network with the atomic-resolution interface resolved for each interaction. We found that although recessive mutations affecting the interaction interface of two interacting proteins tend to cause the same disease, this widely accepted "guilt-by-association" principle does not apply to dominant mutations. Furthermore, recessive truncating mutations in regions encoding the same interface are much more likely to cause the same disease, even for interfaces close to the N terminus of the protein. Conversely, dominant truncating mutations tend to be enriched in regions encoding areas between interfaces. These results suggest that a significant fraction of truncating mutations can generate functional protein products. For example, TRIM27, a known cancer-associated protein, interacts with three proteins (MID2, TRIM42, and SIRPA) through two different interfaces. A dominant truncating mutation (c.1024delT [p.Tyr342Thrfs*30]) associated with ovarian carcinoma is located between the regions encoding the two interfaces; the altered protein retains its interaction with MID2 and TRIM42 through the first interface but loses its interaction with SIRPA through the second interface. Our findings will help clarify the molecular mechanisms of thousands of disease-associated genes and their tens of thousands of mutations, especially for those carrying truncating mutations, often erroneously considered "knockout" alleles.
Collapse
|
45
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 522] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
46
|
Abstract
Abstract
Purpose
Complex networks seem to be ubiquitous objects in contemporary research, both in the natural and social sciences. An important area of research regarding the applicability and modeling of graph- theoretical-oriented approaches to complex systems, is the probabilistic inference of such networks. There exist different methods and algorithms designed for this purpose, most of them are inspired in statistical mechanics and rely on information theoretical grounds. An important shortcoming for most of these methods, when it comes to disentangle the actual structure of complex networks, is that they fail to distinguish between direct and indirect interactions. Here, we suggest a method to discover and assess for such indirect interactions within the framework of information theory.
Methods
Information-theoretical measures (in particular, Mutual Information) are applied for the probabilistic inference of complex networks. Data Processing Inequality is used to find and assess for direct and indirect interactions impact in complex networks.
Results
We outline the mathematical basis of information-theoretical assessment of complex network structure and discuss some examples of application in the fields of biological systems and social networks.
Conclusions
Information theory provides to the field of complex networks analysis with effective means for structural assessment with a computational burden low enough to be useful in both, Biological and Social network analysis.
Collapse
|
47
|
Cannistraci CV, Ogorevc J, Zorc M, Ravasi T, Dovc P, Kunej T. Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies. BMC Med Genomics 2013; 6:5. [PMID: 23410028 PMCID: PMC3626861 DOI: 10.1186/1755-8794-6-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 02/06/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. METHODS Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). RESULTS The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. CONCLUSIONS The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Collapse
Affiliation(s)
- Carlo V Cannistraci
- Integrative Systems Biology Laboratory, Biological and Environmental Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University for Science and Technology, Thuwal, Saudi Arabia.
| | | | | | | | | | | |
Collapse
|
48
|
Ma X, Gao L. Biological network analysis: insights into structure and functions. Brief Funct Genomics 2012; 11:434-442. [PMID: 23184677 DOI: 10.1093/bfgp/els045] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In the past two decades, great efforts have been devoted to extract the dependence and interplay between structure and functions in biological networks because they have strong relevance to biological processes. In this article, we reviewed the recent development in the biological network analysis. In detail, we first reviewed the interactome topological properties of biological networks, the methods for structure and functional patterns.
Collapse
Affiliation(s)
- Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No. 2 South TaiBai Road, Xi'an, Shaanxi 710071, P.R. China
| | | |
Collapse
|
49
|
|
50
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|