51
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
52
|
Malod-Dognin N, Pržulj N. Functional geometry of protein interactomes. Bioinformatics 2019; 35:3727-3734. [PMID: 30821317 DOI: 10.1093/bioinformatics/btz146] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 01/15/2019] [Accepted: 02/25/2019] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. RESULTS To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker's yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes. AVAILABILITY AND IMPLEMENTATION Codes and datasets are freely available at http://www0.cs.ucl.ac.uk/staff/natasa/Simplets/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noël Malod-Dognin
- Department of Life Sciences, Barcelona Supercomputing Center, Barcelona, Spain
| | - Nataša Pržulj
- Department of Life Sciences, Barcelona Supercomputing Center, Barcelona, Spain.,ICREA, Pg. Lluís Companys 23, Barcelona, Spain
| |
Collapse
|
53
|
Vijayan V, Milenkovic T. Aligning dynamic networks with DynaWAVE. Bioinformatics 2019; 34:1795-1798. [PMID: 29300873 DOI: 10.1093/bioinformatics/btx841] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 12/27/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Network alignment (NA) aims to find similar (conserved) regions between networks, such as cellular networks of different species. Until recently, existing methods were limited to aligning static networks. However, real-world systems, including cellular functioning, are dynamic. Hence, in our previous work, we introduced the first ever dynamic NA method, DynaMAGNA++, which improved upon the traditional static NA. However, DynaMAGNA++ does not necessarily scale well to larger networks in terms of alignment quality or runtime. Results To address this, we introduce a new dynamic NA approach, DynaWAVE. We show that DynaWAVE complements DynaMAGNA++: while DynaMAGNA++ is more accurate yet slower than DynaWAVE for smaller networks, DynaWAVE is both more accurate and faster than DynaMAGNA++ for larger networks. We provide a friendly user interface and source code for DynaWAVE. Availability and implementation https://www.nd.edu/∼cone/DynaWAVE/. Contact tmilenko@nd.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vipin Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
54
|
Graph matching approach and generalized median graph for automatic labeling of cortical sulci with parallel and distributed algorithms. COGN SYST RES 2019. [DOI: 10.1016/j.cogsys.2018.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
55
|
Guzzi PH, Milenkovic T. Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin. Brief Bioinform 2019; 19:472-481. [PMID: 28062413 DOI: 10.1093/bib/bbw132] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Indexed: 12/23/2022] Open
Abstract
Analogous to genomic sequence alignment that allows for across-species transfer of biological knowledge between conserved sequence regions, biological network alignment can be used to guide the knowledge transfer between conserved regions of molecular networks of different species. Hence, biological network alignment can be used to redefine the traditional notion of a sequence-based homology to a new notion of network-based homology. Analogous to genomic sequence alignment, there exist local and global biological network alignments. Here, we survey prominent and recent computational approaches of each network alignment type and discuss their (dis)advantages. Then, as it was recently shown that the two approach types are complementary, in the sense that they capture different slices of cellular functioning, we discuss the need to reconcile the two network alignment types and present a recent first step in this direction. We conclude with some open research problems on this topic and comment on the usefulness of network alignment in other domains besides computational biology.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, University Magna Graecia, Catanzaro, 88100 Italy
| | - Tijana Milenkovic
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications (iCeNSA), ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
56
|
Fast Subgraph Matching Strategies Based on Pattern-Only Heuristics. Interdiscip Sci 2019; 11:21-32. [PMID: 30790228 DOI: 10.1007/s12539-019-00323-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 01/28/2019] [Accepted: 02/02/2019] [Indexed: 12/22/2022]
Abstract
Many scientific applications entail solving the subgraph isomorphism problem, i.e., given an input pattern graph, find all the subgraphs of a (usually much larger) target graph that are structurally equivalent to that input. Because subgraph isomorphism is NP-complete, methods to solve it have to use heuristics. This work evaluates subgraph isomorphism methods to assess their computational behavior on a wide range of synthetic and real graphs. Surprisingly, our experiments show that, among the leading algorithms, certain heuristics based only on pattern graphs are the most efficient.
Collapse
|
57
|
Malod-Dognin N, Petschnigg J, Windels SFL, Povh J, Hemingway H, Ketteler R, Pržulj N. Towards a data-integrated cell. Nat Commun 2019; 10:805. [PMID: 30778056 PMCID: PMC6379402 DOI: 10.1038/s41467-019-08797-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 01/18/2019] [Accepted: 01/25/2019] [Indexed: 01/01/2023] Open
Abstract
We are increasingly accumulating molecular data about a cell. The challenge is how to integrate them within a unified conceptual and computational framework enabling new discoveries. Hence, we propose a novel, data-driven concept of an integrated cell, iCell. Also, we introduce a computational prototype of an iCell, which integrates three omics, tissue-specific molecular interaction network types. We construct iCells of four cancers and the corresponding tissue controls and identify the most rewired genes in cancer. Many of them are of unknown function and cannot be identified as different in cancer in any specific molecular network. We biologically validate that they have a role in cancer by knockdown experiments followed by cell viability assays. We find additional support through Kaplan-Meier survival curves of thousands of patients. Finally, we extend this analysis to uncover pan-cancer genes. Our methodology is universal and enables integrative comparisons of diverse omics data over cells and tissues.
Collapse
Affiliation(s)
- Noël Malod-Dognin
- Department of Computer Science, University College London, London, WC1E 6BT, UK
- Department of Life Science, Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Julia Petschnigg
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Sam F L Windels
- Department of Computer Science, University College London, London, WC1E 6BT, UK
| | - Janez Povh
- Faculty of Mechanical Engineering, University of Ljubljana, Ljubljana, 1000, Slovenia
| | - Harry Hemingway
- Health Data Research UK London, University College London, London, WC1E 6BT, UK
- Institute of Health Informatics, University College London, London, WC1E 6BT, UK
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, W1T 7DN, UK
| | - Robin Ketteler
- MRC Laboratory for Molecular Cell Biology, University College London, London, WC1E 6BT, UK
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, UK.
- Department of Life Science, Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain.
- ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
58
|
Melckenbeeck I, Audenaert P, Van Parys T, Van De Peer Y, Colle D, Pickavet M. Optimising orbit counting of arbitrary order by equation selection. BMC Bioinformatics 2019; 20:27. [PMID: 30646859 PMCID: PMC6334470 DOI: 10.1186/s12859-018-2483-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 11/09/2018] [Indexed: 11/25/2022] Open
Abstract
Background Graphlets are useful for bioinformatics network analysis. Based on the structure of Hočevar and Demšar’s ORCA algorithm, we have created an orbit counting algorithm, named Jesse. This algorithm, like ORCA, uses equations to count the orbits, but unlike ORCA it can count graphlets of any order. To do so, it generates the required internal structures and equations automatically. Many more redundant equations are generated, however, and Jesse’s running time is highly dependent on which of these equations are used. Therefore, this paper aims to investigate which equations are most efficient, and which factors have an effect on this efficiency. Results With appropriate equation selection, Jesse’s running time may be reduced by a factor of up to 2 in the best case, compared to using randomly selected equations. Which equations are most efficient depends on the density of the graph, but barely on the graph type. At low graph density, equations with terms in their right-hand side with few arguments are more efficient, whereas at high density, equations with terms with many arguments in the right-hand side are most efficient. At a density between 0.6 and 0.7, both types of equations are about equally efficient. Conclusions Our Jesse algorithm became up to a factor 2 more efficient, by automatically selecting the best equations based on graph density. It was adapted into a Cytoscape App that is freely available from the Cytoscape App Store to ease application by bioinformaticians.
Collapse
Affiliation(s)
- Ine Melckenbeeck
- Ghent University - imec, IDLab, Technologiepark 15, Ghent, 9052, Belgium
| | - Pieter Audenaert
- Ghent University - imec, IDLab, Technologiepark 15, Ghent, 9052, Belgium. .,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
| | - Thomas Van Parys
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.,Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium
| | - Yves Van De Peer
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.,Department of Plant Systems Biology, VIB, Technologiepark 927, Ghent, 9052, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, Ghent, 9052, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Didier Colle
- Ghent University - imec, IDLab, Technologiepark 15, Ghent, 9052, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Mario Pickavet
- Ghent University - imec, IDLab, Technologiepark 15, Ghent, 9052, Belgium.,Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
59
|
Das B, Patil AR, Mitra P. A network-based zoning for parallel whole-cell simulation. Bioinformatics 2019; 35:88-94. [PMID: 29955764 DOI: 10.1093/bioinformatics/bty530] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/27/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation In Computational Cell Biology, whole-cell modeling and simulation is an absolute requirement to analyze and explore the cell of an organism. Despite few individual efforts on modeling, the prime obstacle hindering its development and progress is its compute-intensive nature. Towards this end, little knowledge is available on how to reduce the enormous computational overhead and which computational systems will be of use. Results In this article, we present a network-based zoning approach that could potentially be utilized in the parallelization of whole-cell simulations. Firstly, we construct the protein-protein interaction graph of the whole-cell of an organism using experimental data from various sources. Based on protein interaction information, we predict protein locality and allocate confidence score to the interactions accordingly. We then identify the modules of strictly localized interacting proteins by performing interaction graph clustering based on the confidence score of the interactions. By applying this method to Escherichia coli K12, we identified 188 spatially localized clusters. After a thorough Gene Ontology-based analysis, we proved that the clusters are also in functional proximity. We then conducted Principal Coordinates Analysis to predict the spatial distribution of the clusters in the simulation space. Our automated computational techniques can partition the entire simulation space (cell) into simulation sub-cells. Each of these sub-cells can be simulated on separate computing units of the High-Performance Computing (HPC) systems. We benchmarked our method using proteins. However, our method can be extended easily to add other cellular components like DNA, RNA and metabolites. Availability and implementation . Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Barnali Das
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| | - Abhijeet Rajendra Patil
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India
| |
Collapse
|
60
|
Rossi RA, Zhou R, Ahmed NK. Estimation of Graphlet Counts in Massive Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:44-57. [PMID: 29994543 DOI: 10.1109/tnnls.2018.2826529] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms; however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this paper, we propose an unbiased graphlet estimation framework that is: (a) fast with large speedups compared to the state of the art; (b) parallel with nearly linear speedups; (c) accurate with less than 1% relative error; (d) scalable and space efficient for massive networks with billions of edges; and (e) effective for a variety of real-world settings as well as estimating global and local graphlet statistics (e.g., counts). On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is vastly more accurate than the existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date.
Collapse
|
61
|
Wang P, Zhao J, Zhang X, Tao J, Guan X. SNOD: a fast sampling method of exploring node orbit degrees for large graphs. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1301-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
62
|
Chang HJ, Fischer T, Petit M, Zambelli M, Demiris Y. Learning Kinematic Structure Correspondences Using Multi-Order Similarities. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:2920-2934. [PMID: 29989982 DOI: 10.1109/tpami.2017.2777486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we present a novel framework for finding the kinematic structure correspondences between two articulated objects in videos via hypergraph matching. In contrast to appearance and graph alignment based matching methods, which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Thus our method allows matching the structure of objects which have similar topologies or motions, or a combination of the two. Our main contributions can be summarised as follows: (i) casting the kinematic structure correspondence problem into a hypergraph matching problem by incorporating multi-order similarities with normalising weights, (ii) introducing a structural topology similarity measure by aggregating topology constrained subgraph isomorphisms, (iii) measuring kinematic correlations between pairwise nodes, and (iv) proposing a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, outperforming various other state of the art methods. Our method is not limited to a specific application nor sensor, and can be used as building block in applications such as action recognition, human motion retargeting to robots, and articulated object manipulation amongst others.
Collapse
|
63
|
Vijayan V, Milenkovic T. Multiple Network Alignment via MultiMAGNA+. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1669-1682. [PMID: 28829315 DOI: 10.1109/tcbb.2017.2740381] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Network alignment (NA) aims to find a node mapping that identifies topologically or functionally similar network regions between molecular networks of different species. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA method called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ outperforms or is on par with existing MNA methods, while often completing faster than existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more fair MNA method comparison compared to the existing alignment quality measures. The multiMAGNA++ code is available on the method's web page at http://nd.edu/~cone/multiMAGNA++/.
Collapse
|
64
|
Gaudelet T, Malod-Dognin N, Pržulj N. Higher-order molecular organization as a source of biological function. Bioinformatics 2018; 34:i944-i953. [PMID: 30423061 PMCID: PMC6129285 DOI: 10.1093/bioinformatics/bty570] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Motivation Molecular interactions have widely been modelled as networks. The local wiring patterns around molecules in molecular networks are linked with their biological functions. However, networks model only pairwise interactions between molecules and cannot explicitly and directly capture the higher-order molecular organization, such as protein complexes and pathways. Hence, we ask if hypergraphs (hypernetworks), that directly capture entire complexes and pathways along with protein-protein interactions (PPIs), carry additional functional information beyond what can be uncovered from networks of pairwise molecular interactions. The mathematical formalism of a hypergraph has long been known, but not often used in studying molecular networks due to the lack of sophisticated algorithms for mining the underlying biological information hidden in the wiring patterns of molecular systems modelled as hypernetworks. Results We propose a new, multi-scale, protein interaction hypernetwork model that utilizes hypergraphs to capture different scales of protein organization, including PPIs, protein complexes and pathways. In analogy to graphlets, we introduce hypergraphlets, small, connected, non-isomorphic, induced sub-hypergraphs of a hypergraph, to quantify the local wiring patterns of these multi-scale molecular hypergraphs and to mine them for new biological information. We apply them to model the multi-scale protein networks of bakers yeast and human and show that the higher-order molecular organization captured by these hypergraphs is strongly related to the underlying biology. Importantly, we demonstrate that our new models and data mining tools reveal different, but complementary biological information compared with classical PPI networks. We apply our hypergraphlets to successfully predict biological functions of uncharacterized proteins. Availability and implementation Code and data are available online at http://www0.cs.ucl.ac.uk/staff/natasa/hypergraphlets.
Collapse
Affiliation(s)
- Thomas Gaudelet
- Department of Computer Science, University College London, London, UK
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, London, UK
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, UK
| |
Collapse
|
65
|
Gu S, Johnson J, Faisal FE, Milenković T. From homogeneous to heterogeneous network alignment via colored graphlets. Sci Rep 2018; 8:12524. [PMID: 30131590 PMCID: PMC6104050 DOI: 10.1038/s41598-018-30831-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 08/07/2018] [Indexed: 11/19/2022] Open
Abstract
Network alignment (NA) compares networks with the goal of finding a node mapping that uncovers highly similar (conserved) network regions. Existing NA methods are homogeneous, i.e., they can deal only with networks containing nodes and edges of one type. Due to increasing amounts of heterogeneous network data with nodes or edges of different types, we extend three recent state-of-the-art homogeneous NA methods, WAVE, MAGNA++, and SANA, to allow for heterogeneous NA for the first time. We introduce several algorithmic novelties. Namely, these existing methods compute homogeneous graphlet-based node similarities and then find high-scoring alignments with respect to these similarities, while simultaneously maximizing the amount of conserved edges. Instead, we extend homogeneous graphlets to their heterogeneous counterparts, which we then use to develop a new measure of heterogeneous node similarity. Also, we extend S3, a state-of-the-art measure of edge conservation for homogeneous NA, to its heterogeneous counterpart. Then, we find high-scoring alignments with respect to our heterogeneous node similarity and edge conservation measures. In evaluations on synthetic and real-world biological networks, our proposed heterogeneous NA methods lead to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts. The software and data from this work is available at https://nd.edu/~cone/colored_graphlets/.
Collapse
Affiliation(s)
- Shawn Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - John Johnson
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.
- Eck Institute for Global Health and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
66
|
ClueNet: Clustering a temporal network based on topological similarity rather than denseness. PLoS One 2018; 13:e0195993. [PMID: 29738568 PMCID: PMC5940177 DOI: 10.1371/journal.pone.0195993] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 04/04/2018] [Indexed: 11/19/2022] Open
Abstract
Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
Collapse
|
67
|
Huang J, Gong M, Ma L. A Global Network Alignment Method Using Discrete Particle Swarm Optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:705-718. [PMID: 27775534 DOI: 10.1109/tcbb.2016.2618380] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Molecular interactions data increase exponentially with the advance of biotechnology. This makes it possible and necessary to comparatively analyze the different data at a network level. Global network alignment is an important network comparison approach to identify conserved subnetworks and get insight into evolutionary relationship across species. Network alignment which is analogous to subgraph isomorphism is known to be an NP-hard problem. In this paper, we introduce a novel heuristic Particle-Swarm-Optimization based Network Aligner (PSONA), which optimizes a weighted global alignment model considering both protein sequence similarity and interaction conservations. The particle statuses and status updating rules are redefined in a discrete form by using permutation. A seed-and-extend strategy is employed to guide the searching for the superior alignment. The proposed initialization method "seeds" matches with high sequence similarity into the alignment, which guarantees the functional coherence of the mapping nodes. A greedy local search method is designed as the "extension" procedure to iteratively optimize the edge conservations. PSONA is compared with several state-of-art methods on ten network pairs combined by five species. The experimental results demonstrate that the proposed aligner can map the proteins with high functional coherence and can be used as a booster to effectively refine the well-studied aligners.
Collapse
|
68
|
Cannoodt R, Ruyssinck J, Ramon J, De Preter K, Saeys Y. IncGraph: Incremental graphlet counting for topology optimisation. PLoS One 2018; 13:e0195997. [PMID: 29698494 PMCID: PMC5919487 DOI: 10.1371/journal.pone.0195997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 04/04/2018] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. RESULTS IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. AVAILABILITY IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).
Collapse
Affiliation(s)
- Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Joeri Ruyssinck
- IDLab, Department of Information Technology, Ghent University – imec, Ghent, Belgium
| | - Jan Ramon
- Department of Computer Science, KU Leuven, Belgium
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium
- * E-mail:
| |
Collapse
|
69
|
Liu S, Hachen D, Lizardo O, Poellabauer C, Striegel A, Milenković T. Network analysis of the NetHealth data: exploring co-evolution of individuals' social network positions and physical activities. APPLIED NETWORK SCIENCE 2018; 3:45. [PMID: 30465021 PMCID: PMC6223883 DOI: 10.1007/s41109-018-0103-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Accepted: 09/25/2018] [Indexed: 05/03/2023]
Abstract
Understanding the relationship between individuals' social networks and health could help devise public health interventions for reducing incidence of unhealthy behaviors or increasing prevalence of healthy ones. In this context, we explore the co-evolution of individuals' social network positions and physical activities. We are able to do so because the NetHealth study at the University of Notre Dame has generated both high-resolution longitudinal social network (e.g., SMS) data and high-resolution longitudinal health-related behavioral (e.g., Fitbit physical activity) data. We examine trait differences between (i) users whose social network positions (i.e., centralities) change over time versus those whose centralities remain stable, (ii) users whose Fitbit physical activities change over time versus those whose physical activities remain stable, and (iii) users whose centralities and their physical activities co-evolve, i.e., correlate with each other over time. We find that centralities of a majority of all nodes change with time. These users do not show any trait difference compared to time-stable users. However, if out of all users whose centralities change with time we focus on those whose physical activities also change with time, then the resulting users are more likely to be introverted than time-stable users. Moreover, users whose centralities and physical activities both change with time and whose evolving centralities are significantly correlated (i.e., co-evolve) with evolving physical activities are more likely to be introverted as well as anxious compared to those users who are time-stable and do not have a co-evolution relationship. Our network analysis framework reveals several links between individuals' social network structure, health-related behaviors, and the other (e.g., personality) traits. In the future, our study could lead to development of a predictive model of social network structure from behavioral/trait information and vice versa.
Collapse
Affiliation(s)
- Shikang Liu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, 46556 IN USA
| | - David Hachen
- Department of Sociology, University of Notre Dame, Notre Dame, 46556 IN USA
| | - Omar Lizardo
- Department of Sociology, University of Notre Dame, Notre Dame, 46556 IN USA
| | - Christian Poellabauer
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, 46556 IN USA
| | - Aaron Striegel
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, 46556 IN USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, 46556 IN USA
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, 46556 IN USA
- Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, 46556 IN USA
| |
Collapse
|
70
|
GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison. Sci Rep 2017; 7:14890. [PMID: 29097661 PMCID: PMC5668259 DOI: 10.1038/s41598-017-14411-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 10/11/2017] [Indexed: 12/26/2022] Open
Abstract
Initial protein structural comparisons were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly and are alignment-based. Instead, 3D structures can be modeled as protein structure networks (PSNs). Then, network approaches can compare proteins by comparing their PSNs. These can be alignment-based or alignment-free. We focus on the latter. Existing network alignment-free approaches have drawbacks: 1) They rely on naive measures of network topology. 2) They are not robust to PSN size. They cannot integrate 3) multiple PSN measures or 4) PSN data with sequence data, although this could improve comparison because the different data types capture complementary aspects of the protein structure. We address this by: 1) exploiting well-established graphlet measures via a new network alignment-free approach, 2) introducing normalized graphlet measures to remove the bias of PSN size, 3) allowing for integrating multiple PSN measures, and 4) using ordered graphlets to combine the complementary PSN data and sequence (specifically, residue order) data. We compare synthetic networks and real-world PSNs more accurately and faster than existing network (alignment-free and alignment-based), 3D contact, or sequence approaches.
Collapse
|
71
|
Aparicio D, Ribeiro P, Silva F. Extending the Applicability of Graphlets to Directed Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1302-1315. [PMID: 27362986 DOI: 10.1109/tcbb.2016.2586046] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights into molecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Graphlets are particularly well suited to compare networks and to assess their level of similarity due to the rich topological information that they offer but are almost always used as small undirected graphs of up to five nodes, thus limiting their applicability in directed networks. However, a large set of interesting biological networks such as metabolic, cell signaling, or transcriptional regulatory networks are intrinsically directional, and using metrics that ignore edge direction may gravely hinder information extraction. Our main purpose in this work is to extend the applicability of graphlets to directed networks by considering their edge direction, thus providing a powerful basis for the analysis of directed biological networks. We tested our approach on two network sets, one composed of synthetic graphs and another of real directed biological networks, and verified that they were more accurately grouped using directed graphlets than undirected graphlets. It is also evident that directed graphlets offer substantially more topological information than simple graph metrics such as degree distribution or reciprocity. However, enumerating graphlets in large networks is a computationally demanding task. Our implementation addresses this concern by using a state-of-the-art data structure, the g-trie, which is able to greatly reduce the necessary computation. We compared our tool to other state-of-the art methods and verified that it is the fastest general tool for graphlet counting.
Collapse
|
72
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
73
|
Yoo B, Faisal FE, Chen H, Milenkovic T. Improving Identification of Key Players in Aging via Network De-Noising and Core Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1056-1069. [PMID: 26529776 DOI: 10.1109/tcbb.2015.2495170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Current "ground truth" knowledge about human aging has been obtained by transferring aging-related knowledge from well-studied model species via sequence homology or by studying human gene expression data. Since proteins function by interacting with each other, analyzing protein-protein interaction (PPI) networks in the context of aging is promising. Unlike existing static network research of aging, since cellular functioning is dynamic, we recently integrated the static human PPI network with aging-related gene expression data to form dynamic, age-specific networks. Then, we predicted as key players in aging those proteins whose network topologies significantly changed with age. Since current networks are noisy , here, we use link prediction to de-noise the human network and predict improved key players in aging from the de-noised data. Indeed, de-noising gives more significant overlap between the predicted data and the "ground truth" aging-related data. Yet, we obtain novel predictions, which we validate in the literature. Also, we improve the predictions by an alternative strategy: removing "redundant" edges from the age-specific networks and using the resulting age-specific network "cores" to study aging. We produce new knowledge from dynamic networks encompassing multiple data types, via network de-noising or core inference, complementing the existing knowledge obtained from sequence or expression data.
Collapse
|
74
|
Abstract
MOTIVATION Network alignment (NA) aims to find a node mapping that conserves similar regions between compared networks. NA is applicable to many fields, including computational biology, where NA can guide the transfer of biological knowledge from well- to poorly-studied species across aligned network regions. Existing NA methods can only align static networks. However, most complex real-world systems evolve over time and should thus be modeled as dynamic networks. We hypothesize that aligning dynamic network representations of evolving systems will produce superior alignments compared to aligning the systems' static network representations, as is currently done. RESULTS For this purpose, we introduce the first ever dynamic NA method, DynaMAGNA ++. This proof-of-concept dynamic NA method is an extension of a state-of-the-art static NA method, MAGNA++. Even though both MAGNA++ and DynaMAGNA++ optimize edge as well as node conservation across the aligned networks, MAGNA++ conserves static edges and similarity between static node neighborhoods, while DynaMAGNA++ conserves dynamic edges (events) and similarity between evolving node neighborhoods. For this purpose, we introduce the first ever measure of dynamic edge conservation and rely on our recent measure of dynamic node conservation. Importantly, the two dynamic conservation measures can be optimized with any state-of-the-art NA method and not just MAGNA++. We confirm our hypothesis that dynamic NA is superior to static NA, on synthetic and real-world networks, in computational biology and social domains. DynaMAGNA++ is parallelized and has a user-friendly graphical interface. AVAILABILITY AND IMPLEMENTATION http://nd.edu/∼cone/DynaMAGNA++/ . CONTACT tmilenko@nd.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- V Vijayan
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - D Critchlow
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
- Department of Physics and Astronomy, Austin Peay State University, Clarksville, Tennessee, TN, USA
| | - T Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
75
|
Wong SWH, Pastrello C, Kotlyar M, Faloutsos C, Jurisica I. Modeling tumor progression via the comparison of stage-specific graphs. Methods 2017; 132:34-41. [PMID: 28684340 DOI: 10.1016/j.ymeth.2017.06.033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 05/09/2017] [Accepted: 06/29/2017] [Indexed: 01/09/2023] Open
Abstract
Can we use graph mining algorithms to find patterns in tumor molecular mechanisms? Can we model disease progression with multiple time-specific graph comparison algorithms? In this paper, we will focus on this area. Our main contributions are 1) we proposed the Temporal-Omics (Temp-O) workflow to model tumor progression in non-small cell lung cancer (NSCLC) using graph comparisons between multiple stage-specific graphs, and 2) we showed that temporal structures are meaningful in the tumor progression of NSCLC. Other identified temporal structures that were not highlighted in this paper may also be used to gain insights to possible novel mechanisms. Importantly, the Temp-O workflow is generic; while we applied it on NSCLC, it can be applied in other cancers and diseases. We used gene expression data from tumor samples across disease stages to model lung cancer progression, creating stage-specific tumor graphs. Validating our findings in independent datasets showed that differences in temporal network structures capture diverse mechanisms in NSCLC. Furthermore, results showed that structures are consistent and potentially biologically important as we observed that genes with similar protein names were captured in the same cliques for all cliques in all datasets. Importantly, the identified temporal structures are meaningful in the tumor progression of NSCLC as they agree with the molecular mechanism in the tumor progression or carcinogenesis of NSCLC. In particular, the identified major histocompatibility complex of class II temporal structures capture mechanisms concerning carcinogenesis; the proteasome temporal structures capture mechanisms that are in early or late stages of lung cancer; the ribosomal cliques capture the role of ribosome biosynthesis in cancer development and sustainment. Further, on a large independent dataset we validated that temporal network structures identified proteins that are prognostic for overall survival in NSCLC adenocarcinoma.
Collapse
Affiliation(s)
- Serene W H Wong
- Princess Margaret Cancer Centre, UHN, 101 College Street, M5G 1L7, Toronto, Canada.
| | - Chiara Pastrello
- Princess Margaret Cancer Centre, UHN, 101 College Street, M5G 1L7, Toronto, Canada.
| | - Max Kotlyar
- Princess Margaret Cancer Centre, UHN, 101 College Street, M5G 1L7, Toronto, Canada.
| | - Christos Faloutsos
- Department of Computer Science, Carnegie Mellon University, Pittsburgh, United States.
| | - Igor Jurisica
- Princess Margaret Cancer Centre, UHN, 101 College Street, M5G 1L7, Toronto, Canada; TECHNA Institute for the Advancement of Technology for Health, UHN, 101 College Street, M5G 1L7, Toronto, Canada; Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, Canada; Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia.
| |
Collapse
|
76
|
Ortmann M, Brandes U. Efficient orbit-aware triad and quad census in directed and undirected graphs. APPLIED NETWORK SCIENCE 2017; 2:13. [PMID: 30443568 PMCID: PMC6214268 DOI: 10.1007/s41109-017-0027-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 04/20/2017] [Indexed: 06/09/2023]
Abstract
The prevalence of select substructures is an indicator of network effects in applications such as social network analysis and systems biology. Moreover, subgraph statistics are pervasive in stochastic network models, and they need to be assessed repeatedly in MCMC sampling and estimation algorithms. We present a new approach to count all induced and non-induced four-node subgraphs (the quad census) on a per-node and per-edge basis, complete with a separation into their non-automorphic roles in these subgraphs. It is the first approach to do so in a unified manner, and is based on only a clique-listing subroutine. Computational experiments indicate that, despite its simplicity, the approach outperforms previous, less general approaches. By way of the more presentable triad census, we additionally show how to extend the quad census to directed graphs. As a byproduct we obtain the asymptotically fastest triad census algorithm to date.
Collapse
Affiliation(s)
- Mark Ortmann
- Department of Computer & Information Science, University of Konstanz, Box 67, Konstanz, 78457 Germany
| | - Ulrik Brandes
- Department of Computer & Information Science, University of Konstanz, Box 67, Konstanz, 78457 Germany
| |
Collapse
|
77
|
Combinatorial algorithm for counting small induced graphs and orbits. PLoS One 2017; 12:e0171428. [PMID: 28182743 PMCID: PMC5300269 DOI: 10.1371/journal.pone.0171428] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 01/20/2017] [Indexed: 11/19/2022] Open
Abstract
Graphlet analysis is an approach to network analysis that is particularly popular in bioinformatics. We show how to set up a system of linear equations that relate the orbit counts and can be used in an algorithm that is significantly faster than the existing approaches based on direct enumeration of graphlets. The approach presented in this paper presents a generalization of the currently fastest method for counting 5-node graphlets in bioinformatics. The algorithm requires existence of a vertex with certain properties; we show that such vertex exists for graphlets of arbitrary size, except for complete graphs and a cycle with four nodes, which are treated separately. Empirical analysis of running time agrees with the theoretical results.
Collapse
|
78
|
Bonnici V, Giugno R. On the Variable Ordering in Subgraph Isomorphism Algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:193-203. [PMID: 26761859 DOI: 10.1109/tcbb.2016.2515595] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Graphs are mathematical structures to model several biological data. Applications to analyze them require to apply solutions for the subgraph isomorphism problem, which is NP-complete. Here, we investigate the existing strategies to reduce the subgraph isomorphism algorithm running time with emphasis on the importance of the order with which the graph vertices are taken into account during the search, called variable ordering, and its incidence on the total running time of the algorithms. We focus on two recent solutions, which are based on an effective variable ordering strategy. We discuss their comparison both with the variable ordering strategies reviewed in the paper and the other algorithms present in the ICPR2014 contest on graph matching algorithms for pattern search in biological databases.
Collapse
|
79
|
SCOUT: simultaneous time segmentation and community detection in dynamic networks. Sci Rep 2016; 6:37557. [PMID: 27881879 PMCID: PMC5121586 DOI: 10.1038/srep37557] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 11/01/2016] [Indexed: 11/24/2022] Open
Abstract
Many evolving complex real-world systems can be modeled via dynamic networks. An important problem in dynamic network research is community detection, which finds groups of topologically related nodes. Typically, this problem is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization. The reality likely lies between these two extremes. To find the compromise, we consider community detection in the context of the problem of segment detection, which identifies contiguous time periods with consistent network structure. Consequently, we formulate a combined problem of segment community detection (SCD), which simultaneously partitions the network into contiguous time segments with consistent community organization and finds this community organization for each segment. To solve SCD, we introduce SCOUT, an optimization framework that explicitly considers both segmentation quality and partition quality. SCOUT addresses limitations of existing methods that can be adapted to solve SCD, which consider only one of segmentation quality or partition quality. In a thorough evaluation, SCOUT outperforms the existing methods in terms of both accuracy and computational complexity. We apply SCOUT to biological network data to study human aging.
Collapse
|
80
|
Cho H, Berger B, Peng J. Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst 2016; 3:540-548.e5. [PMID: 27889536 DOI: 10.1016/j.cels.2016.10.017] [Citation(s) in RCA: 156] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 08/14/2016] [Accepted: 10/19/2016] [Indexed: 01/18/2023]
Abstract
The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
81
|
Sarajlić A, Malod-Dognin N, Yaveroğlu ÖN, Pržulj N. Graphlet-based Characterization of Directed Networks. Sci Rep 2016; 6:35098. [PMID: 27734973 PMCID: PMC5062067 DOI: 10.1038/srep35098] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 09/26/2016] [Indexed: 01/22/2023] Open
Abstract
We are flooded with large-scale, dynamic, directed, networked data. Analyses requiring exact comparisons between networks are computationally intractable, so new methodologies are sought. To analyse directed networks, we extend graphlets (small induced sub-graphs) and their degrees to directed data. Using these directed graphlets, we generalise state-of-the-art network distance measures (RGF, GDDA and GCD) to directed networks and show their superiority for comparing directed networks. Also, we extend the canonical correlation analysis framework that enables uncovering the relationships between the wiring patterns around nodes in a directed network and their expert annotations. On directed World Trade Networks (WTNs), our methodology allows uncovering the core-broker-periphery structure of the WTN, predicting the economic attributes of a country, such as its gross domestic product, from its wiring patterns in the WTN for up-to ten years in the future. It does so by enabling us to track the dynamics of a country's positioning in the WTN over years. On directed metabolic networks, our framework yields insights into preservation of enzyme function from the network wiring patterns rather than from sequence data. Overall, our methodology enables advanced analyses of directed networked data from any area of science, allowing domain-specific interpretation of a directed network's topology.
Collapse
Affiliation(s)
- Anida Sarajlić
- Department of Computing, Imperial College London, SW7 2AZ London, UK
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, WC1E 6BT London, UK
| | | | - Nataša Pržulj
- Department of Computer Science, University College London, WC1E 6BT London, UK
| |
Collapse
|
82
|
Rund SSC, Yoo B, Alam C, Green T, Stephens MT, Zeng E, George GF, Sheppard AD, Duffield GE, Milenković T, Pfrender ME. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: network analysis reveals rhythmic gene expression and enhances functional gene annotation. BMC Genomics 2016; 17:653. [PMID: 27538446 PMCID: PMC4991082 DOI: 10.1186/s12864-016-2998-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 08/05/2016] [Indexed: 11/16/2022] Open
Abstract
Background Marine and freshwater zooplankton exhibit daily rhythmic patterns of behavior and physiology which may be regulated directly by the light:dark (LD) cycle and/or a molecular circadian clock. One of the best-studied zooplankton taxa, the freshwater crustacean Daphnia, has a 24 h diel vertical migration (DVM) behavior whereby the organism travels up and down through the water column daily. DVM plays a critical role in resource tracking and the behavioral avoidance of predators and damaging ultraviolet radiation. However, there is little information at the transcriptional level linking the expression patterns of genes to the rhythmic physiology/behavior of Daphnia. Results Here we analyzed genome-wide temporal transcriptional patterns from Daphnia pulex collected over a 44 h time period under a 12:12 LD cycle (diel) conditions using a cosine-fitting algorithm. We used a comprehensive network modeling and analysis approach to identify novel co-regulated rhythmic genes that have similar network topological properties and functional annotations as rhythmic genes identified by the cosine-fitting analyses. Furthermore, we used the network approach to predict with high accuracy novel gene-function associations, thus enhancing current functional annotations available for genes in this ecologically relevant model species. Our results reveal that genes in many functional groupings exhibit 24 h rhythms in their expression patterns under diel conditions. We highlight the rhythmic expression of immunity, oxidative detoxification, and sensory process genes. We discuss differences in the chronobiology of D. pulex from other well-characterized terrestrial arthropods. Conclusions This research adds to a growing body of literature suggesting the genetic mechanisms governing rhythmicity in crustaceans may be divergent from other arthropod lineages including insects. Lastly, these results highlight the power of using a network analysis approach to identify differential gene expression and provide novel functional annotation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2998-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Samuel S C Rund
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA.,Centre for Immunity, Infection and Evolution, Institute of Evolution, University of Edinburgh, Edinburgh, EH9 3FL, UK.,Institute of Immunology and Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - Boyoung Yoo
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.,Present Address: Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Camille Alam
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Taryn Green
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Melissa T Stephens
- Notre Dame Genomics and Bioinformatics Core Facility, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Erliang Zeng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.,Notre Dame Genomics and Bioinformatics Core Facility, University of Notre Dame, Notre Dame, IN, 46556, USA.,Present Address: Department of Biology, University of South Dakota, Vermillion, SD, 57069, USA.,Present Address: Department of Computer Science, University of South Dakota, Vermillion, SD, 57069, USA
| | - Gary F George
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Aaron D Sheppard
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Giles E Duffield
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Tijana Milenković
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA.,Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.,Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Michael E Pfrender
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA. .,Notre Dame Environmental Change Initiative, University of Notre Dame, Notre Dame, IN, 46556, USA.
| |
Collapse
|
83
|
Rahmani H, Blockeel H, Bender A. Using a Human Drug Network for generating novel hypotheses about drugs. INTELL DATA ANAL 2016. [DOI: 10.3233/ida-150800] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Hossein Rahmani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
- Department of Knowledge Engineering, Universiteit Maastricht, Maastricht, The Netherlands
| | - Hendrik Blockeel
- Department of Computer Science, KU Leuven, Leuven, Belgium
- Leiden Institute of Advanced Computer Science, Leiden University, CA Leiden, The Netherlands
| | - Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
84
|
Computational Methods for Integration of Biological Data. Per Med 2016. [DOI: 10.1007/978-3-319-39349-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
85
|
Roth W, Hecker D, Fava E. Systems Biology Approaches to the Study of Biological Networks Underlying Alzheimer's Disease: Role of miRNAs. Methods Mol Biol 2016; 1303:349-377. [PMID: 26235078 DOI: 10.1007/978-1-4939-2627-5_21] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
MicroRNAs (miRNAs) are emerging as significant regulators of mRNA complexity in the human central nervous system (CNS) thereby controlling distinct gene expression profiles in a spatio-temporal manner during development, neuronal plasticity, aging and (age-related) neurodegeneration, including Alzheimer's disease (AD). Increasing effort is expended towards dissecting and deciphering the molecular and genetic mechanisms of neurobiological and pathological functions of these brain-enriched miRNAs. Along these lines, recent data pinpoint distinct miRNAs and miRNA networks being linked to APP splicing, processing and Aβ pathology (Lukiw et al., Front Genet 3:327, 2013), and furthermore, to the regulation of tau and its cellular subnetworks (Lau et al., EMBO Mol Med 5:1613, 2013), altogether underlying the onset and propagation of Alzheimer's disease. MicroRNA profiling studies in Alzheimer's disease suffer from poor consensus which is an acknowledged concern in the field, and constitutes one of the current technical challenges. Hence, a strong demand for experimental and computational systems biology approaches arises, to incorporate and integrate distinct levels of information and scientific knowledge into a complex system of miRNA networks in the context of the transcriptome, proteome and metabolome in a given cellular environment. Here, we will discuss the state-of-the-art technologies and computational approaches on hand that may lead to a deeper understanding of the complex biological networks underlying the pathogenesis of Alzheimer's disease.
Collapse
Affiliation(s)
- Wera Roth
- German Center for Neurodegenerative Diseases (DZNE), Ludwig-Erhard-Allee 2, 53175, Bonn, Germany
| | | | | |
Collapse
|
86
|
Rahman M, Hasan MA. Link Prediction in Dynamic Networks Using Graphlet. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2016. [DOI: 10.1007/978-3-319-46128-1_25] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
87
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
88
|
Systems biology approach reveals possible evolutionarily conserved moonlighting functions for enolase. Comput Biol Chem 2015; 58:1-8. [DOI: 10.1016/j.compbiolchem.2015.04.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 04/13/2015] [Accepted: 04/19/2015] [Indexed: 01/07/2023]
|
89
|
Hulovatyy Y, Chen H, Milenković T. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 2015; 31:i171-80. [PMID: 26072480 PMCID: PMC4765862 DOI: 10.1093/bioinformatics/btv227] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION With increasing availability of temporal real-world networks, how to efficiently study these data? One can model a temporal network as a single aggregate static network, or as a series of time-specific snapshots, each being an aggregate static network over the corresponding time window. Then, one can use established methods for static analysis on the resulting aggregate network(s), but losing in the process valuable temporal information either completely, or at the interface between different snapshots, respectively. Here, we develop a novel approach for studying a temporal network more explicitly, by capturing inter-snapshot relationships. RESULTS We base our methodology on well-established graphlets (subgraphs), which have been proven in numerous contexts in static network research. We develop new theory to allow for graphlet-based analyses of temporal networks. Our new notion of dynamic graphlets is different from existing dynamic network approaches that are based on temporal motifs (statistically significant subgraphs). The latter have limitations: their results depend on the choice of a null network model that is required to evaluate the significance of a subgraph, and choosing a good null model is non-trivial. Our dynamic graphlets overcome the limitations of the temporal motifs. Also, when we aim to characterize the structure and function of an entire temporal network or of individual nodes, our dynamic graphlets outperform the static graphlets. Clearly, accounting for temporal information helps. We apply dynamic graphlets to temporal age-specific molecular network data to deepen our limited knowledge about human aging. AVAILABILITY AND IMPLEMENTATION http://www.nd.edu/∼cone/DG.
Collapse
Affiliation(s)
- Y Hulovatyy
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - H Chen
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - T Milenković
- Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and Applications, and ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
90
|
Crawford J, Sun Y, Milenković T. Fair evaluation of global network aligners. Algorithms Mol Biol 2015; 10:19. [PMID: 26060505 PMCID: PMC4460690 DOI: 10.1186/s13015-015-0050-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 05/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods' NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS Our findings are as follows. MI-GRAAL's NCF is superior to GHOST's NCF, while the performance of the methods' ASs is data-dependent. Thus, for data on which GHOST's AS is superior to MI-GRAAL's AS, the combination of MI-GRAAL's NCF and GHOST's AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches.
Collapse
|
91
|
Birlutiu A, d'Alché-Buc F, Heskes T. A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:538-550. [PMID: 26357265 DOI: 10.1109/tcbb.2014.2359441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.
Collapse
|
92
|
Memišević V, Zavaljevski N, Rajagopala SV, Kwon K, Pieper R, DeShazer D, Reifman J, Wallqvist A. Mining host-pathogen protein interactions to characterize Burkholderia mallei infectivity mechanisms. PLoS Comput Biol 2015; 11:e1004088. [PMID: 25738731 PMCID: PMC4349708 DOI: 10.1371/journal.pcbi.1004088] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/15/2014] [Indexed: 01/01/2023] Open
Abstract
Burkholderia pathogenicity relies on protein virulence factors to control and promote bacterial internalization, survival, and replication within eukaryotic host cells. We recently used yeast two-hybrid (Y2H) screening to identify a small set of novel Burkholderia proteins that were shown to attenuate disease progression in an aerosol infection animal model using the virulent Burkholderia mallei ATCC 23344 strain. Here, we performed an extended analysis of primarily nine B. mallei virulence factors and their interactions with human proteins to map out how the bacteria can influence and alter host processes and pathways. Specifically, we employed topological analyses to assess the connectivity patterns of targeted host proteins, identify modules of pathogen-interacting host proteins linked to processes promoting infectivity, and evaluate the effect of crosstalk among the identified host protein modules. Overall, our analysis showed that the targeted host proteins generally had a large number of interacting partners and interacted with other host proteins that were also targeted by B. mallei proteins. We also introduced a novel Host-Pathogen Interaction Alignment (HPIA) algorithm and used it to explore similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica. We inferred putative roles of B. mallei proteins based on the roles of their aligned Y. pestis and S. enterica partners and showed that up to 73% of the predicted roles matched existing annotations. A key insight into Burkholderia pathogenicity derived from these analyses of Y2H host-pathogen interactions is the identification of eukaryotic-specific targeted cellular mechanisms, including the ubiquitination degradation system and the use of the focal adhesion pathway as a fulcrum for transmitting mechanical forces and regulatory signals. This provides the mechanisms to modulate and adapt the host-cell environment for the successful establishment of host infections and intracellular spread. Burkholderia species need to manipulate many host processes and pathways in order to establish a successful intracellular infection in eukaryotic host organisms. Burkholderia mallei uses secreted virulence factor proteins as a means to execute host-pathogen interactions and promote pathogenesis. While validated virulence factor proteins have been shown to attenuate infection in animal models, their actual roles in modifying and influencing host processes are not well understood. Here, we used host-pathogen protein-protein interactions derived from yeast two-hybrid screens to study nine known B. mallei virulence factors and map out potential virulence mechanisms. From the data, we derived both general and specific insights into Burkholderia host-pathogen infectivity pathways. We showed that B. mallei virulence factors tended to target multifunctional host proteins, proteins that interacted with each other, and host proteins with a large number of interacting partners. We also identified similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica using a novel host-pathogen interactions alignment algorithm. Importantly, our data are compatible with a framework in which multiple B. mallei virulence factors broadly influence key host processes related to ubiquitin-mediated proteolysis and focal adhesion. This provides B. mallei the means to modulate and adapt the host-cell environment to advance infection.
Collapse
Affiliation(s)
- Vesna Memišević
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| | - Nela Zavaljevski
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| | | | - Keehwan Kwon
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Rembert Pieper
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - David DeShazer
- Bacteriology Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, Maryland, United States of America
| | - Jaques Reifman
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
- * E-mail:
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| |
Collapse
|
93
|
Malod-Dognin N, Pržulj N. L-GRAAL: Lagrangian graphlet-based network aligner. ACTA ACUST UNITED AC 2015; 31:2182-9. [PMID: 25725498 PMCID: PMC4481854 DOI: 10.1093/bioinformatics/btv130] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 02/25/2015] [Indexed: 12/31/2022]
Abstract
Motivation: Discovering and understanding patterns in networks of protein–protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. Results: We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. Availability and implementation: L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. Contact:n.malod-dognin@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London, UK
| |
Collapse
|
94
|
Abstract
Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact:natasha@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Vuk Janjić
- Department of Computing, Imperial College London SW7 2AZ, UK
| | - Nataša Pržulj
- Department of Computing, Imperial College London SW7 2AZ, UK
| |
Collapse
|
95
|
Sun Y, Crawford J, Tang J, Milenković T. Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-662-48221-6_2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
96
|
Singh O, Sawariya K, Aparoy P. Graphlet signature-based scoring method to estimate protein-ligand binding affinity. ROYAL SOCIETY OPEN SCIENCE 2014; 1:140306. [PMID: 26064572 PMCID: PMC4448774 DOI: 10.1098/rsos.140306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 10/31/2014] [Indexed: 06/04/2023]
Abstract
Over the years, various computational methodologies have been developed to understand and quantify receptor-ligand interactions. Protein-ligand interactions can also be explained in the form of a network and its properties. The ligand binding at the protein-active site is stabilized by formation of new interactions like hydrogen bond, hydrophobic and ionic. These non-covalent interactions when considered as links cause non-isomorphic sub-graphs in the residue interaction network. This study aims to investigate the relationship between these induced sub-graphs and ligand activity. Graphlet signature-based analysis of networks has been applied in various biological problems; the focus of this work is to analyse protein-ligand interactions in terms of neighbourhood connectivity and to develop a method in which the information from residue interaction networks, i.e. graphlet signatures, can be applied to quantify ligand affinity. A scoring method was developed, which depicts the variability in signatures adopted by different amino acids during inhibitor binding, and was termed as GSUS (graphlet signature uniqueness score). The score is specific for every individual inhibitor. Two well-known drug targets, COX-2 and CA-II and their inhibitors, were considered to assess the method. Residue interaction networks of COX-2 and CA-II with their respective inhibitors were used. Only hydrogen bond network was considered to calculate GSUS and quantify protein-ligand interaction in terms of graphlet signatures. The correlation of the GSUS with pIC50 was consistent in both proteins and better in comparison to the Autodock results. The GSUS scoring method was better in activity prediction of molecules with similar structure and diverse activity and vice versa. This study can be a major platform in developing approaches that can be used alone or together with existing methods to predict ligand affinity from protein-ligand complexes.
Collapse
|
97
|
Li BYS, Zhan C, Yeung LF, Ko KT, Yang G. A low dimensional approach on network characterization. PLoS One 2014; 9:e109383. [PMID: 25329146 PMCID: PMC4199607 DOI: 10.1371/journal.pone.0109383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 09/02/2014] [Indexed: 12/03/2022] Open
Abstract
In many applications, one may need to characterize a given network among a large set of base networks, and these networks are large in size and diverse in structure over the search space. In addition, the characterization algorithms are required to have low volatility and with a small circle of uncertainty. For large datasets, these algorithms are computationally intensive and inefficient. However, under the context of network mining, a major concern of some applications is speed. Hence, we are motivated to develop a fast characterization algorithm, which can be used to quickly construct a graph space for analysis purpose. Our approach is to transform a network characterization measure, commonly formulated based on similarity matrices, into simple vector form signatures. We shall show that the similarity matrix can be represented by a dyadic product of two N-dimensional signature vectors; thus the network alignment process, which is usually solved as an assignment problem, can be reduced into a simple alignment problem based on separate signature vectors.
Collapse
Affiliation(s)
- Benjamin Y. S. Li
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, Hong Kong
- * E-mail:
| | - Choujun Zhan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
| | - Lam F. Yeung
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, Hong Kong
| | - King T. Ko
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, Hong Kong
| | - Genke Yang
- Department of Automation, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
98
|
|
99
|
Hulovatyy Y, Solava RW, Milenković T. Revealing missing parts of the interactome via link prediction. PLoS One 2014; 9:e90073. [PMID: 24594900 PMCID: PMC3940777 DOI: 10.1371/journal.pone.0090073] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 01/29/2014] [Indexed: 12/20/2022] Open
Abstract
Protein interaction networks (PINs) are often used to "learn" new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we introduce novel sensitive LP measures that are expected to overcome the limitations of the existing methods. We systematically evaluate the new and existing LP measures by introducing "synthetic" noise into PINs and measuring how accurate the measures are in reconstructing the original PINs. Also, we use the LP measures to de-noise the original PINs, and we measure biological correctness of the de-noised PINs with respect to functional enrichment of the predicted interactions. Our main findings are: 1) LP measures that favor nodes which are both "topologically similar" and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) LP improves biological correctness of the PINs, plus we validate a significant portion of the predicted interactions in independent, external PIN data sources. Ultimately, we are less focused on identifying a superior method but more on showing that LP improves biological correctness of PINs, which is its ultimate goal in computational biology. But we note that our new methods outperform each of the existing ones with respect to at least one evaluation criterion. Alarmingly, we find that the different criteria often disagree in identifying the best method(s), which has important implications for LP communities in any domain, including social networks.
Collapse
Affiliation(s)
- Yuriy Hulovatyy
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Ryan W. Solava
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
- * E-mail:
| |
Collapse
|
100
|
Wang XD, Huang JL, Yang L, Wei DQ, Qi YX, Jiang ZL. Identification of human disease genes from interactome network using graphlet interaction. PLoS One 2014; 9:e86142. [PMID: 24465923 PMCID: PMC3899204 DOI: 10.1371/journal.pone.0086142] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 12/05/2013] [Indexed: 11/18/2022] Open
Abstract
Identifying genes related to human diseases, such as cancer and cardiovascular disease, etc., is an important task in biomedical research because of its applications in disease diagnosis and treatment. Interactome networks, especially protein-protein interaction networks, had been used to disease genes identification based on the hypothesis that strong candidate genes tend to closely relate to each other in some kinds of measure on the network. We proposed a new measure to analyze the relationship between network nodes which was called graphlet interaction. The graphlet interaction contained 28 different isomers. The results showed that the numbers of the graphlet interaction isomers between disease genes in interactome networks were significantly larger than random picked genes, while graphlet signatures were not. Then, we designed a new type of score, based on the network properties, to identify disease genes using graphlet interaction. The genes with higher scores were more likely to be disease genes, and all candidate genes were ranked according to their scores. Then the approach was evaluated by leave-one-out cross-validation. The precision of the current approach achieved 90% at about 10% recall, which was apparently higher than the previous three predominant algorithms, random walk, Endeavour and neighborhood based method. Finally, the approach was applied to predict new disease genes related to 4 common diseases, most of which were identified by other independent experimental researches. In conclusion, we demonstrate that the graphlet interaction is an effective tool to analyze the network properties of disease genes, and the scores calculated by graphlet interaction is more precise in identifying disease genes.
Collapse
Affiliation(s)
- Xiao-Dong Wang
- Institute of Mechanobiology and Medical Engineering, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Jia-Liang Huang
- Bioinformatics, Integrated Platform Science, GlaxoSmithKline Research and Development China, Shanghai, China
| | - Lun Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Ying-Xin Qi
- Institute of Mechanobiology and Medical Engineering, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail:
| | - Zong-Lai Jiang
- Institute of Mechanobiology and Medical Engineering, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|