1
|
|
2
|
Wang B, Huang L, Zhu Y, Kundaje A, Batzoglou S, Goldenberg A. Vicus: Exploiting local structures to improve network-based analysis of biological data. PLoS Comput Biol 2017; 13:e1005621. [PMID: 29023470 PMCID: PMC5638230 DOI: 10.1371/journal.pcbi.1005621] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 06/09/2017] [Indexed: 01/09/2023] Open
Abstract
Biological networks entail important topological features and patterns critical to understanding interactions within complicated biological systems. Despite a great progress in understanding their structure, much more can be done to improve our inference and network analysis. Spectral methods play a key role in many network-based applications. Fundamental to spectral methods is the Laplacian, a matrix that captures the global structure of the network. Unfortunately, the Laplacian does not take into account intricacies of the network’s local structure and is sensitive to noise in the network. These two properties are fundamental to biological networks and cannot be ignored. We propose an alternative matrix Vicus. The Vicus matrix captures the local neighborhood structure of the network and thus is more effective at modeling biological interactions. We demonstrate the advantages of Vicus in the context of spectral methods by extensive empirical benchmarking on tasks such as single cell dimensionality reduction, protein module discovery and ranking genes for cancer subtyping. Our experiments show that using Vicus, spectral methods result in more accurate and robust performance in all of these tasks. Networks are a representation of choice for many problems in biology and medicine including protein interactions, metabolic pathways, evolutionary biology, cancer subtyping and disease modeling to name a few. The key to much of network analysis lies in the spectrum decomposition represented by eigenvectors of the network Laplacian. While possessing many desirable algebraic properties, Laplacian lacks the power to capture fine-grained structure of the underlying network. Our novel matrix, Vicus, introduced in this work, takes advantage of the local structure of the network while preserving algebraic properties of the Laplacian. We show that using Vicus in spectral methods leads to superior performance across fundamental biological tasks such as dimensionality reduction in single cell analysis, identifying genes for cancer subtyping and identifying protein modules in a PPI network. We postulate, that in tasks where it is important to take into account local network information, spectral-based methods should be using Vicus matrix in place of Laplacian.
Collapse
Affiliation(s)
- Bo Wang
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Lin Huang
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Yuke Zhu
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Genetics Department, Stanford University, Stanford, California, United States of America
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Anna Goldenberg
- SickKids Research Institute, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
3
|
Bootstrap quantification of estimation uncertainties in network degree distributions. Sci Rep 2017; 7:5807. [PMID: 28724937 PMCID: PMC5517433 DOI: 10.1038/s41598-017-05885-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 06/05/2017] [Indexed: 11/21/2022] Open
Abstract
We propose a new method of nonparametric bootstrap to quantify estimation uncertainties in functions of network degree distribution in large ultra sparse networks. Both network degree distribution and network order are assumed to be unknown. The key idea is based on adaptation of the “blocking” argument, developed for bootstrapping of time series and re-tiling of spatial data, to random networks. We first sample a set of multiple ego networks of varying orders that form a patch, or a network block analogue, and then resample the data within patches. To select an optimal patch size, we develop a new computationally efficient and data-driven cross-validation algorithm. The proposed fast patchwork bootstrap (FPB) methodology further extends the ideas for a case of network mean degree, to inference on a degree distribution. In addition, the FPB is substantially less computationally expensive, requires less information on a graph, and is free from nuisance parameters. In our simulation study, we show that the new bootstrap method outperforms competing approaches by providing sharper and better-calibrated confidence intervals for functions of a network degree distribution than other available approaches, including the cases of networks in an ultra sparse regime. We illustrate the FPB in application to collaboration networks in statistics and computer science and to Wikipedia networks.
Collapse
|
4
|
Fushing H, Hsueh CH, Heitkamp C, Matthews MA, Koehl P. Unravelling the geometry of data matrices: effects of water stress regimes on winemaking. J R Soc Interface 2016; 12:20150753. [PMID: 26468072 DOI: 10.1098/rsif.2015.0753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A new method is proposed for unravelling the patterns between a set of experiments and the features that characterize those experiments. The aims are to extract these patterns in the form of a coupling between the rows and columns of the corresponding data matrix and to use this geometry as a support for model testing. These aims are reached through two key steps, namely application of an iterative geometric approach to couple the metric spaces associated with the rows and columns, and use of statistical physics to generate matrices that mimic the original data while maintaining their inherent structure, thereby providing the basis for hypothesis testing and statistical inference. The power of this new method is illustrated on the study of the impact of water stress conditions on the attributes of 'Cabernet Sauvignon' Grapes, Juice, Wine and Bottled Wine from two vintages. The first step, named data mechanics, de-convolutes the intrinsic effects of grape berries and wine attributes due to the experimental irrigation conditions from the extrinsic effects of the environment. The second step provides an analysis of the associations of some attributes of the bottled wine with characteristics of either the matured grape berries or the resulting juice, thereby identifying statistically significant associations between the juice pH, yeast assimilable nitrogen, and sugar content and the bottled wine alcohol level.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Chih-Hsin Hsueh
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Constantin Heitkamp
- Department of Viticulture and Enology, University of California, Davis, CA 95616, USA
| | - Mark A Matthews
- Department of Viticulture and Enology, University of California, Davis, CA 95616, USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616, USA
| |
Collapse
|
5
|
Jiao QJ, Huang Y, Shen HB. A new multi-scale method to reveal hierarchical modular structures in biological networks. MOLECULAR BIOSYSTEMS 2016; 12:3724-3733. [DOI: 10.1039/c6mb00617e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Biological networks are effective tools for studying molecular interactions.
Collapse
Affiliation(s)
- Qing-Ju Jiao
- School of Computer and Information Engineering
- Anyang Normal University
- Anyang 455002
- China
- Institute of Image Processing and Pattern Recognition
| | - Yan Huang
- National Laboratory for Infrared Physics
- Shanghai Institute of Technical Physics
- Chinese Academy of Science
- Shanghai 200083
- China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition
- Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing
- Ministry of Education of China
- Shanghai 200240
- China
| |
Collapse
|
6
|
Thompson ME, Ramirez Ramirez LL, Lyubchich V, Gel YR. Using the bootstrap for statistical inference on random graphs. CAN J STAT 2015. [DOI: 10.1002/cjs.11271] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
| | - Lilia L. Ramirez Ramirez
- Instituto Tecnologico Autonomo de Mexico (ITAM) and Centro de Investigacion en Matematicas (CIMAT) Mexico
| | | | | |
Collapse
|
7
|
De Vico Fallani F, Richiardi J, Chavez M, Achard S. Graph analysis of functional brain networks: practical issues in translational neuroscience. Philos Trans R Soc Lond B Biol Sci 2015; 369:rstb.2013.0521. [PMID: 25180301 DOI: 10.1098/rstb.2013.0521] [Citation(s) in RCA: 214] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The brain can be regarded as a network: a connected system where nodes, or units, represent different specialized regions and links, or connections, represent communication pathways. From a functional perspective, communication is coded by temporal dependence between the activities of different brain areas. In the last decade, the abstract representation of the brain as a graph has allowed to visualize functional brain networks and describe their non-trivial topological properties in a compact and objective way. Nowadays, the use of graph analysis in translational neuroscience has become essential to quantify brain dysfunctions in terms of aberrant reconfiguration of functional brain networks. Despite its evident impact, graph analysis of functional brain networks is not a simple toolbox that can be blindly applied to brain signals. On the one hand, it requires the know-how of all the methodological steps of the pipeline that manipulate the input brain signals and extract the functional network properties. On the other hand, knowledge of the neural phenomenon under study is required to perform physiologically relevant analysis. The aim of this review is to provide practical indications to make sense of brain network analysis and contrast counterproductive attitudes.
Collapse
Affiliation(s)
- Fabrizio De Vico Fallani
- INRIA Paris-Rocquencourt, ARAMIS team, Paris, France CNRS, UMR-7225, Paris, France INSERM, U1227, Paris, France Institut du Cerveau et de la Moelle épinière, Paris, France Univ. Sorbonne UPMC, UMR S1127, Paris, France
| | - Jonas Richiardi
- Functional Imaging in Neuropsychiatric Disorders Laboratory, Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA Laboratory for Neuroimaging and Cognition, Department of Neurology and Department of Neurosciences, University of Geneva, Geneva, Switzerland
| | | | - Sophie Achard
- Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France CNRS, GIPSA-Lab, F-38000 Grenoble, France
| |
Collapse
|
8
|
Fushing H, Chen C, Liu SY, Koehl P. Bootstrapping on undirected binary networks via statistical mechanics. JOURNAL OF STATISTICAL PHYSICS 2014; 156:823-842. [PMID: 25071295 PMCID: PMC4111278 DOI: 10.1007/s10955-014-1043-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose a new method inspired from statistical mechanics for extracting geometric information from undirected binary networks and generating random networks that conform to this geometry. In this method an undirected binary network is perceived as a thermodynamic system with a collection of permuted adjacency matrices as its states. The task of extracting information from the network is then reformulated as a discrete combinatorial optimization problem of searching for its ground state. To solve this problem, we apply multiple ensembles of temperature regulated Markov chains to establish an ultrametric geometry on the network. This geometry is equipped with a tree hierarchy that captures the multiscale community structure of the network. We translate this geometry into a Parisi adjacency matrix, which has a relative low energy level and is in the vicinity of the ground state. The Parisi adjacency matrix is then further optimized by making block permutations subject to the ultrametric geometry. The optimal matrix corresponds to the macrostate of the original network. An ensemble of random networks is then generated such that each of these networks conforms to this macrostate; the corresponding algorithm also provides an estimate of the size of this ensemble. By repeating this procedure at different scales of the ultrametric geometry of the network, it is possible to compute its evolution entropy, i.e. to estimate the evolution of its complexity as we move from a coarse to a ne description of its geometric structure. We demonstrate the performance of this method on simulated as well as real data networks.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Chen Chen
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Shan-Yu Liu
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| |
Collapse
|