1
|
Zhang J, Ren H, Jiang Z, Chen Z, Yang Z, Matsubara Y, Sakurai Y. Strategic Multi-Omics Data Integration via Multi-Level Feature Contrasting and Matching. IEEE Trans Nanobioscience 2024; 23:579-590. [PMID: 39255078 DOI: 10.1109/tnb.2024.3456797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
The analysis and comprehension of multi-omics data has emerged as a prominent topic in the field of bioinformatics and data science. However, the sparsity characteristics and high dimensionality of omics data pose difficulties in terms of extracting meaningful information. Moreover, the heterogeneity inherent in multiple omics sources makes the effective integration of multi-omics data challenging To tackle these challenges, we propose MFCC-SAtt, a multi-level feature contrast clustering model based on self-attention to extract informative features from multi-omics data. MFCC-SAtt treats each omics type as a distinct modality and employs autoencoders with self-attention for each modality to integrate and compress their respective features into a shared feature space. By utilizing a multi-level feature extraction framework along with incorporating a semantic information extractor, we mitigate optimization conflicts arising from different learning objectives. Additionally, MFCC-SAtt guides deep clustering based on multi-level features which further enhances the quality of output labels. By conducting extensive experiments on multi-omics data, we have validated the exceptional performance of MFCC-SAtt. For instance, in a pan-cancer clustering task, MFCC-SAtt achieved an accuracy of over 80.38%.
Collapse
|
2
|
Liu Y, Zhang Y, Chang X, Liu X. MDIC3: Matrix decomposition to infer cell-cell communication. PATTERNS (NEW YORK, N.Y.) 2024; 5:100911. [PMID: 38370122 PMCID: PMC10873161 DOI: 10.1016/j.patter.2023.100911] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 05/31/2023] [Accepted: 12/08/2023] [Indexed: 02/20/2024]
Abstract
Crosstalk among cells is vital for maintaining the biological function and intactness of systems. Most existing methods for investigating cell-cell communications are based on ligand-receptor (L-R) expression, and they focus on the study between two cells. Thus, the final communication inference results are particularly sensitive to the completeness and accuracy of the prior biological knowledge. Because existing L-R research focuses mainly on humans, most existing methods can only examine cell-cell communication for humans. As far as we know, there is currently no effective method to overcome this species limitation. Here, we propose MDIC3 (matrix decomposition to infer cell-cell communication), an unsupervised tool to investigate cell-cell communication in any species, and the results are not limited by specific L-R pairs or signaling pathways. By comparing it with existing methods for the inference of cell-cell communication, MDIC3 obtained better performance in both humans and mice.
Collapse
Affiliation(s)
- Yi Liu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- School of Mathematics and Statistics, Shandong University, Weihai 364209, China
| | - Yuelei Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 364209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu 233030, China
| | - Xiaoping Liu
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
3
|
Inference of Networks from Large Datasets. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11345-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
4
|
Hu J, Qin H, Fan X. Can ODE gene regulatory models neglect time lag or measurement scaling? Bioinformatics 2020; 36:4058-4064. [PMID: 32324854 DOI: 10.1093/bioinformatics/btaa268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 04/14/2020] [Accepted: 04/16/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Many ordinary differential equation (ODE) models have been introduced to replace linear regression models for inferring gene regulatory relationships from time-course gene expression data. But, since the observed data are usually not direct measurements of the gene products or there is an unknown time lag in gene regulation, it is problematic to directly apply traditional ODE models or linear regression models. RESULTS We introduce a lagged ODE model to infer lagged gene regulatory relationships from time-course measurements, which are modeled as linear transformation of the gene products. A time-course microarray dataset from a yeast cell-cycle study is used for simulation assessment of the methods and real data analysis. The results show that our method, by considering both time lag and measurement scaling, performs much better than other linear and ODE models. It indicates the necessity of explicitly modeling the time lag and measurement scaling in ODE gene regulatory models. AVAILABILITY AND IMPLEMENTATION R code is available at https://www.sta.cuhk.edu.hk/xfan/share/lagODE.zip.
Collapse
Affiliation(s)
- Jie Hu
- Department of Probability and Statistics, School of Mathematical Science, Xiamen University, Xiamen, Fujian, China
| | - Huihui Qin
- Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
6
|
Carey M, Ramírez JC, Wu S, Wu H. A big data pipeline: Identifying dynamic gene regulatory networks from time-course Gene Expression Omnibus data with applications to influenza infection. Stat Methods Med Res 2019; 27:1930-1955. [PMID: 29846143 DOI: 10.1177/0962280217746719] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
A biological host response to an external stimulus or intervention such as a disease or infection is a dynamic process, which is regulated by an intricate network of many genes and their products. Understanding the dynamics of this gene regulatory network allows us to infer the mechanisms involved in a host response to an external stimulus, and hence aids the discovery of biomarkers of phenotype and biological function. In this article, we propose a modeling/analysis pipeline for dynamic gene expression data, called Pipeline4DGEData, which consists of a series of statistical modeling techniques to construct dynamic gene regulatory networks from the large volumes of high-dimensional time-course gene expression data that are freely available in the Gene Expression Omnibus repository. This pipeline has a consistent and scalable structure that allows it to simultaneously analyze a large number of time-course gene expression data sets, and then integrate the results across different studies. We apply the proposed pipeline to influenza infection data from nine studies and demonstrate that interesting biological findings can be discovered with its implementation.
Collapse
Affiliation(s)
- Michelle Carey
- 1 School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
| | - Juan Camilo Ramírez
- 2 Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Hulin Wu
- 2 Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
7
|
Liang Y, Kelemen A. Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications. Brief Bioinform 2019; 19:1051-1068. [PMID: 28430854 DOI: 10.1093/bib/bbx036] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 12/23/2022] Open
Abstract
Inferring networks and dynamics of genes, proteins, cells and other biological entities from high-throughput biological omics data is a central and challenging issue in computational and systems biology. This is essential for understanding the complexity of human health, disease susceptibility and pathogenesis for Predictive, Preventive, Personalized and Participatory (P4) system and precision medicine. The delineation of the possible interactions of all genes/proteins in a genome/proteome is a task for which conventional experimental techniques are ill suited. Urgently needed are rapid and inexpensive computational and statistical methods that can identify interacting candidate disease genes or drug targets out of thousands that can be further investigated or validated by experimentations. Moreover, identifying biological dynamic systems, and simultaneously estimating the important kinetic structural and functional parameters, which may not be experimentally accessible could be important directions for drug-disease-gene network studies. In this article, we present an overview and comparison of recent developments of dynamic modeling and network approaches for time-course omics data, and their applications to various biological systems, health conditions and disease statuses. Moreover, various data reduction and analytical schemes ranging from mathematical to computational to statistical methods are compared including their merits, drawbacks and limitations. The most recent software, associated web resources and other potentials for the compared methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| | - Arpad Kelemen
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| |
Collapse
|
8
|
Wu L, Qiu X, Yuan YX, Wu H. Parameter Estimation and Variable Selection for Big Systems of Linear Ordinary Differential Equations: A Matrix-Based Approach. J Am Stat Assoc 2019; 114:657-667. [PMID: 34385718 DOI: 10.1080/01621459.2017.1423074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Ordinary differential equations (ODEs) are widely used to model the dynamic behavior of a complex system. Parameter estimation and variable selection for a "Big System" with linear ODEs are very challenging due to the need of nonlinear optimization in an ultra-high dimensional parameter space. In this article, we develop a parameter estimation and variable selection method based on the ideas of similarity transformation and separable least squares (SLS). Simulation studies demonstrate that the proposed matrix-based SLS method could be used to estimate the coefficient matrix more accurately and perform variable selection for a linear ODE system with thousands of dimensions and millions of parameters much better than the direct least squares (LS) method and the vector-based two-stage method that are currently available. We applied this new method to two real data sets: a yeast cell cycle gene expression data set with 30 dimensions and 930 unknown parameters and the Standard & Poor 1500 index stock price data with 1250 dimensions and 1,563,750 unknown parameters, to illustrate the utility and numerical performance of the proposed parameter estimation and variable selection method for big systems in practice.
Collapse
Affiliation(s)
- Leqin Wu
- Department of Mathematics, Jinan University, Guangzhou, China
| | - Xing Qiu
- Department of Biostatistics and Computational Biology University of Rochester, Rochester, New York, U.S.A
| | - Ya-Xiang Yuan
- Academy of Mathematics and System Sciences Chinese Academy of Sciences, Beijing, China
| | - Hulin Wu
- Department of Biostatistics, University of Texas Health Science Center at Houston, Houston, TX, U.S.A
| |
Collapse
|
9
|
Xue H, Wu S, Wu Y, Idarraga JCR, Wu H. Independence screening for high dimensional nonlinear additive ODE models with applications to dynamic gene regulatory networks. Stat Med 2018; 37:2630-2644. [PMID: 29722041 PMCID: PMC6940146 DOI: 10.1002/sim.7669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 01/18/2018] [Accepted: 03/08/2018] [Indexed: 11/12/2022]
Abstract
Mechanism-driven low-dimensional ordinary differential equation (ODE) models are often used to model viral dynamics at cellular levels and epidemics of infectious diseases. However, low-dimensional mechanism-based ODE models are limited for modeling infectious diseases at molecular levels such as transcriptomic or proteomic levels, which is critical to understand pathogenesis of diseases. Although linear ODE models have been proposed for gene regulatory networks (GRNs), nonlinear regulations are common in GRNs. The reconstruction of large-scale nonlinear networks from time-course gene expression data remains an unresolved issue. Here, we use high-dimensional nonlinear additive ODEs to model GRNs and propose a 4-step procedure to efficiently perform variable selection for nonlinear ODEs. To tackle the challenge of high dimensionality, we couple the 2-stage smoothing-based estimation method for ODEs and a nonlinear independence screening method to perform variable selection for the nonlinear ODE models. We have shown that our method possesses the sure screening property and it can handle problems with non-polynomial dimensionality. Numerical performance of the proposed method is illustrated with simulated data and a real data example for identifying the dynamic GRN of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Hongqi Xue
- iCardiac Technologies, 150 Allens Creek Road, Rochester, NY 14618, USA
| | - Shuang Wu
- Biogen, 300 Binney Street, Cambridge, MA 02142, USA
| | - Yichao Wu
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL 60607-7045, USA
| | | | - Hulin Wu
- Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston, 1200 Pressler Street, RAS E833, Houston, TX 77030, USA
| |
Collapse
|
10
|
Anand R, Sarmah DT, Chatterjee S. Extracting proteins involved in disease progression using temporally connected networks. BMC SYSTEMS BIOLOGY 2018; 12:78. [PMID: 30045727 PMCID: PMC6060549 DOI: 10.1186/s12918-018-0600-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 07/09/2018] [Indexed: 12/13/2022]
Abstract
BACKGROUND Metabolic disorders such as obesity and diabetes are diseases which develop gradually over time in an individual and through the perturbations of genes. Systematic experiments tracking disease progression at gene level are usually conducted giving a temporal microarray data. There is a need for developing methods to analyze such complex data and extract important proteins which could be involved in temporal progression of the data and hence progression of the disease. RESULTS In the present study, we have considered a temporal microarray data from an experiment conducted to study development of obesity and diabetes in mice. We have used this data along with an available Protein-Protein Interaction network to find a network of interactions between proteins which reproduces the next time point data from previous time point data. We show that the resulting network can be mined to identify critical nodes involved in the temporal progression of perturbations. We further show that published algorithms can be applied on such connected network to mine important proteins and show an overlap between outputs from published and our algorithms. The importance of set of proteins identified was supported by literature as well as was further validated by comparing them with the positive genes dataset from OMIM database which shows significant overlap. CONCLUSIONS The critical proteins identified from algorithms can be hypothesized to play important role in temporal progression of the data.
Collapse
Affiliation(s)
- Rajat Anand
- Drug Discovery Research Centre, Translational Health Science and Technology Institute, NCR Biotech science cluster, 3rd milestone, Faridabad-Gurgaon Expressway, Faridabad, 121001, India
| | - Dipanka Tanu Sarmah
- Drug Discovery Research Centre, Translational Health Science and Technology Institute, NCR Biotech science cluster, 3rd milestone, Faridabad-Gurgaon Expressway, Faridabad, 121001, India
| | - Samrat Chatterjee
- Drug Discovery Research Centre, Translational Health Science and Technology Institute, NCR Biotech science cluster, 3rd milestone, Faridabad-Gurgaon Expressway, Faridabad, 121001, India.
| |
Collapse
|
11
|
Avelino PP, Bazeia D, Losano L, Menezes J, de Oliveira BF, Santos MA. How directional mobility affects coexistence in rock-paper-scissors models. Phys Rev E 2018; 97:032415. [PMID: 29776155 DOI: 10.1103/physreve.97.032415] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Indexed: 11/07/2022]
Abstract
This work deals with a system of three distinct species that changes in time under the presence of mobility, selection, and reproduction, as in the popular rock-paper-scissors game. The novelty of the current study is the modification of the mobility rule to the case of directional mobility, in which the species move following the direction associated to a larger (averaged) number density of selection targets in the surrounding neighborhood. Directional mobility can be used to simulate eyes that see or a nose that smells, and we show how it may contribute to reduce the probability of coexistence.
Collapse
Affiliation(s)
- P P Avelino
- Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, PT4150-762 Porto, Portugal.,Departamento de Física e Astronomia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre 687, PT4169-007 Porto, Portugal
| | - D Bazeia
- Departamento de Física, Universidade Federal da Paraíba 58051-900 João Pessoa, PB, Brazil
| | - L Losano
- Departamento de Física, Universidade Federal da Paraíba 58051-900 João Pessoa, PB, Brazil
| | - J Menezes
- Instituto de Astrofísica e Ciências do Espaço, Universidade do Porto, CAUP, Rua das Estrelas, PT4150-762 Porto, Portugal.,Escola de Ciências e Tecnologia, Universidade Federal do Rio Grande do Norte Caixa Postal 1524, 59072-970, Natal, RN, Brazil.,Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - B F de Oliveira
- Departamento de Física, Universidade Estadual de Maringá, Av. Colombo 5790, 87020-900 Maringá, PR, Brazil
| | - M A Santos
- Departamento de Física, Universidade Estadual de Maringá, Av. Colombo 5790, 87020-900 Maringá, PR, Brazil
| |
Collapse
|
12
|
Abstract
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Collapse
Affiliation(s)
- Doulaye Dembélé
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), CNRS UMR 7104, INSERM U 1258, Université de Strasbourg, Illkirch-Graffenstaden, France
| |
Collapse
|
13
|
Choi JY, Hwang H, Timmerman ME. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments. PSYCHOMETRIKA 2018; 83:1-20. [PMID: 28197969 DOI: 10.1007/s11336-017-9558-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 11/08/2016] [Indexed: 06/06/2023]
Abstract
Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.
Collapse
Affiliation(s)
- Ji Yeh Choi
- Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1 , Canada.
| | - Heungsun Hwang
- Department of Psychology, McGill University, 1205 Dr. Penfield Avenue, Montreal, QC, H3A 1B1 , Canada
| | | |
Collapse
|
14
|
Abstract
Although networks are extensively used to visualize information flow in biological, social and technological systems, translating topology into dynamic flow continues to challenge us, as similar networks exhibit fundamentally different flow patterns, driven by different interaction mechanisms. To uncover a network’s actual flow patterns, here we use a perturbative formalism, analytically tracking the contribution of all nodes/paths to the flow of information, exposing the rules that link structure and dynamic information flow for a broad range of nonlinear systems. We find that the diversity of flow patterns can be mapped into a single universal function, characterizing the interplay between the system’s topology and its dynamics, ultimately allowing us to identify the network’s main arteries of information flow. Counter-intuitively, our formalism predicts a family of frequently encountered dynamics where the flow of information avoids the hubs, favoring the network’s peripheral pathways, a striking disparity between structure and dynamics. Complex networks are a useful tool to investigate spreading processes but topology alone is insufficient to predict information flow. Here the authors propose a measure of information flow and predict its behavior from the interplay between structure and dynamics.
Collapse
|
15
|
Liang Y, Kelemen A. Bayesian state space models for dynamic genetic network construction across multiple tissues. Stat Appl Genet Mol Biol 2017; 15:273-90. [PMID: 27343475 DOI: 10.1515/sagmb-2014-0055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models. The unevenly spaced short time courses with unseen time points are treated as hidden state variables. Hierarchical Bayesian approaches with various prior and hyper-prior models with Monte Carlo Markov Chain and Gibbs sampling algorithms are used to estimate the model parameters and the hidden state variables. We apply the proposed Hierarchical Bayesian state space models to multiple tissues (liver, skeletal muscle, and kidney) Affymetrix time course data sets following corticosteroid (CS) drug administration. Both simulation and real data analysis results show that the genomic changes over time and gene-gene interaction in response to CS treatment can be well captured by the proposed models. The proposed dynamic Hierarchical Bayesian state space modeling approaches could be expanded and applied to other large scale genomic data, such as next generation sequence (NGS) combined with real time and time varying electronic health record (EHR) for more comprehensive and robust systematic and network based analysis in order to transform big biomedical data into predictions and diagnostics for precision medicine and personalized healthcare with better decision making and patient outcomes.
Collapse
|
16
|
An integrative method to decode regulatory logics in gene transcription. Nat Commun 2017; 8:1044. [PMID: 29051499 PMCID: PMC5715098 DOI: 10.1038/s41467-017-01193-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 08/25/2017] [Indexed: 12/27/2022] Open
Abstract
Modeling of transcriptional regulatory networks (TRNs) has been increasingly used to dissect the nature of gene regulation. Inference of regulatory relationships among transcription factors (TFs) and genes, especially among multiple TFs, is still challenging. In this study, we introduced an integrative method, LogicTRN, to decode TF–TF interactions that form TF logics in regulating target genes. By combining cis-regulatory logics and transcriptional kinetics into one single model framework, LogicTRN can naturally integrate dynamic gene expression data and TF-DNA-binding signals in order to identify the TF logics and to reconstruct the underlying TRNs. We evaluated the newly developed methodology using simulation, comparison and application studies, and the results not only show their consistence with existing knowledge, but also demonstrate its ability to accurately reconstruct TRNs in biological complex systems. Existing transcriptional regulatory networks models fall short of deciphering the cooperation between multiple transcription factors on dynamic gene expression. Here the authors develop an integrative method that combines gene expression and transcription factor-DNA binding data to decode transcription regulatory logics.
Collapse
|
17
|
Zhang Y, Ouyang Z. Joint principal trend analysis for longitudinal high-dimensional data. Biometrics 2017; 74:430-438. [DOI: 10.1111/biom.12751] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 05/01/2017] [Accepted: 04/01/2017] [Indexed: 11/25/2022]
Affiliation(s)
- Yuping Zhang
- Department of Statistics; University of Connecticut; Storrs, Connecticut U.S.A
- Center for Quantitative Medicine; University of Connecticut Health Center; Farmington, Connecticut U.S.A
- Institute for Systems Genomics, Institute for Collaboration on Health, Intervention, and Policy, CT Institute of the Brain and Cognitive Sciences; University of Connecticut; Storrs, Connecticut U.S.A
| | - Zhengqing Ouyang
- The Jackson Laboratory for Genomic Medicine; Farmington, Connecticut U.S.A
- Department of Biomedical Engineering, Institute for Systems Genomics; University of Connecticut; Storrs, Connecticut U.S.A
- Department of Genetics and Genome Sciences; University of Connecticut Health Center; Farmington, Connecticut U.S.A
| |
Collapse
|
18
|
Lin Q, Liu Q, Lai T, Wang W. Kalman Filtering for Genetic Regulatory Networks with Missing Values. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:7837109. [PMID: 28814967 PMCID: PMC5549500 DOI: 10.1155/2017/7837109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 06/08/2017] [Indexed: 11/17/2022]
Abstract
The filter problem with missing value for genetic regulation networks (GRNs) is addressed, in which the noises exist in both the state dynamics and measurement equations; furthermore, the correlation between process noise and measurement noise is also taken into consideration. In order to deal with the filter problem, a class of discrete-time GRNs with missing value, noise correlation, and time delays is established. Then a new observation model is proposed to decrease the adverse effect caused by the missing value and to decouple the correlation between process noise and measurement noise in theory. Finally, a Kalman filtering is used to estimate the states of GRNs. Meanwhile, a typical example is provided to verify the effectiveness of the proposed method, and it turns out to be the case that the concentrations of mRNA and protein could be estimated accurately.
Collapse
Affiliation(s)
- Qiongbin Lin
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, Fujian 350116, China
| | - Qiuhua Liu
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, Fujian 350116, China
| | - Tianyue Lai
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, Fujian 350116, China
| | - Wu Wang
- College of Electrical Engineering and Automation, Fuzhou University, Fuzhou, Fujian 350116, China
- Fujian Key Lab of Medical Instrument and Pharmaceutical Technology, Fuzhou, Fujian 350116, China
| |
Collapse
|
19
|
Reverse engineering highlights potential principles of large gene regulatory network design and learning. NPJ Syst Biol Appl 2017. [PMID: 28649444 PMCID: PMC5481436 DOI: 10.1038/s41540-017-0019-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 104 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data (Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells. This work by Carré et al addresses central questions in biology, which are: how very large gene regulatory networks (GRNs) are organized, generate stable gene expression, and can be learnt using machine learning algorithms? In this work authors developed an algorithm able to simulate large GRNs. From these networks they simulate stable or oscillating gene expression and highlights some mathematical rules controlling such a collective (several thousands of genes) behavior. They discuss consequent hypothesis concerning the organization of GRNs in real cells. Using this simulation tool, authors also demonstrate that it’s likely possible to computationally learn GRNs from transcriptomic data and prior knowledge on the network (actual known connections issued from Yeast One Hybrid or ChIP Seq for instance). They particularly highlight the crucial importance of the prior knowledge structure in their capacity to learn large GRNs.
Collapse
|
20
|
Liang Y, Kelemen A. Computational dynamic approaches for temporal omics data with applications to systems medicine. BioData Min 2017. [PMID: 28638442 PMCID: PMC5473988 DOI: 10.1186/s13040-017-0140-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. However, the delineation of the possible associations and causalities of genes, proteins, metabolites, cells and other biological entities from high throughput time course omics data is challenging for which conventional experimental techniques are not suited in the big omics era. In this paper, we present various recently developed dynamic trajectory and causal network approaches for temporal omics data, which are extremely useful for those researchers who want to start working in this challenging research area. Moreover, applications to various biological systems, health conditions and disease status, and examples that summarize the state-of-the art performances depending on different specific mining tasks are presented. We critically discuss the merits, drawbacks and limitations of the approaches, and the associated main challenges for the years ahead. The most recent computing tools and software to analyze specific problem type, associated platform resources, and other potentials for the dynamic trajectory and interaction methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201 USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201 USA
| |
Collapse
|
21
|
Pazhamala LT, Purohit S, Saxena RK, Garg V, Krishnamurthy L, Verdier J, Varshney RK. Gene expression atlas of pigeonpea and its application to gain insights into genes associated with pollen fertility implicated in seed formation. JOURNAL OF EXPERIMENTAL BOTANY 2017; 68:2037-2054. [PMID: 28338822 PMCID: PMC5429002 DOI: 10.1093/jxb/erx010] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Pigeonpea (Cajanus cajan) is an important grain legume of the semi-arid tropics, mainly used for its protein rich seeds. To link the genome sequence information with agronomic traits resulting from specific developmental processes, a Cajanus cajan gene expression atlas (CcGEA) was developed using the Asha genotype. Thirty tissues/organs representing developmental stages from germination to senescence were used to generate 590.84 million paired-end RNA-Seq data. The CcGEA revealed a compendium of 28 793 genes with differential, specific, spatio-temporal and constitutive expression during various stages of development in different tissues. As an example to demonstrate the application of the CcGEA, a network of 28 flower-related genes analysed for cis-regulatory elements and splicing variants has been identified. In addition, expression analysis of these candidate genes in male sterile and male fertile genotypes suggested their critical role in normal pollen development leading to seed formation. Gene network analysis also identified two regulatory genes, a pollen-specific SF3 and a sucrose-proton symporter, that could have implications for improvement of agronomic traits such as seed production and yield. In conclusion, the CcGEA provides a valuable resource for pigeonpea to identify candidate genes involved in specific developmental processes and to understand the well-orchestrated growth and developmental process in this resilient crop.
Collapse
Affiliation(s)
- Lekha T Pazhamala
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
| | - Shilp Purohit
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
| | - Rachit K Saxena
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
| | - Vanika Garg
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
| | - L Krishnamurthy
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
| | - Jerome Verdier
- INRA - Research Institute in Horticulture and Seeds (IRHS), 49071 Beaucouze, France
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502 324, India
- School of Plant Biology and Institute of Agriculture, University of Western Australia, 35 Stirling Highway, Crawley, WA, 6009, Australia
| |
Collapse
|
22
|
Sun X, Hu F, Wu S, Qiu X, Linel P, Wu H. Controllability and stability analysis of large transcriptomic dynamic systems for host response to influenza infection in human. Infect Dis Model 2016; 1:52-70. [PMID: 29928721 PMCID: PMC5963324 DOI: 10.1016/j.idm.2016.07.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 07/08/2016] [Indexed: 12/20/2022] Open
Abstract
Background Gene regulatory networks are complex dynamic systems and the reverse-engineering of such networks from high-dimensional time course transcriptomic data have attracted researchers from various fields. It is also interesting and important to study the behavior of the reconstructed networks on the basis of dynamic models and the biological mechanisms. We focus on the gene regulatory networks reconstructed using the ordinary differential equation (ODE) modelling approach and investigate the properties of these networks. Results Controllability and stability analyses are conducted for the reconstructed gene response networks of 17 influenza infected subjects based on ODE models. Symptomatic subjects tend to have larger numbers of driver nodes, higher proportions of critical links and lower proportions of redundant links than asymptomatic subjects. We also show that the degree distribution, rather than the structure of networks, plays an important role in controlling the network in response to influenza infection. In addition, we find that the stability of high-dimensional networks is very sensitive to randomness in the reconstructed systems brought by errors in measurements and parameter estimation. Conclusions The gene response networks of asymptomatic subjects are easier to be controlled than those of symptomatic subjects. This may indicate that the regulatory systems of asymptomatic subjects are easier to recover from disease stimulations, so these subjects are less likely to develop symptoms. Our results also suggest that stability constraint should be considered in the modelling of high-dimensional networks and the estimation of network parameters.
Collapse
Affiliation(s)
- Xiaodian Sun
- Biostatistics and Bioinformatics Core, Sylvester Comprehensive Cancer Center, University of Miami, Miami, USA
| | - Fang Hu
- Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA
| | - Shuang Wu
- Genus PLC, ABS Global, Deforest, WI, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA
| | | | - Hulin Wu
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
23
|
Erdem C, Nagle AM, Casa AJ, Litzenburger BC, Wang YF, Taylor DL, Lee AV, Lezon TR. Proteomic Screening and Lasso Regression Reveal Differential Signaling in Insulin and Insulin-like Growth Factor I (IGF1) Pathways. Mol Cell Proteomics 2016; 15:3045-57. [PMID: 27364358 PMCID: PMC5013316 DOI: 10.1074/mcp.m115.057729] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Revised: 06/23/2016] [Indexed: 01/22/2023] Open
Abstract
Insulin and insulin-like growth factor I (IGF1) influence cancer risk and progression through poorly understood mechanisms. To better understand the roles of insulin and IGF1 signaling in breast cancer, we combined proteomic screening with computational network inference to uncover differences in IGF1 and insulin induced signaling. Using reverse phase protein array, we measured the levels of 134 proteins in 21 breast cancer cell lines stimulated with IGF1 or insulin for up to 48 h. We then constructed directed protein expression networks using three separate methods: (i) lasso regression, (ii) conventional matrix inversion, and (iii) entropy maximization. These networks, named here as the time translation models, were analyzed and the inferred interactions were ranked by differential magnitude to identify pathway differences. The two top candidates, chosen for experimental validation, were shown to regulate IGF1/insulin induced phosphorylation events. First, acetyl-CoA carboxylase (ACC) knock-down was shown to increase the level of mitogen-activated protein kinase (MAPK) phosphorylation. Second, stable knock-down of E-Cadherin increased the phospho-Akt protein levels. Both of the knock-down perturbations incurred phosphorylation responses stronger in IGF1 stimulated cells compared with insulin. Overall, the time-translation modeling coupled to wet-lab experiments has proven to be powerful in inferring differential interactions downstream of IGF1 and insulin signaling, in vitro.
Collapse
Affiliation(s)
- Cemal Erdem
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Alison M Nagle
- ¶Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; ‖Women's Cancer Research Center, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania
| | - Angelo J Casa
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Beate C Litzenburger
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - Yu-Fen Wang
- **Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
| | - D Lansing Taylor
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Adrian V Lee
- ¶Department of Pharmacology & Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; ‖Women's Cancer Research Center, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania; ‡‡Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Timothy R Lezon
- From the ‡Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania; §University of Pittsburgh Drug Discovery Institute, University of Pittsburgh, Pittsburgh, Pennsylvania;
| |
Collapse
|
24
|
Cho SJ, Lee J, Lee HJ, Jo HY, Sinniah M, Kim HY, Chong CK, Song HO. A Novel Malaria Pf/Pv Ab Rapid Diagnostic Test Using a Differential Diagnostic Marker Identified by Network Biology. Int J Biol Sci 2016; 12:824-35. [PMID: 27313496 PMCID: PMC4910601 DOI: 10.7150/ijbs.14408] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 05/06/2016] [Indexed: 11/05/2022] Open
Abstract
Rapid diagnostic tests (RDTs) can detect anti-malaria antibodies in human blood. As they can detect parasite infection at the low parasite density, they are useful in endemic areas where light infection and/or re-infection of parasites are common. Thus, malaria antibody tests can be used for screening bloods in blood banks to prevent transfusion-transmitted malaria (TTM), an emerging problem in malaria endemic areas. However, only a few malaria antibody tests are available in the microwell-based assay format and these are not suitable for field application. A novel malaria antibody (Ab)-based RDT using a differential diagnostic marker for falciparum and vivax malaria was developed as a suitable high-throughput assay that is sensitive and practical for blood screening. The marker, merozoite surface protein 1 (MSP1) was discovered by generation of a Plasmodium-specific network and the hierarchical organization of modularity in the network. Clinical evaluation revealed that the novel Malaria Pf/Pv Ab RDT shows improved sensitivity (98%) and specificity (99.7%) compared with the performance of a commercial kit, SD BioLine Malaria P.f/P.v (95.1% sensitivity and 99.1% specificity). The novel Malaria Pf/Pv Ab RDT has potential for use as a cost-effective blood-screening tool for malaria and in turn, reduces TTM risk in endemic areas.
Collapse
Affiliation(s)
- Sung Jin Cho
- 1. Department of Bioinformatics, College of Natural Sciences, Chungbuk National University, Cheongju, Chungbuk, Republic of Korea
| | - Jihoo Lee
- 2. Department of Biochemistry, College of Natural Sciences, Chungbuk National University, Cheongju, Chungbuk, Republic of Korea
| | - Hyun Jae Lee
- 1. Department of Bioinformatics, College of Natural Sciences, Chungbuk National University, Cheongju, Chungbuk, Republic of Korea
| | - Hyun-Young Jo
- 3. Laboratory Medicine, Chungbuk National University Hospital, Cheongju, Chungbuk, Republic of Korea
| | | | - Hak-Yong Kim
- 2. Department of Biochemistry, College of Natural Sciences, Chungbuk National University, Cheongju, Chungbuk, Republic of Korea
| | - Chom-Kyu Chong
- 5. GenBody Inc., Dankook Biotech Business IC, Cheonan, Chungnam, Republic of Korea
| | - Hyun-Ok Song
- 6. Department of Infection Biology, Wonkwang University School of Medicine, Iksan, Jeonbuk, Republic of Korea
| |
Collapse
|
25
|
Heimberg G, Bhatnagar R, El-Samad H, Thomson M. Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing. Cell Syst 2016; 2:239-250. [PMID: 27135536 DOI: 10.1016/j.cels.2016.04.001] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 03/08/2016] [Accepted: 04/04/2016] [Indexed: 11/17/2022]
Abstract
A tradeoff between precision and throughput constrains all biological measurements, including sequencing-based technologies. Here, we develop a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. We find that transcriptional programs can be reproducibly identified at 1% of conventional read depths. We demonstrate that this resilience to noise of "shallow" sequencing derives from a natural property, low dimensionality, which is a fundamental feature of gene expression data. Accordingly, our conclusions hold for ∼350 single-cell and bulk gene expression datasets across yeast, mouse, and human. In total, our approach provides quantitative guidelines for the choice of sequencing depth necessary to achieve a desired level of analytical resolution. We codify these guidelines in an open-source read depth calculator. This work demonstrates that the structure inherent in biological networks can be productively exploited to increase measurement throughput, an idea that is now common in many branches of science, such as image processing.
Collapse
Affiliation(s)
- Graham Heimberg
- Department of Biochemistry and Biophysics, California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Integrative Program in Quantitative Biology, University of California, San Francisco, San Francisco, CA 94158, USA
- Center for Systems and Synthetic Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Rajat Bhatnagar
- Department of Biochemistry and Biophysics, California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Center for Systems and Synthetic Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Hana El-Samad
- Department of Biochemistry and Biophysics, California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Center for Systems and Synthetic Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Matt Thomson
- Center for Systems and Synthetic Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
26
|
Candia J, Cherukuri S, Guo Y, Doshi KA, Banavar JR, Civin CI, Losert W. Uncovering low-dimensional, miR-based signatures of acute myeloid and lymphoblastic leukemias with a machine-learning-driven network approach. CONVERGENT SCIENCE PHYSICAL ONCOLOGY 2015; 1. [PMID: 27274862 DOI: 10.1088/2057-1739/1/2/025002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Complex phenotypic differences among different acute leukemias cannot be fully captured by analyzing the expression levels of one single molecule, such as a miR, at a time, but requires systematic analysis of large sets of miRs. While a popular approach for analysis of such datasets is principal component analysis (PCA), this method is not designed to optimally discriminate different phenotypes. Moreover, PCA and other low-dimensional representation methods yield linear or non-linear combinations of all measured miRs. Global human miR expression was measured in AML, B-ALL, and TALL cell lines and patient RNA samples. By systematically applying support vector machines to all measured miRs taken in dyad and triad groups, we built miR networks using cell line data and validated our findings with primary patient samples. All the coordinately transcribed members of the miR-23a cluster (which includes also miR-24 and miR-27a), known to function as tumor suppressors of acute leukemias, appeared in the AML, B-ALL and T-ALL centric networks. Subsequent qRT-PCR analysis showed that the most connected miR in the B-ALL-centric network, miR-708, is highly and specifically expressed in B-ALLs, suggesting that miR-708 might serve as a biomarker for B-ALL. This approach is systematic, quantitative, scalable, and unbiased. Rather than a single signature, our approach yields a network of signatures reflecting the redundant nature of biological signaling pathways. The network representation allows for visual analysis of all signatures by an expert and for future integration of additional information. Furthermore, each signature involves only small sets of miRs, such as dyads and triads, which are well suited for in depth validation through laboratory experiments. In particular, loss-and gain-of-function assays designed to drive changes in leukemia cell survival, proliferation and differentiation will benefit from the identification of multi-miR signatures that characterize leukemia subtypes and their normal counterpart cells of origin.
Collapse
Affiliation(s)
- Julián Candia
- Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA; Department of Physics, University of Maryland, College Park, MD 20742, USA; Center for Stem Cell Biology & Regenerative Medicine, Departments of Pediatrics and Physiology, University of Maryland School of Medicine, Baltimore MD 21201, USA
| | - Srujana Cherukuri
- Center for Stem Cell Biology & Regenerative Medicine, Departments of Pediatrics and Physiology, University of Maryland School of Medicine, Baltimore MD 21201, USA; Noble Life Sciences, 22 Firstfield Rd, Gaithersburg, MD 20878, USA
| | - Yin Guo
- Center for Stem Cell Biology & Regenerative Medicine, Departments of Pediatrics and Physiology, University of Maryland School of Medicine, Baltimore MD 21201, USA
| | - Kshama A Doshi
- Center for Stem Cell Biology & Regenerative Medicine, Departments of Pediatrics and Physiology, University of Maryland School of Medicine, Baltimore MD 21201, USA
| | - Jayanth R Banavar
- Department of Physics, University of Maryland, College Park, MD 20742, USA
| | - Curt I Civin
- Center for Stem Cell Biology & Regenerative Medicine, Departments of Pediatrics and Physiology, University of Maryland School of Medicine, Baltimore MD 21201, USA
| | - Wolfgang Losert
- Department of Physics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
27
|
Jayavelu ND, Aasgaard LS, Bar N. Iterative sub-network component analysis enables reconstruction of large scale genetic networks. BMC Bioinformatics 2015; 16:366. [PMID: 26537518 PMCID: PMC4634733 DOI: 10.1186/s12859-015-0768-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 10/09/2015] [Indexed: 11/28/2022] Open
Abstract
Background Network component analysis (NCA) became a popular tool to understand complex regulatory networks. The method uses high-throughput gene expression data and a priori topology to reconstruct transcription factor activity profiles. Current NCA algorithms are constrained by several conditions posed on the network topology, to guarantee unique reconstruction (termed compliancy). However, the restrictions these conditions pose are not necessarily true from biological perspective and they force network size reduction, pruning potentially important components. Results To address this, we developed a novel, Iterative Sub-Network Component Analysis (ISNCA) for reconstructing networks at any size. By dividing the initial network into smaller, compliant subnetworks, the algorithm first predicts the reconstruction of each subntework using standard NCA algorithms. It then subtracts from the reconstruction the contribution of the shared components from the other subnetwork. We tested the ISNCA on real, large datasets using various NCA algorithms. The size of the networks we tested and the accuracy of the reconstruction increased significantly. Importantly, FOXA1, ATF2, ATF3 and many other known key regulators in breast cancer could not be incorporated by any NCA algorithm because of the necessary conditions. However, their temporal activities could be reconstructed by our algorithm, and therefore their involvement in breast cancer could be analyzed. Conclusions Our framework enables reconstruction of large gene expression data networks, without reducing their size or pruning potentially important components, and at the same time rendering the results more biological plausible. Our ISNCA method is not only suitable for prediction of key regulators in cancer studies, but it can be applied to any high-throughput gene expression data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0768-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Lasse S Aasgaard
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| | - Nadav Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Salandsvei 4, Trondheim, Norway.
| |
Collapse
|
28
|
Smieszek SP, Yang H, Paccanaro A, Devlin PF. Progressive promoter element combinations classify conserved orthogonal plant circadian gene expression modules. J R Soc Interface 2015; 11:rsif.2014.0535. [PMID: 25142519 PMCID: PMC4233729 DOI: 10.1098/rsif.2014.0535] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We aimed to test the proposal that progressive combinations of multiple promoter elements acting in concert may be responsible for the full range of phases observed in plant circadian output genes. In order to allow reliable selection of informative phase groupings of genes for our purpose, intrinsic cyclic patterns of expression were identified using a novel, non-biased method for the identification of circadian genes. Our non-biased approach identified two dominant, inherent orthogonal circadian trends underlying publicly available microarray data from plants maintained under constant conditions. Furthermore, these trends were highly conserved across several plant species. Four phase-specific modules of circadian genes were generated by projection onto these trends and, in order to identify potential combinatorial promoter elements that might classify genes into these groups, we used a Random Forest pipeline which merged data from multiple decision trees to look for the presence of element combinations. We identified a number of regulatory motifs which aggregated into coherent clusters capable of predicting the inclusion of genes within each phase module with very high fidelity and these motif combinations changed in a consistent, progressive manner from one phase module group to the next, providing strong support for our hypothesis.
Collapse
Affiliation(s)
- Sandra P Smieszek
- School of Biological Sciences, Royal Holloway University of London, Egham TW20 0EX, UK Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Haixuan Yang
- Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Paul F Devlin
- School of Biological Sciences, Royal Holloway University of London, Egham TW20 0EX, UK Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK
| |
Collapse
|
29
|
Zhu F, Shi L, Engel JD, Guan Y. Regulatory network inferred using expression data of small sample size: application and validation in erythroid system. Bioinformatics 2015; 31:2537-44. [PMID: 25840044 DOI: 10.1093/bioinformatics/btv186] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 03/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. The outcome of the current algorithms highly depends on the quality and quantity of a single time-course dataset, and the performance may be compromised for datasets with a limited number of samples. RESULTS In this work, we report a multi-layer graphical model that is capable of leveraging many publicly available time-course datasets, as well as a cell lineage-specific data with small sample size, to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual public datasets. Then, the inferred directional relationships are weighted and integrated together by evaluating against the cell lineage-specific dataset. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lihong Shi
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | | | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, Department of Internal Medicine, and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
30
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
31
|
Huang X, Zi Z. Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR). MOLECULAR BIOSYSTEMS 2015; 10:2023-30. [PMID: 24899235 DOI: 10.1039/c4mb00053f] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Bayesian network and linear regression methods have been widely applied to reconstruct cellular regulatory networks. In this work, we propose a Bayesian model averaging for linear regression (BMALR) method to infer molecular interactions in biological systems. This method uses a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. We have assessed the performance of BMALR by benchmarking on both in silico DREAM datasets and real experimental datasets. The results show that BMALR achieves both high prediction accuracy and high computational efficiency across different benchmarks. A pre-processing of the datasets with the log transformation can further improve the performance of BMALR, leading to a new top overall performance. In addition, BMALR can achieve robust high performance in community predictions when it is combined with other competing methods. The proposed method BMALR is competitive compared to the existing network inference methods. Therefore, BMALR will be useful to infer regulatory interactions in biological networks. A free open source software tool for the BMALR algorithm is available at https://sites.google.com/site/bmalr4netinfer/.
Collapse
Affiliation(s)
- Xun Huang
- BIOSS Centre for Biological Signalling Studies, University of Freiburg, 79104, Freiburg, Germany.
| | | |
Collapse
|
32
|
Hagen DR, Tidor B. Efficient Bayesian estimates for discrimination among topologically different systems biology models. MOLECULAR BIOSYSTEMS 2014; 11:574-84. [PMID: 25460000 DOI: 10.1039/c4mb00276h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A major effort in systems biology is the development of mathematical models that describe complex biological systems at multiple scales and levels of abstraction. Determining the topology-the set of interactions-of a biological system from observations of the system's behavior is an important and difficult problem. Here we present and demonstrate new methodology for efficiently computing the probability distribution over a set of topologies based on consistency with existing measurements. Key features of the new approach include derivation in a Bayesian framework, incorporation of prior probability distributions of topologies and parameters, and use of an analytically integrable linearization based on the Fisher information matrix that is responsible for large gains in efficiency. The new method was demonstrated on a collection of four biological topologies representing a kinase and phosphatase that operate in opposition to each other with either processive or distributive kinetics, giving 8-12 parameters for each topology. The linearization produced an approximate result very rapidly (CPU minutes) that was highly accurate on its own, as compared to a Monte Carlo method guaranteed to converge to the correct answer but at greater cost (CPU weeks). The Monte Carlo method developed and applied here used the linearization method as a starting point and importance sampling to approach the Bayesian answer in acceptable time. Other inexpensive methods to estimate probabilities produced poor approximations for this system, with likelihood estimation showing its well-known bias toward topologies with more parameters and the Akaike and Schwarz Information Criteria showing a strong bias toward topologies with fewer parameters. These results suggest that this linear approximation may be an effective compromise, providing an answer whose accuracy is near the true Bayesian answer, but at a cost near the common heuristics.
Collapse
Affiliation(s)
- David R Hagen
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | | |
Collapse
|
33
|
Lu T, Wang M. Investigate Data Dependency for Dynamic Gene Regulatory Network Identification through High-dimensional Differential Equation Approach. COMMUN STAT-SIMUL C 2014. [DOI: 10.1080/03610918.2014.902224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
34
|
Wu H, Lu T, Xue H, Liang H. Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling. J Am Stat Assoc 2014; 109:700-716. [PMID: 25061254 DOI: 10.1080/01621459.2013.859617] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The gene regulation network (GRN) is a high-dimensional complex system, which can be represented by various mathematical or statistical models. The ordinary differential equation (ODE) model is one of the popular dynamic GRN models. High-dimensional linear ODE models have been proposed to identify GRNs, but with a limitation of the linear regulation effect assumption. In this article, we propose a sparse additive ODE (SA-ODE) model, coupled with ODE estimation methods and adaptive group LASSO techniques, to model dynamic GRNs that could flexibly deal with nonlinear regulation effects. The asymptotic properties of the proposed method are established and simulation studies are performed to validate the proposed approach. An application example for identifying the nonlinear dynamic GRN of T-cell activation is used to illustrate the usefulness of the proposed method.
Collapse
Affiliation(s)
- Hulin Wu
- Department of Biostatistics and Computational Biology, University of Rochester, School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, NY 14642
| | - Tao Lu
- Department of Epidemiology and Biostatistics, State University of New York, Albany, NY 12144
| | - Hongqi Xue
- Department of Biostatistics and Computational Biology, University of Rochester, School of Medicine and Dentistry, 601 Elmwood Avenue, Box 630, Rochester, NY 14642
| | - Hua Liang
- Department of Statistics, George Washington University, 801 22nd St. NW, Washington, D.C. 20052
| |
Collapse
|
35
|
Wu S, Liu ZP, Qiu X, Wu H. Modeling genome-wide dynamic regulatory network in mouse lungs with influenza infection using high-dimensional ordinary differential equations. PLoS One 2014; 9:e95276. [PMID: 24802016 PMCID: PMC4011728 DOI: 10.1371/journal.pone.0095276] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 03/26/2014] [Indexed: 12/20/2022] Open
Abstract
The immune response to viral infection is regulated by an intricate network of many genes and their products. The reverse engineering of gene regulatory networks (GRNs) using mathematical models from time course gene expression data collected after influenza infection is key to our understanding of the mechanisms involved in controlling influenza infection within a host. A five-step pipeline: detection of temporally differentially expressed genes, clustering genes into co-expressed modules, identification of network structure, parameter estimate refinement, and functional enrichment analysis, is developed for reconstructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. Applying the pipeline to the time course gene expression data from influenza-infected mouse lungs, we have identified 20 distinct temporal expression patterns in the differentially expressed genes and constructed a module-based dynamic network using a linear ODE model. Both intra-module and inter-module annotations and regulatory relationships of our inferred network show some interesting findings and are highly consistent with existing knowledge about the immune response in mice after influenza infection. The proposed method is a computationally efficient, data-driven pipeline bridging experimental data, mathematical modeling, and statistical analysis. The application to the influenza infection data elucidates the potentials of our pipeline in providing valuable insights into systematic modeling of complicated biological processes.
Collapse
Affiliation(s)
- Shuang Wu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, United States of America
| | - Zhi-Ping Liu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, United States of America
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, United States of America
| | - Hulin Wu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, United States of America
| |
Collapse
|
36
|
Candia J, Banavar JR, Losert W. Understanding health and disease with multidimensional single-cell methods. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2014; 26:073102. [PMID: 24451406 PMCID: PMC4020281 DOI: 10.1088/0953-8984/26/7/073102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Current efforts in the biomedical sciences and related interdisciplinary fields are focused on gaining a molecular understanding of health and disease, which is a problem of daunting complexity that spans many orders of magnitude in characteristic length scales, from small molecules that regulate cell function to cell ensembles that form tissues and organs working together as an organism. In order to uncover the molecular nature of the emergent properties of a cell, it is essential to measure multiple-cell components simultaneously in the same cell. In turn, cell heterogeneity requires multiple-cells to be measured in order to understand health and disease in the organism. This review summarizes current efforts towards a data-driven framework that leverages single-cell technologies to build robust signatures of healthy and diseased phenotypes. While some approaches focus on multicolor flow cytometry data and other methods are designed to analyze high-content image-based screens, we emphasize the so-called Supercell/SVM paradigm (recently developed by the authors of this review and collaborators) as a unified framework that captures mesoscopic-scale emergence to build reliable phenotypes. Beyond their specific contributions to basic and translational biomedical research, these efforts illustrate, from a larger perspective, the powerful synergy that might be achieved from bringing together methods and ideas from statistical physics, data mining, and mathematics to solve the most pressing problems currently facing the life sciences.
Collapse
Affiliation(s)
- Julián Candia
- Department of Physics, University of Maryland, College Park, MD 20742, USA. School of Medicine, University of Maryland, Baltimore, MD 21201, USA. IFLYSIB and CONICET, University of La Plata, 1900 La Plata, Argentina
| | | | | |
Collapse
|
37
|
Zheng Z, Christley S, Chiu WT, Blitz IL, Xie X, Cho KWY, Nie Q. Inference of the Xenopus tropicalis embryonic regulatory network and spatial gene expression patterns. BMC SYSTEMS BIOLOGY 2014; 8:3. [PMID: 24397936 PMCID: PMC3896677 DOI: 10.1186/1752-0509-8-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 12/19/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND During embryogenesis, signaling molecules produced by one cell population direct gene regulatory changes in neighboring cells and influence their developmental fates and spatial organization. One of the earliest events in the development of the vertebrate embryo is the establishment of three germ layers, consisting of the ectoderm, mesoderm and endoderm. Attempts to measure gene expression in vivo in different germ layers and cell types are typically complicated by the heterogeneity of cell types within biological samples (i.e., embryos), as the responses of individual cell types are intermingled into an aggregate observation of heterogeneous cell types. Here, we propose a novel method to elucidate gene regulatory circuits from these aggregate measurements in embryos of the frog Xenopus tropicalis using gene network inference algorithms and then test the ability of the inferred networks to predict spatial gene expression patterns. RESULTS We use two inference models with different underlying assumptions that incorporate existing network information, an ODE model for steady-state data and a Markov model for time series data, and contrast the performance of the two models. We apply our method to both control and knockdown embryos at multiple time points to reconstruct the core mesoderm and endoderm regulatory circuits. Those inferred networks are then used in combination with known dorsal-ventral spatial expression patterns of a subset of genes to predict spatial expression patterns for other genes. Both models are able to predict spatial expression patterns for some of the core mesoderm and endoderm genes, but interestingly of different gene subsets, suggesting that neither model is sufficient to recapitulate all of the spatial patterns, yet they are complementary for the patterns that they do capture. CONCLUSION The presented methodology of gene network inference combined with spatial pattern prediction provides an additional layer of validation to elucidate the regulatory circuits controlling the spatial-temporal dynamics in embryonic development.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA.
| |
Collapse
|
38
|
Strakova E, Bobek J, Zikova A, Vohradsky J. Global features of gene expression on the proteome and transcriptome levels in S. coelicolor during germination. PLoS One 2013; 8:e72842. [PMID: 24039809 PMCID: PMC3767685 DOI: 10.1371/journal.pone.0072842] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Accepted: 07/15/2013] [Indexed: 11/18/2022] Open
Abstract
Streptomycetes have been studied mostly as producers of secondary metabolites, while the transition from dormant spores to an exponentially growing culture has largely been ignored. Here, we focus on a comparative analysis of fluorescently and radioactively labeled proteome and microarray acquired transcriptome expressed during the germination of Streptomyces coelicolor. The time-dynamics is considered, starting from dormant spores through 5.5 hours of growth with 13 time points. Time series of the gene expressions were analyzed using correlation, principal components analysis and an analysis of coding genes utilization. Principal component analysis was used to identify principal kinetic trends in gene expression and the corresponding genes driving S. coelicolor germination. In contrast with the correlation analysis, global trends in the gene/protein expression reflected by the first principal components showed that the prominent patterns in both the protein and the mRNA domains are surprisingly well correlated. Analysis of the number of expressed genes identified functional groups activated during different time intervals of the germination.
Collapse
Affiliation(s)
- Eva Strakova
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jan Bobek
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- Institute of Immunology and Microbiology, First Faculty of Medicine, Charles University in Prague, Prague, Czech Republic
| | - Alice Zikova
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jiri Vohradsky
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- * E-mail:
| |
Collapse
|
39
|
Chen BS, Li CW. Analysing microarray data in drug discovery using systems biology. Expert Opin Drug Discov 2013; 2:755-68. [PMID: 23488963 DOI: 10.1517/17460441.2.5.755] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The innovation of present drug design focuses on new targets. However, compound efficacy and safety in human metabolism, including toxicity and pharmacokinetic profiles, but not target selection, are the criteria that determine which drug candidates enter the clinic. Systems biology approaches to disease are developed from the idea that disease-perturbed regulatory networks differ from their normal counterparts. Microarray data analyses reveal global changes in gene or protein expression in response to genetic and environmental changes and, accordingly, are well suited to construct the normal, disease-perturbed and drug-affected networks, which are useful for drug discovery in the pharmaceutical industry. The integration of modelling, microarray data and systems biology approaches will allow for a true breakthrough in in silico absorption, distribution, metabolism, excretion and toxicity assessment in drug design. Therefore, drug discovery through systems biology by means of microarray analyses could significantly reduce the time and cost of new drug development.
Collapse
Affiliation(s)
- Bor-Sen Chen
- National Tsing Hua University, Laboratory of Control and Systems Biology, 101, Sec 2, Kuang Fu Road, Hsinchu, 300, Taiwan
| | | |
Collapse
|
40
|
Barzel B, Barabási AL. Universality in network dynamics. NATURE PHYSICS 2013; 9:673-681. [PMID: 24319492 PMCID: PMC3852675 DOI: 10.1038/nphys2741] [Citation(s) in RCA: 140] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Accepted: 07/30/2013] [Indexed: 05/08/2023]
Abstract
Despite significant advances in characterizing the structural properties of complex networks, a mathematical framework that uncovers the universal properties of the interplay between the topology and the dynamics of complex systems continues to elude us. Here we develop a self-consistent theory of dynamical perturbations in complex systems, allowing us to systematically separate the contribution of the network topology and dynamics. The formalism covers a broad range of steady-state dynamical processes and offers testable predictions regarding the system's response to perturbations and the development of correlations. It predicts several distinct universality classes whose characteristics can be derived directly from the continuum equation governing the system's dynamics and which are validated on several canonical network-based dynamical systems, from biochemical dynamics to epidemic spreading. Finally, we collect experimental data pertaining to social and biological systems, demonstrating that we can accurately uncover their universality class even in the absence of an appropriate continuum theory that governs the system's dynamics.
Collapse
Affiliation(s)
- Baruch Barzel
- Center for Complex Network Research and Departments of Physics, Computer Science and Biology, Northeastern University, Boston, Massachusetts 02115, USA ; Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
41
|
Wang L, Wang X, Arkin AP, Samoilov MS. Inference of gene regulatory networks from genome-wide knockout fitness data. Bioinformatics 2012; 29:338-46. [PMID: 23271269 PMCID: PMC3562072 DOI: 10.1093/bioinformatics/bts634] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Motivation: Genome-wide fitness is an emerging type of high-throughput
biological data generated for individual organisms by creating libraries of knockouts,
subjecting them to broad ranges of environmental conditions, and measuring the resulting
clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory
network behaviour, it may offer certain advantages when insights into such phenotypical
and functional features are of primary interest over individual gene expression. Previous
works have shown that genome-wide fitness data can be used to uncover novel gene
regulatory interactions, when compared with results of more conventional gene expression
analysis. Yet, to date, few algorithms have been proposed for systematically using
genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference
algorithm for using fitness data from knockout libraries to identify underlying gene
regulatory networks. Unlike most prior methods, the presented approach captures not only
structural, but also dynamical and non-linear nature of biomolecular systems involved. A
state–space model with non-linear basis is used for dynamically describing gene
regulatory networks. Network structure is then elucidated by estimating unknown model
parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in
the model, which also enables the algorithm to run in on-line mode for practical use.
Here, we demonstrate that the algorithm provides satisfying results for both synthetic
data as well as empirical measurements of GAL network in yeast
Saccharomyces cerevisiae and TyrR–LiuR network
in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at
http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact:wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information:Supplementary data are available at Bioinformatics
online
Collapse
Affiliation(s)
- Liming Wang
- Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
| | | | | | | |
Collapse
|
42
|
Gąska M, Kuśmider M, Solich J, Faron-Górecka A, Krawczyk MJ, Kułakowski K, Dziedzicka-Wasylewska M. Analysis of region-specific changes in gene expression upon treatment with citalopram and desipramine reveals temporal dynamics in response to antidepressant drugs at the transcriptome level. Psychopharmacology (Berl) 2012; 223:281-97. [PMID: 22547330 PMCID: PMC3438400 DOI: 10.1007/s00213-012-2714-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 03/30/2012] [Indexed: 12/20/2022]
Abstract
RATIONALE The notion that the onset of action of antidepressant drugs (ADs) takes weeks is widely accepted; however, the sequence of events necessary for therapeutic effects still remains obscure. OBJECTIVE We aimed to evaluate a time-course of ADs-induced alterations in the expression of 95 selected genes in 4 regions of the rat brain: the prefrontal and cingulate cortices, the dentate gyrus of the hippocampus, and the amygdala. METHODS We employed RT-PCR array to evaluate changes during a time-course (1, 3, 7, 14, and 21 days) of treatments with desipramine (DMI) and citalopram (CIT). In addition to repeated treatment, we also conducted acute treatment (a single dose of drug followed by the same time intervals as the repeated doses). RESULTS Time-dependent and structure-specific changes in gene expression patterns allowed us to identify spatiotemporal differences in the molecular action of two ADs. Singular value decomposition analysis revealed differences in the global gene expression profiles between treatment types. The numbers of characteristic modes were generally smaller after CIT treatment than after DMI treatment. Analysis of the dynamics of gene expression revealed that the most significant changes concerned immediate early genes, whose expression was also visualized by in situ hybridization. Transcription factor binding site analysis revealed an over-representation of serum response factor binding sites in the promoters of genes that changed upon treatment with both ADs. CONCLUSIONS The observed gene expression patterns were highly dynamic, with oscillations and peaks at various time points of treatment. Our study also revealed novel potential targets of antidepressant action, i.e., Dbp and Id1 genes.
Collapse
Affiliation(s)
- Magdalena Gąska
- Department of Pharmacology, Institute of Pharmacology Polish Academy of Sciences, Smętna 12 Street, 31-343 Krakow, Poland.
| | | | | | | | | | | | | |
Collapse
|
43
|
Ma X, Gao L. Discovering protein complexes in protein interaction networks via exploring the weak ties effect. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 1:S6. [PMID: 23046740 PMCID: PMC3403613 DOI: 10.1186/1752-0509-6-s1-s6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
BACKGROUND Studying protein complexes is very important in biological processes since it helps reveal the structure-functionality relationships in biological networks and much attention has been paid to accurately predict protein complexes from the increasing amount of protein-protein interaction (PPI) data. Most of the available algorithms are based on the assumption that dense subgraphs correspond to complexes, failing to take into account the inherence organization within protein complex and the roles of edges. Thus, there is a critical need to investigate the possibility of discovering protein complexes using the topological information hidden in edges. RESULTS To provide an investigation of the roles of edges in PPI networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connectivity, indicating the weak ties phenomenon in PPI networks. We further demonstrate that there is a negative relation between the weak tie strength and the topological similarity. By using the bridges, a reliable virtual network is constructed, in which each maximal clique corresponds to the core of a complex. By this notion, the detection of the protein complexes is transformed into a classic all-clique problem. A novel core-attachment based method is developed, which detects the cores and attachments, respectively. A comprehensive comparison among the existing algorithms and our algorithm has been made by comparing the predicted complexes against benchmark complexes. CONCLUSIONS We proved that the weak tie effect exists in the PPI network and demonstrated that the density is insufficient to characterize the topological structure of protein complexes. Furthermore, the experimental results on the yeast PPI network show that the proposed method outperforms the state-of-the-art algorithms. The analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes.
Collapse
Affiliation(s)
- Xiaoke Ma
- School of Computer Science and Technology, Xidian University, 710071, PR China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, 710071, PR China
| |
Collapse
|
44
|
Tierney L, Linde J, Müller S, Brunke S, Molina JC, Hube B, Schöck U, Guthke R, Kuchler K. An Interspecies Regulatory Network Inferred from Simultaneous RNA-seq of Candida albicans Invading Innate Immune Cells. Front Microbiol 2012; 3:85. [PMID: 22416242 PMCID: PMC3299011 DOI: 10.3389/fmicb.2012.00085] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 02/20/2012] [Indexed: 12/31/2022] Open
Abstract
The ability to adapt to diverse micro-environmental challenges encountered within a host is of pivotal importance to the opportunistic fungal pathogen Candida albicans. We have quantified C. albicans and M. musculus gene expression dynamics during phagocytosis by dendritic cells in a genome-wide, time-resolved analysis using simultaneous RNA-seq. A robust network inference map was generated from this dataset using NetGenerator, predicting novel interactions between the host and the pathogen. We experimentally verified predicted interdependent sub-networks comprising Hap3 in C. albicans, and Ptx3 and Mta2 in M. musculus. Remarkably, binding of recombinant Ptx3 to the C. albicans cell wall was found to regulate the expression of fungal Hap3 target genes as predicted by the network inference model. Pre-incubation of C. albicans with recombinant Ptx3 significantly altered the expression of Mta2 target cytokines such as IL-2 and IL-4 in a Hap3-dependent manner, further suggesting a role for Mta2 in host-pathogen interplay as predicted in the network inference model. We propose an integrated model for the functionality of these sub-networks during fungal invasion of immune cells, according to which binding of Ptx3 to the C. albicans cell wall induces remodeling via fungal Hap3 target genes, thereby altering the immune response to the pathogen. We show the applicability of network inference to predict interactions between host-pathogen pairs, demonstrating the usefulness of this systems biology approach to decipher mechanisms of microbial pathogenesis.
Collapse
Affiliation(s)
- Lanay Tierney
- Christian Doppler Laboratory for Infection Biology, Max F. Perutz Laboratories, Medical University of Vienna Vienna, Austria
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Lu T, Liang H, Li H, Wu H. High Dimensional ODEs Coupled with Mixed-Effects Modeling Techniques for Dynamic Gene Regulatory Network Identification. J Am Stat Assoc 2012. [PMID: 23204614 DOI: 10.1198/jasa.2011.ap10194] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. In this article we propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed, that include non-parametric mixed-effects models with a mixture distribution for clustering, nonparametric mixed-effects smoothing-based methods for ODE models, the smoothly clipped absolute deviation (SCAD)-based variable selection, and stochastic approximation EM (SAEM) approach for mixed-effects ODE model parameter estimation. The key step, the SCAD-based variable selection of the proposed procedure is justified by investigating its asymptotic properties and validated by Monte Carlo simulations. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks.
Collapse
Affiliation(s)
- Tao Lu
- Department of Biostatistics and Computational Biology, School of Medicine and Dentistry, University of Rochester, Rochester, New York 14642
| | | | | | | |
Collapse
|
46
|
Linde J, Hortschansky P, Fazius E, Brakhage AA, Guthke R, Haas H. Regulatory interactions for iron homeostasis in Aspergillus fumigatus inferred by a Systems Biology approach. BMC SYSTEMS BIOLOGY 2012; 6:6. [PMID: 22260221 PMCID: PMC3305660 DOI: 10.1186/1752-0509-6-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 01/19/2012] [Indexed: 01/01/2023]
Abstract
BACKGROUND In System Biology, iterations of wet-lab experiments followed by modelling approaches and model-inspired experiments describe a cyclic workflow. This approach is especially useful for the inference of gene regulatory networks based on high-throughput gene expression data. Experiments can verify or falsify the predicted interactions allowing further refinement of the network model. Aspergillus fumigatus is a major human fungal pathogen. One important virulence trait is its ability to gain sufficient amounts of iron during infection process. Even though some regulatory interactions are known, we are still far from a complete understanding of the way iron homeostasis is regulated. RESULTS In this study, we make use of a reverse engineering strategy to infer a regulatory network controlling iron homeostasis in A. fumigatus. The inference approach utilizes the temporal change in expression data after a change from iron depleted to iron replete conditions. The modelling strategy is based on a set of linear differential equations and offers the possibility to integrate known regulatory interactions as prior knowledge. Moreover, it makes use of important selection criteria, such as sparseness and robustness. By compiling a list of known regulatory interactions for iron homeostasis in A. fumigatus and softly integrating them during network inference, we are able to predict new interactions between transcription factors and target genes. The proposed activation of the gene expression of hapX by the transcriptional regulator SrbA constitutes a so far unknown way of regulating iron homeostasis based on the amount of metabolically available iron. This interaction has been verified by Northern blots in a recent experimental study. In order to improve the reliability of the predicted network, the results of this experimental study have been added to the set of prior knowledge. The final network includes three SrbA target genes. Based on motif searching within the regulatory regions of these genes, we identify potential DNA-binding sites for SrbA. Our wet-lab experiments demonstrate high-affinity binding capacity of SrbA to the promoters of hapX, hemA and srbA. CONCLUSIONS This study presents an application of the typical Systems Biology circle and is based on cooperation between wet-lab experimentalists and in silico modellers. The results underline that using prior knowledge during network inference helps to predict biologically important interactions. Together with the experimental results, we indicate a novel iron homeostasis regulating system sensing the amount of metabolically available iron and identify the binding site of iron-related SrbA target genes. It will be of high interest to study whether these regulatory interactions are also important for close relatives of A. fumigatus and other pathogenic fungi, such as Candida albicans.
Collapse
Affiliation(s)
- Jörg Linde
- Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Jena, Germany.
| | | | | | | | | | | |
Collapse
|
47
|
Hidalgo MMR, Ruiz-Medina MD. Local wavelet-vaguelette-based functional classification of gene expression data. Biom J 2012; 54:75-93. [PMID: 22213074 DOI: 10.1002/bimj.201000135] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 03/11/2011] [Accepted: 09/08/2011] [Indexed: 11/08/2022]
Abstract
This paper focuses on the problem of functional statistical classification of gene expression curves. A local-wavelet-vaguelette-based functional logistic regression approach is presented. This approach is specially suitable for the classification of non-stationary singular (non-differentiable) curves. The performance of the methodology proposed is illustrated by implementing it for the classification of yeast cell-cycle temporal gene expression profiles. A simulation study is also carried out for comparison with other functional classification methodologies.
Collapse
Affiliation(s)
- Margarita M Rincón Hidalgo
- Departament of Statistics and Operational Research, Universidad de Granada, Campus Fuente Nueva s/n, E-18071, Granada, Spain
| | | |
Collapse
|
48
|
Wu X, Li P, Wang N, Gong P, Perkins EJ, Deng Y, Zhang C. State Space Model with hidden variables for reconstruction of gene regulatory networks. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 3:S3. [PMID: 22784622 PMCID: PMC3287571 DOI: 10.1186/1752-0509-5-s3-s3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Background State Space Model (SSM) is a relatively new approach to inferring gene regulatory networks. It requires less computational time than Dynamic Bayesian Networks (DBN). There are two types of variables in the linear SSM, observed variables and hidden variables. SSM uses an iterative method, namely Expectation-Maximization, to infer regulatory relationships from microarray datasets. The hidden variables cannot be directly observed from experiments. How to determine the number of hidden variables has a significant impact on the accuracy of network inference. In this study, we used SSM to infer Gene regulatory networks (GRNs) from synthetic time series datasets, investigated Bayesian Information Criterion (BIC) and Principle Component Analysis (PCA) approaches to determining the number of hidden variables in SSM, and evaluated the performance of SSM in comparison with DBN. Method True GRNs and synthetic gene expression datasets were generated using GeneNetWeaver. Both DBN and linear SSM were used to infer GRNs from the synthetic datasets. The inferred networks were compared with the true networks. Results Our results show that inference precision varied with the number of hidden variables. For some regulatory networks, the inference precision of DBN was higher but SSM performed better in other cases. Although the overall performance of the two approaches is compatible, SSM is much faster and capable of inferring much larger networks than DBN. Conclusion This study provides useful information in handling the hidden variables and improving the inference precision.
Collapse
Affiliation(s)
- Xi Wu
- School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | | | | | | | | | | | | |
Collapse
|
49
|
Chen M, Zaas A, Woods C, Ginsburg GS, Lucas J, Dunson D, Carin L. Predicting Viral Infection From High-Dimensional Biomarker Trajectories. J Am Stat Assoc 2011; 106:1259-1279. [PMID: 23704802 DOI: 10.1198/jasa.2011.ap10611] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
There is often interest in predicting an individual's latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided.
Collapse
Affiliation(s)
- Minhua Chen
- Minhua Chen is Ph.D. Student, Electrical and Computer Engineering Department, Aimee Zaas is Associate Professor, Christopher Woods is Associate Professor, Geoffrey S. Ginsburg is Professor and Director of Genomic Medicine, and Joseph Lucas is Assistant Research Professor, Institute for Genome Sciences and Policy & Department of Medicine, David Dunson is Professor, Department of Statistical Science, and Lawrence Carin is Professor and Department Chair ( ), Electrical and Computer Engineering Department, Duke University, Durham, NC 27708-0291
| | | | | | | | | | | | | |
Collapse
|
50
|
Haavisto O, Hyötyniemi H, Roos C. STATE SPACE MODELING OF YEAST GENE EXPRESSION DYNAMICS. J Bioinform Comput Biol 2011; 5:31-46. [PMID: 17477490 DOI: 10.1142/s0219720007002515] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2006] [Revised: 06/02/2006] [Accepted: 10/11/2006] [Indexed: 11/18/2022]
Abstract
Combined interaction of all the genes forms a central part of the functional system of a cell. Thus, especially the data-based modeling of the gene expression network is currently one of the main challenges in the field of systems biology. However, the problem is an extremely high-dimensional and complex one, so that normal identification methods are usually not applicable specially if aiming at dynamic models. We propose in this paper a subspace identification approach, which is well suited for high-dimensional system modeling and the presented modified version can also handle the underdetermined case with less data samples than variables (genes). The algorithm is applied to two public stress-response data sets collected from yeast Saccharomyces cerevisiae. The obtained dynamic state space model is tested by comparing the simulation results with the measured data. It is shown that the identified model can relatively well describe the dynamics of the general stress-related changes in the expression of the complete yeast genome. However, it seems inevitable that more precise modeling of the dynamics of the whole genome would require experiments especially designed for systemic modeling.
Collapse
Affiliation(s)
- Olli Haavisto
- Control Engineering Laboratory, Helsinki University of Technology, PO Box 5500, FI-02015 TKK, Finland.
| | | | | |
Collapse
|