1
|
Dibaeinia P, Ojha A, Sinha S. Interpretable AI for inference of causal molecular relationships from omics data. SCIENCE ADVANCES 2025; 11:eadk0837. [PMID: 39951525 PMCID: PMC11827637 DOI: 10.1126/sciadv.adk0837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/14/2025] [Indexed: 02/16/2025]
Abstract
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Abhishek Ojha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
2
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
3
|
Wang Y, He S. Inference on autoregulation in gene expression with variance-to-mean ratio. J Math Biol 2023; 86:87. [PMID: 37131095 PMCID: PMC10154285 DOI: 10.1007/s00285-023-01924-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/14/2023] [Accepted: 04/18/2023] [Indexed: 05/04/2023]
Abstract
Some genes can promote or repress their own expressions, which is called autoregulation. Although gene regulation is a central topic in biology, autoregulation is much less studied. In general, it is extremely difficult to determine the existence of autoregulation with direct biochemical approaches. Nevertheless, some papers have observed that certain types of autoregulations are linked to noise levels in gene expression. We generalize these results by two propositions on discrete-state continuous-time Markov chains. These two propositions form a simple but robust method to infer the existence of autoregulation from gene expression data. This method only needs to compare the mean and variance of the gene expression level. Compared to other methods for inferring autoregulation, our method only requires non-interventional one-time data, and does not need to estimate parameters. Besides, our method has few restrictions on the model. We apply this method to four groups of experimental data and find some genes that might have autoregulation. Some inferred autoregulations have been verified by experiments or other theoretical works.
Collapse
Affiliation(s)
- Yue Wang
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA.
- Institut des Hautes Études Scientifiques (IHÉS), Bures-sur-Yvette, 91440, Essonne, France.
| | - Siqi He
- Simons Center for Geometry and Physics, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
4
|
Lo LY, Wong ML, Lee KH, Leung KS. Time Delayed Causal Gene Regulatory Network Inference with Hidden Common Causes. PLoS One 2015; 10:e0138596. [PMID: 26394325 PMCID: PMC4578777 DOI: 10.1371/journal.pone.0138596] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 09/01/2015] [Indexed: 01/07/2023] Open
Abstract
Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.
Collapse
Affiliation(s)
- Leung-Yau Lo
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
- * E-mail:
| | - Man-Leung Wong
- Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong
| | - Kin-Hong Lee
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
| |
Collapse
|
5
|
Lo LY, Leung KS, Lee KH. Inferring Time-Delayed Causal Gene Network Using Time-Series Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1169-1182. [PMID: 26451828 DOI: 10.1109/tcbb.2015.2394442] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Inferring gene regulatory network (GRN) from the microarray expression data is an important problem in Bioinformatics, because knowing the GRN is an essential first step in understanding the inner workings of the cell and the related diseases. Time delays exist in the regulatory effects from one gene to another due to the time needed for transcription, translation, and to accumulate a sufficient number of needed proteins. Also, it is known that the delays are important for oscillatory phenomenon. Therefore, it is crucial to develop a causal gene network model, preferably as a function of time. In this paper, we propose an algorithm CLINDE to infer causal directed links in GRN with time delays and regulatory effects in the links from time-series microarray gene expression data. It is one of the most comprehensive in terms of features compared to the state-of-the-art discrete gene network models. We have tested CLINDE on synthetic data, the in vivo IRMA (On and Off) datasets and the [1] yeast expression data validated using KEGG pathways. Results show that CLINDE can effectively recover the links, the time delays and the regulatory effects in the synthetic data, and outperforms other algorithms in the IRMA in vivo datasets.
Collapse
|
6
|
Layered signaling regulatory networks analysis of gene expression involved in malignant tumorigenesis of non-resolving ulcerative colitis via integration of cross-study microarray profiles. PLoS One 2013; 8:e67142. [PMID: 23825635 PMCID: PMC3692446 DOI: 10.1371/journal.pone.0067142] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 05/15/2013] [Indexed: 01/08/2023] Open
Abstract
Background Ulcerative colitis (UC) was the most frequently diagnosed inflammatory bowel disease (IBD) and closely linked to colorectal carcinogenesis. By far, the underlying mechanisms associated with the disease are still unclear. With the increasing accumulation of microarray gene expression profiles, it is profitable to gain a systematic perspective based on gene regulatory networks to better elucidate the roles of genes associated with disorders. However, a major challenge for microarray data analysis is the integration of multiple-studies generated by different groups. Methodology/Principal Findings In this study, firstly, we modeled a signaling regulatory network associated with colorectal cancer (CRC) initiation via integration of cross-study microarray expression data sets using Empirical Bayes (EB) algorithm. Secondly, a manually curated human cancer signaling map was established via comprehensive retrieval of the publicly available repositories. Finally, the co-differently-expressed genes were manually curated to portray the layered signaling regulatory networks. Results Overall, the remodeled signaling regulatory networks were separated into four major layers including extracellular, membrane, cytoplasm and nucleus, which led to the identification of five core biological processes and four signaling pathways associated with colorectal carcinogenesis. As a result, our biological interpretation highlighted the importance of EGF/EGFR signaling pathway, EPO signaling pathway, T cell signal transduction and members of the BCR signaling pathway, which were responsible for the malignant transition of CRC from the benign UC to the aggressive one. Conclusions The present study illustrated a standardized normalization approach for cross-study microarray expression data sets. Our model for signaling networks construction was based on the experimentally-supported interaction and microarray co-expression modeling. Pathway-based signaling regulatory networks analysis sketched a directive insight into colorectal carcinogenesis, which was of significant importance to monitor disease progression and improve therapeutic interventions.
Collapse
|
7
|
Hierarchical modularity in ERα transcriptional network is associated with distinct functions and implicates clinical outcomes. Sci Rep 2012; 2:875. [PMID: 23166858 PMCID: PMC3500769 DOI: 10.1038/srep00875] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 10/30/2012] [Indexed: 12/18/2022] Open
Abstract
Recent genome-wide profiling reveals highly complex regulation networks among ERα and its targets. We integrated estrogen (E2)-stimulated time-series ERα ChIP-seq and gene expression data to identify the ERα-centered transcription factor (TF) hubs and their target genes, and inferred the time-variant hierarchical network structures using a Bayesian multivariate modeling approach. With its recurrent motif patterns, we determined three embedded regulatory modules from the ERα core transcriptional network. The GO analyses revealed the distinct biological function associated with each of three embedded modules. The survival analysis showed the genes in each module were able to render a significant survival correlation in breast cancer patient cohorts. In summary, our Bayesian statistical modeling and modularity analysis not only reveals the dynamic properties of the ERα-centered regulatory network and associated distinct biological functions, but also provides a reliable and effective genomic analytical approach for the analysis of dynamic regulatory network for any given TF.
Collapse
|
8
|
de Matos Simoes R, Tripathi S, Emmert-Streib F. Organizational structure and the periphery of the gene regulatory network in B-cell lymphoma. BMC SYSTEMS BIOLOGY 2012; 6:38. [PMID: 22583750 PMCID: PMC3476434 DOI: 10.1186/1752-0509-6-38] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 05/14/2012] [Indexed: 12/22/2022]
Abstract
Background The physical periphery of a biological cell is mainly described by signaling pathways which are triggered by transmembrane proteins and receptors that are sentinels to control the whole gene regulatory network of a cell. However, our current knowledge about the gene regulatory mechanisms that are governed by extracellular signals is severely limited. Results The purpose of this paper is three fold. First, we infer a gene regulatory network from a large-scale B-cell lymphoma expression data set using the C3NET algorithm. Second, we provide a functional and structural analysis of the largest connected component of this network, revealing that this network component corresponds to the peripheral region of a cell. Third, we analyze the hierarchical organization of network components of the whole inferred B-cell gene regulatory network by introducing a new approach which exploits the variability within the data as well as the inferential characteristics of C3NET. As a result, we find a functional bisection of the network corresponding to different cellular components. Conclusions Overall, our study allows to highlight the peripheral gene regulatory network of B-cells and shows that it is centered around hub transmembrane proteins located at the physical periphery of the cell. In addition, we identify a variety of novel pathological transmembrane proteins such as ion channel complexes and signaling receptors in B-cell lymphoma.
Collapse
Affiliation(s)
- Ricardo de Matos Simoes
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, UK
| | | | | |
Collapse
|
9
|
de Matos Simoes R, Emmert-Streib F. Bagging statistical network inference from large-scale gene expression data. PLoS One 2012; 7:e33624. [PMID: 22479422 PMCID: PMC3316596 DOI: 10.1371/journal.pone.0033624] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 02/14/2012] [Indexed: 11/24/2022] Open
Abstract
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-scale gene expression data. BC3NET is an ensemble method that is based on bagging the C3NET algorithm, which means it corresponds to a Bayesian approach with noninformative priors. In this study we demonstrate for a variety of simulated and biological gene expression data from S. cerevisiae that BC3NET is an important enhancement over other inference methods that is capable of capturing biochemical interactions from transcription regulation and protein-protein interaction sensibly. An implementation of BC3NET is freely available as an R package from the CRAN repository.
Collapse
Affiliation(s)
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Emmert-Streib F, Glazko GV, Altay G, de Matos Simoes R. Statistical inference and reverse engineering of gene regulatory networks from observational expression data. Front Genet 2012; 3:8. [PMID: 22408642 PMCID: PMC3271232 DOI: 10.3389/fgene.2012.00008] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 01/10/2012] [Indexed: 01/04/2023] Open
Abstract
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, School of Medicine, Dentistry and Biomedical Sciences, Center for Cancer Research and Cell Biology, Queen's University Belfast Belfast, UK
| | | | | | | |
Collapse
|
11
|
Parametric construction of episode networks from pseudoperiodic time series based on mutual information. PLoS One 2012; 6:e27733. [PMID: 22216086 PMCID: PMC3245224 DOI: 10.1371/journal.pone.0027733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 10/24/2011] [Indexed: 12/02/2022] Open
Abstract
Recently, the construction of networks from time series data has gained widespread interest. In this paper, we develop this area further by introducing a network construction procedure for pseudoperiodic time series. We call such networks episode networks, in which an episode corresponds to a temporal interval of a time series, and which defines a node in the network. Our model includes a number of features which distinguish it from current methods. First, the proposed construction procedure is a parametric model which allows it to adapt to the characteristics of the data; the length of an episode being the parameter. As a direct consequence, networks of minimal size containing the maximal information about the time series can be obtained. In this paper, we provide an algorithm to determine the optimal value of this parameter. Second, we employ estimates of mutual information values to define the connectivity structure among the nodes in the network to exploit efficiently the nonlinearities in the time series. Finally, we apply our method to data from electroencephalogram (EEG) experiments and demonstrate that the constructed episode networks capture discriminative information from the underlying time series that may be useful for diagnostic purposes.
Collapse
|
12
|
de Matos Simoes R, Emmert-Streib F. Influence of statistical estimators of mutual information and data heterogeneity on the inference of gene regulatory networks. PLoS One 2011; 6:e29279. [PMID: 22242113 PMCID: PMC3248437 DOI: 10.1371/journal.pone.0029279] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 11/23/2011] [Indexed: 11/19/2022] Open
Abstract
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.
Collapse
Affiliation(s)
- Ricardo de Matos Simoes
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
| | - Frank Emmert-Streib
- Computational Biology and Machine Learning Lab, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Altay G, Emmert-Streib F. Structural influence of gene networks on their inference: analysis of C3NET. Biol Direct 2011; 6:31. [PMID: 21696592 PMCID: PMC3136421 DOI: 10.1186/1745-6150-6-31] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Accepted: 06/22/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The availability of large-scale high-throughput data possesses considerable challenges toward their functional analysis. For this reason gene network inference methods gained considerable interest. However, our current knowledge, especially about the influence of the structure of a gene network on its inference, is limited. RESULTS In this paper we present a comprehensive investigation of the structural influence of gene networks on the inferential characteristics of C3NET - a recently introduced gene network inference algorithm. We employ local as well as global performance metrics in combination with an ensemble approach. The results from our numerical study for various biological and synthetic network structures and simulation conditions, also comparing C3NET with other inference algorithms, lead a multitude of theoretical and practical insights into the working behavior of C3NET. In addition, in order to facilitate the practical usage of C3NET we provide an user-friendly R package, called c3net, and describe its functionality. It is available from https://r-forge.r-project.org/projects/c3net and from the CRAN package repository. CONCLUSIONS The availability of gene network inference algorithms with known inferential properties opens a new era of large-scale screening experiments that could be equally beneficial for basic biological and biomedical research with auspicious prospects. The availability of our easy to use software package c3net may contribute to the popularization of such methods.
Collapse
Affiliation(s)
- Gökmen Altay
- School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, BT9 7BL, UK
| | | |
Collapse
|
14
|
Shen C, Huang Y, Liu Y, Wang G, Zhao Y, Wang Z, Teng M, Wang Y, Flockhart DA, Skaar TC, Yan P, Nephew KP, Huang TH, Li L. A modulated empirical Bayes model for identifying topological and temporal estrogen receptor α regulatory networks in breast cancer. BMC SYSTEMS BIOLOGY 2011; 5:67. [PMID: 21554733 PMCID: PMC3117732 DOI: 10.1186/1752-0509-5-67] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 05/09/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood. RESULTS We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylations. CONCLUSIONS We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
Collapse
Affiliation(s)
- Changyu Shen
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article.
Collapse
Affiliation(s)
- Yue Zhang
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, 99 Brookline Avenue, Boston, MA 02215, USA
| |
Collapse
|
16
|
Wang YC, Chen BS. Integrated cellular network of transcription regulations and protein-protein interactions. BMC SYSTEMS BIOLOGY 2010; 4:20. [PMID: 20211003 PMCID: PMC2848195 DOI: 10.1186/1752-0509-4-20] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 03/08/2010] [Indexed: 01/13/2023]
Abstract
Background With the accumulation of increasing omics data, a key goal of systems biology is to construct networks at different cellular levels to investigate cellular machinery of the cell. However, there is currently no satisfactory method to construct an integrated cellular network that combines the gene regulatory network and the signaling regulatory pathway. Results In this study, we integrated different kinds of omics data and developed a systematic method to construct the integrated cellular network based on coupling dynamic models and statistical assessments. The proposed method was applied to S. cerevisiae stress responses, elucidating the stress response mechanism of the yeast. From the resulting integrated cellular network under hyperosmotic stress, the highly connected hubs which are functionally relevant to the stress response were identified. Beyond hyperosmotic stress, the integrated network under heat shock and oxidative stress were also constructed and the crosstalks of these networks were analyzed, specifying the significance of some transcription factors to serve as the decision-making devices at the center of the bow-tie structure and the crucial role for rapid adaptation scheme to respond to stress. In addition, the predictive power of the proposed method was also demonstrated. Conclusions We successfully construct the integrated cellular network which is validated by literature evidences. The integration of transcription regulations and protein-protein interactions gives more insight into the actual biological network and is more predictive than those without integration. The method is shown to be powerful and flexible and can be used under different conditions and for different species. The coupling dynamic models of the whole integrated cellular network are very useful for theoretical analyses and for further experiments in the fields of network biology and synthetic biology.
Collapse
Affiliation(s)
- Yu-Chao Wang
- Laboratory of Control and Systems Biology, Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | |
Collapse
|
17
|
Emmert-Streib F, Dehmer M. Hierarchical coordination of periodic genes in the cell cycle of Saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2009; 3:76. [PMID: 19619302 PMCID: PMC2721836 DOI: 10.1186/1752-0509-3-76] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2009] [Accepted: 07/20/2009] [Indexed: 11/25/2022]
Abstract
Background Gene networks are a representation of molecular interactions among genes or products thereof and, hence, are forming causal networks. Despite intense studies during the last years most investigations focus so far on inferential methods to reconstruct gene networks from experimental data or on their structural properties, e.g., degree distributions. Their structural analysis to gain functional insights into organizational principles of, e.g., pathways remains so far under appreciated. Results In the present paper we analyze cell cycle regulated genes in S. cerevisiae. Our analysis is based on the transcriptional regulatory network, representing causal interactions and not just associations or correlations between genes, and a list of known periodic genes. No further data are used. Partitioning the transcriptional regulatory network according to a graph theoretical property leads to a hierarchy in the network and, hence, in the information flow allowing to identify two groups of periodic genes. This reveals a novel conceptual interpretation of the working mechanism of the cell cycle and the genes regulated by this pathway. Conclusion Aside from the obtained results for the cell cycle of yeast our approach could be exemplary for the analysis of general pathways by exploiting the rich causal structure of inferred and/or curated gene networks including protein or signaling networks.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Center for Cancer Research and Cell Biology, Queen's University Belfast, UK.
| | | |
Collapse
|
18
|
Janky R, Helden JV, Babu MM. Investigating transcriptional regulation: From analysis of complex networks to discovery of cis-regulatory elements. Methods 2009; 48:277-86. [DOI: 10.1016/j.ymeth.2009.04.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Revised: 04/17/2009] [Accepted: 04/18/2009] [Indexed: 10/20/2022] Open
|
19
|
The use of logic relationships to model colon cancer gene expression networks with mRNA microarray data. J Biomed Inform 2008; 41:530-43. [DOI: 10.1016/j.jbi.2007.11.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Revised: 11/04/2007] [Accepted: 11/24/2007] [Indexed: 11/17/2022]
|
20
|
Veber P, Guziolowski C, Le Borgne M, Radulescu O, Siegel A. Inferring the role of transcription factors in regulatory networks. BMC Bioinformatics 2008; 9:228. [PMID: 18460200 PMCID: PMC2422845 DOI: 10.1186/1471-2105-9-228] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2007] [Accepted: 05/06/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays. RESULTS We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of E. coli extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to S. cerevisiae transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions. CONCLUSION Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.
Collapse
Affiliation(s)
- Philippe Veber
- Centre INRIA Rennes Bretagne Atlantique, IRISA, Rennes, France.
| | | | | | | | | |
Collapse
|
21
|
Chen G, Larsen P, Almasri E, Dai Y. Rank-based edge reconstruction for scale-free genetic regulatory networks. BMC Bioinformatics 2008; 9:75. [PMID: 18237422 PMCID: PMC2275249 DOI: 10.1186/1471-2105-9-75] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 01/31/2008] [Indexed: 11/12/2022] Open
Abstract
Background The reconstruction of genetic regulatory networks from microarray gene expression data has been a challenging task in bioinformatics. Various approaches to this problem have been proposed, however, they do not take into account the topological characteristics of the targeted networks while reconstructing them. Results In this study, an algorithm that explores the scale-free topology of networks was proposed based on the modification of a rank-based algorithm for network reconstruction. The new algorithm was evaluated with the use of both simulated and microarray gene expression data. The results demonstrated that the proposed algorithm outperforms the original rank-based algorithm. In addition, in comparison with the Bayesian Network approach, the results show that the proposed algorithm gives much better recovery of the underlying network when sample size is much smaller relative to the number of genes. Conclusion The proposed algorithm is expected to be useful in the reconstruction of biological networks whose degree distributions follow the scale-free topology.
Collapse
Affiliation(s)
- Guanrao Chen
- Department of Computer Science (MC152), University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA.
| | | | | | | |
Collapse
|
22
|
Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J Biomed Inform 2007; 40:787-802. [PMID: 17683991 DOI: 10.1016/j.jbi.2007.06.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2006] [Revised: 04/20/2007] [Accepted: 06/06/2007] [Indexed: 11/24/2022]
Abstract
The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. In this review, we report on the plethora of gene expression data mining techniques and focus on their evolution toward knowledge-based data analysis approaches. In particular, we discuss recent developments in gene expression-based analysis methods used in association and classification studies, phenotyping and reverse engineering of gene networks.
Collapse
Affiliation(s)
- Riccardo Bellazzi
- Dipartimento di Informatica e Sistemistica, Università di Pavia, via Ferrata 1, I-27100 Pavia, Italy
| | | |
Collapse
|
23
|
Chawade A, Bräutigam M, Lindlöf A, Olsson O, Olsson B. Putative cold acclimation pathways in Arabidopsis thaliana identified by a combined analysis of mRNA co-expression patterns, promoter motifs and transcription factors. BMC Genomics 2007; 8:304. [PMID: 17764576 PMCID: PMC2001198 DOI: 10.1186/1471-2164-8-304] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2007] [Accepted: 09/02/2007] [Indexed: 01/08/2023] Open
Abstract
Background With the advent of microarray technology, it has become feasible to identify virtually all genes in an organism that are induced by developmental or environmental changes. However, relying solely on gene expression data may be of limited value if the aim is to infer the underlying genetic networks. Development of computational methods to combine microarray data with other information sources is therefore necessary. Here we describe one such method. Results By means of our method, previously published Arabidopsis microarray data from cold acclimated plants at six different time points, promoter motif sequence data extracted from ~24,000 Arabidopsis promoters and known transcription factor binding sites were combined to construct a putative genetic regulatory interaction network. The inferred network includes both previously characterised and hitherto un-described regulatory interactions between transcription factor (TF) genes and genes that encode other TFs or other proteins. Part of the obtained transcription factor regulatory network is presented here. More detailed information is available in the additional files. Conclusion The rule-based method described here can be used to infer genetic networks by combining data from microarrays, promoter sequences and known promoter binding sites. This method should in principle be applicable to any biological system. We tested the method on the cold acclimation process in Arabidopsis and could identify a more complex putative genetic regulatory network than previously described. However, it should be noted that information on specific binding sites for individual TFs were in most cases not available. Thus, gene targets for the entire TF gene families were predicted. In addition, the networks were built solely by a bioinformatics approach and experimental verifications will be necessary for their final validation. On the other hand, since our method highlights putative novel interactions, more directed experiments could now be performed.
Collapse
Affiliation(s)
- Aakash Chawade
- Department of Cell and Molecular Biology, Göteborg University, Box 462, 403 20 Göteborg, Sweden
- School of Humanities and Informatics, University of Skövde, Box 408, 541 28 Skövde, Sweden
| | - Marcus Bräutigam
- Department of Cell and Molecular Biology, Göteborg University, Box 462, 403 20 Göteborg, Sweden
| | - Angelica Lindlöf
- School of Humanities and Informatics, University of Skövde, Box 408, 541 28 Skövde, Sweden
| | - Olof Olsson
- Department of Cell and Molecular Biology, Göteborg University, Box 462, 403 20 Göteborg, Sweden
| | - Björn Olsson
- School of Humanities and Informatics, University of Skövde, Box 408, 541 28 Skövde, Sweden
| |
Collapse
|
24
|
Abstract
Network analysis of living systems is an essential component of contemporary systems biology. It is targeted at assemblance of mutual dependences between interacting systems elements into an integrated view of whole-system functioning. In the following chapter we describe the existing classification of what is referred to as biological networks and show how complex interdependencies in biological systems can be represented in a simpler form of network graphs. Further structural analysis of the assembled biological network allows getting knowledge on the functioning of the entire biological system. Such aspects of network structure as connectivity of network elements and connectivity degree distribution, degree of node centralities, clustering coefficient, network diameter and average path length are touched. Networks are analyzed as static entities, or the dynamical behavior of underlying biological systems may be considered. The description of mathematical and computational approaches for determining the dynamics of regulatory networks is provided. Causality as another characteristic feature of a dynamically functioning biosystem can be also accessed in the reconstruction of biological networks; we give the examples of how this integration is accomplished. Further questions about network dynamics and evolution can be approached by means of network comparison. Network analysis gives rise to new global hypotheses on systems functionality and reductionist findings of novel molecular interactions, based on the reliability of network reconstructions, which has to be tested in the subsequent experiments. We provide a collection of useful links to be used for the analysis of biological networks.
Collapse
Affiliation(s)
- Victoria J Nikiforova
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | | |
Collapse
|
25
|
Ranjan S, Seshadri J, Vindal V, Yellaboina S, Ranjan A. iCR: a web tool to identify conserved targets of a regulatory protein across the multiple related prokaryotic species. Nucleic Acids Res 2006; 34:W584-7. [PMID: 16845075 PMCID: PMC1538900 DOI: 10.1093/nar/gkl202] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Gene regulatory circuits are often commonly shared between two closely related organisms. Our web tool iCR (identify Conserved target of a Regulon) makes use of this fact and identify conserved targets of a regulatory protein. iCR is a special refined extension of our previous tool PredictRegulon- that predicts genome wide, the potential binding sites and target operons of a regulatory protein in a single user selected genome. Like PredictRegulon, the iCR accepts known binding sites of a regulatory protein as ungapped multiple sequence alignment and provides the potential binding sites. However important differences are that the user can select more than one genome at a time and the output reports the genes that are common in two or more species. In order to achieve this, iCR makes use of Cluster of Orthologous Group (COG) indices for the genes. This tool analyses the upstream region of all user-selected prokaryote genome and gives the output based on conservation target orthologs. iCR also reports the Functional class codes based on COG classification for the encoded proteins of downstream genes which helps user understand the nature of the co-regulated genes at the result page itself. iCR is freely accessible at .
Collapse
Affiliation(s)
| | | | | | | | - Akash Ranjan
- To whom correspondence should be addressed. Tel: +91 40 27171442; Fax: +91 40 27171442;
| |
Collapse
|
26
|
Li W, Wang M, Irigoyen P, Gregersen PK. Inferring causal relationships among intermediate phenotypes and biomarkers: a case study of rheumatoid arthritis. Bioinformatics 2006; 22:1503-7. [PMID: 16551663 DOI: 10.1093/bioinformatics/btl100] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Genetic association analysis is based on statistical correlations which do not assign any cause-to-effect arrows between the two correlated variables. Normally, such assignment of cause and effect label is not necessary in genetic analysis since genes are always the cause and phenotypes are always the effect. However, among intermediate phenotypes and biomarkers, assigning cause and effect becomes meaningful, and causal inference can be useful. RESULTS We show that causal inference is possible by an example in a study of rheumatoid arthritis. With the help of genotypic information, the shared epitope, the causal relationship between two biomarkers related to the disease, anti-cyclic citrullinated peptide (anti-CCP) and rheumatoid factor (RF) has been established. We emphasize the fact that third variable must be a genotype to be able to resolve potential ambiguities in causal inference. Two non-trivial conclusions have been reached by the causal inference: (1) anti-CCP is a cause of RF and (2) it is unlikely that a third confounding factor contributes to both anti-CCP and RF.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System 350 Community Drive, Manhasset, NY, USA.
| | | | | | | |
Collapse
|