1
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
2
|
Siahpirani AF, Knaack S, Chasman D, Seirup M, Sridharan R, Stewart R, Thomson J, Roy S. Dynamic regulatory module networks for inference of cell type-specific transcriptional networks. Genome Res 2022; 32:1367-1384. [PMID: 35705328 PMCID: PMC9341506 DOI: 10.1101/gr.276542.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 06/02/2022] [Indexed: 11/25/2022]
Abstract
Changes in transcriptional regulatory networks can significantly alter cell fate. To gain insight into transcriptional dynamics, several studies have profiled bulk multi-omic data sets with parallel transcriptomic and epigenomic measurements at different stages of a developmental process. However, integrating these data to infer cell type-specific regulatory networks is a major challenge. We present dynamic regulatory module networks (DRMNs), a novel approach to infer cell type-specific cis-regulatory networks and their dynamics. DRMN integrates expression, chromatin state, and accessibility to predict cis-regulators of context-specific expression, where context can be cell type, developmental stage, or time point, and uses multitask learning to capture network dynamics across linearly and hierarchically related contexts. We applied DRMNs to study regulatory network dynamics in three developmental processes, each showing different temporal relationships and measuring a different combination of regulatory genomic data sets: cellular reprogramming, liver dedifferentiation, and forward differentiation. DRMN identified known and novel regulators driving cell type-specific expression patterns, showing its broad applicability to examine dynamics of gene regulatory networks from linearly and hierarchically related multi-omic data sets.
Collapse
Affiliation(s)
- Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, Wisconsin 53715, USA
- Department of Computer Sciences, University of Wisconsin, Madison, Wisconsin 53715, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, Wisconsin 53715, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, Wisconsin 53715, USA
| | - Morten Seirup
- Morgridge Institute for Research, Madison, Wisconsin 53715, USA
- Molecular and Environmental Toxicology Program, University of Wisconsin, Madison, Wisconsin 53715, USA
| | - Rupa Sridharan
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, Wisconsin 53715, USA
- Department of Cell and Regenerative Biology, University of Wisconsin, Madison, Wisconsin 53715, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, Wisconsin 53715, USA
| | - James Thomson
- Morgridge Institute for Research, Madison, Wisconsin 53715, USA
- Department of Cell and Regenerative Biology, University of Wisconsin, Madison, Wisconsin 53715, USA
- Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, California 93117, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, Wisconsin 53715, USA
- Department of Computer Sciences, University of Wisconsin, Madison, Wisconsin 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53715, USA
| |
Collapse
|
3
|
Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017; 2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcriptional regulatory networks are at the core of establishing cell type specific gene expression programs. In mammalian systems, such regulatory networks are determined by multiple levels of regulation, including by transcription factors, chromatin environment, and three-dimensional organization of the genome. Recent efforts to measure diverse regulatory genomic datasets across multiple cell types and tissues offer unprecedented opportunities to examine the context-specificity and dynamics of regulatory networks at a greater resolution and scale than before. In parallel, numerous computational approaches to analyze these data have emerged that serve as important tools for understanding mammalian cell type specific regulation. In this article, we review recent computational approaches to predict the expression and sequence-based regulators of a gene's expression level and examine long-range gene regulation. We highlight promising approaches, insights gained, and open challenges that need to be overcome to build a comprehensive picture of cell type specific transcriptional regulatory networks.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715
| | - Sushmita Roy
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715.,Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, WI 53792
| |
Collapse
|
4
|
Abstract
Recent studies across multiple tumour types are starting to reveal a recurrent regulatory architecture in which genomic alterations cluster upstream of functional master regulator (MR) proteins, the aberrant activity of which is both necessary and sufficient to maintain tumour cell state. These proteins form small, hyperconnected and autoregulated modules (termed tumour checkpoints) that are increasingly emerging as optimal biomarkers and therapeutic targets. Crucially, as their activity is mostly dysregulated in a post-translational manner, rather than by mutations in their corresponding genes or by differential expression, the identification of MR proteins by conventional methods is challenging. In this Opinion article, we discuss novel methods for the systematic analysis of MR proteins and of the modular regulatory architecture they implement, including their use as a valuable reductionist framework to study the genetic heterogeneity of human disease and to drive key translational applications.
Collapse
Affiliation(s)
- Andrea Califano
- Department of Systems Biology, Columbia University, and the Departments of Biomedical Informatics, Biochemistry and Molecular Biophysics, JP Sulzberger Columbia Genome Center, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, New York 10032, USA
| | - Mariano J Alvarez
- DarwinHealth, Inc., 3960 Broadway, Suite 540, New York, New York 10032, USA
| |
Collapse
|
5
|
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 2016; 48:838-47. [PMID: 27322546 PMCID: PMC5040167 DOI: 10.1038/ng.3593] [Citation(s) in RCA: 587] [Impact Index Per Article: 65.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 05/23/2016] [Indexed: 01/05/2023]
Abstract
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible. To address this problem we introduce and experimentally validate a new algorithm, virtual inference of protein activity by enriched regulon analysis (VIPER), for accurate assessment of protein activity from gene expression data. We used VIPER to evaluate the functional relevance of genetic alterations in regulatory proteins across all samples in The Cancer Genome Atlas (TCGA). In addition to accurately infer aberrant protein activity induced by established mutations, we also identified a fraction of tumors with aberrant activity of druggable oncoproteins despite a lack of mutations, and vice versa. In vitro assays confirmed that VIPER-inferred protein activity outperformed mutational analysis in predicting sensitivity to targeted inhibitors.
Collapse
Affiliation(s)
- Mariano J. Alvarez
- Department of Systems Biology, Columbia University, New York, USA
- DarwinHealth Inc., New York, USA
| | - Yao Shen
- Department of Systems Biology, Columbia University, New York, USA
- DarwinHealth Inc., New York, USA
| | | | | | - B. Belinda Ding
- Department of Cell Biology, Albert Einstein College of Medicine, New York, USA
| | - B. Hilda Ye
- Department of Cell Biology, Albert Einstein College of Medicine, New York, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, USA
- Department of Biomedical Informatics, Columbia University, New York, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University, New York, USA
- Institute for Cancer Genetics, Columbia University, New York, USA
- Motor Neuron Center, Columbia University, New York, USA
- Columbia Initiative in Stem Cells, Columbia University, New York, USA
| |
Collapse
|
6
|
Kakei Y, Ogo Y, Itai RN, Kobayashi T, Yamakawa T, Nakanishi H, Nishizawa NK. Development of a novel prediction method of cis-elements to hypothesize collaborative functions of cis-element pairs in iron-deficient rice. RICE (NEW YORK, N.Y.) 2013; 6:22. [PMID: 24279975 PMCID: PMC4883709 DOI: 10.1186/1939-8433-6-22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 09/13/2013] [Indexed: 05/20/2023]
Abstract
BACKGROUND Cis-acting elements are essential genomic sequences that control gene expression. In higher eukaryotes, a series of cis-elements function cooperatively. However, further studies are required to examine the co-regulation of multiple cis-elements on a promoter. The aim of this study was to propose a model of cis-element networks that cooperatively regulate gene expression in rice under iron (Fe) deficiency. RESULTS We developed a novel clustering-free method, microarray-associated motif analyzer (MAMA), to predict novel cis-acting elements based on weighted sequence similarities and gene expression profiles in microarray analyses. Simulation of gene expression was performed using a support vector machine and based on the presence of predicted motifs and motif pairs. The accuracy of simulated gene expression was used to evaluate the quality of prediction and to optimize the parameters used in this method. Based on sequences of Oryza sativa genes upregulated by Fe deficiency, MAMA returned experimentally identified cis-elements responsible for Fe deficiency in O. sativa. When this method was applied to O. sativa subjected to zinc deficiency and Arabidopsis thaliana subjected to salt stress, several novel candidate cis-acting elements that overlap with known cis-acting elements, such as ZDRE, ABRE, and DRE, were identified. After optimization, MAMA accurately simulated more than 87% of gene expression. Predicted motifs strongly co-localized in the upstream regions of regulated genes and sequences around transcription start sites. Furthermore, in many cases, the separation (in bp) between co-localized motifs was conserved, suggesting that predicted motifs and the separation between them were important in the co-regulation of gene expression. CONCLUSIONS Our results are suggestive of a typical sequence model for Fe deficiency-responsive promoters and some strong candidate cis-elements that function cooperatively with known cis-elements.
Collapse
Affiliation(s)
- Yusuke Kakei
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Plant Biotechnology Division, Yokohama City University, Kihara Institute for Biological Research Maiokacho, 641-12, Totsuka, Yokohama, Kanagawa 244-0813 Japan
| | - Yuko Ogo
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Functional Transgenic Crops Research Unit, Genetically Modified Organism Research Center National Institute of Agrobiological Sciences, Kannondai 2-1-2, 305-8602 Tsukuba, Ibaraki Japan
| | - Reiko N Itai
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Takanori Kobayashi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| | - Takashi Yamakawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Hiromi Nakanishi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Naoko K Nishizawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| |
Collapse
|
7
|
Sahu SN, Lewis J, Patel I, Bozdag S, Lee JH, LeClerc JE, Cinar HN. Genomic analysis of immune response against Vibrio cholerae hemolysin in Caenorhabditis elegans. PLoS One 2012; 7:e38200. [PMID: 22675448 PMCID: PMC3364981 DOI: 10.1371/journal.pone.0038200] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Accepted: 05/04/2012] [Indexed: 11/18/2022] Open
Abstract
Vibrio cholerae cytolysin (VCC) is among the accessory V. cholerae virulence factors that may contribute to disease pathogenesis in humans. VCC, encoded by hlyA gene, belongs to the most common class of bacterial toxins, known as pore-forming toxins (PFTs). V. cholerae infects and kills Caenorhabditis elegans via cholerae toxin independent manner. VCC is required for the lethality, growth retardation and intestinal cell vacuolation during the infection. However, little is known about the host gene expression responses against VCC. To address this question we performed a microarray study in C. elegans exposed to V. cholerae strains with intact and deleted hlyA genes. Many of the VCC regulated genes identified, including C-type lectins, Prion-like (glutamine [Q]/asparagine [N]-rich)-domain containing genes, genes regulated by insulin/IGF-1-mediated signaling (IIS) pathway, were previously reported as mediators of innate immune response against other bacteria in C. elegans. Protective function of the subset of the genes up-regulated by VCC was confirmed using RNAi. By means of a machine learning algorithm called FastMEDUSA, we identified several putative VCC induced immune regulatory transcriptional factors and transcription factor binding motifs. Our results suggest that VCC is a major virulence factor, which induces a wide variety of immune response- related genes during V. cholerae infection in C. elegans.
Collapse
Affiliation(s)
- Surasri N. Sahu
- Division of Virulence Assessment, Food and Drug Administration, Laurel, Maryland, United States of America
- Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee, United States of America
| | - Jada Lewis
- Division of Molecular Biology, Food and Drug Administration, Laurel, Maryland, United States of America
| | - Isha Patel
- Division of Molecular Biology, Food and Drug Administration, Laurel, Maryland, United States of America
| | - Serdar Bozdag
- Neuro-Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jeong H. Lee
- Division of Virulence Assessment, Food and Drug Administration, Laurel, Maryland, United States of America
- Kyungpook National University (KNU), Daegu, South Korea
| | - Joseph E. LeClerc
- Division of Molecular Biology, Food and Drug Administration, Laurel, Maryland, United States of America
| | - Hediye Nese Cinar
- Division of Virulence Assessment, Food and Drug Administration, Laurel, Maryland, United States of America
- * E-mail:
| |
Collapse
|
8
|
Tretyakov K, Laur S, Vilo J. G = MAT: linking transcription factor expression and DNA binding data. PLoS One 2011; 6:e14559. [PMID: 21297945 PMCID: PMC3031503 DOI: 10.1371/journal.pone.0014559] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 12/03/2010] [Indexed: 12/02/2022] Open
Abstract
Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.
Collapse
Affiliation(s)
| | - Sven Laur
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Quretec, Tartu, Estonia
- * E-mail:
| |
Collapse
|
9
|
Bozdag S, Li A, Wuchty S, Fine HA. FastMEDUSA: a parallelized tool to infer gene regulatory networks. ACTA ACUST UNITED AC 2010; 26:1792-3. [PMID: 20513661 DOI: 10.1093/bioinformatics/btq275] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION In order to construct gene regulatory networks of higher organisms from gene expression and promoter sequence data efficiently, we developed FastMEDUSA. In this parallelized version of the regulatory network-modeling tool MEDUSA, expression and sequence data are shared among a user-defined number of processors on a single multi-core machine or cluster. Our results show that FastMEDUSA allows a more efficient utilization of computational resources. While the determination of a regulatory network of brain tumor in Homo sapiens takes 12 days with MEDUSA, FastMEDUSA obtained the same results in 6 h by utilizing 100 processors. AVAILABILITY Source code and documentation of FastMEDUSA are available at https://wiki.nci.nih.gov/display/NOBbioinf/FastMEDUSA
Collapse
Affiliation(s)
- Serdar Bozdag
- Neuro-Oncology Branch, National Cancer Institute, National Institute of Neurological Diseases and Stroke, Bethesda, MD 20892, USA
| | | | | | | |
Collapse
|
10
|
Stolovitzky G, Monroe D, Califano A. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Ann N Y Acad Sci 2007; 1115:1-22. [PMID: 17925349 DOI: 10.1196/annals.1407.021] [Citation(s) in RCA: 227] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The biotechnological advances of the last decade have confronted us with an explosion of genetics, genomics, transcriptomics, proteomics, and metabolomics data. These data need to be organized and structured before they may provide a coherent biological picture. To accomplish this formidable task, the availability of an accurate map of the physical interactions in the cell that are responsible for cellular behavior and function would be exceedingly helpful, as these data are ultimately the result of such molecular interactions. However, all we have at this time is, at best, a fragmentary and only partially correct representation of the interactions between genes, their byproducts, and other cellular entities. If we want to succeed in our quest for understanding the biological whole as more than the sum of the individual parts, we need to build more comprehensive and cell-context-specific maps of the biological interaction networks. DREAM, the Dialogue on Reverse Engineering Assessment and Methods, is fostering a concerted effort by computational and experimental biologists to understand the limitations and to enhance the strengths of the efforts to reverse engineer cellular networks from high-throughput data. In this chapter we will discuss the salient arguments of the first DREAM conference. We will highlight both the state of the art in the field of reverse engineering as well as some of its challenges and opportunities.
Collapse
Affiliation(s)
- Gustavo Stolovitzky
- IBM Computational Biology Center, P.O. Box 218, Yorktown Heights, NY 10598, USA.
| | | | | |
Collapse
|