601
|
Reverter A, Barris W, McWilliam S, Byrne KA, Wang YH, Tan SH, Hudson N, Dalrymple BP. Validation of alternative methods of data normalization in gene co-expression studies. Bioinformatics 2004; 21:1112-20. [PMID: 15564293 DOI: 10.1093/bioinformatics/bti124] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Clusters of genes encoding proteins with related functions, or in the same regulatory network, often exhibit expression patterns that are correlated over a large number of conditions. Protein associations and gene regulatory networks can be modelled from expression data. We address the question of which of several normalization methods is optimal prior to computing the correlation of the expression profiles between every pair of genes. RESULTS We use gene expression data from five experiments with a total of 78 hybridizations and 23 diverse conditions. Nine methods of data normalization are explored based on all possible combinations of normalization techniques according to between and within gene and experiment variation. We compare the resulting empirical distribution of gene x gene correlations with the expectations and apply cross-validation to test the performance of each method in predicting accurate functional annotation. We conclude that normalization methods based on mixed-model equations are optimal.
Collapse
Affiliation(s)
- Antonio Reverter
- Bioinformatics Group, CSIRO Livestock Industries, Queensland Bioscience Precinct, St Lucia, QLD 4067, Australia.
| | | | | | | | | | | | | | | |
Collapse
|
602
|
Abstract
In this report, we propose the use of structural equations as a tool for identifying and modeling genetic networks and genetic algorithms for searching the most likely genetic networks that best fit the data. After genetic networks are identified, it is fundamental to identify those networks influencing cell phenotypes. To accomplish this task we extend the concept of differential expression of the genes, widely used in gene expression data analysis, to genetic networks. We propose a definition for the differential expression of a genetic network and use the generalized T2 statistic to measure the ability of genetic networks to distinguish different phenotypes. However, describing the differential expression of genetic networks is not enough for understanding biological systems because differences in the expression of genetic networks do not directly reflect regulatory strength between gene activities. Therefore, in this report we also introduce the concept of differentially regulated genetic networks, which has the potential to assess changes of gene regulation in response to perturbation in the environment and may provide new insights into the mechanism of diseases and biological processes. We propose five novel statistics to measure the differences in regulation of genetic networks. To illustrate the concepts and methods for reconstruction of genetic networks and identification of association of genetic networks with function, we applied the proposed models and algorithms to three data sets.
Collapse
Affiliation(s)
- Momiao Xiong
- Human Genetics Center, University of Texas, Houston Health Science Center, TX 77030, USA.
| | | | | |
Collapse
|
603
|
Yamanaka T, Toyoshiba H, Sone H, Parham FM, Portier CJ. The TAO-Gen algorithm for identifying gene interaction networks with application to SOS repair in E. coli. ENVIRONMENTAL HEALTH PERSPECTIVES 2004; 112:1614-1621. [PMID: 15598612 PMCID: PMC1247658 DOI: 10.1289/txg.7105] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2004] [Accepted: 07/21/2004] [Indexed: 05/24/2023]
Abstract
One major unresolved issue in the analysis of gene expression data is the identification and quantification of gene regulatory networks. Several methods have been proposed for identifying gene regulatory networks, but these methods predominantly focus on the use of multiple pairwise comparisons to identify the network structure. In this article, we describe a method for analyzing gene expression data to determine a regulatory structure consistent with an observed set of expression profiles. Unlike other methods this method goes beyond pairwise evaluations by using likelihood-based statistical methods to obtain the network that is most consistent with the complete data set. The proposed algorithm performs accurately for moderate-sized networks with most errors being minor additions of linkages. However, the analysis also indicates that sample sizes may need to be increased to uniquely identify even moderate-sized networks. The method is used to evaluate interactions between genes in the SOS signaling pathway in Escherichia coli using gene expression data where each gene in the network is over-expressed using plasmids inserts.
Collapse
Affiliation(s)
- Takeharu Yamanaka
- Laboratory of Computational Biology and Risk Analysis, National Institute of Environmental Health Sciences, National Institutes of Health/DHHS, 111 Alexander Drive, Research Triangle Park, NC 27709, USA
| | | | | | | | | |
Collapse
|
604
|
Zak DE, Pearson RK, Vadigepalli R, Gonye GE, Schwaber JS, Doyle FJ. Continuous-time identification of gene expression models. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2004; 7:373-86. [PMID: 14683610 DOI: 10.1089/153623103322637689] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
One objective of systems biology is to create predictive, quantitative models of the transcriptional regulation networks that govern numerous cellular processes. Gene expression measurements, as provided by microarrays, are commonly used in studies that attempt to infer the regulation underlying these processes. At present, most gene expression models that have been derived from microarray data are based in discrete-time, which have limited applicability to common biological data sets, and may impede the integration of gene expression models with other models of biological processes that are formulated as ordinary differential equations (ODEs). To overcome these difficulties, a continuous-time approach for process identification to identify gene expression models based in ODEs was developed. The approach utilizes the modulating functions method of parameter identification. The method was applied to three simulated systems: (1) a linear gene expression model, (2) an autoregulatory gene expression model, and (3) simulated microarray data from a nonlinear transcriptional network. In general, the approach was well suited for identifying models of gene expression dynamics, capable of accurately identifying parameters for small numbers of data samples in the presence of modest experimental noise. Additionally, numerous insights about gene expression modeling were revealed by the case studies.
Collapse
Affiliation(s)
- Daniel E Zak
- Department of Chemical Engineering, University of Delaware, Newark, Delaware, USA
| | | | | | | | | | | |
Collapse
|
605
|
Rice JJ, Tu Y, Stolovitzky G. Reconstructing biological networks using conditional correlation analysis. Bioinformatics 2004; 21:765-73. [PMID: 15486043 DOI: 10.1093/bioinformatics/bti064] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the present challenges in biological research is the organization of the data originating from high-throughput technologies. One way in which this information can be organized is in the form of networks of influences, physical or statistical, between cellular components. We propose an experimental method for probing biological networks, analyzing the resulting data and reconstructing the network architecture. METHODS We use networks of known topology consisting of nodes (genes), directed edges (gene-gene interactions) and a dynamics for the genes' mRNA concentrations in terms of the gene-gene interactions. We proposed a network reconstruction algorithm based on the conditional correlation of the mRNA equilibrium concentration between two genes given that one of them was knocked down. Using simulated gene expression data on networks of known connectivity, we investigated how the reconstruction error is affected by noise, network topology, size, sparseness and dynamic parameters. RESULTS Errors arise from correlation between nodes connected through intermediate nodes (false positives) and when the correlation between two directly connected nodes is obscured by noise, non-linearity or multiple inputs to the target node (false negatives). Two critical components of the method are as follows: (1) the choice of an optimal correlation threshold for predicting connections and (2) the reduction of errors arising from indirect connections (for which a novel algorithm is proposed). With these improvements, we can reconstruct networks with the topology of the transcriptional regulatory network in Escherichia coli with a reasonably low error rate.
Collapse
Affiliation(s)
- John Jeremy Rice
- Computational Biology Center, IBM T.J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA
| | | | | |
Collapse
|
606
|
Woolf PJ, Prudhomme W, Daheron L, Daley GQ, Lauffenburger DA. Bayesian analysis of signaling networks governing embryonic stem cell fate decisions. Bioinformatics 2004; 21:741-53. [PMID: 15479714 DOI: 10.1093/bioinformatics/bti056] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Signaling events that direct mouse embryonic stem (ES) cell self-renewal and differentiation are complex and accordingly difficult to understand in an integrated manner. We address this problem by adapting a Bayesian network learning algorithm to model proteomic signaling data for ES cell fate responses to external cues. Using this model we were able to characterize the signaling pathway influences as quantitative, logic-circuit type interactions. Our experimental dataset includes measurements for 28 signaling protein phosphorylation states across 16 different factorial combinations of cytokine and matrix stimuli as reported previously. RESULTS The Bayesian network modeling approach allows us to uncover previously reported signaling activities related to mouse ES cell self-renewal, such as the roles of LIF and STAT3 in maintaining undifferentiated ES cell populations. Furthermore, the network predicts novel influences such as between ERK phosphorylation and differentiation, or RAF phosphorylation and differentiated cell proliferation. Visualization of the influences detected by the Bayesian network provides intuition about the underlying physiology of the signaling pathways. We demonstrate that the Bayesian networks can capture the linear, nonlinear and multistate logic interactions that connect extracellular cues, intracellular signals and consequent cell functional responses.
Collapse
Affiliation(s)
- Peter J Woolf
- Department of Chemical Engineering, University of Michigan, Room 3320, G. G. Brown Building, 2300 Hayward Street, Ann Arbor, MI 48109-2125, USA.
| | | | | | | | | |
Collapse
|
607
|
Xia Y, Yu H, Jansen R, Seringhaus M, Baxter S, Greenbaum D, Zhao H, Gerstein M. Analyzing cellular biochemistry in terms of molecular networks. Annu Rev Biochem 2004; 73:1051-87. [PMID: 15189167 DOI: 10.1146/annurev.biochem.73.011303.073950] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
One way to understand cells and circumscribe the function of proteins is through molecular networks. These networks take a variety of forms including webs of protein-protein interactions, regulatory circuits linking transcription factors and targets, and complex pathways of metabolic reactions. We first survey experimental techniques for mapping networks (e.g., the yeast two-hybrid screens). We then turn our attention to computational approaches for predicting networks from individual protein features, such as correlating gene expression levels or analyzing sequence coevolution. All the experimental techniques and individual predictions suffer from noise and systematic biases. These problems can be overcome to some degree through statistical integration of different experimental datasets and predictive features (e.g., within a Bayesian formalism). Next, we discuss approaches for characterizing the topology of networks, such as finding hubs and analyzing subnetworks in terms of common motifs. Finally, we close with perspectives on how network analysis represents a preliminary step toward a systems approach for modeling cells.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
| | | | | | | | | | | | | | | |
Collapse
|
608
|
Yap Y, Zhang X, Ling MT, Wang X, Wong YC, Danchin A. Classification between normal and tumor tissues based on the pair-wise gene expression ratio. BMC Cancer 2004; 4:72. [PMID: 15469618 PMCID: PMC524507 DOI: 10.1186/1471-2407-4-72] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2004] [Accepted: 10/07/2004] [Indexed: 11/24/2022] Open
Abstract
Background Precise classification of cancer types is critically important for early cancer diagnosis and treatment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. However, reliable cancer-related signals are generally lacking. Method Using recent datasets on colon and prostate cancer, a data transformation procedure from single gene expression to pair-wise gene expression ratio is proposed. Making use of the internal consistency of each expression profiling dataset this transformation improves the signal to noise ratio of the dataset and uncovers new relevant cancer-related signals (features). The efficiency in using the transformed dataset to perform normal/tumor classification was investigated using feature partitioning with informative features (gene annotation) as discriminating axes (single gene expression or pair-wise gene expression ratio). Classification results were compared to the original datasets for up to 10-feature model classifiers. Results 82 and 262 genes that have high correlation to tissue phenotype were selected from the colon and prostate datasets respectively. Remarkably, data transformation of the highly noisy expression data successfully led to lower the coefficient of variation (CV) for the within-class samples as well as improved the correlation with tissue phenotypes. The transformed dataset exhibited lower CV when compared to that of single gene expression. In the colon cancer set, the minimum CV decreased from 45.3% to 16.5%. In prostate cancer, comparable CV was achieved with and without transformation. This improvement in CV, coupled with the improved correlation between the pair-wise gene expression ratio and tissue phenotypes, yielded higher classification efficiency, especially with the colon dataset – from 87.1% to 93.5%. Over 90% of the top ten discriminating axes in both datasets showed significant improvement after data transformation. The high classification efficiency achieved suggested that there exist some cancer-related signals in the form of pair-wise gene expression ratio. Conclusion The results from this study indicated that: 1) in the case when the pair-wise expression ratio transformation achieves lower CV and higher correlation to tissue phenotypes, a better classification of tissue type will follow. 2) the comparable classification accuracy achieved after data transformation suggested that pair-wise gene expression ratio between some pairs of genes can identify reliable markers for cancer.
Collapse
Affiliation(s)
- YeeLeng Yap
- HKU-Pasteur Research Centre, Dexter H.C. Man Building, 8 Sassoon Road Pokfulam, HongKong, China
| | - XueWu Zhang
- HKU-Pasteur Research Centre, Dexter H.C. Man Building, 8 Sassoon Road Pokfulam, HongKong, China
| | - MT Ling
- Cancer Biology Laboratory, Department of Anatomy, Faculty of Medicine. The University of HongKong, China
| | - XiangHong Wang
- Cancer Biology Laboratory, Department of Anatomy, Faculty of Medicine. The University of HongKong, China
| | - YC Wong
- Cancer Biology Laboratory, Department of Anatomy, Faculty of Medicine. The University of HongKong, China
- Central Laboratory of the Institute of Molecular Technology for Drug Discovery and Sythesis, The University of HongKong, China
| | - Antoine Danchin
- Institute Pasteur, Unité de Génétique des Génomes Bactériens, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
609
|
Perkins TJ, Hallett M, Glass L. Inferring models of gene expression dynamics. J Theor Biol 2004; 230:289-99. [PMID: 15302539 DOI: 10.1016/j.jtbi.2004.05.022] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2003] [Revised: 05/12/2004] [Accepted: 05/13/2004] [Indexed: 11/17/2022]
Abstract
We study the problem of identifying genetic networks in which expression dynamics are modeled by a differential equation that uses logical rules to specify time derivatives. We make three main contributions. First, we describe computationally efficient procedures for identifying the structure and dynamics of such networks from expression time series. Second, we derive predictions for the expected amount of data needed to identify randomly generated networks. Third, if expression values are available for only some of the genes, we show that the structure of the network for these "visible" genes can be identified and that the size and overall complexity of the network can be estimated. We validate these procedures and predictions using simulation experiments based on randomly generated networks with up to 30,000 genes and 17 distinct regulators per gene and on a network that models floral morphogenesis in Arabidopsis thaliana.
Collapse
Affiliation(s)
- Theodore J Perkins
- McGill Centre for Bioinformatics, McGill University, 3775 University St. Montreal, Quebec, Canada H3A 2B4.
| | | | | |
Collapse
|
610
|
Herrgård MJ, Covert MW, Palsson BØ. Reconstruction of microbial transcriptional regulatory networks. Curr Opin Biotechnol 2004; 15:70-7. [PMID: 15102470 DOI: 10.1016/j.copbio.2003.11.002] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Although metabolic networks can be readily reconstructed through comparative genomics, the reconstruction of regulatory networks has been hindered by the relatively low level of evolutionary conservation of their molecular components. Recent developments in experimental techniques have allowed the generation of vast amounts of data related to regulatory networks. This data together with literature-derived knowledge has opened the way for genome-scale reconstruction of transcriptional regulatory networks. Large-scale regulatory network reconstructions can be converted to in silico models that allow systematic analysis of network behavior in response to changes in environmental conditions. These models can further be combined with genome-scale metabolic models to build integrated models of cellular function including both metabolism and its regulation.
Collapse
Affiliation(s)
- Markus J Herrgård
- Department of Bioengineering, Bioinformatics Graduate Program, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0412, USA
| | | | | |
Collapse
|
611
|
Abstract
The availability of entire genome sequences is expected to revolutionize the way in which biology and medicine are conducted for years to come. However, achieving this promise still requires significant effort in the areas of gene annotation, cloning and expression of thousands of known and heretofore unknown protein-encoding genes. Traditional technologies of manipulating genes are too cumbersome and inefficient when one is dealing with more than a few genes at a time. Entire libraries composed of all protein-encoding open reading frames (ORFs) cloned in highly flexible vectors will be needed to take full advantage of the information found in any genome sequence. The creation of such ORFeome resources using novel technologies for cloning and expressing entire proteomes constitutes an effective gateway from whole genome sequencing efforts to downstream 'omics' applications.
Collapse
Affiliation(s)
- Jean-François Rual
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, 44 Binney Street, Boston, MA 02115, USA
| | | | | |
Collapse
|
612
|
Veflingstad SR, Almeida J, Voit EO. Priming nonlinear searches for pathway identification. Theor Biol Med Model 2004; 1:8. [PMID: 15367330 PMCID: PMC522751 DOI: 10.1186/1742-4682-1-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2004] [Accepted: 09/14/2004] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Dense time series of metabolite concentrations or of the expression patterns of proteins may be available in the near future as a result of the rapid development of novel, high-throughput experimental techniques. Such time series implicitly contain valuable information about the connectivity and regulatory structure of the underlying metabolic or proteomic networks. The extraction of this information is a challenging task because it usually requires nonlinear estimation methods that involve iterative search algorithms. Priming these algorithms with high-quality initial guesses can greatly accelerate the search process. In this article, we propose to obtain such guesses by preprocessing the temporal profile data and fitting them preliminarily by multivariate linear regression. RESULTS The results of a small-scale analysis indicate that the regression coefficients reflect the connectivity of the network quite well. Using the mathematical modeling framework of Biochemical Systems Theory (BST), we also show that the regression coefficients may be translated into constraints on the parameter values of the nonlinear BST model, thereby reducing the parameter search space considerably. CONCLUSION The proposed method provides a good approach for obtaining a preliminary network structure from dense time series. This will be more valuable as the systems become larger, because preprocessing and effective priming can significantly limit the search space of parameters defining the network connectivity, thereby facilitating the nonlinear estimation task.
Collapse
Affiliation(s)
- Siren R Veflingstad
- Department of Chemistry, Biotechnology and Food Science, Agricultural University of Norway, N-1432 Ås, Norway
- Center for Integrative Genetics (Cigene), Agricultural University of Norway, N-1432 Ås, Norway
| | - Jonas Almeida
- Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, 303K Cannon Place, 135 Cannon Street, Charleston, SC 29425, USA
| | - Eberhard O Voit
- Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, 303K Cannon Place, 135 Cannon Street, Charleston, SC 29425, USA
- Department of Biochemistry and Molecular Biology, Medical University of South Carolina, 303K Cannon Place, 171 Ashley Avenue, Charleston, SC 29425, USA
| |
Collapse
|
613
|
Toyoshiba H, Yamanaka T, Sone H, Parham FM, Walker NJ, Martinez J, Portier CJ. Gene interaction network suggests dioxin induces a significant linkage between aryl hydrocarbon receptor and retinoic acid receptor beta. ENVIRONMENTAL HEALTH PERSPECTIVES 2004; 112:1217-24. [PMID: 15345368 PMCID: PMC1277115 DOI: 10.1289/txg.7020] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Gene expression arrays (gene chips) have enabled researchers to roughly quantify the level of mRNA expression for a large number of genes in a single sample. Several methods have been developed for the analysis of gene array data including clustering, outlier detection, and correlation studies. Most of these analyses are aimed at a qualitative identification of what is different between two samples and/or the relationship between two genes. We propose a quantitative, statistically sound methodology for the analysis of gene regulatory networks using gene expression data sets. The method is based on Bayesian networks for direct quantification of gene expression networks. Using the gene expression changes in HPL1A lung airway epithelial cells after exposure to 2,3,7,8-tetrachlorodibenzo-(Italic)p(/Italic)-dioxin at levels of 0.1, 1.0, and 10.0 nM for 24 hr, a gene expression network was hypothesized and analyzed. The method clearly demonstrates support for the assumed network and the hypothesis linking the usual dioxin expression changes to the retinoic acid receptor system. Simulation studies demonstrated the method works well, even for small samples.
Collapse
Affiliation(s)
- Hiroyoshi Toyoshiba
- Laboratory of Computational Biology and Risk Analysis, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA
| | | | | | | | | | | | | |
Collapse
|
614
|
Laubenbacher R, Stigler B. A computational algebra approach to the reverse engineering of gene regulatory networks. J Theor Biol 2004; 229:523-37. [PMID: 15246788 DOI: 10.1016/j.jtbi.2004.04.037] [Citation(s) in RCA: 96] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Revised: 04/21/2004] [Accepted: 04/28/2004] [Indexed: 11/20/2022]
Abstract
This paper proposes a new method to reverse engineer gene regulatory networks from experimental data. The modeling framework used is time-discrete deterministic dynamical systems, with a finite set of states for each of the variables. The simplest examples of such models are Boolean networks, in which variables have only two possible states. The use of a larger number of possible states allows a finer discretization of experimental data and more than one possible mode of action for the variables, depending on threshold values. Furthermore, with a suitable choice of state set, one can employ powerful tools from computational algebra, that underlie the reverse-engineering algorithm, avoiding costly enumeration strategies. To perform well, the algorithm requires wildtype together with perturbation time courses. This makes it suitable for small to meso-scale networks rather than networks on a genome-wide scale. An analysis of the complexity of the algorithm is performed. The algorithm is validated on a recently published Boolean network model of segment polarity development in Drosophila melanogaster.
Collapse
Affiliation(s)
- Reinhard Laubenbacher
- Virginia Bioinformatics Institute at Virginia Tech, 1880 Pratt Drive, Building XV, Blacksburg, VA 24061, USA.
| | | |
Collapse
|
615
|
Isaacs FJ, Dwyer DJ, Ding C, Pervouchine DD, Cantor CR, Collins JJ. Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol 2004; 22:841-7. [PMID: 15208640 DOI: 10.1038/nbt986] [Citation(s) in RCA: 403] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2004] [Accepted: 05/06/2004] [Indexed: 11/08/2022]
Abstract
Recent studies have demonstrated the important enzymatic, structural and regulatory roles of RNA in the cell. Here we present a post-transcriptional regulation system in Escherichia coli that uses RNA to both silence and activate gene expression. We inserted a complementary cis sequence directly upstream of the ribosome binding site in a target gene. Upon transcription, this cis-repressive sequence causes a stem-loop structure to form at the 5'-untranslated region of the mRNA. The stem-loop structure interferes with ribosome binding, silencing gene expression. A small noncoding RNA that is expressed in trans targets the cis-repressed RNA with high specificity, causing an alteration in the stem-loop structure that activates expression. Such engineered riboregulators may lend insight into mechanistic actions of endogenous RNA-based processes and could serve as scalable components of biological networks, able to function with any promoter or gene to directly control gene expression.
Collapse
Affiliation(s)
- Farren J Isaacs
- Center for BioDynamics and Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, Massachusetts 02215, USA
| | | | | | | | | | | |
Collapse
|
616
|
Abstract
With the development of trauma systems, improved resuscitation, and organ system support, survival after severe injury is common, but is often complicated by nosocomial infection and organ failure. These complications are costly, and can lead to death or disability. Although much is known about the pathophysiology of post-traumatic nosocomial infection and organ failure, findings have been limited by our ability to generate and analyse large amounts of experimental and observational data. However, technological advances in nucleic acid and protein analysis, coupled with increased computational capacity, provide an opportunity to characterise the determinants of and the responses to injury and sepsis on a genome-wide scale. New large-scale collaborative efforts aim to investigate the genome for variation (gene polymorphisms), characterise multiple levels of the biological response to injury (transcriptome and proteome), and relate these to clinical phenotypes. In this article, we summarise recent findings and explore where promising new technologies might have the greatest potential for increasing our knowledge. It will now be important to determine how these recent technological advances can be used and integrated with our existing approaches, to reduce death, disability, and the economic consequences of trauma.
Collapse
Affiliation(s)
- J Perren Cobb
- Cellular Injury and Adaptation Laboratory, Department of Surgery, Washington University in St Louis, St Louis, Missouri 63110, USA.
| | | |
Collapse
|
617
|
Kalir S, Alon U. Using a Quantitative Blueprint to Reprogram the Dynamics of the Flagella Gene Network. Cell 2004; 117:713-20. [PMID: 15186773 DOI: 10.1016/j.cell.2004.05.010] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2003] [Revised: 03/22/2004] [Accepted: 04/08/2004] [Indexed: 11/24/2022]
Abstract
Detailed understanding and control of biological networks will require a level of description similar to that of electronic engineering blueprints. Currently, however, even the best-studied systems are usually described using qualitative arrow diagrams. A quantitative blueprint requires in vivo measurements of (1) the relative strength of the interactions (numbers on the arrows) and (2) the functions that integrate multiple inputs. Here, we address this using a well-studied system, the flagella biosynthesis transcription network in Escherichia coli. We use theory and high-resolution experiments to obtain a quantitative blueprint with (1) numbers on the arrows, finding different hierarchies of activation coefficients for the two regulators, FlhDC and FliA; and (2) cis-regulatory input functions, which summate the input from the two regulators (SUM gates). We then demonstrate experimentally how this blueprint can be used to reprogram temporal expression patterns in this system, using controlled expression of the regulators or point mutations in their binding sites. The present approach can be used to define blueprints of other gene networks and to quantitatively reprogram their dynamics.
Collapse
Affiliation(s)
- Shiraz Kalir
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel 76100
| | | |
Collapse
|
618
|
Kobayashi H, Kaern M, Araki M, Chung K, Gardner TS, Cantor CR, Collins JJ. Programmable cells: interfacing natural and engineered gene networks. Proc Natl Acad Sci U S A 2004; 101:8414-9. [PMID: 15159530 PMCID: PMC420408 DOI: 10.1073/pnas.0402940101] [Citation(s) in RCA: 398] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Novel cellular behaviors and characteristics can be obtained by coupling engineered gene networks to the cell's natural regulatory circuitry through appropriately designed input and output interfaces. Here, we demonstrate how an engineered genetic circuit can be used to construct cells that respond to biological signals in a predetermined and programmable fashion. We employ a modular design strategy to create Escherichia coli strains where a genetic toggle switch is interfaced with: (i) the SOS signaling pathway responding to DNA damage, and (ii) a transgenic quorum sensing signaling pathway from Vibrio fischeri. The genetic toggle switch endows these strains with binary response dynamics and an epigenetic inheritance that supports a persistent phenotypic alteration in response to transient signals. These features are exploited to engineer cells that form biofilms in response to DNA-damaging agents and cells that activate protein synthesis when the cell population reaches a critical density. Our work represents a step toward the development of "plug-and-play" genetic circuitry that can be used to create cells with programmable behaviors.
Collapse
Affiliation(s)
- Hideki Kobayashi
- Department of Biomedical Engineering, Center for BioDynamics, and Center for Advanced Biotechnology, Boston University, 44 Cummington Street, Boston, MA 02215, USA
| | | | | | | | | | | | | |
Collapse
|
619
|
Jaskoll T, Witcher D, Toreno L, Bringas P, Moon AM, Melnick M. FGF8 dose-dependent regulation of embryonic submandibular salivary gland morphogenesis. Dev Biol 2004; 268:457-69. [PMID: 15063181 DOI: 10.1016/j.ydbio.2004.01.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2003] [Revised: 11/20/2003] [Accepted: 01/02/2004] [Indexed: 11/21/2022]
Abstract
FGF8 has been shown to play important morphoregulatory roles during embryonic development. The observation that craniofacial, cardiovascular, pharyngeal, and neural phenotypes vary with Fgf8 gene dosage suggests that FGF8 signaling induces differences in downstream responses in a dose-dependent manner. In this study, we investigated if FGF8 plays a dose-dependent regulatory role during embryonic submandibular salivary gland (SMG) morphogenesis. We evaluated SMG phenotypes of Fgf8 hypomorphic mice, which have decreased Fgf8 gene function throughout embryogenesis. We also evaluated SMG phenotypes of Fgf8 conditional mutants in which Fgf8 function has been completely ablated in its expression domain in the first pharyngeal arch ectoderm from the time of arch formation. Fgf8 hypomorphs have hypoplastic SMGs, whereas conditional mutant SMGs exhibit ontogenic arrest followed by involution and are absent by E18.5. SMG aplasia in Fgf8 ectoderm conditional mutants indicates that FGF8 signaling is essential for the morphogenesis and survival of Pseudoglandular Stage and older SMGs. Equally important, the presence of an initial SMG bud in Fgf8 conditional mutants indicates that initial bud formation is FGF8 independent. Mice heterozygous for either the Fgf8 null allele (Fgf8(+/N)) or the hypomorphic allele (Fgf8(+/H)) have SMGs that are indistinguishable from wild-type (Fgf8(+/+)) mice which suggest that there is not only an FGF8 dose-dependent phenotypic response, but a nonlinear, threshold-like, epistatic response as well. We also found that enhanced FGF8 signaling induced, and abrogated FGF8 signaling decreased, SMG branching morphogenesis in vitro. Furthermore, since FGF10 and Shh expression is modulated by Fgf8 levels, we postulated that exogenous FGF10, Shh, or FGF10 + Shh peptide supplementation in vitro would largely "rescue" the abnormal SMG phenotype associated with decreased FGF8 signaling. This is as expected, though there is no synergistic effect with FGF10 + Shh peptide supplementation. These in vitro experiments model the principle that mutations have different effects in the context of different epigenotypes.
Collapse
Affiliation(s)
- Tina Jaskoll
- Laboratory for Developmental Genetics, University of Southern California, Los Angeles, CA 90089-0641, USA.
| | | | | | | | | | | |
Collapse
|
620
|
Ding C, Maier E, Roscher AA, Braun A, Cantor CR. Simultaneous quantitative and allele-specific expression analysis with real competitive PCR. BMC Genet 2004; 5:8. [PMID: 15128429 PMCID: PMC411033 DOI: 10.1186/1471-2156-5-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2004] [Accepted: 05/05/2004] [Indexed: 11/17/2022] Open
Abstract
Background For a diploid organism such as human, the two alleles of a particular gene can be expressed at different levels due to X chromosome inactivation, gene imprinting, different local promoter activity, or mRNA stability. Recently, imbalanced allelic expression was found to be common in human and can follow Mendelian inheritance. Here we present a method that employs real competitive PCR for allele-specific expression analysis. Results A transcribed mutation such as a single nucleotide polymorphism (SNP) is used as the marker for allele-specific expression analysis. A synthetic mutation created in the competitor is close to a natural mutation site in the cDNA sequence. PCR is used to amplify the two cDNA sequences from the two alleles and the competitor. A base extension reaction with a mixture of ddNTPs/dNTP is used to generate three oligonucleotides for the two cDNAs and the competitor. The three products are identified and their ratios are calculated based on their peak areas in the MALDI-TOF mass spectrum. Several examples are given to illustrate how allele-specific gene expression can be applied in different biological studies. Conclusions This technique can quantify the absolute expression level of each individual allele of a gene with high precision and throughput.
Collapse
Affiliation(s)
- Chunming Ding
- Bioinformatics Program and Center for Advanced Biotechnology, Boston University, Boston, MA 02215 USA
| | - Esther Maier
- Children's Hospital, University of Munich, Lindwurmstrasse 4, 80337 Munich, Germany
| | - Adelbert A Roscher
- Children's Hospital, University of Munich, Lindwurmstrasse 4, 80337 Munich, Germany
| | | | - Charles R Cantor
- Bioinformatics Program and Center for Advanced Biotechnology, Boston University, Boston, MA 02215 USA
- SEQUENOM, Inc., San Diego, CA 92121 USA
| |
Collapse
|
621
|
Beer MA, Tavazoie S. Predicting Gene Expression from Sequence. Cell 2004; 117:185-98. [PMID: 15084257 DOI: 10.1016/s0092-8674(04)00304-6] [Citation(s) in RCA: 416] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2003] [Revised: 02/13/2004] [Accepted: 02/18/2004] [Indexed: 11/28/2022]
Abstract
We describe a systematic genome-wide approach for learning the complex combinatorial code underlying gene expression. Our probabilistic approach identifies local DNA-sequence elements and the positional and combinatorial constraints that determine their context-dependent role in transcriptional regulation. The inferred regulatory rules correctly predict expression patterns for 73% of genes in Saccharomyces cerevisiae, utilizing microarray expression data and sequences in the 800 bp upstream of genes. Application to Caenorhabditis elegans identifies predictive regulatory elements and combinatorial rules that control the phased temporal expression of transcription factors, histones, and germline specific genes. Successful prediction requires diverse and complex rules utilizing AND, OR, and NOT logic, with significant constraints on motif strength, orientation, and relative position. This system generates a large number of mechanistic hypotheses for focused experimental validation, and establishes a predictive dynamical framework for understanding cellular behavior from genomic sequence.
Collapse
Affiliation(s)
- Michael A Beer
- Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | | |
Collapse
|
622
|
Rice J, Stolovitzky G. Making the most of it: pathway reconstruction and integrative simulation using the data at hand. ACTA ACUST UNITED AC 2004. [DOI: 10.1016/s1741-8364(04)02399-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
623
|
Abstract
Abstract
In this report, we propose the use of structural equations as a tool for identifying and modeling genetic networks and genetic algorithms for searching the most likely genetic networks that best fit the data. After genetic networks are identified, it is fundamental to identify those networks influencing cell phenotypes. To accomplish this task we extend the concept of differential expression of the genes, widely used in gene expression data analysis, to genetic networks. We propose a definition for the differential expression of a genetic network and use the generalized T 2 statistic to measure the ability of genetic networks to distinguish different phenotypes. However, describing the differential expression of genetic networks is not enough for understanding biological systems because differences in the expression of genetic networks do not directly reflect regulatory strength between gene activities. Therefore, in this report we also introduce the concept of differentially regulated genetic networks, which has the potential to assess changes of gene regulation in response to perturbation in the environment and may provide new insights into the mechanism of diseases and biological processes. We propose five novel statistics to measure the differences in regulation of genetic networks. To illustrate the concepts and methods for reconstruction of genetic networks and identification of association of genetic networks with function, we applied the proposed models and algorithms to three data sets.
Collapse
Affiliation(s)
- Momiao Xiong
- Human Genetics Center, University of Texas, Houston Health Science Center, Houston, Texas 77030
| | - Jun Li
- Human Genetics Center, University of Texas, Houston Health Science Center, Houston, Texas 77030
| | - Xiangzhong Fang
- Human Genetics Center, University of Texas, Houston Health Science Center, Houston, Texas 77030
| |
Collapse
|
624
|
Affiliation(s)
- Michael E Wall
- Computer and Computational Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA.
| | | | | |
Collapse
|
625
|
Kao KC, Yang YL, Boscolo R, Sabatti C, Roychowdhury V, Liao JC. Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. Proc Natl Acad Sci U S A 2003; 101:641-6. [PMID: 14694202 PMCID: PMC327201 DOI: 10.1073/pnas.0305287101] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Cells adjust gene expression profiles in response to environmental and physiological changes through a series of signal transduction pathways. Upon activation or deactivation, the terminal regulators bind to or dissociate from DNA, respectively, and modulate transcriptional activities on particular promoters. Traditionally, individual reporter genes have been used to detect the activity of the transcription factors. This approach works well for simple, non-overlapping transcription pathways. For complex transcriptional networks, more sophisticated tools are required to deconvolute the contribution of each regulator. Here, we demonstrate the utility of network component analysis in determining multiple transcription factor activities based on transcriptome profiles and available connectivity information regarding network connectivity. We used Escherichia coli carbon source transition from glucose to acetate as a model system. Key results from this analysis were either consistent with physiology or verified by using independent measurements.
Collapse
Affiliation(s)
- Katy C Kao
- Department of Chemical Engineering, University of California, Los Angeles, CA 90095, USA
| | | | | | | | | | | |
Collapse
|
626
|
Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP. Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A 2003; 100:15522-7. [PMID: 14673099 PMCID: PMC307600 DOI: 10.1073/pnas.2136632100] [Citation(s) in RCA: 390] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are often the outputs of complex networked systems driven by hidden regulatory signals. Traditional statistical methods for computing low-dimensional or hidden representations of these data sets, such as principal component analysis and independent component analysis, ignore the underlying network structures and provide decompositions based purely on a priori statistical constraints on the computed component signals. The resulting decomposition thus provides a phenomenological model for the observed data and does not necessarily contain physically or biologically meaningful signals. Here, we develop a method, called network component analysis, for uncovering hidden regulatory signals from outputs of networked systems, when only a partial knowledge of the underlying network topology is available. The a priori network structure information is first tested for compliance with a set of identifiability criteria. For networks that satisfy the criteria, the signals from the regulatory nodes and their strengths of influence on each output node can be faithfully reconstructed. This method is first validated experimentally by using the absorbance spectra of a network of various hemoglobin species. The method is then applied to microarray data generated from yeast Saccharamyces cerevisiae and the activities of various transcription factors during cell cycle are reconstructed by using recently discovered connectivity information for the underlying transcriptional regulatory networks.
Collapse
Affiliation(s)
- James C Liao
- Departments of Chemical Engineering, University of California, Los Angeles, CA 90095, USA.
| | | | | | | | | | | |
Collapse
|
627
|
Abstract
This viewpoint comments on recent advances in understanding the design principles of biological networks. It highlights the surprising discovery of "good-engineering" principles in biochemical circuitry that evolved by random tinkering.
Collapse
Affiliation(s)
- U Alon
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel 76100.
| |
Collapse
|
628
|
|