101
|
Huang AC, Hu L, Kauffman SA, Zhang W, Shmulevich I. Using cell fate attractors to uncover transcriptional regulation of HL60 neutrophil differentiation. BMC SYSTEMS BIOLOGY 2009; 3:20. [PMID: 19222862 PMCID: PMC2652435 DOI: 10.1186/1752-0509-3-20] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 02/18/2009] [Indexed: 12/16/2022]
Abstract
BACKGROUND The process of cellular differentiation is governed by complex dynamical biomolecular networks consisting of a multitude of genes and their products acting in concert to determine a particular cell fate. Thus, a systems level view is necessary for understanding how a cell coordinates this process and for developing effective therapeutic strategies to treat diseases, such as cancer, in which differentiation plays a significant role. Theoretical considerations and recent experimental evidence support the view that cell fates are high dimensional attractor states of the underlying molecular networks. The temporal behavior of the network states progressing toward different cell fate attractors has the potential to elucidate the underlying molecular mechanisms governing differentiation. RESULTS Using the HL60 multipotent promyelocytic leukemia cell line, we performed experiments that ultimately led to two different cell fate attractors by two treatments of varying dosage and duration of the differentiation agent all-trans-retinoic acid (ATRA). The dosage and duration combinations of the two treatments were chosen by means of flow cytometric measurements of CD11b, a well-known early differentiation marker, such that they generated two intermediate populations that were poised at the apparently same stage of differentiation. However, the population of one treatment proceeded toward the terminally differentiated neutrophil attractor while that of the other treatment reverted back toward the undifferentiated promyelocytic attractor. We monitored the gene expression changes in the two populations after their respective treatments over a period of five days and identified a set of genes that diverged in their expression, a subset of which promotes neutrophil differentiation while the other represses cell cycle progression. By employing promoter based transcription factor binding site analysis, we found enrichment in the set of divergent genes, of transcription factors functionally linked to tumor progression, cell cycle, and development. CONCLUSION Since many of the transcription factors identified by this approach are also known to be implicated in hematopoietic differentiation and leukemia, this study points to the utility of incorporating a dynamical systems level view into a computational analysis framework for elucidating transcriptional mechanisms regulating differentiation.
Collapse
|
102
|
Abstract
Traditionally molecular biology research has tended to reduce biological pathways to composite units studied as isolated parts of the cellular system. With the advent of high throughput methodologies that can capture thousands of data points, and powerful computational approaches, the reality of studying cellular processes at a systems level is upon us. As these approaches yield massive datasets, systems level analyses have drawn upon other fields such as engineering and mathematics, adapting computational and statistical approaches to decipher relationships between molecules. Guided by high quality datasets and analyses, one can begin the process of predictive modeling. The findings from such approaches are often surprising and beyond normal intuition. We discuss four classes of dynamical systems used to model genetic regulatory networks. The discussion is divided into continuous and discrete models, as well as deterministic and stochastic model classes. For each combination of these categories, a model is presented and discussed in the context of the yeast cell cycle, illustrating how different types of questions can be addressed by different model classes.
Collapse
|
103
|
Shmulevich I, Aitchison JD. Deterministic and stochastic models of genetic regulatory networks. Methods Enzymol 2009. [PMID: 19897099 DOI: 10.1016/s0076-6879(09)67013-67010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Traditionally molecular biology research has tended to reduce biological pathways to composite units studied as isolated parts of the cellular system. With the advent of high throughput methodologies that can capture thousands of data points, and powerful computational approaches, the reality of studying cellular processes at a systems level is upon us. As these approaches yield massive datasets, systems level analyses have drawn upon other fields such as engineering and mathematics, adapting computational and statistical approaches to decipher relationships between molecules. Guided by high quality datasets and analyses, one can begin the process of predictive modeling. The findings from such approaches are often surprising and beyond normal intuition. We discuss four classes of dynamical systems used to model genetic regulatory networks. The discussion is divided into continuous and discrete models, as well as deterministic and stochastic model classes. For each combination of these categories, a model is presented and discussed in the context of the yeast cell cycle, illustrating how different types of questions can be addressed by different model classes.
Collapse
|
104
|
Abstract
A challenge in systems-level investigations of the immune response is the principled integration of disparate data sets for constructing predictive models. InnateDB (Lynn et al., 2008; http://www.innatedb.ca), a publicly available, manually curated database of experimentally verified molecular interactions and pathways involved in innate immunity, is a powerful new resource that facilitates such integrative systems-level analyses.
Collapse
|
105
|
Boyle J, Cavnor C, Killcoyne S, Shmulevich I. Systems biology driven software design for the research enterprise. BMC Bioinformatics 2008; 9:295. [PMID: 18578887 PMCID: PMC2478690 DOI: 10.1186/1471-2105-9-295] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2007] [Accepted: 06/25/2008] [Indexed: 11/12/2022] Open
Abstract
Background In systems biology, and many other areas of research, there is a need for the interoperability of tools and data sources that were not originally designed to be integrated. Due to the interdisciplinary nature of systems biology, and its association with high throughput experimental platforms, there is an additional need to continually integrate new technologies. As scientists work in isolated groups, integration with other groups is rarely a consideration when building the required software tools. Results We illustrate an approach, through the discussion of a purpose built software architecture, which allows disparate groups to reuse tools and access data sources in a common manner. The architecture allows for: the rapid development of distributed applications; interoperability, so it can be used by a wide variety of developers and computational biologists; development using standard tools, so that it is easy to maintain and does not require a large development effort; extensibility, so that new technologies and data types can be incorporated; and non intrusive development, insofar as researchers need not to adhere to a pre-existing object model. Conclusion By using a relatively simple integration strategy, based upon a common identity system and dynamically discovered interoperable services, a light-weight software architecture can become the focal point through which scientists can both get access to and analyse the plethora of experimentally derived data.
Collapse
|
106
|
Balleza E, Alvarez-Buylla ER, Chaos A, Kauffman S, Shmulevich I, Aldana M. Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS One 2008; 3:e2456. [PMID: 18560561 PMCID: PMC2423472 DOI: 10.1371/journal.pone.0002456] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 04/14/2008] [Indexed: 11/19/2022] Open
Abstract
The coordinated expression of the different genes in an organism is essential to sustain functionality under the random external perturbations to which the organism might be subjected. To cope with such external variability, the global dynamics of the genetic network must possess two central properties. (a) It must be robust enough as to guarantee stability under a broad range of external conditions, and (b) it must be flexible enough to recognize and integrate specific external signals that may help the organism to change and adapt to different environments. This compromise between robustness and adaptability has been observed in dynamical systems operating at the brink of a phase transition between order and chaos. Such systems are termed critical. Thus, criticality, a precise, measurable, and well characterized property of dynamical systems, makes it possible for robustness and adaptability to coexist in living organisms. In this work we investigate the dynamical properties of the gene transcription networks reported for S. cerevisiae, E. coli, and B. subtilis, as well as the network of segment polarity genes of D. melanogaster, and the network of flower development of A. thaliana. We use hundreds of microarray experiments to infer the nature of the regulatory interactions among genes, and implement these data into the Boolean models of the genetic networks. Our results show that, to the best of the current experimental data available, the five networks under study indeed operate close to criticality. The generality of this result suggests that criticality at the genetic level might constitute a fundamental evolutionary mechanism that generates the great diversity of dynamically robust living forms that we observe around us.
Collapse
|
107
|
Niemistö A, Korpelainen T, Saleem R, Yli-Harja O, Aitchison J, Shmulevich I. A K-means segmentation method for finding 2-D object areas based on 3-D image stacks obtained by confocal microscopy. ACTA ACUST UNITED AC 2008; 2007:5559-62. [PMID: 18003272 DOI: 10.1109/iembs.2007.4353606] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A segmentation method for three-dimensional image stacks obtained by confocal microscopy is proposed. The method can be used to find two-dimensional object areas based on an image stack. The segmentation method is based on K-means clustering, global thresholding, and mathematical morphology. As a case study, the proposed method is applied to 244 image stacks of the yeast Saccharomyces cerevisiae. Quantitative comparisons with manually obtained results as well as with results obtained by a two-dimensional segmentation method are used to illustrate how the additional information provided by three-dimensional image stacks can improve segmentation results.
Collapse
|
108
|
Lähdesmäki H, Rust AG, Shmulevich I. Probabilistic inference of transcription factor binding from multiple data sources. PLoS One 2008; 3:e1820. [PMID: 18364997 PMCID: PMC2268002 DOI: 10.1371/journal.pone.0001820] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Accepted: 02/04/2008] [Indexed: 11/21/2022] Open
Abstract
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
Collapse
|
109
|
Ramsey SA, Klemm SL, Zak DE, Kennedy KA, Thorsson V, Li B, Gilchrist M, Gold ES, Johnson CD, Litvak V, Navarro G, Roach JC, Rosenberger CM, Rust AG, Yudkovsky N, Aderem A, Shmulevich I. Uncovering a macrophage transcriptional program by integrating evidence from motif scanning and expression dynamics. PLoS Comput Biol 2008; 4:e1000021. [PMID: 18369420 PMCID: PMC2265556 DOI: 10.1371/journal.pcbi.1000021] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 02/04/2008] [Indexed: 01/04/2023] Open
Abstract
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation. Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
Collapse
|
110
|
Niemistö A, Selinummi J, Saleem R, Shmulevich I, Aitchison J, Yli-Harja O. Extraction of the number of peroxisomes in yeast cells by automated image analysis. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2008; 2006:2353-6. [PMID: 17945710 DOI: 10.1109/iembs.2006.259890] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An automated image analysis method for extracting the number of peroxisomes in yeast cells is presented. Two images of the cell population are required for the method: a bright field microscope image from which the yeast cells are detected and the respective fluorescent image from which the number of peroxisomes in each cell is found. The segmentation of the cells is based on clustering the local mean-variance space. The watershed transformation is thereafter employed to separate cells that are clustered together. The peroxisomes are detected by thresholding the fluorescent image. The method is tested with several images of a budding yeast Saccharomyces cerevisiae population, and the results are compared with manually obtained results.
Collapse
|
111
|
Korb M, Rust AG, Thorsson V, Battail C, Li B, Hwang D, Kennedy KA, Roach JC, Rosenberger CM, Gilchrist M, Zak D, Johnson C, Marzolf B, Aderem A, Shmulevich I, Bolouri H. The Innate Immune Database (IIDB). BMC Immunol 2008; 9:7. [PMID: 18321385 PMCID: PMC2268913 DOI: 10.1186/1471-2172-9-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 03/05/2008] [Indexed: 02/04/2023] Open
Abstract
Background As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens. Description We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser. Conclusion We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at .
Collapse
|
112
|
Nykter M, Price ND, Larjo A, Aho T, Kauffman SA, Yli-Harja O, Shmulevich I. Critical networks exhibit maximal information diversity in structure-dynamics relationships. PHYSICAL REVIEW LETTERS 2008; 100:058702. [PMID: 18352443 DOI: 10.1103/physrevlett.100.058702] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2007] [Indexed: 05/26/2023]
Abstract
Network structure strongly constrains the range of dynamic behaviors available to a complex system. These system dynamics can be classified based on their response to perturbations over time into two distinct regimes, ordered or chaotic, separated by a critical phase transition. Numerous studies have shown that the most complex dynamics arise near the critical regime. Here we use an information theoretic approach to study structure-dynamics relationships within a unified framework and show that these relationships are most diverse in the critical regime.
Collapse
|
113
|
Vêncio RZN, Shmulevich I. ProbCD: enrichment analysis accounting for categorization uncertainty. BMC Bioinformatics 2007; 8:383. [PMID: 17935624 PMCID: PMC2169266 DOI: 10.1186/1471-2105-8-383] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2007] [Accepted: 10/12/2007] [Indexed: 11/10/2022] Open
Abstract
Background As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. Results We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: . Conclusion We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.
Collapse
|
114
|
Niemisto A, Hu L, Yli-Harja O, Zhang W, Shmulevich I. Quantification of in vitro cell invasion through image analysis. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:1703-6. [PMID: 17272032 DOI: 10.1109/iembs.2004.1403512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
An image analysis method for quantification of in vitro cell invasion is presented. The method is designed for in vitro assays that are based on invasion of cells through a porous membrane. The images are obtained with a light microscope. The method has two major steps. The first one is the detection of the well in which invasion occurs. The second task is the detection of cells that have invaded through the membrane. The image processing techniques that are employed include thresholding and morphological filtering. Image processing results of in vitro invasion experiments and an analysis of robustness are presented to demonstrate the accuracy of the method.
Collapse
|
115
|
Krawitz P, Shmulevich I. Entropy of complex relevant components of Boolean networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 76:036115. [PMID: 17930314 DOI: 10.1103/physreve.76.036115] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 08/10/2007] [Indexed: 05/25/2023]
Abstract
Boolean network models of strongly connected modules are capable of capturing the high regulatory complexity of many biological gene regulatory circuits. We study numerically the previously introduced basin entropy, a parameter for the dynamical uncertainty or information storage capacity of a network as well as the average transient time in random relevant components as a function of their connectivity. We also demonstrate that basin entropy can be estimated from time-series data and is therefore also applicable to nondeterministic networks models.
Collapse
|
116
|
Price ND, Shmulevich I. Biochemical and statistical network models for systems biology. Curr Opin Biotechnol 2007; 18:365-70. [PMID: 17681779 PMCID: PMC2034526 DOI: 10.1016/j.copbio.2007.07.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 07/12/2007] [Indexed: 11/19/2022]
Abstract
The normal and abnormal behavior of a living cell is governed by complex networks of interacting biomolecules. Models of these networks allow us to make predictions about cellular behavior under a variety of environmental cues. In this review, we focus on two broad classes of such models: biochemical network models and statistical inference models. In particular, we discuss a number of modeling approaches in the context of the assumptions that they entail, the types of data required for their inference, and the range of their applicability.
Collapse
|
117
|
Vêncio RZN, Varuzza L, de B Pereira CA, Brentani H, Shmulevich I. Simcluster: clustering enumeration gene expression data on the simplex space. BMC Bioinformatics 2007; 8:246. [PMID: 17625017 PMCID: PMC2147035 DOI: 10.1186/1471-2105-8-246] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2007] [Accepted: 07/11/2007] [Indexed: 02/06/2023] Open
Abstract
Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Collapse
|
118
|
Krawitz P, Shmulevich I. Basin entropy in Boolean network ensembles. PHYSICAL REVIEW LETTERS 2007; 98:158701. [PMID: 17501391 DOI: 10.1103/physrevlett.98.158701] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Indexed: 05/15/2023]
Abstract
The information processing capacity of a complex dynamical system is reflected in the partitioning of its state space into disjoint basins of attraction, with state trajectories in each basin flowing towards their corresponding attractor. We introduce a novel network parameter, the basin entropy, as a measure of the complexity of information that such a system is capable of storing. By studying ensembles of random Boolean networks, we find that the basin entropy scales with system size only in critical regimes, suggesting that the informationally optimal partition of the state space is achieved when the system is operating at the critical boundary between the ordered and disordered phases.
Collapse
|
119
|
Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, Pollock RE, Hood L, Shmulevich I, Zhang W. Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci U S A 2007; 104:3414-9. [PMID: 17360660 PMCID: PMC1805517 DOI: 10.1073/pnas.0611373104] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Gastrointestinal stromal tumor (GIST) has emerged as a clinically distinct type of sarcoma with frequent overexpression and mutation of the c-Kit oncogene and a favorable response to imatinib mesylate [also known as STI571 (Gleevec)] therapy. However, a significant diagnostic challenge remains in the differentiation of GIST from leiomyosarcomas (LMSs). To improve on the diagnostic evaluation and to complement the immunohistochemical evaluation of these tumors, we performed a whole-genome gene expression study on 68 well characterized tumor samples. Using bioinformatic approaches, we devised a two-gene relative expression classifier that distinguishes between GIST and LMS with an accuracy of 99.3% on the microarray samples and an estimated accuracy of 97.8% on future cases. We validated this classifier by using RT-PCR on 20 samples in the microarray study and on an additional 19 independent samples, with 100% accuracy. Thus, our two-gene relative expression classifier is a highly accurate diagnostic method to distinguish between GIST and LMS and has the potential to be rapidly implemented in a clinical setting. The success of this classifier is likely due to two general traits, namely that the classifier is independent of data normalization and that it uses as simple an approach as possible to achieve this independence to avoid overfitting. We expect that the use of simple marker pairs that exhibit these traits will be of significant clinical use in a variety of contexts.
Collapse
|
120
|
Niemisto A, Shmulevich I, Yli-Harja O, Chirieac LR, Hamilton SR. Automated quantification of lymph node size and number in surgical specimens of stage II colorectal cancer. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2005:6313-6. [PMID: 17281711 DOI: 10.1109/iembs.2005.1615941] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
An automated image analysis method for quantification of the size and number of lymph nodes in surgical specimens of stage II colorectal cancer is presented. The quantification is made using routine histopathologic sections of lymph nodes that have been dissected by pathologists from resection specimens. The hematoxylin and eosin stained sections on slides are imaged with a standard image scanner. Each obtained image can contain multiple slides. The first task is to detect the slides. Then, the lymph nodes are detected and their size is assessed using K-means clustering and morphological image processing. The results are found to correlate well with results that have been obtained manually. The method has proven useful for predicting survival in stage II colorectal cancer.
Collapse
|
121
|
Borenstein A, Linker R, Shmulevich I, Shaviv A. Determination of soil nitrate and water content using attenuated total reflectance spectroscopy. APPLIED SPECTROSCOPY 2006; 60:1267-72. [PMID: 17132443 DOI: 10.1366/000370206778998969] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Direct determination of nitrate and soil moisture can significantly improve N-application management and thus reduce N-derived environmental pollution related to agriculture. Several studies have shown that Fourier transform infrared attenuated total reflectance (FT-IR/ATR) spectroscopy could be used to estimate the nitrate content of standardized soil pastes. Paste standardization appeared to be the main obstacle to in situ application of this approach, and the present study shows how FT-IR/ATR can be used to estimate both water content and nitrate concentration of field soil samples. Water content and nitrate concentration are determined sequentially using two subsamples of the initial soil sample. An a priori determined amount of highly concentrated nitrate solution is added to the first subsample and the ATR spectrum of this paste is used to estimate the sample water content. It is then possible to calculate the amount of water that should be added to the second subsample so that the resulting paste is very close to the ideal standard paste. Nitrate concentration, mg [N]/kg [dry soil], is estimated using the FT-IR/ATR spectrum of this second paste. Results are presented for a laboratory experiment with four agricultural soils, as well as for a field trial with a calcareous soil. For water content, the determination errors range from 0.01 to 0.02 g [water]/g [dry soil]. For nitrate concentration, the errors for three of the soils range from 5.9 to 8.4 mg [N]/kg [dry soil], while for the fourth, calcareous clay soil, the determination error is 13.6 mg [N]/kg [dry soil]. The determination errors obtained for the field trial are similar to the ones obtained for a similar soil under laboratory conditions, which shows the potential usefulness of the approach for improving N-application management and reducing environmental pollution.
Collapse
|
122
|
Jiang R, Mircean C, Shmulevich I, Cogdell D, Jia Y, Tabus I, Aldape K, Sawaya R, Bruner JM, Fuller GN, Zhang W. Pathway alterations during glioma progression revealed by reverse phase protein lysate arrays. Proteomics 2006; 6:2964-71. [PMID: 16619307 DOI: 10.1002/pmic.200500555] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The progression of gliomas has been extensively studied at the genomic level using cDNA microarrays. However, systematic examinations at the protein translational and post-translational levels are far more limited. We constructed a glioma protein lysate array from 82 different primary glioma tissues, and surveyed the expression and phosphorylation of 46 different proteins involved in signaling pathways of cell proliferation, cell survival, apoptosis, angiogenesis, and cell invasion. An analysis algorithm was employed to robustly estimate the protein expressions in these samples. When ranked by their discriminating power to separate 37 glioblastomas (high-grade gliomas) from 45 lower-grade gliomas, the following 12 proteins were identified as the most powerful discriminators: IBalpha, EGFRpTyr845, AKTpThr308, phosphatidylinositol 3-kinase (PI3K), BadpSer136, insulin-like growth factor binding protein (IGFBP) 2, IGFBP5, matrix metalloproteinase 9 (MMP9), vascular endothelial growth factor (VEGF), phosphorylated retinoblastoma protein (pRB), Bcl-2, and c-Abl. Clustering analysis showed a close link between PI3K and AKTpThr308, IGFBP5 and IGFBP2, and IBalpha and EGFRpTyr845. Another cluster includes MMP9, Bcl-2, VEGF, and pRB. These clustering patterns may suggest functional relationships, which warrant further investigation. The marked association of phosphorylation of AKT at Thr308, but not Ser473, with glioblastoma suggests a specific event of PI3K pathway activation in glioma progression.
Collapse
|
123
|
Nykter M, Hunt KK, Pollock RE, El-Naggar AK, Taylor E, Shmulevich I, Yli-Harja O, Zhang W. Unsupervised analysis uncovers changes in histopathologic diagnosis in supervised genomic studies. Technol Cancer Res Treat 2006; 5:177-82. [PMID: 16551137 DOI: 10.1177/153303460600500209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Human gastrointestinal stromal tumors (GIST) have recently emerged as a distinct mesenchymal tumor type that has a unique phenotype characterized by a gain of function mutations in c-kit. In contrast, leiomyosarcomas (LMS) of the gastrointestinal tract or retroperitoneum, which were previously classified together with GISTs as gastrointestinal sarcomas, have much less frequent mutations of c-kit. We performed microarray analyses to gain a comprehensive understanding of the difference between the two types of soft-tissue sarcomas at the level of gene expression. Microarray experiments were performed on 30 GISTs and 30 LMSs that were collected at the time of surgical resection. These tumors were categorized based on the histopathologic diagnosis recorded in our institutional database. Prior to our search for genes that are differentially expressed between these two types of cancers, we first carried out an unsupervised analysis using multidimensional scaling (MDS) to determine whether the two groups have marked overall differences in gene expression. Initially, the MDS did not reveal a good separation between the two groups. We then re-reviewed the histopathology of these tumors and realized that some of the cases included in our study were acquired 10 years ago when the diagnosis of gastrointestinal sarcoma was made according to histopathologic criteria alone without immunohistochemistry for c-kit. An experienced pathologist reviewed all of the specimens and this revealed that a number of the GIST cases were classified as LMS in the clinical database. Correction of the histopathologic diagnosis and relabeling of the samples resulted in a much more pronounced separation of GIST and LMS in the MDS analysis. This study underscores the need to re-review histopathology as reclassification occurs. While updating the clinical database may be desired, this is usually impractical. For molecular studies that use archival samples, it is critical to have the archival samples re-reviewed by a pathologist. Further, unsupervised analysis often proves to be a critical quality control step in identifying structural problems that may exist. Finally, MDS analysis further supports that GIST is a distinct type of sarcoma.
Collapse
|
124
|
Lähdesmäki H, Hautaniemi S, Shmulevich I, Yli-Harja O. Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. SIGNAL PROCESSING 2006; 86:814-834. [PMID: 17415411 PMCID: PMC1847796 DOI: 10.1016/j.sigpro.2005.06.008] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
A significant amount of attention has recently been focused on modeling of gene regulatory networks. Two frequently used large-scale modeling frameworks are Bayesian networks (BNs) and Boolean networks, the latter one being a special case of its recent stochastic extension, probabilistic Boolean networks (PBNs). PBN is a promising model class that generalizes the standard rule-based interactions of Boolean networks into the stochastic setting. Dynamic Bayesian networks (DBNs) is a general and versatile model class that is able to represent complex temporal stochastic processes and has also been proposed as a model for gene regulatory systems. In this paper, we concentrate on these two model classes and demonstrate that PBNs and a certain subclass of DBNs can represent the same joint probability distribution over their common variables. The major benefit of introducing the relationships between the models is that it opens up the possibility of applying the standard tools of DBNs to PBNs and vice versa. Hence, the standard learning tools of DBNs can be applied in the context of PBNs, and the inference methods give a natural way of handling the missing values in PBNs which are often present in gene expression measurements. Conversely, the tools for controlling the stationary behavior of the networks, tools for projecting networks onto sub-networks, and efficient learning schemes can be used for DBNs. In other words, the introduced relationships between the models extend the collection of analysis tools for both model classes.
Collapse
|
125
|
Niemistö A, Dunmire V, Yli-Harja O, Zhang W, Shmulevich I. Analysis of angiogenesis using in vitro experiments and stochastic growth models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:062902. [PMID: 16485992 DOI: 10.1103/physreve.72.062902] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2004] [Indexed: 05/06/2023]
Abstract
The global properties of vascular networks grown with an in vitro angiogenesis assay are compared quantitatively, using automated image analysis, with the global properties of networks obtained with discrete, stochastic growth models. The model classes that are investigated are invasion percolation and diffusion limited aggregation. By matching global properties to experimental data, one can infer which model classes and parameters are most reflective of angiogenesis in experimental cells. This sheds light on large-scale emergent properties of angiogenesis from a systems perspective. It is found that invasion percolation is better than diffusion limited aggregation at matching experimental data. We also present evidence that the distribution of the lengths of real tubule complexes follows a power law.
Collapse
|
126
|
Shmulevich I, Kauffman SA, Aldana M. Eukaryotic cells are dynamically ordered or critical but not chaotic. Proc Natl Acad Sci U S A 2005; 102:13439-44. [PMID: 16155121 PMCID: PMC1224670 DOI: 10.1073/pnas.0506771102] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Indexed: 12/30/2022] Open
Abstract
Two important theoretical approaches have been developed to generically characterize the relationship between the structure and function of large genetic networks: the continuous approach, based on reaction-kinetics differential equations, and the Boolean approach, based on difference equations and discrete logical rules. These two approaches do not always coincide in their predictions for the same system. Nonetheless, both of them predict that the highly nonlinear dynamics exhibited by genetic regulatory systems can be characterized into two broad regimes, to wit, an ordered regime where the system is robust against perturbations, and a chaotic regime where the system is extremely sensitive to perturbations. It has been a plausible and long-standing hypothesis that genomic regulatory networks of real cells operate in the ordered regime or at the border between order and chaos. This hypothesis is indirectly supported by the robustness and stability observed in the phenotypic traits of living organisms under genetic perturbations. However, there has been no systematic study to determine whether the gene-expression patterns of real cells are compatible with the dynamically ordered regimes predicted by theoretical models. Using the Boolean approach, here we show what we believe to be the first direct evidence that the underlying genetic network of HeLa cells appears to operate either in the ordered regime or at the border between order and chaos but does not appear to be chaotic.
Collapse
|
127
|
Fuller G, Mircean C, Tabus I, Taylor E, Sawaya R, Bruner J, Shmulevich I, Zhang W. Molecular voting for glioma classification reflecting heterogeneity in the continuum of cancer progression. Oncol Rep 2005. [DOI: 10.3892/or.14.3.651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
128
|
Fuller GN, Mircean C, Tabus I, Taylor E, Sawaya R, Bruner JM, Shmulevich I, Zhang W. Molecular voting for glioma classification reflecting heterogeneity in the continuum of cancer progression. Oncol Rep 2005; 14:651-6. [PMID: 16077969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Open
Abstract
Gliomas, the most common brain tumors, are generally categorized into two lineages (astrocytic and oligodendrocytic) and further classified as low-grade (astrocytoma and oligodendroglioma), mid-grade (anaplastic astrocytoma and anaplastic oligodendroglioma), and high-grade (glioblastoma multiforme) based on morphological features. A strict classification scheme has limitations because a specific glioma can be at any stage of the continuum of cancer progression and may contain mixed features. Thus, a more comprehensive classification based on molecular signatures may reflect the biological nature of specific tumors more accurately. In this study, we used microarray technology to profile the gene expression of 49 human brain tumors and applied the k-nearest neighbor algorithm for classification. We first trained the classification gene set with 19 of the most typical glioma cases and selected a set of genes that provide the lowest cross-validation classification error with k=5. We then applied this gene set to the 30 remaining cases, including several that do not belong to gliomas such as atypical meningioma. The results showed that not only does the algorithm correctly classify most of the gliomas, but the detailed voting results also provide more subtle information regarding the molecular similarities to neighboring classes. For atypical meningioma, the voting was equally split among the four classes, indicating a difficulty in placement of meningioma into the four classes of gliomas. Thus, the actual voting results, which are typically used only to decide the winning class label in k-nearest neighbor algorithms, provide a useful method for gaining deeper insight into the stage of a tumor in the continuum of cancer development.
Collapse
|
129
|
Niemistö A, Dunmire V, Yli-Harja O, Zhang W, Shmulevich I. Robust quantification of in vitro angiogenesis through image analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2005; 24:549-53. [PMID: 15822812 DOI: 10.1109/tmi.2004.837339] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
An automated image analysis method for quantification of in vitro angiogenesis is presented. The method is designed for in vitro angiogenesis assays that are based on co-culturing endothelial cells with fibroblasts. Such assays are used in many current studies in which anti-angiogenic agents for the treatment of cancer are being sought. This search requires accurate quantification of the stimulatory and inhibitory effects of the different agents. The quantification method gives lengths and sizes of the tubule complexes as well as the numbers of junctions in each of them. The method is tested with a set of test images obtained with a commercially available in vitro angiogenesis assay. The results correctly indicate the inhibitory effect of suramin and the stimulatory effect of vascular endothelial growth factor. Moreover, the image analysis method is shown to be robust against variations in illumination. We have implemented a software package that utilizes the methods. The software as well as a set of test images are available at http://www.cs.tut.fi/sgn/csb/angioquant/.
Collapse
|
130
|
Lee EJ, Mircean C, Shmulevich I, Wang H, Liu J, Niemistö A, Kavanagh JJ, Lee JH, Zhang W. Insulin-like growth factor binding protein 2 promotes ovarian cancer cell invasion. Mol Cancer 2005; 4:7. [PMID: 15686601 PMCID: PMC549074 DOI: 10.1186/1476-4598-4-7] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2004] [Accepted: 02/02/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insulin-like growth factor binding protein 2 (IGFBP2) is overexpressed in ovarian malignant tissues and in the serum and cystic fluid of ovarian cancer patients, suggesting an important role of IGFBP2 in the biology of ovarian cancer. The purpose of this study was to assess the role of increased IGFBP2 in ovarian cancer cells. RESULTS Using western blotting and tissue microarray analyses, we showed that IGFBP2 was frequently overexpressed in ovarian carcinomas compared with normal ovarian tissues. Furthermore, IGFBP2 was significantly overexpressed in invasive serous ovarian carcinomas compared with borderline serous ovarian tumors. To test whether increased IGFBP2 contributes to the highly invasive nature of ovarian cancer cells, we generated IGFBP2-overexpressing cells from an SKOV3 ovarian cancer cell line, which has a very low level of endogenous IGFBP2. A Matrigel invasion assay showed that these IGFBP2-overexpressing cells were more invasive than the control cells. We then designed small interference RNA (siRNA) molecules that attenuated IGFBP2 expression in PA-1 ovarian cancer cells, which have a high level of endogenous IGFBP2. The Matrigel invasion assay showed that the attenuation of IGFBP2 expression indeed decreased the invasiveness of PA-1 cells. CONCLUSIONS We therefore showed that IGFBP2 enhances the invasion capacity of ovarian cancer cells. Blockage of IGFBP2 may thus constitute a viable strategy for targeted cancer therapy.
Collapse
|
131
|
Mircean C, Shmulevich I, Cogdell D, Choi W, Jia Y, Tabus I, Hamilton SR, Zhang W. Robust estimation of protein expression ratios with lysate microarray technology. Bioinformatics 2005; 21:1935-42. [PMID: 15647295 DOI: 10.1093/bioinformatics/bti258] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The protein lysate microarray is a developing proteomic technology for measuring protein expression levels in a large number of biological samples simultaneously. A challenge for accurate quantification is the relatively narrow dynamic range associated with the commonly used chromogenic signal detection system. To facilitate accurate measurement of the relative expression levels, each sample is serially diluted and each diluted version is spotted on a nitrocellulose-coated slide in triplicate. Thus, each sample yields multiple measurements in different dynamic ranges of the detection system. This study aims to develop suitable algorithms that yield accurate representations of the relative expression levels in different samples from multiple data points. RESULTS We evaluated two algorithms for estimating relative protein expression in different samples on the lysate microarray by means of a cross-validation procedure. For this purpose as well as for quality control we designed a 1440-spot lysate microarray containing 80 identical samples of purified bovine serum albumin, printed in triplicate with six 2-fold dilutions. Our analysis showed that the algorithm based on a robust least squares estimator provided the most accurate quantification of the protein lysate microarray data. We also demonstrated our methods by estimating relative expression levels of p53 and p21 in either p53(+/+) or p53(-/-) HCT116 colon cancer cells after two drug treatments and their combinations on another lysate microarray. AVAILABILITY http://www.cs.tut.fi/~mirceanc/lysate_array_bioinformatics.htm
Collapse
|
132
|
Toulouse T, Ao P, Shmulevich I, Kauffman S. Noise in a Small Genetic Circuit that Undergoes Bifurcation. COMPLEXITY 2005; 11:45-51. [PMID: 16670776 PMCID: PMC1456069 DOI: 10.1002/cplx.20099] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Based on the consideration of Boolean dynamics, it has been hypothesized that cell types may correspond to alternative attractors of a gene regulatory network. Recent stochastic Boolean network analysis, however, raised the important question concerning the stability of such attractors. In this paper a detailed numerical analysis is performed within the framework of Langevin dynamics. While the present results confirm that the noise is indeed an important dynamical element, the cell type as represented by attractors can still be a viable hypothesis. It is found that the stability of an attractor depends on the strength of noise related to the distance of the system to the bifurcation point and it can be exponentially stable depending on biological parameters.
Collapse
|
133
|
Etzion Y, Linker R, Cogan U, Shmulevich I. Determination of Protein Concentration in Raw Milk by Mid-Infrared Fourier Transform Infrared/Attenuated Total Reflectance Spectroscopy. J Dairy Sci 2004; 87:2779-88. [PMID: 15375035 DOI: 10.3168/jds.s0022-0302(04)73405-0] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
This study investigates the potential use of attenuated total reflectance spectroscopy in the mid-infrared range for determining protein concentration in raw cow milk. The determination of protein concentration is based on the characteristic absorbance of milk proteins, which includes 2 absorbance bands in the 1500 to 1700 cm(-1) range, known as the amide I and amide II bands, and absorbance in the 1060 to 1100 cm(-1) range, which is associated with phosphate groups covalently bound to casein proteins. To minimize the influence of the strong water band (centered around 1640 cm(-1)) that overlaps with the amide I and amide II bands, an optimized automatic procedure for accurate water subtraction was applied. Following water subtraction, the spectra were analyzed by 3 methods, namely simple band integration, partial least squares (PLS) and neural networks. For the neural network models, the spectra were first decomposed by principal component analysis (PCA), and the neural network inputs were the spectra principal components scores. In addition, the concentrations of 2 constituents expected to interact with the protein (i.e., fat and lactose) were also used as inputs. These approaches were tested with 235 spectra of standardized raw milk samples, corresponding to 26 protein concentrations in the 2.47 to 3.90% (weight per volume) range. The simple integration method led to very poor results, whereas PLS resulted in prediction errors of about 0.22% protein. The neural network approach led to prediction errors of 0.20% protein when based on PCA scores only, and 0.08% protein when lactose and fat concentrations were also included in the model. These results indicate the potential usefulness of Fourier transform infrared/attenuated total reflectance spectroscopy for rapid, possibly online, determination of protein concentration in raw milk.
Collapse
|
134
|
Shmulevich I, Kauffman SA. Activities and sensitivities in boolean network models. PHYSICAL REVIEW LETTERS 2004; 93:048701. [PMID: 15323803 PMCID: PMC1490311 DOI: 10.1103/physrevlett.93.048701] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2003] [Indexed: 05/20/2023]
Abstract
We study how the notions of importance of variables in Boolean functions as well as the sensitivities of the functions to changes in these variables impact the dynamical behavior of Boolean networks. The activity of a variable captures its influence on the output of the function and is a measure of that variable's importance. The average sensitivity of a Boolean function captures the smoothness of the function and is related to its internal homogeneity. In a random Boolean network, we show that the expected average sensitivity determines the well-known critical transition curve. We also discuss canalizing functions and the fact that the canalizing variables enjoy higher importance, as measured by their activities, than the noncanalizing variables. Finally, we demonstrate the important role of the average sensitivity in determining the dynamical behavior of a Boolean network.
Collapse
|
135
|
Chirieac L, Suehiro Y, Niemisto A, Shmulevich I, Lunagomez S, Morris J, Hamilton SR. Size and number of examined lymph nodes impacts outcome in patients with stage II colorectal cancer. J Clin Oncol 2004. [DOI: 10.1200/jco.2004.22.90140.3507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
136
|
Lähdesmäki H, Hao X, Sun B, Hu L, Yli-Harja O, Shmulevich I, Zhang W. Distinguishing key biological pathways between primary breast cancers and their lymph node metastases by gene function-based clustering analysis. Int J Oncol 2004; 24:1589-96. [PMID: 15138604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023] Open
Abstract
In order to identify key biological pathways that can distinguish between primary breast cancers and their lymph node metastases, we employed gene expression profiling together with gene function-based clustering analysis. We first acquired gene expression profiles of 9 matched primary tumors and the corresponding metastases that contained at least 75% of tumor cells. Then, we applied a clustering algorithm to the preprocessed data. In order to focus on the most informative genes, we ranked all the genes individually based on their abilities to separate the primary breast tumor and metastases samples. Further, we separated these genes into six functional groups according to the Stanford SOURCE database: 'cell cycle,' 'apoptosis,' 'metabolism,' 'cell adhesion and migration,' 'signal transduction,' and 'transcriptional factor and DNA binding molecules.' Unsupervised clustering analysis using all of the 2,303 genes on the microarrays was not able to separate the primary and metastases samples. Clustering analysis using the most informative genes revealed that primary tumors were more tightly clustered, whereas the metastases samples were relatively heterogeneous. The clustering analysis with the genes belonging to different functional groups showed that different functional gene sets varied in their abilities to separate primary tumors and their metastases. Marked separations were found with genes involved in metabolism, signal transduction, cell cycle, and transcriptional factor and DNA binding molecules. In contrast, apoptosis and cell adhesion and migration genes did not provide a clear separation of the two groups of samples. These results suggest that metastatic cells have different metabolism and signal transduction activities, regulated by transcriptional events, from the primary tumor cells. The results also suggest that the altered cell adhesion and migration potentials that are required for tumors to metastasize already exist in the primary tumors as a whole.
Collapse
|
137
|
Hao X, Sun B, Hu L, Lähdesmäki H, Dunmire V, Feng Y, Zhang SW, Wang H, Wu C, Wang H, Fuller GN, Symmans WF, Shmulevich I, Zhang W. Differential gene and protein expression in primary breast malignancies and their lymph node metastases as revealed by combined cDNA microarray and tissue microarray analysis. Cancer 2004; 100:1110-22. [PMID: 15022276 DOI: 10.1002/cncr.20095] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND Metastatic disease is a major adverse prognostic factor in breast carcinoma. Lymph node metastases often represent the first step in the metastatic process. METHODS To gain insight into the molecular events that underlie breast carcinoma metastasis, the authors compared gene expression profiles, obtained by cDNA microarray analysis, of nine matched primary tumors and metastases after screening for enrichment of tumor cells. Statistical analysis identified genes that are expressed at elevated or decreased levels in metastases relative to the corresponding primary tumors. Multidimensional scaling analysis indicated that in terms of expression levels, primary tumors were tightly clustered, whereas metastases exhibited a greater spread; this finding points to the more heterogeneous nature of metastases. Among the differentially expressed entities were the invasion- and tissue modeling-related genes IGFBP5, fibronectin, and MMP2; the cell cycle regulatory gene cyclin D1; other genes, such as enolase 2; and an expressed sequence tag similar to angiopoietin 1. To validate and extend these initial findings, the authors constructed a tissue microarray consisting of 100 primary malignancies paired with their lymph node metastases. Antibodies for the IGFBP-5, fibronectin, MMP-2, cyclin D1, and MDM-2 proteins were used to stain tissue array sections. RESULTS Consistent with microarray data, statistically significant overexpression of IGFBP-5, down-regulation of cyclin D1, and unchanged MDM-2 levels were observed in metastatic tumor cells. Nonetheless, although fibronectin and MMP2 mRNA expression levels were decreased in many metastasis specimens, expression levels of the corresponding proteins in the extracellular matrix were elevated in most metastases. Decreased expression of fibronectin and MMP2 in lymph node metastases was further confirmed by real-time polymerase chain reaction assays performed on five additional specimen pairs. CONCLUSIONS The results of the current study suggest that extracellular matrix protein expression and nuclear gene expression are associated via a negative-feedback regulatory mechanism. Therefore, gene expression profiling and tissue array validation should be combined to elucidate molecular events associated with the metastatic process.
Collapse
|
138
|
Mircean C, Tabus I, Kobayashi T, Yamaguchi M, Shiku H, Shmulevich I, Zhang W. Pathway analysis of informative genes from microarray data reveals that metabolism and signal transduction genes distinguish different subtypes of lymphomas. Int J Oncol 2004; 24:497-504. [PMID: 14767533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
Recent clinicopathological studies identified a unique subgroup of diffuse large B-cell lymphoma (DLBCL) that expresses CD5 on the cell surface. This 'de novo CD5+ DLBCL' comprises 10% of all DLBCL and has a poorer prognosis than CD5- DLBCL. Comparison of gene expression profiles between de novo CD5+ DLBCLs and CD5- DLBCLs shows that de novo CD5+ DLBCL expresses high levels of integrin beta1 in tumor cells and CD36 in the vascular cells. On the other hand, comparison between mantle cell lymphomas (MCLs) and DLBCLs expectedly identified cyclin D1 as a top feature gene. To gain insight into the molecular pathway differences among the three types of lymphoma, we evaluated the functional categories of groups of genes important for the discrimination among the three groups. We first selected 280 (from 2,142) genes, according to their individual discriminatory power. We then used the gene-shaving clustering algorithm and identified 22 clusters of genes. Of the 22 clusters, six were highly correlated with the class labels of the patients and the top three clusters accounted for the major difference among the three lymphoma subtypes. A multidimensional scaling (MDS) analysis using the average genes from the top three clusters separated the three lymphoma subtypes quite well. The functions of the genes in the top three gene clusters showed a significant enrichment of metabolism and signal transduction. To further examine whether genes of particular functions reflect more faithfully the difference between the subtypes of lymphomas, we separated the 280 informative genes into six different functional groups and performed MDS analysis using each of the gene groups. Four of the gene-function groups (metabolism, signal transduction pathway, transcriptional factors, cell adhesion and migration), separated the three lymphoma subtypes well, whereas apoptosis genes and cell cycle genes did not result in good separation.
Collapse
MESH Headings
- Algorithms
- CD5 Antigens/biosynthesis
- Cell Adhesion
- Cell Movement
- Cluster Analysis
- Gene Expression Regulation, Neoplastic
- Humans
- Lymphoma, Large B-Cell, Diffuse/diagnosis
- Lymphoma, Large B-Cell, Diffuse/genetics
- Lymphoma, Large B-Cell, Diffuse/metabolism
- Lymphoma, Mantle-Cell/diagnosis
- Lymphoma, Mantle-Cell/genetics
- Lymphoma, Mantle-Cell/metabolism
- Models, Statistical
- Multigene Family
- Oligonucleotide Array Sequence Analysis
- Phenotype
- Signal Transduction
- Transcription Factors/metabolism
Collapse
|
139
|
Mircean C, Tabus I, Kobayashi T, Yamaguchi M, Shiku H, Shmulevich I, Zhang W. Pathway analysis of informative genes from microarray data reveals that metabolism and signal transduction genes distinguish different subtypes of lymphomas. Int J Oncol 2004. [DOI: 10.3892/ijo.24.3.497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
140
|
Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, Dougherty ER. Growing genetic regulatory networks from seed genes. Bioinformatics 2004; 20:1241-7. [PMID: 14871865 DOI: 10.1093/bioinformatics/bth074] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A number of models have been proposed for genetic regulatory networks. In principle, a network may contain any number of genes, so long as data are available to make inferences about their relationships. Nevertheless, there are two important reasons why the size of a constructed network should be limited. Computationally and mathematically, it is more feasible to model and simulate a network with a small number of genes. In addition, it is more likely that a small set of genes maintains a specific core regulatory mechanism. RESULTS Subnetworks are constructed in the context of a directed graph by beginning with a seed consisting of one or more genes believed to participate in a viable subnetwork. Functionalities and regulatory relationships among seed genes may be partially known or they may simply be of interest. Given the seed, we iteratively adjoin new genes in a manner that enhances subnetwork autonomy. The algorithm is applied using both the coefficient of determination and the Boolean-function influence among genes, and it is illustrated using a glioma gene-expression dataset. AVAILABILITY Software for the seed-growing algorithm will be available at the website for Probabilistic Boolean Networks: http://www2.mdanderson.org/app/ilya/PBN/PBN.htm
Collapse
|
141
|
Shmulevich I, Lähdesmäki H, Dougherty ER, Astola J, Zhang W. The role of certain Post classes in Boolean network models of genetic networks. Proc Natl Acad Sci U S A 2003; 100:10734-9. [PMID: 12963822 PMCID: PMC202352 DOI: 10.1073/pnas.1534782100] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A topic of great interest and debate concerns the source of order and remarkable robustness observed in genetic regulatory networks. The study of the generic properties of Boolean networks has proven to be useful for gaining insight into such phenomena. The main focus, as regards ordered behavior in networks, has been on canalizing functions, internal homogeneity or bias, and network connectivity. Here we examine the role that certain classes of Boolean functions that are closed under composition play in the emergence of order in Boolean networks. The closure property implies that any gene at any number of steps in the future is guaranteed to be governed by a function from the same class. By means of Derrida curves on random Boolean networks and percolation simulations on square lattices, we demonstrate that networks constructed from functions belonging to these classes have a tendency toward ordered behavior. Thus they are not overly sensitive to initial conditions, and damage does not readily spread throughout the network. In addition, the considered classes are significantly larger than the class of canalizing functions as the connectivity increases. The functions in these classes exhibit the same kind of preference toward biased functions as do canalizing functions, meaning that functions from this class are likely to be biased. Finally, functions from this class have a natural way of ensuring robustness against noise and perturbations, thus representing plausible evolutionarily selected candidates for regulatory rules in genetic networks.
Collapse
|
142
|
Morikawa J, Li H, Kim S, Nishi K, Ueno S, Suh E, Dougherty E, Shmulevich I, Shiku H, Zhang W, Kobayashi T. Identification of signature genes by microarray for acute myeloid leukemia without maturation and acute promyelocytic leukemia with t(15;17)(q22;q12)(PML/RARα). Int J Oncol 2003. [DOI: 10.3892/ijo.23.3.617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
143
|
Morikawa J, Li H, Kim S, Nishi K, Ueno S, Suh E, Dougherty E, Shmulevich I, Shiku H, Zhang W, Kobayashi T. Identification of signature genes by microarray for acute myeloid leukemia without maturation and acute promyelocytic leukemia with t(15;17)(q22;q12)(PML/RARalpha). Int J Oncol 2003; 23:617-25. [PMID: 12888896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2023] Open
Abstract
Acute myeloid leukemia (AML) has distinct subgroups characterized by different maturation and specific chromosomal translocation. In order to gain insight into the gene expression activities in AML, we carried out a gene expression profiling study with 21 AML samples using cDNA microarrays, focusing on acute promyelocytic leukemia with specific translocation t(15;17)(q22;q12) [French-American-British or FAB-M3 with t(15;17)] and AML without maturation (FAB-M1) characterized by morphologically and phenotypically immature AML blasts and no recurrent chromosomal abnormalities. Using a multivariate sigma-classifier algorithm, we identified 33 strong feature genes that distinguish FAB-M3 with t(15;17) from other AML samples, and 24 strong feature genes that classify FAB-M1. A direct comparison between FAB-M3 with t(15;17) and FAB-M1 led to selection of 13 strong feature genes. Those genes include some known to be related to leukemogenesis and cell differentiation. RIN1, a gene in the ras pathway, was up-regulated in FAB-M3 with t(15;17). Growth factor-binding protein 2 gene was down-regulated in FAB-M1. Huntingtin gene was up-regulated in FAB-M1. Others include syndecan 4, interleukin-2 receptor beta, folate receptor beta, low affinity immunoglobulin gamma, Fc receptor IIC precursor, insulin-like growth factor binding protein 2, and myeloperoxidase, which are involved in cell differentiation. Overexpression of myeloperoxidase in FAB-M3 cells with t(15;17) compared to FAB-M1 cells is consistent with the conventional cytochemical staining pattern. Thus, the study revealed that a morphologically-defined FAB-M1 subtype has a distinct gene expression signature that contributes to its cell differentiation and proliferation as well as FAB-M3 with a recurrent cytogenetic abnormality t(15;17)(q22;q12).
Collapse
MESH Headings
- Algorithms
- Cell Differentiation
- Chromosomes, Human, Pair 15
- Chromosomes, Human, Pair 17
- Cluster Analysis
- DNA, Complementary/metabolism
- Down-Regulation
- Gene Expression Regulation, Neoplastic
- Humans
- Image Processing, Computer-Assisted
- Karyotyping
- Leukemia, Myeloid, Acute/genetics
- Leukemia, Promyelocytic, Acute/genetics
- Multivariate Analysis
- Neoplasm Proteins/genetics
- Nuclear Proteins
- Oligonucleotide Array Sequence Analysis
- Promyelocytic Leukemia Protein
- RNA/metabolism
- Receptors, Retinoic Acid/genetics
- Retinoic Acid Receptor alpha
- Transcription Factors/genetics
- Translocation, Genetic
- Tumor Suppressor Proteins
- Up-Regulation
Collapse
|
144
|
Shmulevich I. Model selection in genomics. ENVIRONMENTAL HEALTH PERSPECTIVES 2003; 111:A328-A329. [PMID: 12760836 PMCID: PMC1241516 DOI: 10.1289/ehp.111-a328b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
|
145
|
Shmulevich I, Astola J, Cogdell D, Hamilton SR, Zhang W. Data extraction from composite oligonucleotide microarrays. Nucleic Acids Res 2003; 31:e36. [PMID: 12655024 PMCID: PMC152821 DOI: 10.1093/nar/gng036] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Microarray or DNA chip technology is revolutionizing biology by empowering researchers in the collection of broad-scope gene information. It is well known that microarray-based measurements exhibit a substantial amount of variability due to a number of possible sources, ranging from hybridization conditions to image capture and analysis. In order to make reliable inferences and carry out quantitative analysis with microarray data, it is generally advisable to have more than one measurement of each gene. The availability of both between-array and within-array replicate measurements is essential for this purpose. Although statistical considerations call for increasing the number of replicates of both types, the latter is particularly challenging in practice due to a number of limiting factors, especially for in-house spotting facilities. We propose a novel approach to design so-called composite microarrays, which allow more replicates to be obtained without increasing the number of printed spots.
Collapse
|
146
|
Kobayashi T, Yamaguchi M, Kim S, Morikawa J, Ogawa S, Ueno S, Suh E, Dougherty E, Shmulevich I, Shiku H, Zhang W. Microarray reveals differences in both tumors and vascular specific gene expression in de novo CD5+ and CD5- diffuse large B-cell lymphomas. Cancer Res 2003; 63:60-6. [PMID: 12517778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2023]
Abstract
Malignant lymphoma is a heterogeneous disease with different clinical features. Among diffuse large B-cell lymphomas (DLBCLs), a unique subtype has been identified recently based on cell surface marker CD5 and clinicopathological features. These de novo CD5(+) DLBCLs account for approximately 10% of all of the DLBCLs and have poorer prognosis. To additionally understand this subtype of DLBCLs at the molecular level and to find genes that are differentially expressed in de novo CD5(+) DLBCLs, CD5(-) DLBCLs, and mantle cell lymphomas, which also have poor prognosis, we performed gene expression profiling using cDNA microarray technology. Data from a total of 9 samples of CD5(-) DLBCLs, 11 samples of de novo CD5(+) DLBCLs, and 10 samples of mantle cell lymphomas were acquired. A series of genes were identified that distinguish these three types of lymphomas. Among DLBCL cases, integrin beta1 and/or CD36 adhesion molecules were overexpressed in most cases of CD5(+) DLBCL. An immunohistochemical confirmation study revealed that integrin beta1 was expressed on lymphoma cells, which may account for the high extranodal involvement and poor prognosis of CD5(+) DLBCLs. In contrast, CD36 was overexpressed on vascular endothelia in CD5(+) DLBCLs, although there was no difference in vascularity detected by von Wilbrand factor antibody between CD5(+) and CD5(-) DLBCLs. Those results suggest that CD5(+) and CD5(-) DLBCLs have different gene expression signatures in both tumor cells and their vascular systems.
Collapse
MESH Headings
- Aged
- Aged, 80 and over
- Blood Vessels/pathology
- CD5 Antigens/genetics
- Female
- Gene Expression Regulation
- Gene Expression Regulation, Neoplastic
- Humans
- Immunophenotyping
- Lymphoma, B-Cell/classification
- Lymphoma, B-Cell/genetics
- Lymphoma, B-Cell/pathology
- Lymphoma, Large B-Cell, Diffuse/classification
- Lymphoma, Large B-Cell, Diffuse/genetics
- Lymphoma, Large B-Cell, Diffuse/pathology
- Male
- Middle Aged
- Oligonucleotide Array Sequence Analysis
- von Willebrand Factor/genetics
Collapse
|
147
|
Katkovnik V, Shmulevich I. Kernel density estimation with adaptive varying window size. Pattern Recognit Lett 2002. [DOI: 10.1016/s0167-8655(02)00127-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
148
|
Kim S, Dougherty ER, Shmulevich I, Hess KR, Hamilton SR, Trent JM, Fuller GN, Zhang W. Identification of combination gene sets for glioma classification. Mol Cancer Ther 2002; 1:1229-36. [PMID: 12479704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2023]
Abstract
One goal for the gene expression profiling of cancer tissues is to identify signature genes that robustly distinguish different types or grades of tumors. Such signature genes would ideally provide a molecular basis for classification and also yield insight into the molecular events underlying different cancer phenotypes. This study applies a recently developed algorithm to identify not only single classifier genes but also gene sets (combinations) for use as glioma classifiers. Classifier genes identified by this algorithm are shown to be strong features by conservatively and collectively considering the misclassification errors of the feature sets. Applying this approach to a test set of 25 patients, we have identified the best single genes and two- to three-gene combinations for distinguishing four types of glioma: (a) oligodendroglioma; (b) anaplastic oligodendroglioma; (c) anaplastic astrocytoma; and (d) glioblastoma multiforme. Some of the identified genes, such as insulin-like growth factor-binding protein 2, have been confirmed to be associated with one of the tumor types. Using combinations of genes, the classification error rate can be significantly lowered. In many instances, neither of the individual genes of a two-gene set performs well as an accurate classifier, but the combination of the two genes forms a robust classifier with a small error rate. Two-gene and three-gene combinations thus provide robust classifiers possessing the potential to translate expression microarray results into diagnostic histopathological assays for clinical utilization.
Collapse
|
149
|
Shmulevich I, Dougherty ER, Zhang W. Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 2002; 18:1319-31. [PMID: 12376376 DOI: 10.1093/bioinformatics/18.10.1319] [Citation(s) in RCA: 206] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A major objective of gene regulatory network modeling, in addition to gaining a deeper understanding of genetic regulation and control, is the development of computational tools for the identification and discovery of potential targets for therapeutic intervention in diseases such as cancer. We consider the general question of the potential effect of individual genes on the global dynamical network behavior, both from the view of random gene perturbation as well as intervention in order to elicit desired network behavior. RESULTS Using a recently introduced class of models, called Probabilistic Boolean Networks (PBNs), this paper develops a model for random gene perturbations and derives an explicit formula for the transition probabilities in the new PBN model. This result provides a building block for performing simulations and deriving other results concerning network dynamics. An example is provided to show how the gene perturbation model can be used to compute long-term influences of genes on other genes. Following this, the problem of intervention is addressed via the development of several computational tools based on first-passage times in Markov chains. The consequence is a methodology for finding the best gene with which to intervene in order to most likely achieve desirable network behavior. The ideas are illustrated with several examples in which the goal is to induce the network to transition into a desired state, or set of states. The corresponding issue of avoiding undesirable states is also addressed. Finally, the paper turns to the important problem of assessing the effect of gene perturbations on long-run network behavior. A bound on the steady-state probabilities is derived in terms of the perturbation probability. The result demonstrates that states of the network that are more 'easily reachable' from other states are more stable in the presence of gene perturbations. Consequently, these are hypothesized to correspond to cellular functional states. AVAILABILITY A library of functions written in MATLAB for simulating PBNs, constructing state-transition matrices, computing steady-state distributions, computing influences, modeling random gene perturbations, and finding optimal intervention targets, as described in this paper, is available on request from is@ieee.org.
Collapse
|
150
|
Shmulevich I, van der Ven AHGS. An inhibition-based stochastic countable-time decision model. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2002; 55:17-25. [PMID: 12034009 DOI: 10.1348/000711002159671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
A new stochastic model to account for reaction-time fluctuation in prolonged work tasks is presented. Transition probabilities from work periods to distraction periods and vice versa are dependent on inhibition, which increases during work and decreases during distractions. The model presented here differs from all other inhibition-based models in that transitions can take place only at certain random points in time, and is referred to as a countable-time decision model. It is argued that the proposed model is a more plausible alternative to other existing inhibition-based models, while at the same time being highly flexible in that it is able to approximate other models arbitrarily well. This model is compared to an existing inhibition-based continuous-time decision model and the probability distribution functions for work and distraction periods are derived.
Collapse
|