251
|
Nettleton D, Recknor J, Reecy JM. Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. ACTA ACUST UNITED AC 2007; 24:192-201. [PMID: 18042553 DOI: 10.1093/bioinformatics/btm583] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION The field of microarray data analysis is shifting emphasis from methods for identifying differentially expressed genes to methods for identifying differentially expressed gene categories. The latter approaches utilize a priori information about genes to group genes into categories and enhance the interpretation of experiments aimed at identifying expression differences across treatments. While almost all of the existing approaches for identifying differentially expressed gene categories are practically useful, they suffer from a variety of drawbacks. Perhaps most notably, many popular tools are based exclusively on gene-specific statistics that cannot detect many types of multivariate expression change. RESULTS We have developed a nonparametric multivariate method for identifying gene categories whose multivariate expression distribution differs across two or more conditions. We illustrate our approach and compare its performance to several existing procedures via the analysis of a real data set and a unique data-based simulation study designed to capture the challenges and complexities of practical data analysis. We show that our method has good power for differentiating between differentially expressed and non-differentially expressed gene categories, and we utilize a resampling based strategy for controlling the false discovery rate when testing multiple categories. AVAILABILITY R code (www.r-project.org) for implementing our approach is available from the first author by request.
Collapse
Affiliation(s)
- Dan Nettleton
- Department of Statistics, Lowa State University, Ames, Lowa 50011-1210, USA.
| | | | | |
Collapse
|
252
|
Marco A, Marín I. A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification. BMC Bioinformatics 2007; 8:442. [PMID: 18005402 PMCID: PMC2213689 DOI: 10.1186/1471-2105-8-442] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2007] [Accepted: 11/15/2007] [Indexed: 11/18/2022] Open
Abstract
Background Classification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data. Results We describe a novel strategy to compare a hierarchical and a dichotomic non-hierarchical classification of elements, in order to find clusters in a hierarchical tree in which elements of a given "flat" partition are overrepresented. The key improvement of our strategy respect to previous methods is using permutation analyses of ranked clusters to determine whether regions of the dendrograms present a significant enrichment. We show that this method is more sensitive than previously developed strategies and how it can be applied to several real cases, including microarray and interactome data. Particularly, we use it to compare a hierarchical representation of the yeast mitochondrial interactome and a catalogue of known mitochondrial protein complexes, demonstrating a high level of congruence between those two classifications. We also discuss extensions of this method to other cases which are conceptually related. Conclusion Our method is highly sensitive and outperforms previously described strategies. A PERL script that implements it is available at .
Collapse
Affiliation(s)
- Antonio Marco
- Departamento de Genética, Universidad de Valencia, Burjassot, Spain.
| | | |
Collapse
|
253
|
Zhong S, Xie D. Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework. Artif Intell Med 2007; 41:105-15. [PMID: 17913480 DOI: 10.1016/j.artmed.2007.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2006] [Revised: 08/02/2007] [Accepted: 08/03/2007] [Indexed: 10/22/2022]
Abstract
OBJECTIVE Gene Ontology (GO) has become a routine resource for functional analysis of gene lists. Although a number of tools have been provided to identify enriched GO terms in one or two gene lists, two technical challenges remain. First, how to handle multiple hypothesis testing in the analysis given that the tests are heavily correlated; second, how to identify GO terms that are enriched in a gene cluster, as compared to multiple other gene clusters. We provide a statistical procedure to rigorously treat these problems and offer a software tool for applying GO to the analysis of gene clusters. METHODS We previously introduced a statistical procedure that handles hypothesis testing in a two-group comparison scenario. In this paper we extend the two-group comparison procedure into a general procedure that enables the analysis of any number of gene lists/clusters. This new procedure enables identification of GO terms enriched in any gene cluster, while it controls for multiple hypothesis testing. This procedure is implemented into a user-friendly analysis tool: GoSurfer. The current version of GoSurfer takes one or several gene lists as input, and it identifies the GO terms that are enriched in any of the input gene lists. GoSurfer estimates a conservative false discovery rate (FDR) for every GO term. The FDR estimation procedure in GoSurfer has two advantages: it does not rely on independence assumption, and it does not assume all the hypotheses are null hypothesis (complete null). Thus GoSurfer's FDR estimates are mildly conservative rather than overly conservative. RESULTS We implemented the new procedure for GO analysis in multiple gene clusters into the GoSurfer software. We provide three examples on using GoSurfer to analyze time course gene expression data sets on the differentiation of embryonic stem cells. In the example of analysis of multiple gene clusters, we first used a typical clustering algorithm and identified five gene clusters, representing up-regulation, down-regulation and other patterns in the differentiation time course. Taking all the five gene clusters as input data, GoSurfer reports "cell adhesion" and "muscle contraction" as significant GO terms for the up-regulated cluster, "amino acids metabolism" as a significant GO term for the down-regulated gene cluster, and GoSurfer reports a number of GO terms related to RNA processing and RNA transport as significant terms to a cluster that is up-regulated in both early and late time points. This may suggest that genes for RNA processing and genes for RNA transport are coregulated in the differentiation process of embryonic stem cells. CONCLUSION The GoSurfer software is provided to analyze multiple gene clusters and identify GO terms that are enriched in any gene cluster. Gosurfer is available at: www.gosurfer.org.
Collapse
Affiliation(s)
- Sheng Zhong
- Department of Bioengineering, University of Illinois at Urbana Champaign, IL 61801, United States.
| | | |
Collapse
|
254
|
Gupta M, Ibrahim JG. Variable Selection in Regression Mixture Modeling for the Discovery of Gene Regulatory Networks. J Am Stat Assoc 2007. [DOI: 10.1198/016214507000000068] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
255
|
Witten JT, Chen CTL, Cohen BA. Complex genetic changes in strains of Saccharomyces cerevisiae derived by selection in the laboratory. Genetics 2007; 177:449-56. [PMID: 17660538 PMCID: PMC2013722 DOI: 10.1534/genetics.107.077859] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Selection of model organisms in the laboratory has the potential to generate useful substrates for testing evolutionary theories. These studies generally employ relatively long-term selections with weak selective pressures to allow the accumulation of multiple adaptations. In contrast to this approach, we analyzed two strains of Saccharomyces cerevisiae that were selected for resistance to multiple stress challenges by a rapid selection scheme to test whether the variation between rapidly selected strains might also be useful in evolutionary studies. We found that resistance to oxidative stress is a multigene trait in these strains. Both derived strains possess the same major-effect adaptations to oxidative stress, but have distinct modifiers of the phenotype. Similarly, both derived strains have altered their global transcriptional responses to oxidative stress in similar ways, but do have at least some distinct differences in transcriptional regulation. We conclude that short-term laboratory selections can generate complex genetic variation that may be a useful substrate for testing evolutionary theories.
Collapse
Affiliation(s)
- Joshua T Witten
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | | | | |
Collapse
|
256
|
Zhou X, Su Z. EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species. BMC Genomics 2007; 8:246. [PMID: 17645808 PMCID: PMC1940007 DOI: 10.1186/1471-2164-8-246] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 07/24/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It is always difficult to interpret microarray results. Recently, a handful of tools have been developed to meet this need, but almost none of them were designed to support agronomical species. DESCRIPTION This paper presents EasyGO, a web server to perform Gene Ontology based functional interpretation on groups of genes or GeneChip probe sets. EasyGO makes a special contribution to the agronomical research community by supporting Affymetrix GeneChips of both crops and farm animals and by providing stronger capabilities for results visualization and user interaction. Currently it supports 11 agronomical plants, 3 farm animals, and the model plant Arabidopsis. The authors demonstrated EasyGO's ability to uncover hidden knowledge by analyzing a group of probe sets with similar expression profiles. CONCLUSION EasyGO is a good tool for helping biologists and agricultural scientists to discover enriched biological knowledge that can provide solutions or suggestions for original problems. It is freely available to all users at http://bioinformatics.cau.edu.cn/easygo/.
Collapse
Affiliation(s)
- Xin Zhou
- Division of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100094, China
| | - Zhen Su
- Division of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100094, China
| |
Collapse
|
257
|
Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res 2007; 35:W193-200. [PMID: 17478515 PMCID: PMC1933153 DOI: 10.1093/nar/gkm226] [Citation(s) in RCA: 859] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Revised: 03/22/2007] [Accepted: 03/28/2007] [Indexed: 02/02/2023] Open
Abstract
g:Profiler (http://biit.cs.ut.ee/gprofiler/) is a public web server for characterising and manipulating gene lists resulting from mining high-throughput genomic data. g:Profiler has a simple, user-friendly web interface with powerful visualisation for capturing Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual gene levels. Besides standard multiple testing corrections, a new improved method for estimating the true effect of multiple testing over complex structures like GO has been introduced. Interpreting ranked gene lists is supported from the same interface with very efficient algorithms. Such ordered lists may arise when studying the most significantly affected genes from high-throughput data or genes co-expressed with the query gene. Other important aspects of practical data analysis are supported by modules tightly integrated with g:Profiler. These are: g:Convert for converting between different database identifiers; g:Orth for finding orthologous genes from other species; and g:Sorter for searching a large body of public gene expression data for co-expression. g:Profiler supports 31 different species, and underlying data is updated regularly from sources like the Ensembl database. Bioinformatics communities wishing to integrate with g:Profiler can use alternative simple textual outputs.
Collapse
Affiliation(s)
- Jüri Reimand
- Institute of Computer Science, University of Tartu, Liivi 2, 50409 Tartu, Estonia, Estonian Biocentre, Riia 23b, 51010 Tartu, Estonia and EGeen, Ülikooli 6a, 51003 Tartu, Estonia
| | - Meelis Kull
- Institute of Computer Science, University of Tartu, Liivi 2, 50409 Tartu, Estonia, Estonian Biocentre, Riia 23b, 51010 Tartu, Estonia and EGeen, Ülikooli 6a, 51003 Tartu, Estonia
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Liivi 2, 50409 Tartu, Estonia, Estonian Biocentre, Riia 23b, 51010 Tartu, Estonia and EGeen, Ülikooli 6a, 51003 Tartu, Estonia
| | - Jaanus Hansen
- Institute of Computer Science, University of Tartu, Liivi 2, 50409 Tartu, Estonia, Estonian Biocentre, Riia 23b, 51010 Tartu, Estonia and EGeen, Ülikooli 6a, 51003 Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Liivi 2, 50409 Tartu, Estonia, Estonian Biocentre, Riia 23b, 51010 Tartu, Estonia and EGeen, Ülikooli 6a, 51003 Tartu, Estonia
| |
Collapse
|
258
|
Lerman G, Shakhnovich BE. Defining functional distance using manifold embeddings of gene ontology annotations. Proc Natl Acad Sci U S A 2007; 104:11334-9. [PMID: 17595300 PMCID: PMC2040899 DOI: 10.1073/pnas.0702965104] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure-function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules.
Collapse
Affiliation(s)
- Gilad Lerman
- *Department of Mathematics, University of Minnesota, Minneapolis, MN 55455; and
- To whom correspondence may be addressed. E-mail: or
| | - Boris E. Shakhnovich
- Program in Bioinformatics, Boston University, Boston, MA 02215
- To whom correspondence may be addressed. E-mail: or
| |
Collapse
|
259
|
Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, Lempicki RA. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 2007; 35:W169-75. [PMID: 17576678 PMCID: PMC1933169 DOI: 10.1093/nar/gkm415] [Citation(s) in RCA: 1567] [Impact Index Per Article: 92.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies. The newly updated DAVID Bioinformatics Resources consists of the DAVID Knowledgebase and five integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification Tool, the DAVID Functional Annotation Tool, the DAVID Gene ID Conversion Tool, the DAVID Gene Name Viewer and the DAVID NIAID Pathogen Genome Browser. The expanded DAVID Knowledgebase now integrates almost all major and well-known public bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a variety of public bioinformatics databases. For any uploaded gene list, the DAVID Resources now provides not only the typical gene-term enrichment analysis, but also new tools and functions that allow users to condense large gene lists into gene functional groups, convert between gene/protein identifiers, visualize many-genes-to-many-terms relationships, cluster redundant and heterogeneous terms into groups, search for interesting and related genes or terms, dynamically view genes from their lists on bio-pathways and more. With DAVID (http://david.niaid.nih.gov), investigators gain more power to interpret the biological mechanisms associated with large gene lists.
Collapse
Affiliation(s)
- Da Wei Huang
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Brad T. Sherman
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Qina Tan
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Joseph Kir
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Liu
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Bryant
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yongjian Guo
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Robert Stephens
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Michael W. Baseler
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - H. Clifford Lane
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Richard A. Lempicki
- Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
- *To whom correspondence should be addressed. +1-301-846-7114301-846-7672
| |
Collapse
|
260
|
Kramer RW, Slagowski NL, Eze NA, Giddings KS, Morrison MF, Siggers KA, Starnbach MN, Lesser CF. Yeast functional genomic screens lead to identification of a role for a bacterial effector in innate immunity regulation. PLoS Pathog 2007; 3:e21. [PMID: 17305427 PMCID: PMC1797620 DOI: 10.1371/journal.ppat.0030021] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 01/02/2007] [Indexed: 11/18/2022] Open
Abstract
Numerous bacterial pathogens manipulate host cell processes to promote infection and ultimately cause disease through the action of proteins that they directly inject into host cells. Identification of the targets and molecular mechanisms of action used by these bacterial effector proteins is critical to understanding pathogenesis. We have developed a systems biological approach using the yeast Saccharomyces cerevisiae that can expedite the identification of cellular processes targeted by bacterial effector proteins. We systematically screened the viable yeast haploid deletion strain collection for mutants hypersensitive to expression of the Shigella type III effector OspF. Statistical data mining of the results identified several cellular processes, including cell wall biogenesis, which when impaired by a deletion caused yeast to be hypersensitive to OspF expression. Microarray experiments revealed that OspF expression resulted in reversed regulation of genes regulated by the yeast cell wall integrity pathway. The yeast cell wall integrity pathway is a highly conserved mitogen-activated protein kinase (MAPK) signaling pathway, normally activated in response to cell wall perturbations. Together these results led us to hypothesize and subsequently demonstrate that OspF inhibited both yeast and mammalian MAPK signaling cascades. Furthermore, inhibition of MAPK signaling by OspF is associated with attenuation of the host innate immune response to Shigella infection in a mouse model. These studies demonstrate how yeast systems biology can facilitate functional characterization of pathogenic bacterial effector proteins. Many bacterial pathogens use specialized secretion systems to deliver effector proteins directly into host cells. The effector proteins mediate the subversion or inhibition of host cell processes to promote survival of the pathogens. Although these proteins are critical elements of pathogenesis, relatively few are well characterized. They often lack significant homology to proteins of known function, and they present special challenges, biological and practical, to study in vivo. For example, their functions often appear to be redundant or synergistic, and the organisms that produce them can be dangerous or difficult to culture, requiring special facilities. The yeast Saccharomyces cerevisiae has recently emerged as a model system to both identify and functionally characterize effector proteins. This work describes how genome-wide phenotypic screens and mRNA profiling of yeast expressing the Shigella effector OspF led to the discovery that OspF inhibits mitogen-activated protein kinase signaling in both yeast and mammalian cells. This inhibition of mitogen-activated protein kinase signaling is associated with attenuation of the host innate immune response. This study demonstrates how yeast functional genomic studies can contribute to the understanding of pathogenic effector proteins.
Collapse
Affiliation(s)
- Roger W Kramer
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
| | - Naomi L Slagowski
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
| | - Ngozi A Eze
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
| | - Kara S Giddings
- Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Monica F Morrison
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
| | - Keri A Siggers
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
| | - Michael N Starnbach
- Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Cammie F Lesser
- Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Harvard Medical School, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
261
|
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P. Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann Appl Stat 2007. [DOI: 10.1214/07-aoas104] [Citation(s) in RCA: 175] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
262
|
Vermeirssen V, Barrasa MI, Hidalgo CA, Babon JAB, Sequerra R, Doucette-Stamm L, Barabási AL, Walhout AJ. Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res 2007; 17:1061-71. [PMID: 17513831 PMCID: PMC1899117 DOI: 10.1101/gr.6148107] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Transcription regulatory networks play a pivotal role in the development, function, and pathology of metazoan organisms. Such networks are comprised of protein-DNA interactions between transcription factors (TFs) and their target genes. An important question pertains to how the architecture of such networks relates to network functionality. Here, we show that a Caenorhabditis elegans core neuronal protein-DNA interaction network is organized into two TF modules. These modules contain TFs that bind to a relatively small number of target genes and are more systems specific than the TF hubs that connect the modules. Each module relates to different functional aspects of the network. One module contains TFs involved in reproduction and target genes that are expressed in neurons as well as in other tissues. The second module is enriched for paired homeodomain TFs and connects to target genes that are often exclusively neuronal. We find that paired homeodomain TFs are specifically expressed in C. elegans and mouse neurons, indicating that the neuronal function of paired homeodomains is evolutionarily conserved. Taken together, we show that a core neuronal C. elegans protein-DNA interaction network possesses TF modules that relate to different functional aspects of the complete network.
Collapse
Affiliation(s)
- Vanessa Vermeirssen
- Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - M. Inmaculada Barrasa
- Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - César A. Hidalgo
- Center for Complex Network Research, Department of Physics, University of Notre Dame, Notre Dame, Indiana 46556, USA
| | - Jenny Aurielle B. Babon
- Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | | - Albert-László Barabási
- Center for Complex Network Research, Department of Physics, University of Notre Dame, Notre Dame, Indiana 46556, USA
| | - Albertha J.M. Walhout
- Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
- Corresponding author.E-mail ; fax (508) 856-5460
| |
Collapse
|
263
|
Ni JZ, Grate L, Donohue JP, Preston C, Nobida N, O’Brien G, Shiue L, Clark TA, Blume JE, Ares M. Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Dev 2007; 21:708-18. [PMID: 17369403 PMCID: PMC1820944 DOI: 10.1101/gad.1525507] [Citation(s) in RCA: 381] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Many alternative splicing events create RNAs with premature stop codons, suggesting that alternative splicing coupled with nonsense-mediated decay (AS-NMD) may regulate gene expression post-transcriptionally. We tested this idea in mice by blocking NMD and measuring changes in isoform representation using splicing-sensitive microarrays. We found a striking class of highly conserved stop codon-containing exons whose inclusion renders the transcript sensitive to NMD. A genomic search for additional examples identified>50 such exons in genes with a variety of functions. These exons are unusually frequent in genes that encode splicing activators and are unexpectedly enriched in the so-called "ultraconserved" elements in the mammalian lineage. Further analysis show that NMD of mRNAs for splicing activators such as SR proteins is triggered by splicing activation events, whereas NMD of the mRNAs for negatively acting hnRNP proteins is triggered by splicing repression, a polarity consistent with widespread homeostatic control of splicing regulator gene expression. We suggest that the extreme genomic conservation surrounding these regulatory splicing events within splicing factor genes demonstrates the evolutionary importance of maintaining tightly tuned homeostasis of RNA-binding protein levels in the vertebrate cell.
Collapse
Affiliation(s)
- Julie Z. Ni
- Center for Molecular Biology of RNA and Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Leslie Grate
- Center for Molecular Biology of RNA and Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - John Paul Donohue
- Center for Molecular Biology of RNA and Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Christine Preston
- Hughes Undergraduate Research Laboratory, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Naomi Nobida
- Hughes Undergraduate Research Laboratory, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Georgeann O’Brien
- Hughes Undergraduate Research Laboratory, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | - Lily Shiue
- Center for Molecular Biology of RNA and Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | | | - John E. Blume
- Affymetrix, Inc., Santa Clara, California 95051, USA
| | - Manuel Ares
- Center for Molecular Biology of RNA and Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA
- Hughes Undergraduate Research Laboratory, University of California at Santa Cruz, Santa Cruz, California 95064, USA
- Corresponding author.E-MAIL ; FAX (831) 459-3737
| |
Collapse
|
264
|
Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A. GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol 2007; 8:R3. [PMID: 17204154 PMCID: PMC1839127 DOI: 10.1186/gb-2007-8-1-r3] [Citation(s) in RCA: 493] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 09/29/2006] [Accepted: 01/04/2007] [Indexed: 12/01/2022] Open
Abstract
GENECODIS, a web-based tool for finding annotations that frequently co-occur in a set of genes and ranking them by their statistical significance, is presented. We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at .
Collapse
Affiliation(s)
- Pedro Carmona-Saez
- BioComputing Unit, National Center of Biotechnology (CNB-CSIC), C/Darwin 3, Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Monica Chagoyen
- BioComputing Unit, National Center of Biotechnology (CNB-CSIC), C/Darwin 3, Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, C/Avenida Complutense S/N, 28040 Madrid, Spain
| | - Francisco Tirado
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, C/Avenida Complutense S/N, 28040 Madrid, Spain
| | - Jose M Carazo
- BioComputing Unit, National Center of Biotechnology (CNB-CSIC), C/Darwin 3, Campus Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Alberto Pascual-Montano
- Computer Architecture Department, Facultad de Ciencias Físicas, Universidad Complutense de Madrid, C/Avenida Complutense S/N, 28040 Madrid, Spain
| |
Collapse
|
265
|
Abstract
PURPOSE OF REVIEW High-dimensional lipid analysis technologies (lipidomics) provide researchers with an opportunity to measure lipids on an unprecedented scale. They do not, however, guarantee a fast track to new knowledge. The vast amount of data produced by these platforms presents a major hurdle to assembling valid knowledge and to the discovery of mechanistic biomarkers. This review examines strategies for improving the quality of high-dimensional lipid data and streamlining data analysis to increase the value of lipidomics platforms to research and commercial applications. RECENT FINDINGS Recent articles focus on careful study design and data analysis protocols. Authors offer detailed descriptions of study populations, analytical methods and data analysis, and highlight the use of practical data preprocessing and the incorporation of biological knowledge into data analysis. SUMMARY The field is moving towards more methodical and structured approaches to biomarker identification. Experimental designs focusing on well-defined outcomes have a better chance of producing biologically relevant results. The high-dimensional lipid analysis techniques available are varied, have different strengths and weaknesses, and must be chosen carefully depending on the experimental design and application. Many techniques for data analysis are available, but the most successful are those incorporating existing biological knowledge into the statistical analysis.
Collapse
Affiliation(s)
- Michelle M Wiest
- Lipomics Technologies, 3410 Industrial Boulevard, Suite 103, West Sacramento, California 95691, USA.
| | | |
Collapse
|
266
|
Liu J, Hughes-Oliver JM, Menius JA. Domain-enhanced analysis of microarray data using GO annotations. Bioinformatics 2007; 23:1225-34. [PMID: 17379692 DOI: 10.1093/bioinformatics/btm092] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g. the Gene Ontology, to guide analysis of such data. By focusing on domain-aggregated results at, say the molecular function level, increased interpretability is available to biological scientists beyond what is possible if results are presented at the gene level. RESULTS We use a 'top-down' approach to perform domain aggregation by first combining gene expressions before testing for differentially expressed patterns. This is in contrast to the more standard 'bottom-up' approach, where genes are first tested individually then aggregated by domain knowledge. The benefits are greater sensitivity for detecting signals. Our method, domain-enhanced analysis (DEA) is assessed and compared to other methods using simulation studies and analysis of two publicly available leukemia data sets. AVAILABILITY Our DEA method uses functions available in R (http://www.r-project.org/) and SAS (http://www.sas.com/). The two experimental data sets used in our analysis are available in R as Bioconductor packages, 'ALL' and 'golubEsets' (http://www.bioconductor.org/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiajun Liu
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, USA.
| | | | | |
Collapse
|
267
|
Ye C, Eskin E. Discovering tightly regulated and differentially expressed gene sets in whole genome expression data. Bioinformatics 2007; 23:e84-90. [PMID: 17237110 DOI: 10.1093/bioinformatics/btl315] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY The programs are available by request from the authors.
Collapse
Affiliation(s)
- Chun Ye
- Bioinformatics Program, University of California San Diego, La Jolla, CA 92093-0404, USA.
| | | |
Collapse
|
268
|
Bresell A, Servenius B, Persson B. Ontology annotation treebrowser : an interactive tool where the complementarity of medical subject headings and gene ontology improves the interpretation of gene lists. ACTA ACUST UNITED AC 2007; 5:225-36. [PMID: 17140269 DOI: 10.2165/00822942-200605040-00005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Gene expression and proteomics analysis allow the investigation of thousands of biomolecules in parallel. This results in a long list of interesting genes or proteins and a list of annotation terms in the order of thousands. It is not a trivial task to understand such a gene list and it would require extensive efforts to bring together the overwhelming amounts of associated information from the literature and databases. Thus, it is evident that we need ways of condensing and filtering this information. An excellent way to represent knowledge is to use ontologies, where it is possible to group genes or terms with overlapping context, rather than studying one-dimensional lists of keywords. Therefore, we have built the ontology annotation treebrowser (OAT) to represent, condense, filter and summarise the knowledge associated with a list of genes or proteins. The OAT system consists of two disjointed parts; a MySQL database named OATdb, and a treebrowser engine that is implemented as a web interface. The OAT system is implemented using Perl scripts on an Apache web server and the gene, ontology and annotation data is stored in a relational MySQL database. In OAT, we have harmonized the two ontologies of medical subject headings (MeSH) and gene ontology (GO), to enable us to use knowledge both from the literature and the annotation projects in the same tool. OAT includes multiple gene identifier sets, which are merged internally in the OAT database. We have also generated novel MeSH annotations by mapping accession numbers to MEDLINE entries. The ontology browser OAT was created to facilitate the analysis of gene lists. It can be browsed dynamically, so that a scientist can interact with the data and govern the outcome. Test statistics show which branches are enriched. We also show that the two ontologies complement each other, with surprisingly low overlap, by mapping annotations to the Unified Medical Language System. We have developed a novel interactive annotation browser that is the first to incorporate both MeSH and GO for improved interpretation of gene lists. With OAT, we illustrate the benefits of combining MeSH and GO for understanding gene lists. OAT is available as a public web service at: http://www.ifm.liu.se/bioinfo/oat.
Collapse
Affiliation(s)
- Anders Bresell
- IFM Bioinformatics, Linköping University, Linköping, Sweden.
| | | | | |
Collapse
|
269
|
Schmidt MW, Houseman A, Ivanov AR, Wolf DA. Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe. Mol Syst Biol 2007; 3:79. [PMID: 17299416 PMCID: PMC1828747 DOI: 10.1038/msb4100117] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2006] [Accepted: 12/13/2006] [Indexed: 02/04/2023] Open
Abstract
The fission yeast Schizosaccharomyces pombe is a widely used model organism to study basic mechanisms of eukaryotic biology, but unlike other model organisms, its proteome remains largely uncharacterized. Using a shotgun proteomics approach based on multidimensional prefractionation and tandem mass spectrometry, we have detected ∼30% of the theoretical fission yeast proteome. Applying statistical modelling to normalize spectral counts to the number of predicted tryptic peptides, we have performed label-free quantification of 1465 proteins. The fission yeast protein data showed considerable correlations with mRNA levels and with the abundance of orthologous proteins in budding yeast. Functional pathway analysis indicated that the mRNA–protein correlation is strong for proteins involved in signalling and metabolic processes, but increasingly discordant for components of protein complexes, which clustered in groups with similar mRNA–protein ratios. Self-organizing map clustering of large-scale protein and mRNA data from fission and budding yeast revealed coordinate but not always concordant expression of components of functional pathways and protein complexes. This finding reaffirms at the protein level the considerable divergence in gene expression patterns of the two model organisms that was noticed in previous transcriptomic studies.
Collapse
Affiliation(s)
- Michael W Schmidt
- NIEHS Center for Environmental Health Proteomics Facility, Harvard School of Public Health, Boston, MA, USA
- Department of Genetics and Complex Diseases, Harvard School of Public Health, Boston, MA, USA
- Institute for Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Andres Houseman
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Alexander R Ivanov
- NIEHS Center for Environmental Health Proteomics Facility, Harvard School of Public Health, Boston, MA, USA
- Department of Genetics and Complex Diseases, Harvard School of Public Health, Boston, MA, USA
- Department of Genetics and Complex Diseases, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA. Tel.: +1 617 432 2093; Fax: +1 617 432 2059;
| | - Dieter A Wolf
- NIEHS Center for Environmental Health Proteomics Facility, Harvard School of Public Health, Boston, MA, USA
- Department of Genetics and Complex Diseases, Harvard School of Public Health, Boston, MA, USA
- Department of Genetics and Complex Diseases, Harvard School of Public Health, 665 Huntington Avenue, Boston, MA 02115, USA. Tel.: +1 617 432 2093; Fax: +1 617 432 2059;
| |
Collapse
|
270
|
Prüfer K, Muetzel B, Do HH, Weiss G, Khaitovich P, Rahm E, Pääbo S, Lachmann M, Enard W. FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 2007; 8:41. [PMID: 17284313 PMCID: PMC1800870 DOI: 10.1186/1471-2105-8-41] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 02/06/2007] [Indexed: 11/17/2022] Open
Abstract
Background Genome-wide expression, sequence and association studies typically yield large sets of gene candidates, which must then be further analysed and interpreted. Information about these genes is increasingly being captured and organized in ontologies, such as the Gene Ontology. Relationships between the gene sets identified by experimental methods and biological knowledge can be made explicit and used in the interpretation of results. However, it is often difficult to assess the statistical significance of such analyses since many inter-dependent categories are tested simultaneously. Results We developed the program package FUNC that includes and expands on currently available methods to identify significant associations between gene sets and ontological annotations. Implemented are several tests in particular well suited for genome wide sequence comparisons, estimates of the family-wise error rate, the false discovery rate, a sensitive estimator of the global significance of the results and an algorithm to reduce the complexity of the results. Conclusion FUNC is a versatile and useful tool for the analysis of genome-wide data. It is freely available under the GPL license and also accessible via a web service.
Collapse
Affiliation(s)
- Kay Prüfer
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Bjoern Muetzel
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Hong-Hai Do
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107, Germany
| | - Gunter Weiss
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Philipp Khaitovich
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
- Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Erhard Rahm
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107, Germany
| | - Svante Pääbo
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Michael Lachmann
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Wolfgang Enard
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| |
Collapse
|
271
|
Henegar C, Cancello R, Rome S, Vidal H, Clément K, Zucker JD. Clustering biological annotations and gene expression data to identify putatively co-regulated biological processes. J Bioinform Comput Biol 2007; 4:833-52. [PMID: 17007070 DOI: 10.1142/s0219720006002181] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 03/04/2006] [Accepted: 03/24/2006] [Indexed: 01/04/2023]
Abstract
MOTIVATION Functional profiling is a key step of microarray gene expression data analysis. Identifying co-regulated biological processes could help for better understanding of underlying biological interactions within the studied biological frame. RESULTS We present herein an original approach designed to search for putatively co-regulated biological processes sharing a significant number of co-expressed genes. An R language implementation named "FunCluster" was built and tested on two gene expression data sets. A discriminatory functional analysis of the first data set, related to experiments performed on separated adipocytes and stroma vascular fraction cells of human white adipose tissue, highlighted the prevalent role of nonadipose cells in the synthesis of inflammatory and immunity molecules in human adiposity. On the second data set, resulting from a model investigating insulin coordinated regulation of gene expression in human skeletal muscle, FunCluster analysis spotlighted novel functional classes of putatively co-regulated biological processes related to protein metabolism and the regulation of muscular contraction. AVAILABILITY Supplementary information about the FunCluster tool is available on-line at http://corneliu.henegar.info/FunCluster.htm.
Collapse
|
272
|
Abstract
High-throughput experiments in biology often produce sets of genes of potential interests. Some of those gene sets might be of considerable size. Therefore, computer-assisted analysis is necessary for the biological interpretation of the gene sets, and for creating working hypotheses, which can be tested experimentally. One obvious way to analyze gene set data is to associate the genes with a particular biological feature, for example, a given pathway. Statistical analysis could be used to evaluate if a gene set is truly associated with a feature. Over the past few years many tools that perform such analysis have been created. In this chapter, using WebGestalt as an example, it will be explained in detail how to associate gene sets with functional annotations, pathways, publication records, and protein domains.
Collapse
Affiliation(s)
- Stefan A Kirov
- Oak Ridge National Laboratory, University of Tennessee, USA
| | | | | |
Collapse
|
273
|
Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 2007; 8:R183. [PMID: 17784955 PMCID: PMC2375021 DOI: 10.1186/gb-2007-8-9-r183] [Citation(s) in RCA: 1687] [Impact Index Per Article: 99.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Revised: 04/20/2007] [Accepted: 09/04/2007] [Indexed: 12/16/2022] Open
Abstract
The DAVID Gene Functional Classification Tool http://david.abcc.ncifcrf.gov uses a novel agglomeration algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules. This organization is accomplished by mining the complex biological co-occurrences found in multiple sources of functional annotation. It is a powerful method to group functionally related genes and terms into a manageable number of biological modules for efficient interpretation of gene lists in a network context.
Collapse
Affiliation(s)
- Da Wei Huang
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Brad T Sherman
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Qina Tan
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Jack R Collins
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - W Gregory Alvord
- Computer and Statistical Services, Data Management Services, National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Jean Roayaei
- Computer and Statistical Services, Data Management Services, National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Robert Stephens
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - Michael W Baseler
- Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | - H Clifford Lane
- Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Richard A Lempicki
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| |
Collapse
|
274
|
Dopazo J. Functional interpretation of microarray experiments. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:398-410. [PMID: 17069516 DOI: 10.1089/omi.2006.10.398] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.
Collapse
Affiliation(s)
- Joaquín Dopazo
- Department of Bioinformatics, and Functional Genomics Node (INB), Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain.
| |
Collapse
|
275
|
McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC. AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Res 2006; 35:D599-603. [PMID: 17135208 PMCID: PMC1751552 DOI: 10.1093/nar/gkl936] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Analysis of functional genomics (transcriptomics and proteomics) datasets is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation. To facilitate systems biology in these species we have established the curated, web-accessible, public resource 'AgBase' (www.agbase.msstate.edu). We have improved the structural annotation of agriculturally important genomes by experimentally confirming the in vivo expression of electronically predicted proteins and by proteogenomic mapping. Proteogenomic data are available from the AgBase proteogenomics link. We contribute Gene Ontology (GO) annotations and we provide a two tier system of GO annotations for users. The 'GO Consortium' gene association file contains the most rigorous GO annotations based solely on experimental data. The 'Community' gene association file contains GO annotations based on expert community knowledge (annotations based directly from author statements and submitted annotations from the community) and annotations for predicted proteins. We have developed two tools for proteomics analysis and these are freely available on request. A suite of tools for analyzing functional genomics datasets using the GO is available online at the AgBase site. We encourage and publicly acknowledge GO annotations from researchers and provide an online mechanism for agricultural researchers to submit requests for GO annotations.
Collapse
Affiliation(s)
- Fiona M. McCarthy
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State UniversityPO Box 6100, Mississippi, MS 39762, USA
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- To whom correspondence should be addressed. Tel: +1 662 325 5859; Fax: +1 662 325 1031;
| | - Susan M. Bridges
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
- To whom correspondence should be addressed. Tel: +1 662 325 5859; Fax: +1 662 325 1031;
| | - Nan Wang
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
| | - G. Bryce Magee
- Institute for Digital Biology, Mississippi State UniversityMS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
| | - W. Paul Williams
- USDA ARS Corn Host Plant Resistance Research UnitBox 5367, Mississippi, MS 39762, USA
| | - Dawn S. Luthe
- Department of Crop and Soil Sciences, The Pennsylvania State UniversityUniversity Park, PA 16802, USA
| | - Shane C. Burgess
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State UniversityPO Box 6100, Mississippi, MS 39762, USA
- Department of Computer Science and Engineering, Bagley College of EngineeringPO Box 9637, Mississippi, MS 39762, USA
- Mississippi Agricultural and Forestry Experiment Station, Mississippi State UniversityMS 39762, USA
| |
Collapse
|
276
|
Abstract
MOTIVATIONS Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may participate in more than one function, resulting in one regulation pattern in one context and a different pattern in another. Using bi-clustering algorithms, one can obtain sets of genes that are co-regulated under subsets of conditions. RESULTS We develop a polynomial time algorithm to find an optimal bi-cluster with the maximum similarity score. To our knowledge, this is the first formulation for bi-cluster problems that admits a polynomial time algorithm for optimal solutions. The algorithm works for a special case, where the bi-clusters are approximately squares. We then extend the algorithm to handle various kinds of other cases. Experiments on simulation data and real data show that the new algorithms outperform most of the existing methods in many cases. Our new algorithms have the following advantages: (1) no discretization procedure is required, (2) performs well for overlapping bi-clusters and (3) works well for additive bi-clusters. AVAILABILITY The software is available at http://www.cs.cityu.edu.hk/~liuxw/msbe/help.html.
Collapse
Affiliation(s)
- Xiaowen Liu
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | | |
Collapse
|
277
|
Wang H, Chua NH, Wang XJ. Prediction of trans-antisense transcripts in Arabidopsis thaliana. Genome Biol 2006; 7:R92. [PMID: 17040561 PMCID: PMC1794575 DOI: 10.1186/gb-2006-7-10-r92] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2006] [Revised: 10/02/2006] [Accepted: 10/13/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Natural antisense transcripts (NATs) are coding or non-coding RNAs with sequence complementarity to other transcripts (sense transcripts). These RNAs could potentially regulate the expression of their sense partner(s) at either the transcriptional or post-transcriptional level. Experimental and computational methods have demonstrated the widespread occurrence of NATs in eukaryotes. However, most previous studies only focused on cis-NATs with little attention being paid to NATs that originate in trans. RESULTS We have performed a genome-wide screen of trans-NATs in Arabidopsis thaliana and identified 1,320 putative trans-NAT pairs. An RNA annealing program predicted that most trans-NATs could form extended double-stranded RNA duplexes with their sense partners. Among trans-NATs with available expression data, more than 85% were found in the same tissue as their sense partners; of these, 67% were found in the same cell as their sense partners at comparable expression levels. For about 60% of Arabidopsis trans-NATs, orthologs of at least one transcript of the pair also had trans-NAT partners in either Populus trichocarpa or Oryza sativa. The observation that 430 transcripts had both putative cis- and trans-NATs implicates multiple regulations by antisense transcripts. The potential roles of trans-NATs in inducing post-transcriptional gene silencing and in regulating alternative splicing were also examined. CONCLUSION The Arabidopsis transcriptome contains a fairly large number of trans-NATs, whose possible functions include silencing of the corresponding sense transcripts or altering their splicing patterns. The interlaced relationships observed in some cis- and trans-NAT pairs suggest that antisense transcripts could be involved in complex regulatory networks in eukaryotes.
Collapse
Affiliation(s)
- Huan Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- Graduate University of the Chinese Academy of Sciences, Beijing 100101, China
| | - Nam-Hai Chua
- Laboratory of Plant Molecular Biology, The Rockefeller University, New York, NY 10021, USA
| | - Xiu-Jie Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
278
|
Kankainen M, Brader G, Törönen P, Palva ET, Holm L. Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana. Nucleic Acids Res 2006; 34:e124. [PMID: 17003050 PMCID: PMC1636450 DOI: 10.1093/nar/gkl694] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.
Collapse
Affiliation(s)
- Matti Kankainen
- Institute of BiotechnologyPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
| | - Günter Brader
- Department of Biological and Environmental Sciences, Division of Genetics, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
| | - Petri Törönen
- Institute of BiotechnologyPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
| | - E. Tapio Palva
- Department of Biological and Environmental Sciences, Division of Genetics, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
| | - Liisa Holm
- Institute of BiotechnologyPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
- Department of Biological and Environmental Sciences, Division of Genetics, University of HelsinkiPO Box 56 (Viikinkaari 5), FIN-00014, Helsinki, Finland
- To whom correspondence should be addressed. Tel:+358 9 19159115; Fax:+358 9 19159079;
| |
Collapse
|
279
|
Auld KL, Hitchcock AL, Doherty HK, Frietze S, Huang LS, Silver PA. The conserved ATPase Get3/Arr4 modulates the activity of membrane-associated proteins in Saccharomyces cerevisiae. Genetics 2006; 174:215-27. [PMID: 16816426 PMCID: PMC1569774 DOI: 10.1534/genetics.106.058362] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2006] [Accepted: 06/19/2006] [Indexed: 01/09/2023] Open
Abstract
The regulation of cellular membrane dynamics is crucial for maintaining proper cell growth and division. The Cdc48-Npl4-Ufd1 complex is required for several regulated membrane-associated processes as part of the ubiquitin-proteasome system, including ER-associated degradation and the control of lipid composition in yeast. In this study we report the results of a genetic screen in Saccharomyces cerevisiae for extragenic suppressors of a temperature-sensitive npl4 allele and the subsequent analysis of one suppressor, GET3/ARR4. The GET3 gene encodes an ATPase with homology to the regulatory component of the bacterial arsenic pump. Mutants of GET3 rescue several phenotypes of the npl4 mutant and transcription of GET3 is coregulated with the proteasome, illustrating a functional relationship between GET3 and NPL4 in the ubiquitin-proteasome system. We have further found that Get3 biochemically interacts with the trans-membrane domain proteins Get1/Mdm39 and Get2/Rmd7 and that Deltaget3 is able to suppress phenotypes of get1 and get2 mutants, including sporulation defects. In combination, our characterization of GET3 genetic and biochemical interactions with NPL4, GET1, and GET2 implicates Get3 in multiple membrane-dependent pathways.
Collapse
Affiliation(s)
- Kathryn L Auld
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | |
Collapse
|
280
|
Antonov AV, Mewes HW. Complex functionality of gene groups identified from high-throughput data. J Mol Biol 2006; 363:289-96. [PMID: 16959266 DOI: 10.1016/j.jmb.2006.07.062] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2006] [Revised: 07/24/2006] [Accepted: 07/25/2006] [Indexed: 12/19/2022]
Abstract
Relating experimental data to biological knowledge is necessary to cope with the avalanches of new data emerging from recent developments in high-throughput technologies. Automatic functional profiling becomes the de facto standard approach for the secondary analysis of high-throughput data. A number of tools employing available gene functional annotations have been developed for this purpose. However, current annotations are derived mostly from traditional analysis of the individual gene function. The complex biological phenomena carried out by the concerted activity of many genes often requires the definition of new complex functionality (related to a group of genes), which is, in many cases, not available in current annotation vocabularies. Functional profiling with annotation terms related to the description of individual biological functions of a gene may fail to provide reasonable interpretation of biological relationships in a set of genes involved in complex biological phenomena. We introduce a novel procedure to profile a complex functionality of a gene set. Complex functionality is constructed as a combination of available annotation terms. By profiling ChIP-chip data from Saccharomyces cerevisiae we demonstrate that this technique produces deeper insights into the results of high-throughput experiments that are beyond the known facts described in the functional classifications.
Collapse
Affiliation(s)
- Alexey V Antonov
- GSF National Research Center for Environment and Health, Institute for Bioinformatics, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany.
| | | |
Collapse
|
281
|
Zhong S, Tian L, Li C, Storch KF, Wong WH. Comparative analysis of gene sets in the Gene Ontology space under the multiple hypothesis testing framework. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:425-35. [PMID: 16448035 DOI: 10.1109/csb.2004.1332455] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by high-throughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested in knowing which GO terms are enriched in one list of genes but relatively depleted in another. Statistical tests such as Fisher's exact test or Chi-square test can be performed to search for such GO terms. However, because multiple GO terms are tested simultaneously, individual p-values from individual tests do not serve as good indicators for picking GO terms. Furthermore, these multiple tests are highly correlated, usual multiple testing procedures that work under an independence assumption are not applicable. In this paper we introduce a procedure, based on False Discovery Rate (FDR), to treat this correlated multiple testing problem. This procedure calculates a moderately conserved estimator of q-value for every GO term. We identify the GO terms with q-values that satisfy a desired level as the significant GO terms. This procedure has been implemented into the GoSurfer software. GoSurfer is a windows based graphical data mining tool. It is freely available at http://www.gosurfer.org.
Collapse
Affiliation(s)
- Sheng Zhong
- Department of Biostatistics, Harvard University, USA
| | | | | | | | | |
Collapse
|
282
|
Zheng X, Baker H, Hancock WS. Analysis of the low molecular weight serum peptidome using ultrafiltration and a hybrid ion trap-Fourier transform mass spectrometer. J Chromatogr A 2006; 1120:173-84. [PMID: 16527286 DOI: 10.1016/j.chroma.2006.01.098] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2005] [Revised: 11/18/2005] [Accepted: 01/24/2006] [Indexed: 11/26/2022]
Abstract
Advances in proteomics are continuing to expand the ability to analyze the serum proteome. In recent years, it has been realized that in addition to the circulating proteins, human serum also contains a large number of peptides. Many of these peptides are believed to be fragments of larger proteins that have been at least partially degraded by various enzymes such as metalloproteases. Identifying these peptides from a small amount of serum/plasma is difficult due to the complexity of the sample, the low levels of these peptides, and the difficulties in getting a protein identification from a single peptide. In this study, we modified previously published protocols for using centrifugal ultrafiltration, and unlike past studies did not digest the filtrate with trypsin with the intent of identifying endogenous peptides with this method. The filtrate fraction was concentrated and analyzed by a reversed phase-high performance liquid chromatography system connected to a nanospray ionization hybrid ion trap-Fourier transform mass spectrometer (LTQ-FTMS). The mass accuracy of this instrument allows confidence for identifying the protein precursors by a single peptide. The utility of this approach was demonstrated by the identification of over 300 unique peptides with 2 ppm or better mass accuracy per serum sample. With confident identifications, the origin and function of native serum peptides can be more seriously explored. Interestingly, over 34 peptide ladders were observed from over 17 serum proteins. This indicates that a cascade of proteolytic processes affects the serum peptidome. To examine whether this result was an artifact of serum, matched plasma and serum samples were analyzed with similar peptide ladders found in each.
Collapse
Affiliation(s)
- Xiaoyang Zheng
- Barnett Institute and Department of Chemistry, Northeastern University, 341 Mugar Hall, 360 Huntington Avenue, Boston, MA 02115, USA
| | | | | |
Collapse
|
283
|
Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 2006; 7:280. [PMID: 16749936 PMCID: PMC1502140 DOI: 10.1186/1471-2105-7-280] [Citation(s) in RCA: 197] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 06/02/2006] [Indexed: 12/23/2022] Open
Abstract
Background The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions. Results We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs. Conclusion We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 1441 N. 34th St. Seattle, WA 98103-8904, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 1441 N. 34th St. Seattle, WA 98103-8904, USA
| | - Richard Bonneau
- New York University Dept. of Biology, Dept. of Computer Science, New York, USA
| |
Collapse
|
284
|
Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006. [DOI: 10.1093/bioinformatics/btl060\] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
285
|
Semeiks JR, Rizki A, Bissell MJ, Mian IS. Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features. BMC Bioinformatics 2006; 7:147. [PMID: 16542449 PMCID: PMC1435935 DOI: 10.1186/1471-2105-7-147] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2005] [Accepted: 03/16/2006] [Indexed: 11/17/2022] Open
Abstract
Background Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells. Results Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occured consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology/signal transduction and nucleic acid binding/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity. Conclusion Given a set of genes and/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.
Collapse
Affiliation(s)
- JR Semeiks
- Life Sciences Division (MS 977-225A), Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - A Rizki
- Life Sciences Division (MS 977-225A), Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - MJ Bissell
- Life Sciences Division (MS 977-225A), Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - IS Mian
- Life Sciences Division (MS 74-197), Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8265, USA
| |
Collapse
|
286
|
Edwards KD, Anderson PE, Hall A, Salathia NS, Locke JCW, Lynn JR, Straume M, Smith JQ, Millar AJ. FLOWERING LOCUS C mediates natural variation in the high-temperature response of the Arabidopsis circadian clock. THE PLANT CELL 2006; 18:639-50. [PMID: 16473970 PMCID: PMC1383639 DOI: 10.1105/tpc.105.038315] [Citation(s) in RCA: 225] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Temperature compensation contributes to the accuracy of biological timing by preventing circadian rhythms from running more quickly at high than at low temperatures. We previously identified quantitative trait loci (QTL) with temperature-specific effects on the circadian rhythm of leaf movement, including a QTL linked to the transcription factor FLOWERING LOCUS C (FLC). We have now analyzed FLC alleles in near-isogenic lines and induced mutants to eliminate other candidate genes. We showed that FLC lengthened the circadian period specifically at 27 degrees C, contributing to temperature compensation of the circadian clock. Known upstream regulators of FLC expression in flowering time pathways similarly controlled its circadian effect. We sought to identify downstream targets of FLC regulation in the molecular mechanism of the circadian clock using genome-wide analysis to identify FLC-responsive genes and 3503 transcripts controlled by the circadian clock. A Bayesian clustering method based on Fourier coefficients allowed us to discriminate putative regulatory genes. Among rhythmic FLC-responsive genes, transcripts of the transcription factor LUX ARRHYTHMO (LUX) correlated in peak abundance with the circadian period in flc mutants. Mathematical modeling indicated that the modest change in peak LUX RNA abundance was sufficient to cause the period change due to FLC, providing a molecular target for the crosstalk between flowering time pathways and circadian regulation.
Collapse
Affiliation(s)
- Kieron D Edwards
- Institute of Molecular Plant Sciences, University of Edinburgh, Edinburgh, EH9 3JH United Kingdom
| | | | | | | | | | | | | | | | | |
Collapse
|
287
|
Auld KL, Brown CR, Casolari JM, Komili S, Silver PA. Genomic Association of the Proteasome Demonstrates Overlapping Gene Regulatory Activity with Transcription Factor Substrates. Mol Cell 2006; 21:861-71. [PMID: 16543154 DOI: 10.1016/j.molcel.2006.02.020] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2005] [Revised: 10/31/2005] [Accepted: 02/21/2006] [Indexed: 12/18/2022]
Abstract
The proteasome can regulate transcription through proteolytic processing of transcription factors and via gene locus binding, but few targets of proteasomal regulation have been identified. Using genome-wide location analysis and transcriptional profiling in Saccharomyces cerevisiae, we have established which genes are bound and regulated by the proteasome and by Spt23 and Mga2, transcription factors activated by the proteasome. We observed proteasome association with gene sets that are highly transcribed, controlled by the mating type loci, and involved in lipid metabolism. At ribosomal protein (RP) genes, proteasome and RNA polymerase II (RNA Pol II) binding was enriched in a proteasome mutant, indicating a role for the proteasome in dissociating elongation complexes. The genomic occupancies of Spt23 and Mga2 overlapped significantly with the genes bound by the proteasome. Finally, the proteasome acts in two distinct ways, one dependent and one independent of Spt23/Mga2 cleavage, providing evidence for cooperative gene regulation by the proteasome and its substrates.
Collapse
Affiliation(s)
- Kathryn L Auld
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | |
Collapse
|
288
|
Bachand F, Lackner DH, Bähler J, Silver PA. Autoregulation of ribosome biosynthesis by a translational response in fission yeast. Mol Cell Biol 2006; 26:1731-42. [PMID: 16478994 PMCID: PMC1430238 DOI: 10.1128/mcb.26.5.1731-1742.2006] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Revised: 08/29/2005] [Accepted: 12/05/2005] [Indexed: 11/20/2022] Open
Abstract
Maintaining the appropriate balance between the small and large ribosomal subunits is critical for translation and cell growth. We previously identified the 40S ribosomal protein S2 (rpS2) as a substrate of the protein arginine methyltransferase 3 (RMT3) and reported a misregulation of the 40S/60S ratio in rmt3 deletion mutants of Schizosaccharomyces pombe. For this study, using DNA microarrays, we have investigated the genome-wide biological response of rmt3-null cells to this ribosomal subunit imbalance. Whereas little change was observed at the transcriptional level, a number of genes showed significant alterations in their polysomal-to-monosomal ratios in rmt3Delta mutants. Importantly, nearly all of the 40S ribosomal protein-encoding mRNAs showed increased ribosome density in rmt3 disruptants. Sucrose gradient analysis also revealed that the ribosomal subunit imbalance detected in rmt3-null cells is due to a deficit in small-subunit levels and can be rescued by rpS2 overexpression. Our results indicate that rmt3-null fission yeast compensate for the reduced levels of small ribosomal subunits by increasing the ribosome density, and likely the translation efficiency, of 40S ribosomal protein-encoding mRNAs. Our findings support the existence of autoregulatory mechanisms that control ribosome biosynthesis and translation as an important layer of gene regulation.
Collapse
Affiliation(s)
- François Bachand
- Department of Biochemistry, Université de Sherbrooke, Sherbrooke, Québec, Canada.
| | | | | | | |
Collapse
|
289
|
Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006; 22:1122-9. [PMID: 16500941 DOI: 10.1093/bioinformatics/btl060] [Citation(s) in RCA: 334] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.
Collapse
Affiliation(s)
- Amela Prelić
- Computer Engineering and Networks Laboratory, ETH Zurich, 8092 Zurich, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
290
|
Vêncio RZN, Koide T, Gomes SL, de B Pereira CA. BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics 2006; 7:86. [PMID: 16504085 PMCID: PMC1440873 DOI: 10.1186/1471-2105-7-86] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2005] [Accepted: 02/23/2006] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. RESULTS BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. CONCLUSION The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.
Collapse
Affiliation(s)
- Ricardo ZN Vêncio
- BIOINFO-USP Núcleo de Pesquisas em Bioinformática, Universidade de São Paulo, Rua do Matão 1010, 05508-090 São Paulo, Brazil
- Instituto Israelita de Ensino e Pesquisa Albert Einstein, Hospital Israelita Albert Einstein, Av. Albert Einstein 627, 05651-901 São Paulo, Brazil
| | - Tie Koide
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Av. Prof. Lineu Prestes 748, 05508-000 São Paulo, Brazil
| | - Suely L Gomes
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Av. Prof. Lineu Prestes 748, 05508-000 São Paulo, Brazil
| | - Carlos A de B Pereira
- BIOINFO-USP Núcleo de Pesquisas em Bioinformática, Universidade de São Paulo, Rua do Matão 1010, 05508-090 São Paulo, Brazil
- Departamento de Estatística, Instituto de Matemática e Estatística, Universidade de São Paulo, Rua do Matão 1010, 05508-090 São Paulo, Brazil
| |
Collapse
|
291
|
Abstract
Eukaryotic transcription activation domains (ADs) are not well defined on the proteome scale. We systematicallly tested approximately 6000 yeast proteins for transcriptional activity using a yeast one-hybrid system and identified 451 transcriptional activators. We then determined their transcription activation strength using fusions to the Gal4 DNA-binding domain and a His3 reporter gene which contained a promoter with a Gal4-binding site. Among the 132 strongest activators 32 are known transcription factors while another 35 have no known function. Although zinc fingers, helix-loop-helix domains and several other domains are highly overrepresented among the activators, only few contain characterized ADs. We also found some striking correlations: the stronger the activation activity, the more acidic, glutamine-rich, proline-rich or asparagine-rich the activators were. About 29% of the activators have been found previously to specifically interact with the transcription machinery, while 10% are known to be components of transcription regulatory complexes. Based on their transcriptional activity, localization and interaction patterns, at least six previously uncharacterized proteins are suggested to be bona fide transcriptional regulators (namely YFL049W, YJR070C, YDR520C, YGL066W/Sgf73, YKR064W and YCR082W/Ahc2).
Collapse
Affiliation(s)
| | | | | | - Tomoko Chiba
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
| | - Takashi Ito
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
| | - Peter Uetz
- To whom correspondence should be addressed. Tel: +49 7247 82 6103; Fax: +49 7247 82 3354;
| |
Collapse
|
292
|
Lall S, Grün D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, Kao HL, Gunsalus KC, Pachter L, Piano F, Rajewsky N. A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006; 16:460-71. [PMID: 16458514 DOI: 10.1016/j.cub.2006.01.050] [Citation(s) in RCA: 346] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 01/19/2006] [Accepted: 01/24/2006] [Indexed: 12/19/2022]
Abstract
BACKGROUND Metazoan miRNAs regulate protein-coding genes by binding the 3' UTR of cognate mRNAs. Identifying targets for the 115 known C. elegans miRNAs is essential for understanding their function. RESULTS By using a new version of PicTar and sequence alignments of three nematodes, we predict that miRNAs regulate at least 10% of C. elegans genes through conserved interactions. We have developed a new experimental pipeline to assay 3' UTR-mediated posttranscriptional gene regulation via an endogenous reporter expression system amenable to high-throughput cloning, demonstrating the utility of this system using one of the most intensely studied miRNAs, let-7. Our expression analyses uncover several new potential let-7 targets and suggest a new let-7 activity in head muscle and neurons. To explore genome-wide trends in miRNA function, we analyzed functional categories of predicted target genes, finding that one-third of C. elegans miRNAs target gene sets are enriched for specific functional annotations. We have also integrated miRNA target predictions with other functional genomic data from C. elegans. CONCLUSIONS At least 10% of C. elegans genes are predicted miRNA targets, and a number of nematode miRNAs seem to regulate biological processes by targeting functionally related genes. We have also developed and successfully utilized an in vivo system for testing miRNA target predictions in likely endogenous expression domains. The thousands of genome-wide miRNA target predictions for nematodes, humans, and flies are available from the PicTar website and are linked to an accessible graphical network-browsing tool allowing exploration of miRNA target predictions in the context of various functional genomic data resources.
Collapse
Affiliation(s)
- Sabbi Lall
- Center for Comparative Functional Genomics, Department of Biology, New York University, New York, New York 10003, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
293
|
Yang C, Zeng E, Li T, Narasimhan G. Clustering genes using gene expression and text literature data. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:329-40. [PMID: 16447990 DOI: 10.1109/csb.2005.23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Clustering of gene expression data is a standard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investigate the problem of improving the clustering by integrating information obtained from gene expression data with knowledge extracted from biomedical text literature. In each iteration of algorithm MSC, an EM-type procedure is employed to bootstrap the model obtained from one data source by starting with the cluster assignments obtained in the previous iteration using the other data sources. Upon convergence, the two individual models are used to construct the final cluster assignment. We compare the results of algorithm MSC for two data sources with the results obtained when the clustering is applied on the two sources of data separately. We also compare it with that obtained using the feature level integration method that performs the clustering after simply concatenating the features obtained from the two data sources. We show that the z-scores of the clustering results from MSC are better than that from the other methods. To evaluate our clusters better, function enrichment results are presented using terms from the Gene Ontology database. Finally, by investigating the success of motif detection programs that use the clusters, we show that our approach integrating gene expression data and text data reveals clusters that are biologically more meaningful than those identified using gene expression data alone.
Collapse
Affiliation(s)
- Chengyong Yang
- Bioinformatics Research Group, School of Computer Science, Florida International University, Miami, FL 33199, USA.
| | | | | | | |
Collapse
|
294
|
An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
295
|
Sabatine MS, Liu E, Morrow DA, Heller E, McCarroll R, Wiegand R, Berriz GF, Roth FP, Gerszten RE. Metabolomic identification of novel biomarkers of myocardial ischemia. Circulation 2005; 112:3868-75. [PMID: 16344383 DOI: 10.1161/circulationaha.105.569137] [Citation(s) in RCA: 381] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Recognition of myocardial ischemia is critical both for the diagnosis of coronary artery disease and the selection and evaluation of therapy. Recent advances in proteomic and metabolic profiling technologies may offer the possibility of identifying novel biomarkers and pathways activated in myocardial ischemia. METHODS AND RESULTS Blood samples were obtained before and after exercise stress testing from 36 patients, 18 of whom demonstrated inducible ischemia (cases) and 18 of whom did not (controls). Plasma was fractionated by liquid chromatography, and profiling of analytes was performed with a high-sensitivity electrospray triple-quadrupole mass spectrometer under selected reaction monitoring conditions. Lactic acid and metabolites involved in skeletal muscle AMP catabolism increased after exercise in both cases and controls. In contrast, there was significant discordant regulation of multiple metabolites that either increased or decreased in cases but remained unchanged in controls. Functional pathway trend analysis with the use of novel software revealed that 6 members of the citric acid pathway were among the 23 most changed metabolites in cases (adjusted P=0.04). Furthermore, changes in 6 metabolites, including citric acid, differentiated cases from controls with a high degree of accuracy (P<0.0001; cross-validated c-statistic=0.83). CONCLUSIONS We report the novel application of metabolomics to acute myocardial ischemia, in which we identified novel biomarkers of ischemia, and from pathway trend analysis, coordinate changes in groups of functionally related metabolites.
Collapse
Affiliation(s)
- Marc S Sabatine
- Cardiovascular Division, Brigham and Women's Hospital, Donald W. Reynolds Cardiovascular Clinical Research Center on Atherosclerosis, Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
296
|
Klekota J, Brauner E, Schreiber SL. Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics. J Chem Inf Model 2005; 45:1824-36. [PMID: 16309290 DOI: 10.1021/ci050087d] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Scoring the activity of compounds in phenotypic high-throughput assays presents a unique challenge because of the limited resolution and inherent measurement error of these assays. Techniques that leverage the structural similarity of compounds within an assay can be used to improve the hit-recovery rate from screening data. A technique is presented that uses clustering and sampling statistics to predict likely compound activity by scoring entire structural classes. A set of phenotypic assays performed against a commercially available compound library was used as a test set. Using the class-scoring technique, the resultant activity prediction scores were more reproducible than individual assay measurements, and class scoring recovered known active compounds more efficiently than individual assay measurements because class scoring had fewer false positives. Known biologically active compounds were recovered 87% of the time using class scores, suggesting a low false-negative rate that compared well to individual assay measurements. In addition, many weak and potentially novel classes of active compounds, overlooked by individual assay measurements, were suggested.
Collapse
Affiliation(s)
- Justin Klekota
- Howard Hughes Medical Institute, Harvard Institute of Chemistry and Cell Biology, Broad Institute of Harvard and MIT, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA.
| | | | | |
Collapse
|
297
|
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005; 102:15545-50. [PMID: 16199517 PMCID: PMC1239896 DOI: 10.1073/pnas.0506580102] [Citation(s) in RCA: 32211] [Impact Index Per Article: 1695.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
Collapse
Affiliation(s)
- Aravind Subramanian
- Broad Institute of Massachusetts Institute of Technology and Harvard, 320 Charles Street, Cambridge, MA 02141, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
298
|
Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M. Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005; 437:1173-8. [PMID: 16189514 DOI: 10.1038/nature04209] [Citation(s) in RCA: 2000] [Impact Index Per Article: 105.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2005] [Accepted: 09/08/2005] [Indexed: 12/29/2022]
Abstract
Systematic mapping of protein-protein interactions, or 'interactome' mapping, was initiated in model organisms, starting with defined biological processes and then expanding to the scale of the proteome. Although far from complete, such maps have revealed global topological and dynamic features of interactome networks that relate to known biological properties, suggesting that a human interactome map will provide insight into development and disease mechanisms at a systems level. Here we describe an initial version of a proteome-scale map of human binary protein-protein interactions. Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise interactions among the products of approximately 8,100 currently available Gateway-cloned open reading frames and detected approximately 2,800 interactions. This data set, called CCSB-HI1, has a verification rate of approximately 78% as revealed by an independent co-affinity purification assay, and correlates significantly with other biological attributes. The CCSB-HI1 data set increases by approximately 70% the set of available binary interactions within the tested space and reveals more than 300 new connections to over 100 disease-associated proteins. This work represents an important step towards a systematic and comprehensive human interactome project.
Collapse
Affiliation(s)
- Jean-François Rual
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Harvard Medical School, 44 Binney Street, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
299
|
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci U S A 2005; 102:13544-9. [PMID: 16174746 PMCID: PMC1200092 DOI: 10.1073/pnas.0506577102] [Citation(s) in RCA: 448] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accurate and rapid identification of perturbed pathways through the analysis of genome-wide expression profiles facilitates the generation of biological hypotheses. We propose a statistical framework for determining whether a specified group of genes for a pathway has a coordinated association with a phenotype of interest. Several issues on proper hypothesis-testing procedures are clarified. In particular, it is shown that the differences in the correlation structure of each set of genes can lead to a biased comparison among gene sets unless a normalization procedure is applied. We propose statistical tests for two important but different aspects of association for each group of genes. This approach has more statistical power than currently available methods and can result in the discovery of statistically significant pathways that are not detected by other methods. This method is applied to data sets involving diabetes, inflammatory myopathies, and Alzheimer's disease, using gene sets we compiled from various public databases. In the case of inflammatory myopathies, we have correctly identified the known cytotoxic T lymphocyte-mediated autoimmunity in inclusion body myositis. Furthermore, we predicted the presence of dendritic cells in inclusion body myositis and of an IFN-alpha/beta response in dermatomyositis, neither of which was previously described. These predictions have been subsequently corroborated by immunohistochemistry.
Collapse
Affiliation(s)
- Lu Tian
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, 680 North Lake Shore Drive, Chicago, IL 60611, USA
| | | | | | | | | | | |
Collapse
|
300
|
Khatri P, Sellamuthu S, Malhotra P, Amin K, Done A, Draghici S. Recent additions and improvements to the Onto-Tools. Nucleic Acids Res 2005; 33:W762-5. [PMID: 15980579 PMCID: PMC1160233 DOI: 10.1093/nar/gki472] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Onto-Tools suite is composed of an annotation database and six seamlessly integrated, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner and Pathway-Express. The Onto-Tools database has been expanded to include various types of data from 12 new databases. Our database now integrates different types of genomic data from 19 sequence, gene, protein and annotation databases. Additionally, our database is also expanded to include complete Gene Ontology (GO) annotations. Using the enhanced database and GO annotations, Onto-Express now allows functional profiling for 24 organisms and supports 17 different types of input IDs. Onto-Translate is also enhanced to fully utilize the capabilities of the new Onto-Tools database with an ultimate goal of providing the users with a non-redundant and complete mapping from any type of identification system to any other type. Currently, Onto-Translate allows arbitrary mappings between 29 types of IDs. Pathway-Express is a new tool that helps the users find the most interesting pathways for their input list of genes. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
Collapse
Affiliation(s)
| | | | | | | | | | - Sorin Draghici
- To whom correspondence should be addressed. Tel: +1 313 577 5484; Fax: +1 313 577 6868;
| |
Collapse
|