Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Berriz GF, King OD, Bryant B, Sander C, Roth FP. Characterizing gene sets with FuncAssociate. Bioinformatics 2004;19:2502-4. [PMID: 14668247 DOI: 10.1093/bioinformatics/btg363] [Citation(s) in RCA: 372] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

251

Nettleton D, Recknor J, Reecy JM. Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. ACTA ACUST UNITED AC 2007;24:192-201. [PMID: 18042553 DOI: 10.1093/bioinformatics/btm583] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

252

Marco A, Marín I. A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification. BMC Bioinformatics 2007;8:442. [PMID: 18005402 PMCID: PMC2213689 DOI: 10.1186/1471-2105-8-442] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2007] [Accepted: 11/15/2007] [Indexed: 11/18/2022] Open

253

Zhong S, Xie D. Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework. Artif Intell Med 2007;41:105-15. [PMID: 17913480 DOI: 10.1016/j.artmed.2007.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2006] [Revised: 08/02/2007] [Accepted: 08/03/2007] [Indexed: 10/22/2022]

Abstract

OBJECTIVE

Gene Ontology (GO) has become a routine resource for functional analysis of gene lists. Although a number of tools have been provided to identify enriched GO terms in one or two gene lists, two technical challenges remain. First, how to handle multiple hypothesis testing in the analysis given that the tests are heavily correlated; second, how to identify GO terms that are enriched in a gene cluster, as compared to multiple other gene clusters. We provide a statistical procedure to rigorously treat these problems and offer a software tool for applying GO to the analysis of gene clusters.

METHODS

We previously introduced a statistical procedure that handles hypothesis testing in a two-group comparison scenario. In this paper we extend the two-group comparison procedure into a general procedure that enables the analysis of any number of gene lists/clusters. This new procedure enables identification of GO terms enriched in any gene cluster, while it controls for multiple hypothesis testing. This procedure is implemented into a user-friendly analysis tool: GoSurfer. The current version of GoSurfer takes one or several gene lists as input, and it identifies the GO terms that are enriched in any of the input gene lists. GoSurfer estimates a conservative false discovery rate (FDR) for every GO term. The FDR estimation procedure in GoSurfer has two advantages: it does not rely on independence assumption, and it does not assume all the hypotheses are null hypothesis (complete null). Thus GoSurfer's FDR estimates are mildly conservative rather than overly conservative.

RESULTS

We implemented the new procedure for GO analysis in multiple gene clusters into the GoSurfer software. We provide three examples on using GoSurfer to analyze time course gene expression data sets on the differentiation of embryonic stem cells. In the example of analysis of multiple gene clusters, we first used a typical clustering algorithm and identified five gene clusters, representing up-regulation, down-regulation and other patterns in the differentiation time course. Taking all the five gene clusters as input data, GoSurfer reports "cell adhesion" and "muscle contraction" as significant GO terms for the up-regulated cluster, "amino acids metabolism" as a significant GO term for the down-regulated gene cluster, and GoSurfer reports a number of GO terms related to RNA processing and RNA transport as significant terms to a cluster that is up-regulated in both early and late time points. This may suggest that genes for RNA processing and genes for RNA transport are coregulated in the differentiation process of embryonic stem cells.

CONCLUSION

The GoSurfer software is provided to analyze multiple gene clusters and identify GO terms that are enriched in any gene cluster. Gosurfer is available at: www.gosurfer.org.

Collapse

254

Gupta M, Ibrahim JG. Variable Selection in Regression Mixture Modeling for the Discovery of Gene Regulatory Networks. J Am Stat Assoc 2007. [DOI: 10.1198/016214507000000068] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

255

Witten JT, Chen CTL, Cohen BA. Complex genetic changes in strains of Saccharomyces cerevisiae derived by selection in the laboratory. Genetics 2007;177:449-56. [PMID: 17660538 PMCID: PMC2013722 DOI: 10.1534/genetics.107.077859] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

256

Zhou X, Su Z. EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species. BMC Genomics 2007;8:246. [PMID: 17645808 PMCID: PMC1940007 DOI: 10.1186/1471-2164-8-246] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 07/24/2007] [Indexed: 11/10/2022] Open

257

Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res 2007;35:W193-200. [PMID: 17478515 PMCID: PMC1933153 DOI: 10.1093/nar/gkm226] [Citation(s) in RCA: 859] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Revised: 03/22/2007] [Accepted: 03/28/2007] [Indexed: 02/02/2023] Open

258

Lerman G, Shakhnovich BE. Defining functional distance using manifold embeddings of gene ontology annotations. Proc Natl Acad Sci U S A 2007;104:11334-9. [PMID: 17595300 PMCID: PMC2040899 DOI: 10.1073/pnas.0702965104] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

259

Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, Lempicki RA. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 2007;35:W169-75. [PMID: 17576678 PMCID: PMC1933169 DOI: 10.1093/nar/gkm415] [Citation(s) in RCA: 1567] [Impact Index Per Article: 92.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Affiliation(s)

Da Wei Huang Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Brad T. Sherman Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Qina Tan Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Joseph Kir Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
David Liu Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
David Bryant Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Yongjian Guo Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Robert Stephens Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Michael W. Baseler Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
H. Clifford Lane Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
Richard A. Lempicki Laboratory of Immunopathogenesis and Bioinformatics, Advanced Biomedical Computing Center, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, MD 21702, USA, Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA, Bioinformatics and Scientific IT Program, NIAID Office of Technology Information Systems, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA *To whom correspondence should be addressed. +1-301-846-7114301-846-7672

Collapse

260

Kramer RW, Slagowski NL, Eze NA, Giddings KS, Morrison MF, Siggers KA, Starnbach MN, Lesser CF. Yeast functional genomic screens lead to identification of a role for a bacterial effector in innate immunity regulation. PLoS Pathog 2007;3:e21. [PMID: 17305427 PMCID: PMC1797620 DOI: 10.1371/journal.ppat.0030021] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 01/02/2007] [Indexed: 11/18/2022] Open

Abstract

Numerous bacterial pathogens manipulate host cell processes to promote infection and ultimately cause disease through the action of proteins that they directly inject into host cells. Identification of the targets and molecular mechanisms of action used by these bacterial effector proteins is critical to understanding pathogenesis. We have developed a systems biological approach using the yeast Saccharomyces cerevisiae that can expedite the identification of cellular processes targeted by bacterial effector proteins. We systematically screened the viable yeast haploid deletion strain collection for mutants hypersensitive to expression of the Shigella type III effector OspF. Statistical data mining of the results identified several cellular processes, including cell wall biogenesis, which when impaired by a deletion caused yeast to be hypersensitive to OspF expression. Microarray experiments revealed that OspF expression resulted in reversed regulation of genes regulated by the yeast cell wall integrity pathway. The yeast cell wall integrity pathway is a highly conserved mitogen-activated protein kinase (MAPK) signaling pathway, normally activated in response to cell wall perturbations. Together these results led us to hypothesize and subsequently demonstrate that OspF inhibited both yeast and mammalian MAPK signaling cascades. Furthermore, inhibition of MAPK signaling by OspF is associated with attenuation of the host innate immune response to Shigella infection in a mouse model. These studies demonstrate how yeast systems biology can facilitate functional characterization of pathogenic bacterial effector proteins.

Many bacterial pathogens use specialized secretion systems to deliver effector proteins directly into host cells. The effector proteins mediate the subversion or inhibition of host cell processes to promote survival of the pathogens. Although these proteins are critical elements of pathogenesis, relatively few are well characterized. They often lack significant homology to proteins of known function, and they present special challenges, biological and practical, to study in vivo. For example, their functions often appear to be redundant or synergistic, and the organisms that produce them can be dangerous or difficult to culture, requiring special facilities. The yeast Saccharomyces cerevisiae has recently emerged as a model system to both identify and functionally characterize effector proteins. This work describes how genome-wide phenotypic screens and mRNA profiling of yeast expressing the Shigella effector OspF led to the discovery that OspF inhibits mitogen-activated protein kinase signaling in both yeast and mammalian cells. This inhibition of mitogen-activated protein kinase signaling is associated with attenuation of the host innate immune response. This study demonstrates how yeast functional genomic studies can contribute to the understanding of pathogenic effector proteins.

Collapse

261

Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P. Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann Appl Stat 2007. [DOI: 10.1214/07-aoas104] [Citation(s) in RCA: 175] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

262

Vermeirssen V, Barrasa MI, Hidalgo CA, Babon JAB, Sequerra R, Doucette-Stamm L, Barabási AL, Walhout AJ. Transcription factor modularity in a gene-centered C. elegans core neuronal protein-DNA interaction network. Genome Res 2007;17:1061-71. [PMID: 17513831 PMCID: PMC1899117 DOI: 10.1101/gr.6148107] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

263

Ni JZ, Grate L, Donohue JP, Preston C, Nobida N, O’Brien G, Shiue L, Clark TA, Blume JE, Ares M. Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Dev 2007;21:708-18. [PMID: 17369403 PMCID: PMC1820944 DOI: 10.1101/gad.1525507] [Citation(s) in RCA: 381] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

264

Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A. GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol 2007;8:R3. [PMID: 17204154 PMCID: PMC1839127 DOI: 10.1186/gb-2007-8-1-r3] [Citation(s) in RCA: 493] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 09/29/2006] [Accepted: 01/04/2007] [Indexed: 12/01/2022] Open

265

Wiest MM, Watkins SM. Biomarker discovery using high-dimensional lipid analysis. Curr Opin Lipidol 2007;18:181-6. [PMID: 17353667 DOI: 10.1097/mol.0b013e3280895d82] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

266

Liu J, Hughes-Oliver JM, Menius JA. Domain-enhanced analysis of microarray data using GO annotations. Bioinformatics 2007;23:1225-34. [PMID: 17379692 DOI: 10.1093/bioinformatics/btm092] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

267

Ye C, Eskin E. Discovering tightly regulated and differentially expressed gene sets in whole genome expression data. Bioinformatics 2007;23:e84-90. [PMID: 17237110 DOI: 10.1093/bioinformatics/btl315] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

268

Bresell A, Servenius B, Persson B. Ontology annotation treebrowser : an interactive tool where the complementarity of medical subject headings and gene ontology improves the interpretation of gene lists. ACTA ACUST UNITED AC 2007;5:225-36. [PMID: 17140269 DOI: 10.2165/00822942-200605040-00005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Abstract

Gene expression and proteomics analysis allow the investigation of thousands of biomolecules in parallel. This results in a long list of interesting genes or proteins and a list of annotation terms in the order of thousands. It is not a trivial task to understand such a gene list and it would require extensive efforts to bring together the overwhelming amounts of associated information from the literature and databases. Thus, it is evident that we need ways of condensing and filtering this information. An excellent way to represent knowledge is to use ontologies, where it is possible to group genes or terms with overlapping context, rather than studying one-dimensional lists of keywords. Therefore, we have built the ontology annotation treebrowser (OAT) to represent, condense, filter and summarise the knowledge associated with a list of genes or proteins. The OAT system consists of two disjointed parts; a MySQL database named OATdb, and a treebrowser engine that is implemented as a web interface. The OAT system is implemented using Perl scripts on an Apache web server and the gene, ontology and annotation data is stored in a relational MySQL database. In OAT, we have harmonized the two ontologies of medical subject headings (MeSH) and gene ontology (GO), to enable us to use knowledge both from the literature and the annotation projects in the same tool. OAT includes multiple gene identifier sets, which are merged internally in the OAT database. We have also generated novel MeSH annotations by mapping accession numbers to MEDLINE entries. The ontology browser OAT was created to facilitate the analysis of gene lists. It can be browsed dynamically, so that a scientist can interact with the data and govern the outcome. Test statistics show which branches are enriched. We also show that the two ontologies complement each other, with surprisingly low overlap, by mapping annotations to the Unified Medical Language System. We have developed a novel interactive annotation browser that is the first to incorporate both MeSH and GO for improved interpretation of gene lists. With OAT, we illustrate the benefits of combining MeSH and GO for understanding gene lists. OAT is available as a public web service at: http://www.ifm.liu.se/bioinfo/oat.

Collapse

269

Schmidt MW, Houseman A, Ivanov AR, Wolf DA. Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe. Mol Syst Biol 2007;3:79. [PMID: 17299416 PMCID: PMC1828747 DOI: 10.1038/msb4100117] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2006] [Accepted: 12/13/2006] [Indexed: 02/04/2023] Open

270

Prüfer K, Muetzel B, Do HH, Weiss G, Khaitovich P, Rahm E, Pääbo S, Lachmann M, Enard W. FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 2007;8:41. [PMID: 17284313 PMCID: PMC1800870 DOI: 10.1186/1471-2105-8-41] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 02/06/2007] [Indexed: 11/17/2022] Open

271

Henegar C, Cancello R, Rome S, Vidal H, Clément K, Zucker JD. Clustering biological annotations and gene expression data to identify putatively co-regulated biological processes. J Bioinform Comput Biol 2007;4:833-52. [PMID: 17007070 DOI: 10.1142/s0219720006002181] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 03/04/2006] [Accepted: 03/24/2006] [Indexed: 01/04/2023]

272

Kirov SA, Zhang B, Snoddy JR. Association analysis for large-scale gene set data. Methods Mol Biol 2007;408:19-33. [PMID: 18314575 DOI: 10.1007/978-1-59745-547-3_2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

273

Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 2007;8:R183. [PMID: 17784955 PMCID: PMC2375021 DOI: 10.1186/gb-2007-8-9-r183] [Citation(s) in RCA: 1687] [Impact Index Per Article: 99.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Revised: 04/20/2007] [Accepted: 09/04/2007] [Indexed: 12/16/2022] Open

274

Dopazo J. Functional interpretation of microarray experiments. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006;10:398-410. [PMID: 17069516 DOI: 10.1089/omi.2006.10.398] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

275

McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC. AgBase: a unified resource for functional analysis in agriculture. Nucleic Acids Res 2006;35:D599-603. [PMID: 17135208 PMCID: PMC1751552 DOI: 10.1093/nar/gkl936] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

276

Liu X, Wang L. Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 2006;23:50-6. [PMID: 17090578 DOI: 10.1093/bioinformatics/btl560] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

277

Wang H, Chua NH, Wang XJ. Prediction of trans-antisense transcripts in Arabidopsis thaliana. Genome Biol 2006;7:R92. [PMID: 17040561 PMCID: PMC1794575 DOI: 10.1186/gb-2006-7-10-r92] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2006] [Revised: 10/02/2006] [Accepted: 10/13/2006] [Indexed: 11/10/2022] Open

278

Kankainen M, Brader G, Törönen P, Palva ET, Holm L. Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana. Nucleic Acids Res 2006;34:e124. [PMID: 17003050 PMCID: PMC1636450 DOI: 10.1093/nar/gkl694] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

279

Auld KL, Hitchcock AL, Doherty HK, Frietze S, Huang LS, Silver PA. The conserved ATPase Get3/Arr4 modulates the activity of membrane-associated proteins in Saccharomyces cerevisiae. Genetics 2006;174:215-27. [PMID: 16816426 PMCID: PMC1569774 DOI: 10.1534/genetics.106.058362] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2006] [Accepted: 06/19/2006] [Indexed: 01/09/2023] Open

280

Antonov AV, Mewes HW. Complex functionality of gene groups identified from high-throughput data. J Mol Biol 2006;363:289-96. [PMID: 16959266 DOI: 10.1016/j.jmb.2006.07.062] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2006] [Revised: 07/24/2006] [Accepted: 07/25/2006] [Indexed: 12/19/2022]

281

Zhong S, Tian L, Li C, Storch KF, Wong WH. Comparative analysis of gene sets in the Gene Ontology space under the multiple hypothesis testing framework. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:425-35. [PMID: 16448035 DOI: 10.1109/csb.2004.1332455] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

282

Zheng X, Baker H, Hancock WS. Analysis of the low molecular weight serum peptidome using ultrafiltration and a hybrid ion trap-Fourier transform mass spectrometer. J Chromatogr A 2006;1120:173-84. [PMID: 16527286 DOI: 10.1016/j.chroma.2006.01.098] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2005] [Revised: 11/18/2005] [Accepted: 01/24/2006] [Indexed: 11/26/2022]

283

Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 2006;7:280. [PMID: 16749936 PMCID: PMC1502140 DOI: 10.1186/1471-2105-7-280] [Citation(s) in RCA: 197] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2006] [Accepted: 06/02/2006] [Indexed: 12/23/2022] Open

Abstract

Background

The learning of global genetic regulatory networks from expression data is a severely under-constrained problem that is aided by reducing the dimensionality of the search space by means of clustering genes into putatively co-regulated groups, as opposed to those that are simply co-expressed. Be cause genes may be co-regulated only across a subset of all observed experimental conditions, biclustering (clustering of genes and conditions) is more appropriate than standard clustering. Co-regulated genes are also often functionally (physically, spatially, genetically, and/or evolutionarily) associated, and such a priori known or pre-computed associations can provide support for appropriately grouping genes. One important association is the presence of one or more common cis-regulatory motifs. In organisms where these motifs are not known, their de novo detection, integrated into the clustering algorithm, can help to guide the process towards more biologically parsimonious solutions.

Results

We have developed an algorithm, cMonkey, that detects putative co-regulated gene groupings by integrating the biclustering of gene expression data and various functional associations with the de novo detection of sequence motifs.

Conclusion

We have applied this procedure to the archaeon Halobacterium NRC-1, as part of our efforts to decipher its regulatory network. In addition, we used cMonkey on public data for three organisms in the other two domains of life: Helicobacter pylori, Saccharomyces cerevisiae, and Escherichia coli. The biclusters detected by cMonkey both recapitulated known biology and enabled novel predictions (some for Halobacterium were subsequently confirmed in the laboratory). For example, it identified the bacteriorhodopsin regulon, assigned additional genes to this regulon with apparently unrelated function, and detected its known promoter motif. We have performed a thorough comparison of cMonkey results against other clustering methods, and find that cMonkey biclusters are more parsimonious with all available evidence for co-regulation.

Collapse

284

Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006. [DOI: 10.1093/bioinformatics/btl060\] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

285

Semeiks JR, Rizki A, Bissell MJ, Mian IS. Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features. BMC Bioinformatics 2006;7:147. [PMID: 16542449 PMCID: PMC1435935 DOI: 10.1186/1471-2105-7-147] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2005] [Accepted: 03/16/2006] [Indexed: 11/17/2022] Open

Abstract

Background

Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells.

Results

Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occured consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology/signal transduction and nucleic acid binding/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity.

Conclusion

Given a set of genes and/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.

Collapse

286

Edwards KD, Anderson PE, Hall A, Salathia NS, Locke JCW, Lynn JR, Straume M, Smith JQ, Millar AJ. FLOWERING LOCUS C mediates natural variation in the high-temperature response of the Arabidopsis circadian clock. THE PLANT CELL 2006;18:639-50. [PMID: 16473970 PMCID: PMC1383639 DOI: 10.1105/tpc.105.038315] [Citation(s) in RCA: 225] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

287

Auld KL, Brown CR, Casolari JM, Komili S, Silver PA. Genomic Association of the Proteasome Demonstrates Overlapping Gene Regulatory Activity with Transcription Factor Substrates. Mol Cell 2006;21:861-71. [PMID: 16543154 DOI: 10.1016/j.molcel.2006.02.020] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2005] [Revised: 10/31/2005] [Accepted: 02/21/2006] [Indexed: 12/18/2022]

288

Bachand F, Lackner DH, Bähler J, Silver PA. Autoregulation of ribosome biosynthesis by a translational response in fission yeast. Mol Cell Biol 2006;26:1731-42. [PMID: 16478994 PMCID: PMC1430238 DOI: 10.1128/mcb.26.5.1731-1742.2006] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Revised: 08/29/2005] [Accepted: 12/05/2005] [Indexed: 11/20/2022] Open

289

Prelić A, Bleuler S, Zimmermann P, Wille A, Bühlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006;22:1122-9. [PMID: 16500941 DOI: 10.1093/bioinformatics/btl060] [Citation(s) in RCA: 334] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

290

Vêncio RZN, Koide T, Gomes SL, de B Pereira CA. BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics 2006;7:86. [PMID: 16504085 PMCID: PMC1440873 DOI: 10.1186/1471-2105-7-86] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2005] [Accepted: 02/23/2006] [Indexed: 01/21/2023] Open

291

Titz B, Thomas S, Rajagopala SV, Chiba T, Ito T, Uetz P. Transcriptional activators in yeast. Nucleic Acids Res 2006;34:955-67. [PMID: 16464826 PMCID: PMC1361621 DOI: 10.1093/nar/gkj493] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

292

Lall S, Grün D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, Kao HL, Gunsalus KC, Pachter L, Piano F, Rajewsky N. A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol 2006;16:460-71. [PMID: 16458514 DOI: 10.1016/j.cub.2006.01.050] [Citation(s) in RCA: 346] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 01/19/2006] [Accepted: 01/24/2006] [Indexed: 12/19/2022]

293

Yang C, Zeng E, Li T, Narasimhan G. Clustering genes using gene expression and text literature data. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:329-40. [PMID: 16447990 DOI: 10.1109/csb.2005.23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

294

An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

295

Sabatine MS, Liu E, Morrow DA, Heller E, McCarroll R, Wiegand R, Berriz GF, Roth FP, Gerszten RE. Metabolomic identification of novel biomarkers of myocardial ischemia. Circulation 2005;112:3868-75. [PMID: 16344383 DOI: 10.1161/circulationaha.105.569137] [Citation(s) in RCA: 381] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

296

Klekota J, Brauner E, Schreiber SL. Identifying Biologically Active Compound Classes Using Phenotypic Screening Data and Sampling Statistics. J Chem Inf Model 2005;45:1824-36. [PMID: 16309290 DOI: 10.1021/ci050087d] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

297

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545-50. [PMID: 16199517 PMCID: PMC1239896 DOI: 10.1073/pnas.0506580102] [Citation(s) in RCA: 32211] [Impact Index Per Article: 1695.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

298

Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP, Vidal M. Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005;437:1173-8. [PMID: 16189514 DOI: 10.1038/nature04209] [Citation(s) in RCA: 2000] [Impact Index Per Article: 105.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2005] [Accepted: 09/08/2005] [Indexed: 12/29/2022]

299

Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci U S A 2005;102:13544-9. [PMID: 16174746 PMCID: PMC1200092 DOI: 10.1073/pnas.0506577102] [Citation(s) in RCA: 448] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

300

Khatri P, Sellamuthu S, Malhotra P, Amin K, Done A, Draghici S. Recent additions and improvements to the Onto-Tools. Nucleic Acids Res 2005;33:W762-5. [PMID: 15980579 PMCID: PMC1160233 DOI: 10.1093/nar/gki472] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open