1
|
Abstract
Transcriptomes are known to organize themselves into gene co-expression clusters or modules where groups of genes display distinct patterns of coordinated or synchronous expression across independent biological samples. The functional significance of these co-expression clusters is suggested by the fact that highly coexpressed groups of genes tend to be enriched in genes involved in common functions and biological processes. While gene co-expression is widely assumed to reflect close regulatory proximity, the validity of this assumption remains unclear. Here we use a simple synthetic gene regulatory network (GRN) model and contrast the resulting co-expression structure produced by these networks with their known regulatory architecture and with the co-expression structure measured in available human expression data. Using randomization tests, we found that the levels of co-expression observed in simulated expression data were, just as with empirical data, significantly higher than expected by chance. When examining the source of correlated expression, we found that individual regulators, both in simulated and experimental data, fail, on average, to display correlated expression with their immediate targets. However, highly correlated gene pairs tend to share at least one common regulator, while most gene pairs sharing common regulators do not necessarily display correlated expression. Our results demonstrate that widespread co-expression naturally emerges in regulatory networks, and that it is a reliable and direct indicator of active co-regulation in a given cellular context.
Collapse
|
2
|
Rodriguez-Baena DS, Perez-Pulido AJ, Aguilar-Ruiz JS. A biclustering algorithm for extracting bit-patterns from binary datasets. ACTA ACUST UNITED AC 2011; 27:2738-45. [PMID: 21824973 DOI: 10.1093/bioinformatics/btr464] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. RESULTS A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. AVAILABILITY The source and binary codes, the datasets used in the experiments and the results can be found at: http://www.upo.es/eps/bigs/BiBit.html CONTACT dsrodbae@upo.es SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
3
|
Marco A, Konikoff C, Karr TL, Kumar S. Relationship between gene co-expression and sharing of transcription factor binding sites in Drosophila melanogaster. Bioinformatics 2009; 25:2473-7. [PMID: 19633094 PMCID: PMC2752616 DOI: 10.1093/bioinformatics/btp462] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2009] [Revised: 07/03/2009] [Accepted: 07/22/2009] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION In functional genomics, it is frequently useful to correlate expression levels of genes to identify transcription factor binding sites (TFBS) via the presence of common sequence motifs. The underlying assumption is that co-expressed genes are more likely to contain shared TFBS and, thus, TFBS can be identified computationally. Indeed, gene pairs with a very high expression correlation show a significant excess of shared binding sites in yeast. We have tested this assumption in a more complex organism, Drosophila melanogaster, by using experimentally determined TFBS and microarray expression data. We have also examined the reverse relationship between the expression correlation and the extent of TFBS sharing. RESULTS Pairs of genes with shared TFBS show, on average, a higher degree of co-expression than those with no common TFBS in Drosophila. However, the reverse does not hold true: gene pairs with high expression correlations do not share significantly larger numbers of TFBS. Exception to this observation exists when comparing expression of genes from the earliest stages of embryonic development. Interestingly, semantic similarity between gene annotations (Biological Process) is much better associated with TFBS sharing, as compared to the expression correlation. We discuss these results in light of reverse engineering approaches to computationally predict regulatory sequences by using comparative genomics.
Collapse
Affiliation(s)
- Antonio Marco
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.
| | | | | | | |
Collapse
|
4
|
Mühlberger I, Perco P, Fechete R, Mayer B, Oberbauer R. Biomarkers in renal transplantation ischemia reperfusion injury. Transplantation 2009; 88:S14-9. [PMID: 19667956 DOI: 10.1097/tp.0b013e3181af65b5] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Ischemia reperfusion injury (IRI) is a choreographed process leading to delayed graft function (DGF) and reduced long-term patency of the transplanted organ. Early identification of recipients of grafts at risk would allow modification of the posttransplant management, and thereby potentially improve short- and long-term outcomes. The recently emerged "omics" technologies together with bioinformatics workup have allowed the integration and analysis of IRI-associated molecular profiles in the context of DGF. Such a systems biological approach promises qualitative information about interdependencies of complex processes such as IRI regulation, rather than offering descriptive tables of differentially regulated features on a transcriptome, proteome, or metabolome level leaking the functional, biological framework. In deceased-donor kidney transplantation as the primary causative factor resulting in IRI and DGF, a distinct signature and choreography of molecular events in the graft before harvesting seems to be associated with subsequent DGF. A systems biological assessment of these molecular changes suggests that processes along inflammation are of pivotal importance for the early stage of IRI. The causal proof of this association has been tested by a double-blinded, randomized, controlled trial of steroid or placebo infusion into deceased donors before the organs were harvested. Thorough systems biological analysis revealed a panel of biomarkers with excellent discrimination. In summary, integrated analysis of omics data has brought forward biomarker candidates and candidate panels that promise early assessment of IRI. However, the clinical utility of these markers still needs to be established in prospective trials in independent patient populations.
Collapse
|
5
|
Hu ZZ, Huang H, Cheema A, Jung M, Dritschilo A, Wu CH. Integrated Bioinformatics for Radiation-Induced Pathway Analysis from Proteomics and Microarray Data. JOURNAL OF PROTEOMICS & BIOINFORMATICS 2008; 1:47-60. [PMID: 19088860 PMCID: PMC2603135 DOI: 10.4172/jpb.1000009] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Functional analysis and interpretation of large-scale proteomics and gene expression data require effective use of bioinformatics tools and public knowledge resources coupled with expert-guided examination. An integrated bioinformatics approach was used to analyze cellular pathways in response to ionizing radiation. ATM, or ataxia-telangiectasia mutated , a serine-threonine protein kinase, plays critical roles in radiation responses, including cell cycle arrest and DNA repair. We analyzed radiation responsive pathways based on 2D-gel/MS proteomics and microarray gene expression data from fibroblasts expressing wild type or mutant ATM gene. The analysis showed that metabolism was significantly affected by radiation in an ATM dependent manner. In particular, purine metabolic pathways were differentially changed in the two cell lines. The expression of ribonucleoside-diphosphate reductase subunit M2 (RRM2) was increased in ATM-wild type cells at both mRNA and protein levels, but no changes were detected in ATM-mutated cells. Increased expression of p53 was observed 30min after irradiation of the ATM-wild type cells. These results suggest that RRM2 is a downstream target of the ATM-p53 pathway that mediates radiation-induced DNA repair. We demonstrated that the integrated bioinformatics approach facilitated pathway analysis, hypothesis generation and target gene/protein identification.
Collapse
Affiliation(s)
- Zhang-Zhi Hu
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Hongzhan Huang
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Amrita Cheema
- Proteomics and Metabolomics Shared Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Mira Jung
- Department of Radiation Medicine, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Anatoly Dritschilo
- Department of Radiation Medicine, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H. Wu
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA
| |
Collapse
|
6
|
Klepper K, Sandve GK, Abul O, Johansen J, Drablos F. Assessment of composite motif discovery methods. BMC Bioinformatics 2008; 9:123. [PMID: 18302777 PMCID: PMC2311304 DOI: 10.1186/1471-2105-9-123] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2007] [Accepted: 02/26/2008] [Indexed: 12/26/2022] Open
Abstract
Background Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. Results We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Conclusion Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.
Collapse
Affiliation(s)
- Kjetil Klepper
- Department of Cancer Reasearch and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| | | | | | | | | |
Collapse
|
7
|
Rapberger R, Perco P, Sax C, Pangerl T, Siehs C, Pils D, Bernthaler A, Lukas A, Mayer B, Krainer M. Linking the ovarian cancer transcriptome and immunome. BMC SYSTEMS BIOLOGY 2008; 2:2. [PMID: 18173842 PMCID: PMC2265674 DOI: 10.1186/1752-0509-2-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2007] [Accepted: 01/03/2008] [Indexed: 01/17/2023]
Abstract
BACKGROUND Autoantigens have been reported in a variety of tumors, providing insight into the interplay between malignancies and the immune response, and also giving rise to novel diagnostic and therapeutic concepts. Why certain tumor-associated proteins induce an immune response remains largely elusive. RESULTS This paper analyzes the proposed link between increased abundance of a protein in cancerous tissue and the increased potential of the protein for induction of a humoral immune response, using ovarian cancer as an example. Public domain data sources on differential gene expression and on autoantigens associated with this malignancy were extracted and compared, using bioinformatics analysis, on the levels of individual genes and proteins, transcriptional coregulation, joint functional pathways, and shared protein-protein interaction networks. Finally, a selected list of ovarian cancer-associated, differentially regulated proteins was tested experimentally for reactivity with antibodies prevalent in sera of ovarian cancer patients.Genes reported as showing differential expression in ovarian cancer exhibited only minor overlap with the public domain list of ovarian cancer autoantigens. However, experimental screening for antibodies directed against antigenic determinants from ovarian cancer-associated proteins yielded clear reactions with sera. CONCLUSION A link between tumor protein abundance and the likelihood of induction of a humoral immune response in ovarian cancer appears evident.
Collapse
Affiliation(s)
- Ronald Rapberger
- Institute for Theoretical Chemistry, University of Vienna, Währinger Strasse 17, A-1090 Vienna, Austria
- University Clinics for Internal Medicine I, Medical University of Vienna, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| | - Paul Perco
- Institute for Theoretical Chemistry, University of Vienna, Währinger Strasse 17, A-1090 Vienna, Austria
| | - Cornelia Sax
- University Clinics for Internal Medicine I, Medical University of Vienna, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| | - Thomas Pangerl
- University Clinics for Internal Medicine I, Medical University of Vienna, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| | - Christian Siehs
- Institute for Theoretical Chemistry, University of Vienna, Währinger Strasse 17, A-1090 Vienna, Austria
| | - Dietmar Pils
- University Clinics for Internal Medicine I, Medical University of Vienna, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| | - Andreas Bernthaler
- emergentec biodevelopment GmbH, Rathausstrasse 5/3, A-1010 Vienna, Austria
| | - Arno Lukas
- emergentec biodevelopment GmbH, Rathausstrasse 5/3, A-1010 Vienna, Austria
| | - Bernd Mayer
- Institute for Theoretical Chemistry, University of Vienna, Währinger Strasse 17, A-1090 Vienna, Austria
- emergentec biodevelopment GmbH, Rathausstrasse 5/3, A-1010 Vienna, Austria
| | - Michael Krainer
- University Clinics for Internal Medicine I, Medical University of Vienna, Währinger Gürtel 18-20, A-1090 Vienna, Austria
| |
Collapse
|
8
|
Perco P, Rapberger R, Siehs C, Lukas A, Oberbauer R, Mayer G, Mayer B. Transforming omics data into context: Bioinformatics on genomics and proteomics raw data. Electrophoresis 2006; 27:2659-75. [PMID: 16739231 DOI: 10.1002/elps.200600064] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Differential gene expression analysis and proteomics have exerted significant impact on the elucidation of concerted cellular processes, as simultaneous measurement of hundreds to thousands of individual objects on the level of RNA and protein ensembles became technically feasible. The availability of such data sets has promised a profound understanding of phenomena on an aggregate level, expressed as the phenotypic response (observables) of cells, e.g., in the presence of drugs, or characterization of cells and tissue displaying distinct patho-physiological states. However, the step of transforming these data into context, i.e., linking distinct expression or abundance patterns with phenotypic observables - and furthermore enabling a sound biological interpretation on the level of reaction networks and concerted pathways, is still a major shortcoming. This finding is certainly based on the enormous complexity embedded in cellular reaction networks, but a variety of computational approaches have been developed over the last few years to overcome these issues. This review provides an overview on computational procedures for analysis of genomic and proteomic data introducing a sequential analysis workflow: Explorative statistics for deriving a first, from the purely statistical viewpoint, relevant candidate gene/protein list, followed by co-regulation and network analysis to biologically expand this core list toward functional networks and pathways. The review on these procedures is complemented by example applications tailored at identification of disease-associated proteins. Optimization of computational procedures involved, in conjunction with the continuous increase in additional biological data, clearly has the potential of boosting our understanding of processes on a cell-wide level.
Collapse
Affiliation(s)
- Paul Perco
- Department of Nephrology, Medical University of Vienna, Austria
| | | | | | | | | | | | | |
Collapse
|
9
|
Perco P, Blaha P, Kainz A, Mayer B, Hauser P, Wekerle T, Oberbauer R. Molecular signature of mice T lymphocytes following tolerance induction by allogeneic BMT and CD40-CD40L costimulation blockade. Transpl Int 2006; 19:146-57. [PMID: 16441364 DOI: 10.1111/j.1432-2277.2005.00241.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Tolerance induction by mixed chimerism and costimulation blockade is a promising approach to avoid immunosuppression, but the molecular basis of tolerant T lymphocytes remains elusive. We investigated the genome-wide gene expression profile of murine T lymphocytes after tolerance induction by allogeneic bone marrow transplantation (BMT) and costimulatory blockade using the anti-CD40L antibody MR1. Molecular functions, biological processes, cellular locations, and coregulation of identified genes were determined. A total of 113 unique genes exhibited a significant differential expression between the lymphocytes of MR1-treated Tolerance (TOL) and untreated recipients Control (CTRL). The majority of genes upregulated in the TOL group are involved in several signal transduction cascades such as members of the MAPKKK cascade (IL6, Tob2, Stk39, and Dusp24). Other genes involved in lymphocyte differentiation and highly expressed in the TOL group are lymphotactin, the estrogen receptors (ERs) and the suppressor of cytokine signaling 7. Common transcription factors such as ER 1 alpha, GATA-binding protein 1, insulin promoter factor 1, and paired-related homeobox 2 could be identified in the promoter regions of upregulated genes in the TOL group. These data suggest that T lymphoctes of tolerant mice exhibit a distinct molecular expression profile, which needs to be evaluated in other experimental tolerance models to determine whether it is a universal signature of tolerance.
Collapse
Affiliation(s)
- Paul Perco
- Department of Nephrology, Medical University of Vienna, Austria
| | | | | | | | | | | | | |
Collapse
|