51
|
Bian ZR, Yin J, Sun W, Lin DJ. Microarray and network-based identification of functional modules and pathways of active tuberculosis. Microb Pathog 2017; 105:68-73. [PMID: 28189733 DOI: 10.1016/j.micpath.2017.02.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 02/07/2017] [Accepted: 02/07/2017] [Indexed: 02/02/2023]
Abstract
Diagnose of active tuberculosis (TB) is challenging and treatment response is also difficult to efficiently monitor. The aim of this study was to use an integrated analysis of microarray and network-based method to the samples from publically available datasets to obtain a diagnostic module set and pathways in active TB. Towards this goal, background protein-protein interactions (PPI) network was generated based on global PPI information and gene expression data, following by identification of differential expression network (DEN) from the background PPI network. Then, ego genes were extracted according to the degree features in DEN. Next, module collection was conducted by ego gene expansion based on EgoNet algorithm. After that, differential expression of modules between active TB and controls was evaluated using random permutation test. Finally, biological significance of differential modules was detected by pathways enrichment analysis based on Reactome database, and Fisher's exact test was implemented to extract differential pathways for active TB. Totally, 47 ego genes and 47 candidate modules were identified from the DEN. By setting the cutoff-criteria of gene size >5 and classification accuracy ≥0.9, 7 ego modules (Module 4, Module 7, Module 9, Module 19, Module 25, Module 38 and Module 43) were extracted, and all of them had the statistical significance between active TB and controls. Then, Fisher's exact test was conducted to capture differential pathways for active TB. Interestingly, genes in Module 4, Module 25, Module 38, and Module 43 were enriched in the same pathway, formation of a pool of free 40S subunits. Significant pathway for Module 7 and Module 9 was eukaryotic translation termination, and for Module 19 was nonsense mediated decay enhanced by the exon junction complex (EJC). Accordingly, differential modules and pathways might be potential biomarkers for treating active TB, and provide valuable clues for better understanding of molecular mechanism of active TB.
Collapse
Affiliation(s)
- Zhong-Rui Bian
- Department of Cardiology, The Second Hospital of Shandong University, Jinan 250033, Shandong Province, China
| | - Juan Yin
- Beijing Spirallink Medical Research Institute, Beijing 100054, China
| | - Wen Sun
- Beijing Spirallink Medical Research Institute, Beijing 100054, China
| | - Dian-Jie Lin
- Department of Respiratory Medicine, Shandong Provincial Hospital, Jinan 250021, Shandong Province, China.
| |
Collapse
|
52
|
Wong KC, Peng C, Li Y. Evolving Transcription Factor Binding Site Models From Protein Binding Microarray Data. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:415-424. [PMID: 26887021 DOI: 10.1109/tcyb.2016.2519380] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner. In this paper, we describe the PBM motif model building problem. We apply several evolutionary computation methods and compare their performance with the interior point method, demonstrating their performance advantages. In addition, given the PBM domain knowledge, we propose and describe a novel method called kmerGA which makes domain-specific assumptions to exploit PBM data properties to build more accurate models than the other models built. The effectiveness and robustness of kmerGA is supported by comprehensive performance benchmarking on more than 200 datasets, time complexity analysis, convergence analysis, parameter analysis, and case studies. To demonstrate its utility further, kmerGA is applied to two real world applications: 1) PBM rotation testing and 2) ChIP-Seq peak sequence prediction. The results support the biological relevance of the models learned by kmerGA, and thus its real world applicability.
Collapse
|
53
|
Triska M, Ivliev A, Nikolsky Y, Tatarinova TV. Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. Methods Mol Biol 2017; 1613:291-310. [PMID: 28849565 DOI: 10.1007/978-1-4939-7027-8_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.
Collapse
Affiliation(s)
- Martin Triska
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | | | - Yuri Nikolsky
- Prosapia Genetics, Solana Beach, CA, USA.,School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Tatiana V Tatarinova
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA. .,Center for Personalized Medicine, Children's Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA, 90027, USA. .,A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia.
| |
Collapse
|
54
|
Abstract
BACKGROUND Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. RESULTS The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. CONCLUSIONS The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
Collapse
Affiliation(s)
- Josep Basha Gutierrez
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 277-8561 Chiba, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 277-8561 Chiba, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, 108-8639 Tokyo, Japan
| |
Collapse
|
55
|
Zinc Cluster Transcription Factors Alter Virulence in Candida albicans. Genetics 2016; 205:559-576. [PMID: 27932543 DOI: 10.1534/genetics.116.195024] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/16/2016] [Indexed: 11/18/2022] Open
Abstract
Almost all humans are colonized with Candida albicans However, in immunocompromised individuals, this benign commensal organism becomes a serious, life-threatening pathogen. Here, we describe and analyze the regulatory networks that modulate innate responses in the host niches. We identified Zcf15 and Zcf29, two Zinc Cluster transcription Factors (ZCF) that are required for C. albicans virulence. Previous sequence analysis of clinical C. albicans isolates from immunocompromised patients indicates that both ZCF genes diverged during clonal evolution. Using in vivo animal models, ex vivo cell culture methods, and in vitro sensitivity assays, we demonstrate that knockout mutants of both ZCF15 and ZCF29 are hypersensitive to reactive oxygen species (ROS), suggesting they help neutralize the host-derived ROS produced by phagocytes, as well as establish a sustained infection in vivo Transcriptomic analysis of mutants under resting conditions where cells were not experiencing oxidative stress revealed a large network that control macro and micronutrient homeostasis, which likely contributes to overall pathogen fitness in host niches. Under oxidative stress, both transcription factors regulate a separate set of genes involved in detoxification of ROS and down-regulating ribosome biogenesis. ChIP-seq analysis, which reveals vastly different binding partners for each transcription factor (TF) before and after oxidative stress, further confirms these results. Furthermore, the absence of a dominant binding motif likely facilitates their mobility, and supports the notion that they represent a recent expansion of the ZCF family in the pathogenic Candida species. Our analyses provide a framework for understanding new aspects of the interface between C. albicans and host defense response, and extends our understanding of how complex cell behaviors are linked to the evolution of TFs.
Collapse
|
56
|
Austin RS, Hiu S, Waese J, Ierullo M, Pasha A, Wang TT, Fan J, Foong C, Breit R, Desveaux D, Moses A, Provart NJ. New BAR tools for mining expression data and exploring Cis-elements in Arabidopsis thaliana. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016; 88:490-504. [PMID: 27401965 DOI: 10.1111/tpj.13261] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/23/2016] [Accepted: 07/01/2016] [Indexed: 05/21/2023]
Abstract
Identifying sets of genes that are specifically expressed in certain tissues or in response to an environmental stimulus is useful for designing reporter constructs, generating gene expression markers, or for understanding gene regulatory networks. We have developed an easy-to-use online tool for defining a desired expression profile (a modification of our Expression Angler program), which can then be used to identify genes exhibiting patterns of expression that match this profile as closely as possible. Further, we have developed another online tool, Cistome, for predicting or exploring cis-elements in the promoters of sets of co-expressed genes identified by such a method, or by other methods. We present two use cases for these tools, which are freely available on the Bio-Analytic Resource at http://BAR.utoronto.ca.
Collapse
Affiliation(s)
- Ryan S Austin
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Shu Hiu
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Jamie Waese
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Matthew Ierullo
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Asher Pasha
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Ting Ting Wang
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Jim Fan
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Curtis Foong
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Robert Breit
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Darrell Desveaux
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Alan Moses
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| | - Nicholas J Provart
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, M5S 3B2, Canada
| |
Collapse
|
57
|
Nikolaichik Y, Damienikan AU. SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals. PeerJ 2016; 4:e2056. [PMID: 27257541 PMCID: PMC4888284 DOI: 10.7717/peerj.2056] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/29/2016] [Indexed: 02/02/2023] Open
Abstract
The majority of bacterial genome annotations are currently automated and based on a 'gene by gene' approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn't fit with regulatory information allowed us to correct product and gene names for over 300 loci.
Collapse
Affiliation(s)
- Yevgeny Nikolaichik
- Department of Molecular Biology, Belarusian State University, Minsk, Belarus
| | | |
Collapse
|
58
|
Hwang SG, Kim DS, Kim JB, Hwang JE, Park HM, Kim JH, Jang CS. Transcriptome analysis of reproductive-stage Arabidopsis plants exposed gamma-ray irradiation at various doses. Int J Radiat Biol 2016; 92:451-65. [PMID: 27151538 DOI: 10.1080/09553002.2016.1178865] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Sun-Goo Hwang
- Plant Genomics Laboratory, Department of Applied Plant Sciences, Kangwon National University, Chuncheon, Korea
| | - Dong Sub Kim
- NJ Solar Plant Group, NJ Biopia Co., Gwangju, South Korea
| | - Jin-Baek Kim
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, Jeonbuk, South Korea
| | - Jung Eun Hwang
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, Jeonbuk, South Korea
| | - Hyun Mi Park
- Plant Genomics Laboratory, Department of Applied Plant Sciences, Kangwon National University, Chuncheon, Korea
| | - Jin Hyuk Kim
- Plant Genomics Laboratory, Department of Applied Plant Sciences, Kangwon National University, Chuncheon, Korea
| | - Cheol Seong Jang
- Plant Genomics Laboratory, Department of Applied Plant Sciences, Kangwon National University, Chuncheon, Korea
| |
Collapse
|
59
|
Ness JK, Skiles AA, Yap EH, Fajardo EJ, Fiser A, Tapinos N. Nuc-ErbB3 regulates H3K27me3 levels and HMT activity to establish epigenetic repression during peripheral myelination. Glia 2016; 64:977-92. [PMID: 27017927 PMCID: PMC5021170 DOI: 10.1002/glia.22977] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 02/01/2016] [Indexed: 12/04/2022]
Abstract
Nuc‐ErbB3 an alternative transcript from the ErbB3 locus binds to a specific DNA motif and associates with Schwann cell chromatin. Here we generated a nuc‐ErbB3 knockin mouse that lacks nuc‐ErbB3 expression in the nucleus without affecting the neuregulin‐ErbB3 receptor signaling. Nuc‐ErbB3 knockin mice exhibit hypermyelination and aberrant myelination at the paranodal region. This phenotype is attributed to de‐repression of myelination associated gene transcription following loss of nuc‐ErbB3 and histone H3K27me3 promoter occupancy. Nuc‐ErbB3 knockin mice exhibit reduced association of H3K27me3 with myelination‐associated gene promoters and increased RNA Pol‐II rate of transcription of these genes. In addition, nuc‐ErbB3 directly regulates levels of H3K27me3 in Schwann cells. Nuc‐ErbB3 knockin mice exhibit significant decrease of histone H3K27me3 methyltransferase (HMT) activity and reduced levels of H3K27me3. Collectively, nuc‐ErbB3 is a master transcriptional repressor, which regulates HMT activity to establish a repressive chromatin landscape on promoters of genes during peripheral myelination. GLIA 2016;64:977–992 Nuc‐ErbB3 knock‐in mice exhibit peripheral hypermyelination. Nuc‐ErbB3 regulates total levels of H3K27me3 and HMT activity. Nuc‐ErbB3 induces transcriptional repression of myelination associated genes.
Collapse
Affiliation(s)
- Jennifer K Ness
- Molecular Neuroscience and Neurooncology Laboratory, Geisinger Clinic, Danville, Pennsylvania
| | - Amanda A Skiles
- Molecular Neuroscience and Neurooncology Laboratory, Geisinger Clinic, Danville, Pennsylvania
| | - Eng-Hui Yap
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York
| | - Eduardo J Fajardo
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York
| | - Nikos Tapinos
- Molecular Neuroscience and Neurooncology Laboratory, Geisinger Clinic, Danville, Pennsylvania
| |
Collapse
|
60
|
Sato MP, Makino T, Kawata M. Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions. BMC Evol Biol 2016; 16:35. [PMID: 26860869 PMCID: PMC4748610 DOI: 10.1186/s12862-016-0606-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 01/29/2016] [Indexed: 11/29/2022] Open
Abstract
Background Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Results Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning, behavioral plasticity, and memory. Diversity of neural and behavioral traits may have been maintained by balancing selection. Conclusions Our results revealed the evolutionary process occurring by natural selection for differences in gene expression levels caused by sequence variation in core promoter regions in a natural population. The sequences of core promoter regions were diverse even within the population, possibly providing a source for natural selection. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0606-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mitsuhiko P Sato
- Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku, Sendai, 980-8578, Japan.
| | - Takashi Makino
- Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku, Sendai, 980-8578, Japan.
| | - Masakado Kawata
- Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku, Sendai, 980-8578, Japan.
| |
Collapse
|
61
|
Deb A, Grewal RK, Kundu S. Regulatory Cross-Talks and Cascades in Rice Hormone Biosynthesis Pathways Contribute to Stress Signaling. FRONTIERS IN PLANT SCIENCE 2016; 7:1303. [PMID: 27617021 PMCID: PMC4999436 DOI: 10.3389/fpls.2016.01303] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 08/15/2016] [Indexed: 05/18/2023]
Abstract
Crosstalk among different hormone signaling pathways play an important role in modulating plant response to both biotic and abiotic stress. Hormone activity is controlled by its bio-availability, which is again influenced by its biosynthesis. Thus, independent hormone biosynthesis pathways must be regulated and co-ordinated to mount an integrated response. One of the possibilities is to use cis-regulatory elements to orchestrate expression of hormone biosynthesis genes. Analysis of CREs, associated with differentially expressed hormone biosynthesis related genes in rice leaf under Magnaporthe oryzae attack and drought stress enabled us to obtain insights about cross-talk among hormone biosynthesis pathways at the transcriptional level. We identified some master transcription regulators that co-ordinate different hormone biosynthesis pathways under stress. We found that Abscisic acid and Brassinosteroid regulate Cytokinin conjugation; conversely Brassinosteroid biosynthesis is affected by both Abscisic acid and Cytokinin. Jasmonic acid and Ethylene biosynthesis may be modulated by Abscisic acid through DREB transcription factors. Jasmonic acid or Salicylic acid biosynthesis pathways are co-regulated but they are unlikely to influence each others production directly. Thus, multiple hormones may modulate hormone biosynthesis pathways through a complex regulatory network, where biosynthesis of one hormone is affected by several other contributing hormones.
Collapse
Affiliation(s)
- Arindam Deb
- Department of Biophysics, Molecular Biology and Bioinformatics, University of CalcuttaKolkata, India
| | - Rumdeep K. Grewal
- Department of Biophysics, Molecular Biology and Bioinformatics, University of CalcuttaKolkata, India
- Computational Systems Biology Group, Center of Excellence in Systems Biology and Biomedical Engineering, University of CalcuttaKolkata, India
| | - Sudip Kundu
- Department of Biophysics, Molecular Biology and Bioinformatics, University of CalcuttaKolkata, India
- Computational Systems Biology Group, Center of Excellence in Systems Biology and Biomedical Engineering, University of CalcuttaKolkata, India
- *Correspondence: Sudip Kundu
| |
Collapse
|
62
|
Koramutla MK, Bhatt D, Negi M, Venkatachalam P, Jain PK, Bhattacharya R. Strength, Stability, and cis-Motifs of In silico Identified Phloem-Specific Promoters in Brassica juncea (L.). FRONTIERS IN PLANT SCIENCE 2016; 7:457. [PMID: 27148290 PMCID: PMC4834444 DOI: 10.3389/fpls.2016.00457] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2015] [Accepted: 03/24/2016] [Indexed: 05/03/2023]
Abstract
Aphids, a hemipteran group of insects pose a serious threat to many of the major crop species including Brassica oilseeds. Transgenic strategies for developing aphid-resistant plant types necessitate phloem-bound expression of the insecticidal genes. A few known phloem-specific promoters, in spite of tissue-specific activity fail to confer high level gene-expression. Here, we identified seven orthologues of phloem-specific promoters in B. juncea (Indian mustard), and experimentally validated their strength of expression in phloem exudates. Significant cis-motifs, globally occurring in phloem-specific promoters showed variable distribution frequencies in these putative phloem-specific promoters of B. juncea. In RT-qPCR based gene-expression study promoter of Glutamine synthetase 3A (GS3A) showed multifold higher activity compared to others, across the different growth stages of B. juncea plants. A statistical method employing four softwares was devised for rapidly analysing stability of the promoter-activities across the plant developmental stages. Different statistical softwares ranked these B. juncea promoters differently in terms of their stability in promoter-activity. Nevertheless, the consensus in output empirically suggested consistency in promoter-activity of the six B. juncea phloem- specific promoters including GS3A. The study identified suitable endogenous promoters for high level and consistent gene-expression in B. juncea phloem exudate. The study also demonstrated a rapid method of assessing species-specific strength and stability in expression of the endogenous promoters.
Collapse
Affiliation(s)
- Murali Krishna Koramutla
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute CampusNew Delhi, India
| | - Deepa Bhatt
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute CampusNew Delhi, India
| | - Manisha Negi
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute CampusNew Delhi, India
| | | | - Pradeep K. Jain
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute CampusNew Delhi, India
| | - Ramcharan Bhattacharya
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute CampusNew Delhi, India
- *Correspondence: Ramcharan Bhattacharya ;
| |
Collapse
|
63
|
Nettling M, Treutler H, Grau J, Keilwagen J, Posch S, Grosse I. DiffLogo: a comparative visualization of sequence motifs. BMC Bioinformatics 2015; 16:387. [PMID: 26577052 PMCID: PMC4650857 DOI: 10.1186/s12859-015-0767-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 10/08/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND For three decades, sequence logos are the de facto standard for the visualization of sequence motifs in biology and bioinformatics. Reasons for this success story are their simplicity and clarity. The number of inferred and published motifs grows with the number of data sets and motif extraction algorithms. Hence, it becomes more and more important to perceive differences between motifs. However, motif differences are hard to detect from individual sequence logos in case of multiple motifs for one transcription factor, highly similar binding motifs of different transcription factors, or multiple motifs for one protein domain. RESULTS Here, we present DiffLogo, a freely available, extensible, and user-friendly R package for visualizing motif differences. DiffLogo is capable of showing differences between DNA motifs as well as protein motifs in a pair-wise manner resulting in publication-ready figures. In case of more than two motifs, DiffLogo is capable of visualizing pair-wise differences in a tabular form. Here, the motifs are ordered by similarity, and the difference logos are colored for clarity. We demonstrate the benefit of DiffLogo on CTCF motifs from different human cell lines, on E-box motifs of three basic helix-loop-helix transcription factors as examples for comparison of DNA motifs, and on F-box domains from three different families as example for comparison of protein motifs. CONCLUSIONS DiffLogo provides an intuitive visualization of motif differences. It enables the illustration and investigation of differences between highly similar motifs such as binding patterns of transcription factors for different cell types, treatments, and algorithmic approaches.
Collapse
Affiliation(s)
- Martin Nettling
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| | - Hendrik Treutler
- Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany.
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| | - Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Quedlinburg, Germany.
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
64
|
Gazestani VH, Salavati R. Deciphering RNA Regulatory Elements Involved in the Developmental and Environmental Gene Regulation of Trypanosoma brucei. PLoS One 2015; 10:e0142342. [PMID: 26529602 PMCID: PMC4631447 DOI: 10.1371/journal.pone.0142342] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 10/20/2015] [Indexed: 11/17/2022] Open
Abstract
Trypanosoma brucei is a vector-borne parasite with intricate life cycle that can cause serious diseases in humans and animals. This pathogen relies on fine regulation of gene expression to respond and adapt to variable environments, with implications in transmission and infectivity. However, the involved regulatory elements and their mechanisms of actions are largely unknown. Here, benefiting from a new graph-based approach for finding functional regulatory elements in RNA (GRAFFER), we have predicted 88 new RNA regulatory elements that are potentially involved in the gene regulatory network of T. brucei. We show that many of these newly predicted elements are responsive to both transcriptomic and proteomic changes during the life cycle of the parasite. Moreover, we found that 11 of predicted elements strikingly resemble previously identified regulatory elements for the parasite. Additionally, comparison with previously predicted motifs on T. brucei suggested the superior performance of our approach based on the current limited knowledge of regulatory elements in T. brucei.
Collapse
Affiliation(s)
- Vahid H Gazestani
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Montreal, Quebec, Canada
| | - Reza Salavati
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Montreal, Quebec, Canada; McGill Centre for Bioinformatics, McGill University, 3649 Promenade Sir William Osler, Montreal, Quebec, Canada; Department of Biochemistry, McGill University, McIntyre Medical Building, 3655 Promenade Sir William Osler, Montreal, Quebec, Canada
| |
Collapse
|
65
|
Li H, Li C, Hu J, Fan X. A Resampling Based Clustering Algorithm for Replicated Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1295-1303. [PMID: 26671802 DOI: 10.1109/tcbb.2015.2403320] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance.
Collapse
|
66
|
Karnik R, Beer MA. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space. PLoS One 2015; 10:e0140557. [PMID: 26465884 PMCID: PMC4605740 DOI: 10.1371/journal.pone.0140557] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 09/28/2015] [Indexed: 01/06/2023] Open
Abstract
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
Collapse
Affiliation(s)
- Rahul Karnik
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Michael A. Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail:
| |
Collapse
|
67
|
De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, Dhoedt B, Vandepoele K, Fostier J. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements. Bioinformatics 2015; 31:3758-66. [PMID: 26254488 PMCID: PMC4653392 DOI: 10.1093/bioinformatics/btv466] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 08/03/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. RESULTS We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. AVAILABILITY AND IMPLEMENTATION BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller CONTACT Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dieter De Witte
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Jan Van de Velde
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Dries Decap
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Pieter Audenaert
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Piet Demeester
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Bart Dhoedt
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Jan Fostier
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| |
Collapse
|
68
|
Hu P, Liu M, Zhang D, Wang J, Niu H, Liu Y, Wu Z, Han B, Zhai W, Shen Y, Chen L. Global identification of the genetic networks and cis-regulatory elements of the cold response in zebrafish. Nucleic Acids Res 2015; 43:9198-213. [PMID: 26227973 PMCID: PMC4627065 DOI: 10.1093/nar/gkv780] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2014] [Accepted: 07/20/2015] [Indexed: 12/17/2022] Open
Abstract
The transcriptional programs of ectothermic teleosts are directly influenced by water temperature. However, the cis- and trans-factors governing cold responses are not well characterized. We profiled transcriptional changes in eight zebrafish tissues exposed to mildly and severely cold temperatures using RNA-Seq. A total of 1943 differentially expressed genes (DEGs) were identified, from which 34 clusters representing distinct tissue and temperature response expression patterns were derived using the k-means fuzzy clustering algorithm. The promoter regions of the clustered DEGs that demonstrated strong co-regulation were analysed for enriched cis-regulatory elements with a motif discovery program, DREME. Seventeen motifs, ten known and seven novel, were identified, which covered 23% of the DEGs. Two motifs predicted to be the binding sites for the transcription factors Bcl6 and Jun, respectively, were chosen for experimental verification, and they demonstrated the expected cold-induced and cold-repressed patterns of gene regulation. Protein interaction modeling of the network components followed by experimental validation suggested that Jun physically interacts with Bcl6 and might be a hub factor that orchestrates the cold response in zebrafish. Thus, the methodology used and the regulatory networks uncovered in this study provide a foundation for exploring the mechanisms of cold adaptation in teleosts.
Collapse
Affiliation(s)
- Peng Hu
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Mingli Liu
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Dong Zhang
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Jinfeng Wang
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Hongbo Niu
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Yimeng Liu
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Zhichao Wu
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Bingshe Han
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Wanying Zhai
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| | - Yu Shen
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Liangbiao Chen
- Key Laboratory of Aquacultural Resources and Utilization, Ministry of Education, College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
69
|
Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R. Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone. Genome Biol Evol 2015; 7:1559-79. [PMID: 25977456 PMCID: PMC4494061 DOI: 10.1093/gbe/evv087] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
Collapse
Affiliation(s)
- Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee
| | - Wei-Chen Chen
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland
| | - Premal Shah
- Department of Biology, University of Pennsylvania
| | - Cedric L Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville
| | - Russell Zaretzki
- National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee Department of Business Analytics and Statistics, University of Tennessee, Knoxville
| |
Collapse
|
70
|
González-Álvarez DL, Vega-Rodríguez MA, Rubio-Largo Á. Finding Patterns in Protein Sequences by Using a Hybrid Multiobjective Teaching Learning Based Optimization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:656-666. [PMID: 26357276 DOI: 10.1109/tcbb.2014.2369043] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteins are molecules that form the mass of living beings. These proteins exist in dissociated forms like amino-acids and carry out various biological functions, in fact, almost all body reactions occur with the participation of proteins. This is one of the reasons why the analysis of proteins has become a major issue in biology. In a more concrete way, the identification of conserved patterns in a set of related protein sequences can provide relevant biological information about these protein functions. In this paper, we present a novel algorithm based on teaching learning based optimization (TLBO) combined with a local search function specialized to predict common patterns in sets of protein sequences. This population-based evolutionary algorithm defines a group of individuals (solutions) that enhance their knowledge (quality) by means of different learning stages. Thus, if we correctly adapt it to the biological context of the mentioned problem, we can get an acceptable set of quality solutions. To evaluate the performance of the proposed technique, we have used six instances composed of different related protein sequences obtained from the PROSITE database. As we will see, the designed approach makes good predictions and improves the quality of the solutions found by other well-known biological tools.
Collapse
|
71
|
Lihu A, Holban T. A review of ensemble methods for de novo motif discovery in ChIP-Seq data. Brief Bioinform 2015; 16:964-73. [DOI: 10.1093/bib/bbv022] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Indexed: 01/17/2023] Open
|
72
|
Ikebata H, Yoshida R. Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets. Bioinformatics 2015; 31:1561-8. [PMID: 25583120 PMCID: PMC4426842 DOI: 10.1093/bioinformatics/btv017] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 01/06/2015] [Indexed: 11/14/2022] Open
Abstract
Motivation: The motif discovery problem consists of finding recurring patterns of short strings in a set of nucleotide sequences. This classical problem is receiving renewed attention as most early motif discovery methods lack the ability to handle large data of recent genome-wide ChIP studies. New ChIP-tailored methods focus on reducing computation time and pay little regard to the accuracy of motif detection. Unlike such methods, our method focuses on increasing the detection accuracy while maintaining the computation efficiency at an acceptable level. The major advantage of our method is that it can mine diverse multiple motifs undetectable by current methods. Results: The repulsive parallel Markov chain Monte Carlo (RPMCMC) algorithm that we propose is a parallel version of the widely used Gibbs motif sampler. RPMCMC is run on parallel interacting motif samplers. A repulsive force is generated when different motifs produced by different samplers near each other. Thus, different samplers explore different motifs. In this way, we can detect much more diverse motifs than conventional methods can. Through application to 228 transcription factor ChIP-seq datasets of the ENCODE project, we show that the RPMCMC algorithm can find many reliable cofactor interacting motifs that existing methods are unable to discover. Availability and implementation: A C++ implementation of RPMCMC and discovered cofactor motifs for the 228 ENCODE ChIP-seq datasets are available from http://daweb.ism.ac.jp/yoshidalab/motif. Contact:ikebata.hisaki@ism.ac.jp, yoshidar@ism.ac.jp Supplementary information:Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Hisaki Ikebata
- Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan
| | - Ryo Yoshida
- Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Depar
| |
Collapse
|
73
|
Comparative analysis of regulatory information and circuits across distant species. Nature 2014; 512:453-6. [PMID: 25164757 PMCID: PMC4336544 DOI: 10.1038/nature13668] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 07/10/2014] [Indexed: 12/20/2022]
Abstract
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
Collapse
|
74
|
Bosio MC, Negri R, Dieci G. Promoter architectures in the yeast ribosomal expression program. Transcription 2014; 2:71-77. [PMID: 21468232 DOI: 10.4161/trns.2.2.14486] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2010] [Revised: 12/15/2010] [Accepted: 12/16/2010] [Indexed: 12/13/2022] Open
Abstract
Ribosome biogenesis begins with the orchestrated expression of hundreds of genes, including the three large classes of ribosomal protein, ribosome biogenesis and snoRNA genes. Current knowledge about the corresponding promoters suggests the existence of novel class-specific transcriptional strategies and crosstalk between telomere length and cell growth control.
Collapse
Affiliation(s)
- Maria Cristina Bosio
- Dipartimento di Biochimica e Biologia Molecolare; Università degli Studi di Parma; Parma
| | | | | |
Collapse
|
75
|
Shi M, Gao T, Ju L, Yao Y, Gao H. Effects of FlrBC on flagellar biosynthesis of Shewanella oneidensis. Mol Microbiol 2014; 93:1269-83. [PMID: 25074236 DOI: 10.1111/mmi.12731] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/24/2014] [Indexed: 11/27/2022]
Abstract
As a most conserved complex molecular machine made up of a large number of structural subunits, the flagellum is under tight regulation by hierarchical arrangements. Although variations in polar flagellar systems are found, most of them are restricted to multiple-copy components, such as flagellins and stators. Therefore, these features are regarded to be peripheral relative to the comprehensive conservation. In this study, however, we present evidence to show that the difference in highly conserved polar flagellar systems can be surprisingly profound, even at the heart of the classical regulatory hierarchy. In Gram-negative Shewanella oneidensis, two-component system FlrBC, whose counterpart is essential for flagellar biosynthesis and motility by directly controlling expression of class III genes in polarly flagellated bacteria such as Vibrio cholerae, is dispensable for the process. The system directly controls expression of the flaA gene, encoding a flagellin of weak motility. We further show that the ratio of two flagellins, FlaA and FlaB, determines motility of a flagellum. More strikingly, overproduction of FlrC results in a peritrichously multi-flagellated phenotype, and FlrC is likely to function as an activator in its unphosphorylated form for transcription of the flaA gene, contrasting the previously characterized counterpart.
Collapse
Affiliation(s)
- Miaomiao Shi
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China; Key Laboratory for Agro-Microbial Research and Utilization, Hangzhou, 310058, Zhejiang, China
| | | | | | | | | |
Collapse
|
76
|
Yu X, Gao H, Zheng X, Li C, Wang J. A computational method of predicting regulatory interactions in Arabidopsis based on gene expression data and sequence information. Comput Biol Chem 2014; 51:36-41. [DOI: 10.1016/j.compbiolchem.2014.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Revised: 04/16/2014] [Accepted: 04/27/2014] [Indexed: 10/25/2022]
|
77
|
iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014; 10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 647] [Impact Index Per Article: 58.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open
Abstract
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
Collapse
|
78
|
Identification of rice genes associated with cosmic-ray response via co-expression gene network analysis. Gene 2014; 541:82-91. [DOI: 10.1016/j.gene.2014.02.060] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Revised: 02/10/2014] [Accepted: 02/14/2014] [Indexed: 11/20/2022]
|
79
|
An improved systematic approach to predicting transcription factor target genes using support vector machine. PLoS One 2014; 9:e94519. [PMID: 24743548 PMCID: PMC3990533 DOI: 10.1371/journal.pone.0094519] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 03/17/2014] [Indexed: 11/21/2022] Open
Abstract
Biological prediction of transcription factor binding sites and their corresponding transcription factor target genes (TFTGs) makes great contribution to understanding the gene regulatory networks. However, these approaches are based on laborious and time-consuming biological experiments. Numerous computational approaches have shown great potential to circumvent laborious biological methods. However, the majority of these algorithms provide limited performances and fail to consider the structural property of the datasets. We proposed a refined systematic computational approach for predicting TFTGs. Based on previous work done on identifying auxin response factor target genes from Arabidopsis thaliana co-expression data, we adopted a novel reverse-complementary distance-sensitive n-gram profile algorithm. This algorithm converts each upstream sub-sequence into a high-dimensional vector data point and transforms the prediction task into a classification problem using support vector machine-based classifier. Our approach showed significant improvement compared to other computational methods based on the area under curve value of the receiver operating characteristic curve using 10-fold cross validation. In addition, in the light of the highly skewed structure of the dataset, we also evaluated other metrics and their associated curves, such as precision-recall curves and cost curves, which provided highly satisfactory results.
Collapse
|
80
|
Claussnitzer M, Dankel SN, Klocke B, Grallert H, Glunk V, Berulava T, Lee H, Oskolkov N, Fadista J, Ehlers K, Wahl S, Hoffmann C, Qian K, Rönn T, Riess H, Müller-Nurasyid M, Bretschneider N, Schroeder T, Skurk T, Horsthemke B, Spieler D, Klingenspor M, Seifert M, Kern MJ, Mejhert N, Dahlman I, Hansson O, Hauck SM, Blüher M, Arner P, Groop L, Illig T, Suhre K, Hsu YH, Mellgren G, Hauner H, Laumen H. Leveraging cross-species transcription factor binding site patterns: from diabetes risk loci to disease mechanisms. Cell 2014; 156:343-58. [PMID: 24439387 DOI: 10.1016/j.cell.2013.10.058] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2013] [Revised: 09/05/2013] [Accepted: 10/30/2013] [Indexed: 10/25/2022]
Abstract
Genome-wide association studies have revealed numerous risk loci associated with diverse diseases. However, identification of disease-causing variants within association loci remains a major challenge. Divergence in gene expression due to cis-regulatory variants in noncoding regions is central to disease susceptibility. We show that integrative computational analysis of phylogenetic conservation with a complexity assessment of co-occurring transcription factor binding sites (TFBS) can identify cis-regulatory variants and elucidate their mechanistic role in disease. Analysis of established type 2 diabetes risk loci revealed a striking clustering of distinct homeobox TFBS. We identified the PRRX1 homeobox factor as a repressor of PPARG2 expression in adipose cells and demonstrate its adverse effect on lipid metabolism and systemic insulin sensitivity, dependent on the rs4684847 risk allele that triggers PRRX1 binding. Thus, cross-species conservation analysis at the level of co-occurring TFBS provides a valuable contribution to the translation of genetic association signals to disease-related molecular mechanisms.
Collapse
Affiliation(s)
- Melina Claussnitzer
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany; Hebrew SeniorLife Institute for Aging Research, Harvard Medical School, Boston, MA 02131, USA.
| | - Simon N Dankel
- Department of Clinical Science, University of Bergen, 5021 Bergen, Norway; K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5021 Bergen, Norway; Hormone Laboratory, Haukeland University Hospital, 5021 Bergen, Norway
| | | | - Harald Grallert
- German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Viktoria Glunk
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany
| | - Tea Berulava
- Institut für Humangenetik, Universitätsklinikum Essen, Universität-Duisburg-Essen, 45147 Essen, Germany
| | - Heekyoung Lee
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany
| | - Nikolay Oskolkov
- Diabetes and Endocrinology Research Unit, Department of Clinical Sciences, Lund University, Malmö 20502, Sweden
| | - Joao Fadista
- Diabetes and Endocrinology Research Unit, Department of Clinical Sciences, Lund University, Malmö 20502, Sweden
| | - Kerstin Ehlers
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany
| | - Simone Wahl
- German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Christoph Hoffmann
- Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; Chair of Molecular Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany
| | - Kun Qian
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany
| | - Tina Rönn
- Diabetes and Endocrinology Research Unit, Department of Clinical Sciences, Lund University, Malmö 20502, Sweden
| | - Helene Riess
- Department of Internal Medicine II-Cardiology, University of Ulm Medical Center, 89081 Ulm, Germany; Institute of Epidemiology II, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764 Neuherberg, Germany; Department of Medicine I, University Hospital Grosshadern, Ludwig-Maximilians-Universität, 81377 Munich, Germany; Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig-Maximilians-Universität, 81377 Munich, Germany
| | | | - Timm Schroeder
- Research Unit Stem Cell Dynamics, Helmholtz Center Munich-German Research Center for Environmental Health GmbH, 85764 Neuherberg, Germany; Department of Biosystems Science and Engineering (D-BSSE), ETH Zurich, 4058 Basel, Switzerland
| | - Thomas Skurk
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; Else Kröner-Fresenius-Center for Nutritional Medicine, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Bernhard Horsthemke
- Institut für Humangenetik, Universitätsklinikum Essen, Universität-Duisburg-Essen, 45147 Essen, Germany
| | | | - Derek Spieler
- Institute of Human Genetics, Helmholtz Zentrum München, 85764 Neuherberg, German Research Center for Environmental Health, Germany; Department of Neurology, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Martin Klingenspor
- Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; Chair of Molecular Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany
| | | | - Michael J Kern
- Department of Regenerative Medicine and Cell Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Niklas Mejhert
- Department of Medicine, Karolinska Institutet, Center for Endocrinology and Metabolism, Karolinska University Hospital Huddinge, SE-141 86 Stockholm, Sweden
| | - Ingrid Dahlman
- Department of Medicine, Karolinska Institutet, Center for Endocrinology and Metabolism, Karolinska University Hospital Huddinge, SE-141 86 Stockholm, Sweden
| | - Ola Hansson
- Diabetes and Endocrinology Research Unit, Department of Clinical Sciences, Lund University, Malmö 20502, Sweden
| | - Stefanie M Hauck
- German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Research Unit Protein Science, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Matthias Blüher
- Department of Medicine, University of Leipzig, 04103 Leipzig, Germany
| | - Peter Arner
- Department of Medicine, Karolinska Institutet, Center for Endocrinology and Metabolism, Karolinska University Hospital Huddinge, SE-141 86 Stockholm, Sweden
| | - Leif Groop
- Diabetes and Endocrinology Research Unit, Department of Clinical Sciences, Lund University, Malmö 20502, Sweden
| | - Thomas Illig
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, 85764 Neuherberg, Germany; Hanover Unified Biobank, Hanover Medical School, 30625 Hanover, Germany
| | - Karsten Suhre
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Qatar Foundation, PO Box 24144, Doha, Qatar
| | - Yi-Hsiang Hsu
- Hebrew SeniorLife Institute for Aging Research, Harvard Medical School, Boston, MA 02131, USA; Molecular and Integrative Physiological Sciences, Harvard School of Public Health, Boston, MA 02115, USA
| | - Gunnar Mellgren
- Department of Clinical Science, University of Bergen, 5021 Bergen, Norway; K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, N-5021 Bergen, Norway; Hormone Laboratory, Haukeland University Hospital, 5021 Bergen, Norway
| | - Hans Hauner
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany; Else Kröner-Fresenius-Center for Nutritional Medicine, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Helmut Laumen
- Chair of Nutritional Medicine, Technische Universität München, Else Kröner-Fresenius-Center for Nutritional Medicine, 85350 Freising-Weihenstephan, Germany; Nutritional Medicine Unit, ZIEL-Research Center for Nutrition and Food Sciences, Technische Universität München, 85350 Freising-Weihenstephan, Germany; German Center for Diabetes Research (DZD), 85764 Neuherberg, Germany; Clinical Cooperation Group Nutrigenomics and Type 2 Diabetes, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany and Technische Universität München, 85350 Freising-Weihenstephan, Germany; Institute of Experimental Genetics, Helmholtz Zentrum München, Neuherberg 85764, Germany.
| |
Collapse
|
81
|
Liu G, Marras A, Nielsen J. The future of genome-scale modeling of yeast through integration of a transcriptional regulatory network. QUANTITATIVE BIOLOGY 2014. [DOI: 10.1007/s40484-014-0027-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
82
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
83
|
Tran NTL, Huang CH. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data. Biol Direct 2014; 9:4. [PMID: 24555784 PMCID: PMC4022013 DOI: 10.1186/1745-6150-9-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 01/08/2014] [Accepted: 02/11/2014] [Indexed: 12/24/2022] Open
Abstract
Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).
Collapse
Affiliation(s)
- Ngoc Tam L Tran
- Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, USA.
| | | |
Collapse
|
84
|
Jia C, Carson MB, Wang Y, Lin Y, Lu H. A new exhaustive method and strategy for finding motifs in ChIP-enriched regions. PLoS One 2014; 9:e86044. [PMID: 24475069 PMCID: PMC3901781 DOI: 10.1371/journal.pone.0086044] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Accepted: 12/04/2013] [Indexed: 12/22/2022] Open
Abstract
ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaustive search algorithms have limited application for the identification of short (l, d) motifs (l ≤ 10, d ≤ 2) contained in ChIP-enriched regions. In this work we have developed a more powerful exhaustive method (FMotif) for finding long (l, d) motifs in DNA sequences. In conjunction with our method, we have adopted a simple ChIP-enriched sampling strategy for finding these motifs in large-scale ChIP-enriched regions. Empirical studies on synthetic samples and applications using several ChIP data sets including 16 TF (transcription factor) ChIP-seq data sets and five TF ChIP-exo data sets have demonstrated that our proposed method is capable of finding these motifs with high efficiency and accuracy. The source code for FMotif is available at http://211.71.76.45/FMotif/.
Collapse
Affiliation(s)
- Caiyan Jia
- School of Computer and Information Technology & Beijing Key Lab of Traffic Data Analysis, Beijing Jiaotong University, Beijing, China
- Department of Bioengineering/Bioinformatics, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Matthew B. Carson
- Center for Healthcare Studies, Institute for Public Health and Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Yang Wang
- School of Computer and Information Technology & Beijing Key Lab of Traffic Data Analysis, Beijing Jiaotong University, Beijing, China
| | - Youfang Lin
- School of Computer and Information Technology & Beijing Key Lab of Traffic Data Analysis, Beijing Jiaotong University, Beijing, China
| | - Hui Lu
- Department of Bioengineering/Bioinformatics, University of Illinois at Chicago, Chicago, Illinois, United States of America
- Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai JiaoTong University, Shanghai, China
| |
Collapse
|
85
|
Wollaston-Hayden EE, Harris RBS, Liu B, Bridger R, Xu Y, Wells L. Global O-GlcNAc Levels Modulate Transcription of the Adipocyte Secretome during Chronic Insulin Resistance. Front Endocrinol (Lausanne) 2014; 5:223. [PMID: 25657638 PMCID: PMC4302944 DOI: 10.3389/fendo.2014.00223] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2014] [Accepted: 12/05/2014] [Indexed: 01/06/2023] Open
Abstract
Increased flux through the hexosamine biosynthetic pathway and the corresponding increase in intracellular glycosylation of proteins via O-linked β-N-acetylglucosamine (O-GlcNAc) is sufficient to induce insulin resistance (IR) in multiple systems. Previously, our group used shotgun proteomics to identify multiple rodent adipocytokines and secreted proteins whose levels are modulated upon the induction of IR by indirectly and directly modulating O-GlcNAc levels. We have validated the relative levels of several of these factors using immunoblotting. Since adipocytokines levels are regulated primarily at the level of transcription and O-GlcNAc alters the function of many transcription factors, we hypothesized that elevated O-GlcNAc levels on key transcription factors are modulating secreted protein expression. Here, we show that upon the elevation of O-GlcNAc levels and the induction of IR in mature 3T3-F442a adipocytes, the transcript levels of multiple secreted proteins reflect the modulation observed at the protein level. We validate the transcript levels in male mouse models of diabetes. Using inguinal fat pads from the severely IR db/db mouse model and the mildly IR diet-induced mouse model, we have confirmed that the secreted proteins regulated by O-GlcNAc modulation in cell culture are likewise modulated in the whole animal upon a shift to IR. By comparing the promoters of similarly regulated genes, we determine that Sp1 is a common cis-acting element. Furthermore, we show that the LPL and SPARC promoters are enriched for Sp1 and O-GlcNAc modified proteins during insulin resistance in adipocytes. Thus, the O-GlcNAc modification of proteins bound to promoters, including Sp1, is linked to adipocytokine transcription during insulin resistance.
Collapse
Affiliation(s)
- Edith E. Wollaston-Hayden
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Ruth B. S. Harris
- Department of Physiology, Georgia Health Sciences University, Augusta, GA, USA
| | - Bingqiang Liu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Robert Bridger
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Ying Xu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Lance Wells
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
- *Correspondence: Lance Wells, Department of Biochemistry and Molecular Biology, Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Road, Athens, GA 30602, USA e-mail:
| |
Collapse
|
86
|
Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 2013; 42:2976-87. [PMID: 24335146 PMCID: PMC3950668 DOI: 10.1093/nar/gkt1249] [Citation(s) in RCA: 327] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Recent advances in technology have led to a dramatic increase in the number of available transcription factor ChIP-seq and ChIP-chip data sets. Understanding the motif content of these data sets is an important step in understanding the underlying mechanisms of regulation. Here we provide a systematic motif analysis for 427 human ChIP-seq data sets using motifs curated from the literature and also discovered de novo using five established motif discovery tools. We use a systematic pipeline for calculating motif enrichment in each data set, providing a principled way for choosing between motif variants found in the literature and for flagging potentially problematic data sets. Our analysis confirms the known specificity of 41 of the 56 analyzed factor groups and reveals motifs of potential cofactors. We also use cell type-specific binding to find factors active in specific conditions. The resource we provide is accessible both for browsing a small number of factors and for performing large-scale systematic analyses. We provide motif matrices, instances and enrichments in each of the ENCODE data sets. The motifs discovered here have been used in parallel studies to validate the specificity of antibodies, understand cooperativity between data sets and measure the variation of motif binding across individuals and species.
Collapse
Affiliation(s)
- Pouya Kheradpour
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA and Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02139, USA
| | | |
Collapse
|
87
|
Zeigler RD, Cohen BA. Discrimination between thermodynamic models of cis-regulation using transcription factor occupancy data. Nucleic Acids Res 2013; 42:2224-34. [PMID: 24288374 PMCID: PMC3936720 DOI: 10.1093/nar/gkt1230] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many studies have identified binding preferences for transcription factors (TFs), but few have yielded predictive models of how combinations of transcription factor binding sites generate specific levels of gene expression. Synthetic promoters have emerged as powerful tools for generating quantitative data to parameterize models of combinatorial cis-regulation. We sought to improve the accuracy of such models by quantifying the occupancy of TFs on synthetic promoters in vivo and incorporating these data into statistical thermodynamic models of cis-regulation. Using chromatin immunoprecipitation-seq, we measured the occupancy of Gcn4 and Cbf1 in synthetic promoter libraries composed of binding sites for Gcn4, Cbf1, Met31/Met32 and Nrg1. We measured the occupancy of these two TFs and the expression levels of all promoters in two growth conditions. Models parameterized using only expression data predicted expression but failed to identify several interactions between TFs. In contrast, models parameterized with occupancy and expression data predicted expression data, and also revealed Gcn4 self-cooperativity and a negative interaction between Gcn4 and Nrg1. Occupancy data also allowed us to distinguish between competing regulatory mechanisms for the factor Gcn4. Our framework for combining occupancy and expression data produces predictive models that better reflect the mechanisms underlying combinatorial cis-regulation of gene expression.
Collapse
Affiliation(s)
- Robert D Zeigler
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, MO 63108, USA
| | | |
Collapse
|
88
|
Keren L, Zackay O, Lotan-Pompan M, Barenholz U, Dekel E, Sasson V, Aidelberg G, Bren A, Zeevi D, Weinberger A, Alon U, Milo R, Segal E. Promoters maintain their relative activity levels under different growth conditions. Mol Syst Biol 2013; 9:701. [PMID: 24169404 PMCID: PMC3817408 DOI: 10.1038/msb.2013.59] [Citation(s) in RCA: 141] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Accepted: 09/27/2013] [Indexed: 12/20/2022] Open
Abstract
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of ~900 S. cerevisiae and ~1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60-90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulation-promoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
Collapse
Affiliation(s)
- Leeat Keren
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Ora Zackay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Maya Lotan-Pompan
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Uri Barenholz
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Erez Dekel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Vered Sasson
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Guy Aidelberg
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Anat Bren
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Danny Zeevi
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Uri Alon
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Milo
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
89
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
90
|
López Y, Patil A, Nakai K. Identification of novel motif patterns to decipher the promoter architecture of co-expressed genes in Arabidopsis thaliana. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 3:S10. [PMID: 24555803 PMCID: PMC3852273 DOI: 10.1186/1752-0509-7-s3-s10] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background The understanding of the mechanisms of transcriptional regulation remains a challenge for molecular biologists in the post-genome era. It is hypothesized that the regulatory regions of genes expressed in the same tissue or cell type share a similar structure. Though several studies have analyzed the promoters of genes expressed in specific metazoan tissues or cells, little research has been done in plants. Hence finding specific patterns of motifs to explain the promoter architecture of co-expressed genes in plants could shed light on their transcription mechanism. Results We identified novel patterns of sets of motifs in promoters of genes co-expressed in four different plant structures (PSs) and in the entire plant in Arabidopsis thaliana. Sets of genes expressed in four PSs (flower, seed, root, shoot) and housekeeping genes expressed in the entire plant were taken from a database of co-expressed genes in A. thaliana. PS-specific motifs were predicted using three motif-discovery algorithms, 8 of which are novel, to the best of our knowledge. A support vector machine was trained using the average upstream distance of the identified motifs from the translation start site on both strands of binding sites. The correctly classified promoters per PS were used to construct specific patterns of sets of motifs to describe the promoter architecture of those co-expressed genes. The discovered PS-specific patterns were tested in the entire A. thaliana genome, correctly identifying 77.8%, 81.2%, 70.8% and 53.7% genes expressed in petal differentiation, synergid cells, root hair and trichome, as well as 88.4% housekeeping genes. Conclusions We present five patterns of sets of motifs which describe the promoter architecture of co-expressed genes in five PSs with the ability to predict them from the entire A. thaliana genome. Based on these findings, we conclude that the positioning and orientation of transcription factor binding sites at specific distances from the translation start site is a reliable measure to differentiate promoters of genes expressed in different A. thaliana structures from background genomic promoters. Our method can be used to predict novel motifs and decipher a similar promoter architecture for genes co-expressed in A. thaliana under different conditions.
Collapse
|
91
|
Vorontsov IE, Kulakovskiy IV, Makeev VJ. Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol Biol 2013; 8:23. [PMID: 24074225 PMCID: PMC3851813 DOI: 10.1186/1748-7188-8-23] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Collapse
|
92
|
Kakei Y, Ogo Y, Itai RN, Kobayashi T, Yamakawa T, Nakanishi H, Nishizawa NK. Development of a novel prediction method of cis-elements to hypothesize collaborative functions of cis-element pairs in iron-deficient rice. RICE (NEW YORK, N.Y.) 2013; 6:22. [PMID: 24279975 PMCID: PMC4883709 DOI: 10.1186/1939-8433-6-22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 09/13/2013] [Indexed: 05/20/2023]
Abstract
BACKGROUND Cis-acting elements are essential genomic sequences that control gene expression. In higher eukaryotes, a series of cis-elements function cooperatively. However, further studies are required to examine the co-regulation of multiple cis-elements on a promoter. The aim of this study was to propose a model of cis-element networks that cooperatively regulate gene expression in rice under iron (Fe) deficiency. RESULTS We developed a novel clustering-free method, microarray-associated motif analyzer (MAMA), to predict novel cis-acting elements based on weighted sequence similarities and gene expression profiles in microarray analyses. Simulation of gene expression was performed using a support vector machine and based on the presence of predicted motifs and motif pairs. The accuracy of simulated gene expression was used to evaluate the quality of prediction and to optimize the parameters used in this method. Based on sequences of Oryza sativa genes upregulated by Fe deficiency, MAMA returned experimentally identified cis-elements responsible for Fe deficiency in O. sativa. When this method was applied to O. sativa subjected to zinc deficiency and Arabidopsis thaliana subjected to salt stress, several novel candidate cis-acting elements that overlap with known cis-acting elements, such as ZDRE, ABRE, and DRE, were identified. After optimization, MAMA accurately simulated more than 87% of gene expression. Predicted motifs strongly co-localized in the upstream regions of regulated genes and sequences around transcription start sites. Furthermore, in many cases, the separation (in bp) between co-localized motifs was conserved, suggesting that predicted motifs and the separation between them were important in the co-regulation of gene expression. CONCLUSIONS Our results are suggestive of a typical sequence model for Fe deficiency-responsive promoters and some strong candidate cis-elements that function cooperatively with known cis-elements.
Collapse
Affiliation(s)
- Yusuke Kakei
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Plant Biotechnology Division, Yokohama City University, Kihara Institute for Biological Research Maiokacho, 641-12, Totsuka, Yokohama, Kanagawa 244-0813 Japan
| | - Yuko Ogo
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Functional Transgenic Crops Research Unit, Genetically Modified Organism Research Center National Institute of Agrobiological Sciences, Kannondai 2-1-2, 305-8602 Tsukuba, Ibaraki Japan
| | - Reiko N Itai
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Takanori Kobayashi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| | - Takashi Yamakawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Hiromi Nakanishi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Naoko K Nishizawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| |
Collapse
|
93
|
Soares MPM, Barchuk AR, Simões ACQ, Dos Santos Cristino A, de Paula Freitas FC, Canhos LL, Bitondi MMG. Genes involved in thoracic exoskeleton formation during the pupal-to-adult molt in a social insect model, Apis mellifera. BMC Genomics 2013; 14:576. [PMID: 23981317 PMCID: PMC3766229 DOI: 10.1186/1471-2164-14-576] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 08/23/2013] [Indexed: 12/04/2022] Open
Abstract
Background The insect exoskeleton provides shape, waterproofing, and locomotion via attached somatic muscles. The exoskeleton is renewed during molting, a process regulated by ecdysteroid hormones. The holometabolous pupa transforms into an adult during the imaginal molt, when the epidermis synthe3sizes the definitive exoskeleton that then differentiates progressively. An important issue in insect development concerns how the exoskeletal regions are constructed to provide their morphological, physiological and mechanical functions. We used whole-genome oligonucleotide microarrays to screen for genes involved in exoskeletal formation in the honeybee thoracic dorsum. Our analysis included three sampling times during the pupal-to-adult molt, i.e., before, during and after the ecdysteroid-induced apolysis that triggers synthesis of the adult exoskeleton. Results Gene ontology annotation based on orthologous relationships with Drosophila melanogaster genes placed the honeybee differentially expressed genes (DEGs) into distinct categories of Biological Process and Molecular Function, depending on developmental time, revealing the functional elements required for adult exoskeleton formation. Of the 1,253 unique DEGs, 547 were upregulated in the thoracic dorsum after apolysis, suggesting induction by the ecdysteroid pulse. The upregulated gene set included 20 of the 47 cuticular protein (CP) genes that were previously identified in the honeybee genome, and three novel putative CP genes that do not belong to a known CP family. In situ hybridization showed that two of the novel genes were abundantly expressed in the epidermis during adult exoskeleton formation, strongly implicating them as genuine CP genes. Conserved sequence motifs identified the CP genes as members of the CPR, Tweedle, Apidermin, CPF, CPLCP1 and Analogous-to-Peritrophins families. Furthermore, 28 of the 36 muscle-related DEGs were upregulated during the de novo formation of striated fibers attached to the exoskeleton. A search for cis-regulatory motifs in the 5′-untranslated region of the DEGs revealed potential binding sites for known transcription factors. Construction of a regulatory network showed that various upregulated CP- and muscle-related genes (15 and 21 genes, respectively) share common elements, suggesting co-regulation during thoracic exoskeleton formation. Conclusions These findings help reveal molecular aspects of rigid thoracic exoskeleton formation during the ecdysteroid-coordinated pupal-to-adult molt in the honeybee.
Collapse
Affiliation(s)
- Michelle Prioli Miranda Soares
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brasil.
| | | | | | | | | | | | | |
Collapse
|
94
|
Jia C, Carson MB, Yu J. A fast weak motif-finding algorithm based on community detection in graphs. BMC Bioinformatics 2013; 14:227. [PMID: 23865838 PMCID: PMC3726413 DOI: 10.1186/1471-2105-14-227] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Accepted: 07/12/2013] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Identification of transcription factor binding sites (also called 'motif discovery') in DNA sequences is a basic step in understanding genetic regulation. Although many successful programs have been developed, the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites. State-of-the-art algorithms have their own constraints (e.g., high time or space complexity for finding long motifs, low precision in identification of weak motifs, or the OOPS constraint: one occurrence of the motif instance per sequence) which limit their scope of application. RESULTS In this paper, we present a novel and fast algorithm we call TFBSGroup. It is based on community detection from a graph and is used to discover long and weak (l,d) motifs under the ZOMOPS constraint (zero, one or multiple occurrence(s) of the motif instance(s) per sequence), where l is the length of a motif and d is the maximum number of mutations between a motif instance and the motif itself. Firstly, TFBSGroup transforms the (l, d) motif search in sequences to focus on the discovery of dense subgraphs within a graph. It identifies these subgraphs using a fast community detection method for obtaining coarse-grained candidate motifs. Next, it greedily refines these candidate motifs towards the true motif within their own communities. Empirical studies on synthetic (l, d) samples have shown that TFBSGroup is very efficient (e.g., it can find true (18, 6), (24, 8) motifs within 30 seconds). More importantly, the algorithm has succeeded in rapidly identifying motifs in a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB. The algorithm has also accurately identified motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal. CONCLUSIONS Our novel heuristic algorithm, TFBSGroup, is able to quickly identify nearly exact matches for long and weak (l, d) motifs in DNA sequences under the ZOMOPS constraint. It is also capable of finding motifs in real applications. The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/TFBSGroup/.
Collapse
Affiliation(s)
- Caiyan Jia
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | | | | |
Collapse
|
95
|
Wong KC, Chan TM, Peng C, Li Y, Zhang Z. DNA motif elucidation using belief propagation. Nucleic Acids Res 2013; 41:e153. [PMID: 23814189 PMCID: PMC3763557 DOI: 10.1093/nar/gkt574] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, KSA, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
96
|
Yu Q, Huo H, Zhang Y, Guo H, Guo H. PairMotif+: a fast and effective algorithm for de novo motif discovery in DNA sequences. Int J Biol Sci 2013; 9:412-24. [PMID: 23678291 PMCID: PMC3654438 DOI: 10.7150/ijbs.5786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Accepted: 04/15/2013] [Indexed: 11/25/2022] Open
Abstract
The planted (l, d) motif search is one of the most widely studied problems in bioinformatics, which plays an important role in the identification of transcription factor binding sites in DNA sequences. However, it is still a challenging task to identify highly degenerate motifs, since current algorithms either output the exact results with a high computational cost or accomplish the computation in a short time but very often fall into a local optimum. In order to make a better trade-off between accuracy and efficiency, we propose a new pattern-driven algorithm, named PairMotif+. At first, some pairs of l-mers are extracted from input sequences according to probabilistic analysis and statistical method so that one or more pairs of motif instances are included in them. Then an approximate strategy for refining pairs of l-mers with high accuracy is adopted in order to avoid the verification of most candidate motifs. Experimental results on the simulated data show that PairMotif+ can solve various (l, d) problems within an hour on a PC with 2.67 GHz processor, and has a better identification accuracy than the compared algorithms MEME, AlignACE and VINE. Also, the validity of the proposed algorithm is tested on multiple real data sets.
Collapse
Affiliation(s)
- Qiang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | | | | | | | | |
Collapse
|
97
|
Khan MAF, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD. Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome. Genesis 2013; 51:311-24. [PMID: 23355428 PMCID: PMC3664090 DOI: 10.1002/dvg.22375] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 01/16/2013] [Accepted: 01/17/2013] [Indexed: 11/07/2022]
Abstract
The discovery of cis-regulatory elements is a challenging problem in bioinformatics, owing to distal locations and context-specific roles of these elements in controlling gene regulation. Here we review the current bioinformatics methodologies and resources available for systematic discovery of cis-acting regulatory elements and conserved transcription factor binding sites in the chick genome. In addition, we propose and make available, a novel workflow using computational tools that integrate CTCF analysis to predict putative insulator elements, enhancer prediction, and TFBS analysis. To demonstrate the usefulness of this computational workflow, we then use it to analyze the locus of the gene Sox2 whose developmental expression is known to be controlled by a complex array of cis-acting regulatory elements. The workflow accurately predicts most of the experimentally verified elements along with some that have not yet been discovered. A web version of the CTCF tool, together with instructions for using the workflow can be accessed from http://toolshed.g2.bx.psu.edu/view/mkhan1980/ctcf_analysis. For local installation of the tool, relevant Perl scripts and instructions are provided in the directory named "code" in the supplementary materials.
Collapse
Affiliation(s)
- Mohsin A F Khan
- Department of Cell & Developmental Biology, University College London, London, United Kingdom
| | | | | | | | | | | |
Collapse
|
98
|
Wang L, Wang X. Hierarchical Dirichlet process model for gene expression clustering. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2013; 2013:5. [PMID: 23587447 PMCID: PMC3656798 DOI: 10.1186/1687-4153-2013-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Accepted: 03/11/2013] [Indexed: 11/23/2022]
Abstract
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments.
Collapse
Affiliation(s)
- Liming Wang
- Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA.
| | | |
Collapse
|
99
|
Martyanov V, Gross RH. Computational discovery of transcriptional regulatory modules in fungal ribosome biogenesis genes reveals novel sequence and function patterns. PLoS One 2013; 8:e59851. [PMID: 23555806 PMCID: PMC3612091 DOI: 10.1371/journal.pone.0059851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Accepted: 02/20/2013] [Indexed: 11/24/2022] Open
Abstract
Genes involved in ribosome biogenesis and assembly (RBA) are responsible for ribosome formation. In Saccharomyces cerevisiae, their transcription is regulated by two dissimilar DNA motifs. We were interested in analyzing conservation and divergence of RBA transcription regulation machinery throughout fungal evolution. We have identified orthologs of S. cerevisiae RBA genes in 39 species across fungal phylogeny and searched upstream regions of these gene sets for DNA sequences significantly similar to S. cerevisiae RBA regulatory motifs. In addition to confirming known motif arrangements comprising two different motifs in a set of S. cerevisiae close relatives or two instances of the same motif (that we refer to as modules), we have also discovered novel modules in a group of fungi closely related to Neurospora crassa. Despite a single nucleotide difference between consensus sequences of RBA motifs, modules associated with S, cerevisiae group and N. crassa group displayed consistently different characteristics with respect to preferred module organization and several other module properties. For a given species, we have found a correlation between the configuration of the RBA module and significant enrichment in a set of specific Gene Ontology biological processes. We have identified several likely new candidates for a role in ribosome biogenesis in S. cerevisiae based on the combined evidence of RBA module presence in the upstream regions, functional annotation information and microarray expression profiles. We believe that this approach will be useful in terms of generating hypotheses about functional roles of genes for which only fragmentary data from a single source are available.
Collapse
Affiliation(s)
- Viktor Martyanov
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
| | - Robert H. Gross
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
- * E-mail:
| |
Collapse
|
100
|
Curtis RE, Kim S, Woolford JL, Xu W, Xing EP. Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules. BMC Genomics 2013; 14:196. [PMID: 23514438 PMCID: PMC3616858 DOI: 10.1186/1471-2164-14-196] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 03/12/2013] [Indexed: 01/08/2023] Open
Abstract
Background Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. Results While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Conclusions Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.
Collapse
Affiliation(s)
- Ross E Curtis
- Joint Carnegie Mellon – University of Pittsburgh PhD Program in Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | | |
Collapse
|