1
|
Chen J, Liu R, Lyv C, Wu M, Liu S, Jiang M, Zhang Y, Xu D, Hou K, Wu W. Identification of a 301 bp promoter core region of the SrUGT91D2 gene from Stevia rebaudiana that contributes to hormone and abiotic stress inducibility. BMC PLANT BIOLOGY 2024; 24:921. [PMID: 39358690 PMCID: PMC11447968 DOI: 10.1186/s12870-024-05616-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 09/23/2024] [Indexed: 10/04/2024]
Abstract
BACKGROUND The UDP-glucuronosyltransferase 91D2 (SrUGT91D2) gene is a crucial element in the biosynthetic pathway of steviol glycosides (SGs) and is responsible for creating 1,2-β-D glucosidic bonds at the C19 and C13 positions. This process plays a vital role in the synthesis of rebaudioside M (RM) and rebaudioside D (RD). The promoter, which regulates gene expression, requires functional analysis to understand gene expression regulation. However, investigations into the function of the promoter of SrUGT91D2 (pSrUGT91D2) have not been reported. RESULTS The pSrUGT91D2 was isolated from six S. rebaudiana lines, and subsequent multiple sequence comparisons revealed the presence of a 26 bp inDel fragment (pSrUGT91D2-B1188 type) in lines GP, GX, 110, 1114, and B1188 but not in the pSrUGT91D2 of line 023 (pSrUGT91D2-023 type). Bioinformatics analysis revealed a prevalence of significant cis-regulatory elements (CREs) within the promoter sequences, including those responsive to abscisic acid, light, anaerobic conditions, auxin, drought, low temperature, and MeJA. To verify the activity of pSrUGT91D2, the full-length promoter and a series of 5' deletion fragments (P1-P7) and a 3' deletion fragment (P8) from various lines were fused with the reporter β-glucuronidase (GUS) gene to construct the plant expression vector, pCAMBIA1300-pro∷GUS. The transcriptional activity of these genes was examined in tobacco leaves through transient transformation. GUS tissue staining analysis and enzyme activity assays demonstrated that both the full-length promoter and truncated pSrUGT91D2 were capable of initiating GUS expression in tobacco leaves. Interestingly, P8-pSrUGT91D2-B1188 (containing the inDel segment, 301 bp) exhibited enhanced activity in driving GUS gene expression. Transient expression studies of P8-pSrUGT91D2-B1188 and P8-pSrUGT91D2-023 in response to exogenous hormones (abscisic acid and indole-3-acetic acid) and light indicated the necessity of the inDel region for P8 to exhibit transcriptional activity, as it displayed strong responsiveness to abscisic acid (ABA), indole-3-acetic acid (IAA), and light induction. CONCLUSIONS These findings contribute to a deeper understanding of the regulatory mechanism of the upstream region of the SrUGT91D2 gene and provide a theoretical basis for future studies on the interaction between CREs of pSrUGT91D2 and related transcription factors.
Collapse
Affiliation(s)
- Jinsong Chen
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Renlang Liu
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Chengcheng Lyv
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Mengyang Wu
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Siqin Liu
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Meiyan Jiang
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Yurou Zhang
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Dongbei Xu
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Kai Hou
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China
| | - Wei Wu
- College of Agronomy, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
2
|
Raditsa V, Tsukanov A, Bogomolov A, Levitsky V. Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data. NAR Genom Bioinform 2024; 6:lqae090. [PMID: 39071850 PMCID: PMC11282361 DOI: 10.1093/nargab/lqae090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/03/2024] [Accepted: 07/19/2024] [Indexed: 07/30/2024] Open
Abstract
Efficient de novo motif discovery from the results of wide-genome mapping of transcription factor binding sites (ChIP-seq) is dependent on the choice of background nucleotide sequences. The foreground sequences (ChIP-seq peaks) represent not only specific motifs of target transcription factors, but also the motifs overrepresented throughout the genome, such as simple sequence repeats. We performed a massive comparison of the 'synthetic' and 'genomic' approaches to generate background sequences for de novo motif discovery. The 'synthetic' approach shuffled nucleotides in peaks, while in the 'genomic' approach selected sequences from the reference genome randomly or only from gene promoters according to the fraction of A/T nucleotides in each sequence. We compiled the benchmark collections of ChIP-seq datasets for mouse, human and Arabidopsis, and performed de novo motif discovery. We showed that the genomic approach has both more robust detection of the known motifs of target transcription factors and more stringent exclusion of the simple sequence repeats as possible non-specific motifs. The advantage of the genomic approach over the synthetic approach was greater in plants compared to mammals. We developed the AntiNoise web service (https://denovosea.icgbio.ru/antinoise/) that implements a genomic approach to extract genomic background sequences for twelve eukaryotic genomes.
Collapse
Affiliation(s)
- Vladimir V Raditsa
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Anton V Tsukanov
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Anton G Bogomolov
- Department of Cell Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
| | - Victor G Levitsky
- Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia
- Department of Natural Science, Novosibirsk State University, Novosibirsk 630090, Russia
| |
Collapse
|
3
|
Iqbal MA, Miyamoto K, Yumoto E, Parveen S, Mutanda I, Inafuku M, Oku H. Plant hormone profile and control over isoprene biosynthesis in a tropical tree Ficus septica. PLANT BIOLOGY (STUTTGART, GERMANY) 2022; 24:492-501. [PMID: 35050526 DOI: 10.1111/plb.13386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Plant hormone signalling and the circadian clock have been implicated in the transcriptional control of isoprene biosynthesis. To gain more insight into the hormonal control of isoprene biosynthesis, the present study measured plant hormone concentrations in jasmonic acid (JA)-treated leaves of our previous model study, examined their relationship with gene expression of isoprene synthase (IspS) and hormone signalling transcription factors. Of the plant hormones, IAA and JA-Ile and their related transcription factors (MYC2 and SAUR21) were significantly correlated with IspS gene expression. Concentrations of cytokinins, isopentenyladenine (iP), trans-zeatin riboside (tZR) and cis-zeatin riboside (cZR), were similarly significantly correlated with IspS expression. However, there was no significant correlation between their related transcription factor (ARR-B) and IspS expression. The circadian clock-related gene PRR7, but not the transcription factor LHY, was highly correlated with IspS expression. These results suggest that the hormonal balance between JA-Ile and IAA plays a central role in transcriptional regulation of IspS through the transcription factors MYC2 and SAUR21, the early auxin responsive genes. The putative cis-acting elements for SAUR on the IspS promoter (TGTCNN and CATATG), in addition to the G-box for MYC2, support the above proposal. These results provide insightful information on the core components of plant hormone-related regulation of IspS under coordination with the circadian clock genes.
Collapse
Affiliation(s)
- Md A Iqbal
- The United Graduate School of Agricultural Sciences, Kagoshima University, Kagoshima, Japan
| | - K Miyamoto
- Department of Biosciences, Teikyo University, Utsunomiya, Tochigi, Japan
| | - E Yumoto
- Advanced Instrumental Analysis Center, Teikyo University, Tochigi, Japan
| | - S Parveen
- Faculty of Agriculture, Sher-e-Bangla Agricultural University, Dhaka, Bangladesh
| | - I Mutanda
- School of the Environment and Safety Engineering, Biofuels Institute, Jiangsu University, Zhenjiang, Jiangsu, China
| | - M Inafuku
- Faculty of Agriculture, University of the Ryukyus, Okinawa, Japan
| | - H Oku
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
| |
Collapse
|
4
|
Szymczyk P, Szymańska G, Kuźma Ł, Jeleń A, Balcerczak E. Methyl Jasmonate Activates the 2C Methyl-D-erithrytol 2,4-cyclodiphosphate Synthase Gene and Stimulates Tanshinone Accumulation in Salvia miltiorrhiza Solid Callus Cultures. Molecules 2022; 27:molecules27061772. [PMID: 35335134 PMCID: PMC8950807 DOI: 10.3390/molecules27061772] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 02/25/2022] [Accepted: 03/05/2022] [Indexed: 01/25/2023] Open
Abstract
The present study characterizes the 5′ regulatory region of the SmMEC gene. The isolated fragment is 1559 bp long and consists of a promoter, 5′UTR and 31 nucleotide 5′ fragments of the CDS region. In silico bioinformatic analysis found that the promoter region contains repetitions of many potential cis-active elements. Cis-active elements associated with the response to methyl jasmonate (MeJa) were identified in the SmMEC gene promoter. Co-expression studies combined with earlier transcriptomic research suggest the significant role of MeJa in SmMEC gene regulation. These findings were in line with the results of the RT-PCR test showing SmMEC gene expression induction after 72 h of MeJa treatment. Biphasic total tanshinone accumulation was observed following treatment of S. miltiorrhiza solid callus cultures with 50–500 μM methyl jasmonate, with peaks observed after 10–20 and 50–60 days. An early peak of total tanshinone concentration (0.08%) occurred after 20 days of 100 μM MeJa induction, and a second, much lower one, was observed after 50 days of 50 μM MeJa stimulation (0.04%). The dominant tanshinones were cryptotanshinone (CT) and dihydrotanshinone (DHT). To better understand the inducing effect of MeJa treatment on tanshinone biosynthesis, a search was performed for methyl jasmonate-responsive cis-active motifs in the available sequences of gene proximal promoters associated with terpenoid precursor biosynthesis. The results indicate that MeJa has the potential to induce a significant proportion of the presented genes, which is in line with available transcriptomic and RT-PCR data.
Collapse
Affiliation(s)
- Piotr Szymczyk
- Department of Biology and Pharmaceutical Botany, Medical University of Łódź, Muszyńskiego 1, 90-151 Łódź, Poland;
- Correspondence:
| | - Grażyna Szymańska
- Department of Pharmaceutical Biotechnology, Medical University of Łódź, Muszyńskiego 1, 90-151 Łódź, Poland;
| | - Łukasz Kuźma
- Department of Biology and Pharmaceutical Botany, Medical University of Łódź, Muszyńskiego 1, 90-151 Łódź, Poland;
| | - Agnieszka Jeleń
- Department of Pharmaceutical Biochemistry and Molecular Diagnostics, Medical University of Łódź, Muszyńskiego 1, 90-151 Łódź, Poland; (A.J.); (E.B.)
| | - Ewa Balcerczak
- Department of Pharmaceutical Biochemistry and Molecular Diagnostics, Medical University of Łódź, Muszyńskiego 1, 90-151 Łódź, Poland; (A.J.); (E.B.)
| |
Collapse
|
5
|
Menzel M, Hurka S, Glasenhardt S, Gogol-Döring A. NoPeak: k-mer-based motif discovery in ChIP-Seq data without peak calling. Bioinformatics 2021; 37:596-602. [PMID: 32991679 DOI: 10.1093/bioinformatics/btaa845] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 09/14/2020] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION The discovery of sequence motifs mediating DNA-protein binding usually implies the determination of binding sites using high-throughput sequencing and peak calling. The determination of peaks, however, depends strongly on data quality and is susceptible to noise. RESULTS Here, we present a novel approach to reliably identify transcription factor-binding motifs from ChIP-Seq data without peak detection. By evaluating the distributions of sequencing reads around the different k-mers in the genome, we are able to identify binding motifs in ChIP-Seq data that yield no results in traditional pipelines. AVAILABILITY AND IMPLEMENTATION NoPeak is published under the GNU General Public License and available as a standalone console-based Java application at https://github.com/menzel/nopeak. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Menzel
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| | - Sabine Hurka
- Institute for Insect Biotechnology, Justus Liebig University, Giessen 35392, Germany
| | - Stefan Glasenhardt
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| | - Andreas Gogol-Döring
- MNI, Technische Hochschule Mittelhessen, University of Applied Sciences, Giessen 35390, Germany
| |
Collapse
|
6
|
Stigliani A, Martin-Arevalillo R, Lucas J, Bessy A, Vinos-Poyo T, Mironova V, Vernoux T, Dumas R, Parcy F. Capturing Auxin Response Factors Syntax Using DNA Binding Models. MOLECULAR PLANT 2019; 12:822-832. [PMID: 30336329 DOI: 10.1016/j.molp.2018.09.010] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 08/31/2018] [Accepted: 09/28/2018] [Indexed: 05/03/2023]
Abstract
Auxin is a key hormone performing a wealth of functions throughout the life cycle of plants. It acts largely by regulating genes at the transcriptional level through a family of transcription factors called auxin response factors (ARFs). Even though all ARF monomers analyzed so far bind a similar DNA sequence, there is evidence that ARFs differ in their target genomic regions and regulated genes. Here, we report the use of position weight matrices (PWMs) to model ARF DNA binding specificity based on published DNA affinity purification sequencing (DAP-seq) data. We found that the genome binding of two ARFs (ARF2 and ARF5/Monopteros [MP]) differ largely because these two factors have different preferred ARF binding site (ARFbs) arrangements (orientation and spacing). We illustrated why PWMs are more versatile to reliably identify ARFbs than the widely used consensus sequences and demonstrated their power with biochemical experiments in the identification of the regulatory regions of IAA19, an well-characterized auxin-responsive gene. Finally, we combined gene regulation by auxin with ARF-bound regions and identified specific ARFbs configurations that are over-represented in auxin-upregulated genes, thus deciphering the ARFbs syntax functional for regulation. Our study provides a general method to exploit the potential of genome-wide DNA binding assays and to decode gene regulation.
Collapse
Affiliation(s)
- Arnaud Stigliani
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Raquel Martin-Arevalillo
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France; Laboratoire de Reproduction et Développement des Plantes, Univ. Lyon, ENS de Lyon, UCB Lyon1, CNRS, INRA, 46 allée d'Italie, 69364, Lyon, France
| | - Jérémy Lucas
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Adrien Bessy
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Thomas Vinos-Poyo
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - Victoria Mironova
- Novosibirsk State University, Pirogova Street 2, Novosibirsk, Russia; Institute of Cytology and Genetics SB RAS, Lavrentyeva Avenue 10, Novosibirsk, Russia
| | - Teva Vernoux
- Laboratoire de Reproduction et Développement des Plantes, Univ. Lyon, ENS de Lyon, UCB Lyon1, CNRS, INRA, 46 allée d'Italie, 69364, Lyon, France
| | - Renaud Dumas
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France
| | - François Parcy
- Univ. Grenoble Alpes, CNRS, CEA, INRA, BIG-LPCV, 38000 Grenoble, France.
| |
Collapse
|
7
|
Tenorio-Berrío R, Pérez-Alonso MM, Vicente-Carbajosa J, Martín-Torres L, Dreyer I, Pollmann S. Identification of Two Auxin-Regulated Potassium Transporters Involved in Seed Maturation. Int J Mol Sci 2018; 19:E2132. [PMID: 30037141 PMCID: PMC6073294 DOI: 10.3390/ijms19072132] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 07/18/2018] [Accepted: 07/20/2018] [Indexed: 12/16/2022] Open
Abstract
The seed is the most important plant reproductive unit responsible for the evolutionary success of flowering plants. Aside from its essential function in the sexual reproduction of plants, the seed also represents the most economically important agricultural product worldwide, providing energy, nutrients, and raw materials for human nutrition, livestock feed, and countless manufactured goods. Hence, improvements in seed quality or size are highly valuable, due to their economic potential in agriculture. Recently, the importance of indolic compounds in regulating these traits has been reported for Arabidopsis thaliana. The transcriptional and physiological mechanisms involved, however, remain largely undisclosed. Potassium transporters have been suggested as possible mediators of embryo cell size, controlling turgor pressure during seed maturation. In addition, it has been demonstrated that the expression of K⁺ transporters is effectively regulated by auxin. Here, we provide evidence for the identification of two Arabidopsis K⁺ transporters, HAK/KT12 (At1g60160) and KUP4 (At4g23640), that are likely to be implicated in determining seed size during seed maturation and, at the same time, show a differential regulation by indole-3-acetic acid and indole-3-acetamide.
Collapse
Affiliation(s)
- Rubén Tenorio-Berrío
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
| | - Marta-Marina Pérez-Alonso
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
| | - Jesús Vicente-Carbajosa
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
| | - Leticia Martín-Torres
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
| | - Ingo Dreyer
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
- Centro de Bioinformática y Simulación Molecular (CBSM), Universidad de Talca, 2 Norte 685, 3460000 Talca, Chile.
| | - Stephan Pollmann
- Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentación (INIA), Universidad Politécnica de Madrid (UPM), 28223 Pozuelo de Alarcón, Spain.
| |
Collapse
|
8
|
Zemlyanskaya EV, Wiebe DS, Omelyanchuk NA, Levitsky VG, Mironova VV. Meta-analysis of transcriptome data identified TGTCNN motif variants associated with the response to plant hormone auxin in Arabidopsis thaliana L. J Bioinform Comput Biol 2017; 14:1641009. [PMID: 27122321 DOI: 10.1142/s0219720016410092] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Auxin is the major regulator of plant growth and development. It regulates gene expression via a family of transcription factors (ARFs) that bind to auxin responsive elements (AuxREs) in the gene promoters. The canonical AuxREs found in regulatory regions of many auxin responsive genes contain the TGTCTC core motif, whereas ARF binding site is a degenerate TGTCNN with TGTCGG strongly preferred. Thereby two questions arise: which TGTCNN variants are functional AuxRE cores and whether different TGTCNN variants have distinct functional roles? In this study, we performed meta-analysis of microarray data to reveal TGTCNN variants essential for auxin response and to characterize their functional features. Our results indicate that four TGTCNN motifs (TGTCTC, TGTCCC, TGTCGG, and TGTCTG) are associated with auxin up-regulation and two (TGTCGG, TGTCAT) with auxin down-regulation, but to a lesser extent. The genes having some of these motifs in their regulatory regions showed time-specific auxin response. Functional annotation of auxin up- and down-regulated genes also revealed GO terms specific for the auxin-regulated genes with certain TGTCNN variants in their promoters. Our results provide an idea that various TGTCNN motifs may play distinct roles in the auxin regulation of gene expression.
Collapse
Affiliation(s)
- Elena V Zemlyanskaya
- * Department for Systems Biology, Institute of Cytology and Genetics SB RAS, 10 Lavrentyev Ave., Novosibirsk 630090, Russia.,† Laboratory of Computational Transcriptomics and Evolutionary Bioinformatics, Novosibirsk State University, 2 Pirogov Str., Novosibirsk 630090, Russia
| | - Daniil S Wiebe
- * Department for Systems Biology, Institute of Cytology and Genetics SB RAS, 10 Lavrentyev Ave., Novosibirsk 630090, Russia.,† Laboratory of Computational Transcriptomics and Evolutionary Bioinformatics, Novosibirsk State University, 2 Pirogov Str., Novosibirsk 630090, Russia
| | - Nadezhda A Omelyanchuk
- * Department for Systems Biology, Institute of Cytology and Genetics SB RAS, 10 Lavrentyev Ave., Novosibirsk 630090, Russia.,† Laboratory of Computational Transcriptomics and Evolutionary Bioinformatics, Novosibirsk State University, 2 Pirogov Str., Novosibirsk 630090, Russia
| | - Victor G Levitsky
- * Department for Systems Biology, Institute of Cytology and Genetics SB RAS, 10 Lavrentyev Ave., Novosibirsk 630090, Russia.,† Laboratory of Computational Transcriptomics and Evolutionary Bioinformatics, Novosibirsk State University, 2 Pirogov Str., Novosibirsk 630090, Russia
| | - Victoria V Mironova
- * Department for Systems Biology, Institute of Cytology and Genetics SB RAS, 10 Lavrentyev Ave., Novosibirsk 630090, Russia.,† Laboratory of Computational Transcriptomics and Evolutionary Bioinformatics, Novosibirsk State University, 2 Pirogov Str., Novosibirsk 630090, Russia
| |
Collapse
|
9
|
Nettling M, Treutler H, Cerquides J, Grosse I. Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies. BMC Bioinformatics 2017; 18:141. [PMID: 28249564 PMCID: PMC5333389 DOI: 10.1186/s12859-017-1495-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 01/24/2017] [Indexed: 11/23/2022] Open
Abstract
Background Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. Approaches for de-novo motif discovery can be subdivided in phylogenetic footprinting that takes into account phylogenetic dependencies in aligned sequences of more than one species and non-phylogenetic approaches based on sequences from only one species that typically take into account intra-motif dependencies. It has been shown that modeling (i) phylogenetic dependencies as well as (ii) intra-motif dependencies separately improves de-novo motif discovery, but there is no approach capable of modeling both (i) and (ii) simultaneously. Results Here, we present an approach for de-novo motif discovery that combines phylogenetic footprinting with motif models capable of taking into account intra-motif dependencies. We study the degree of intra-motif dependencies inferred by this approach from ChIP-seq data of 35 transcription factors. We find that significant intra-motif dependencies of orders 1 and 2 are present in all 35 datasets and that intra-motif dependencies of order 2 are typically stronger than those of order 1. We also find that the presented approach improves the classification performance of phylogenetic footprinting in all 35 datasets and that incorporating intra-motif dependencies of order 2 yields a higher classification performance than incorporating such dependencies of only order 1. Conclusion Combining phylogenetic footprinting with motif models incorporating intra-motif dependencies leads to an improved performance in the classification of transcription factor binding sites. This may advance our understanding of transcriptional gene regulation and its evolution. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1495-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin Nettling
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany.
| | | | - Jesus Cerquides
- Institut d'Investigació en Intel ·ligència Artificial, IIIA-CSIC, Campus UAB, Cerdanyola, Spain
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
10
|
Trenner J, Poeschl Y, Grau J, Gogol-Döring A, Quint M, Delker C. Auxin-induced expression divergence between Arabidopsis species may originate within the TIR1/AFB-AUX/IAA-ARF module. JOURNAL OF EXPERIMENTAL BOTANY 2017; 68:539-552. [PMID: 28007950 DOI: 10.1093/jxb/erw457] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Auxin is an essential regulator of plant growth and development, and auxin signaling components are conserved among land plants. Yet, a remarkable degree of natural variation in physiological and transcriptional auxin responses has been described among Arabidopsis thaliana accessions. As intraspecies comparisons offer only limited genetic variation, we here inspect the variation of auxin responses between A. thaliana and A. lyrata. This approach allowed the identification of conserved auxin response genes including novel genes with potential relevance for auxin biology. Furthermore, promoter divergences were analyzed for putative sources of variation. De novo motif discovery identified novel and variants of known elements with potential relevance for auxin responses, emphasizing the complex, and yet elusive, code of element combinations accounting for the diversity in transcriptional auxin responses. Furthermore, network analysis revealed correlations of interspecies differences in the expression of AUX/IAA gene clusters and classic auxin-related genes. We conclude that variation in general transcriptional and physiological auxin responses may originate substantially from functional or transcriptional variations in the TIR1/AFB, AUX/IAA, and ARF signaling network. In that respect, AUX/IAA gene expression divergence potentially reflects differences in the manner in which different species transduce identical auxin signals into gene expression responses.
Collapse
Affiliation(s)
- Jana Trenner
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Betty-Heimann, Halle (Saale), Germany
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, Halle (Saale), Germany
| | - Yvonne Poeschl
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, Germany
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1,Halle (Saale), Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1,Halle (Saale), Germany
| | - Andreas Gogol-Döring
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, Germany
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1,Halle (Saale), Germany
| | - Marcel Quint
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Betty-Heimann, Halle (Saale), Germany
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, Halle (Saale), Germany
| | - Carolin Delker
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Betty-Heimann, Halle (Saale), Germany
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, Halle (Saale), Germany
| |
Collapse
|
11
|
Abstract
Auxin is arguably the most important signaling molecule in plants, and the last few decades have seen remarkable breakthroughs in understanding its production, transport, and perception. Recent investigations have focused on transcriptional responses to auxin, providing novel insight into the functions of the domains of key transcription regulators in responses to the hormonal cue and prominently implicating chromatin regulation in these responses. In addition, studies are beginning to identify direct targets of the auxin-responsive transcription factors that underlie auxin modulation of development. Mechanisms to tune the response to different auxin levels are emerging, as are first insights into how this single hormone can trigger diverse responses. Key unanswered questions center on the mechanism for auxin-directed transcriptional repression and the identity of additional determinants of auxin response specificity. Much of what has been learned in model plants holds true in other species, including the earliest land plants.
Collapse
Affiliation(s)
- Dolf Weijers
- Laboratory of Biochemistry, Wageningen University, 6703 HA Wageningen, The Netherlands;
| | - Doris Wagner
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
12
|
Dinesh DC, Villalobos LIAC, Abel S. Structural Biology of Nuclear Auxin Action. TRENDS IN PLANT SCIENCE 2016; 21:302-316. [PMID: 26651917 DOI: 10.1016/j.tplants.2015.10.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 09/29/2015] [Accepted: 10/23/2015] [Indexed: 05/23/2023]
Abstract
Auxin coordinates plant development largely via hierarchical control of gene expression. During the past decades, the study of early auxin genes paired with the power of Arabidopsis genetics have unraveled key nuclear components and molecular interactions that perceive the hormone and activate primary response genes. Recent research in the realm of structural biology allowed unprecedented insight into: (i) the recognition of auxin-responsive DNA elements by auxin transcription factors; (ii) the inactivation of those auxin response factors by early auxin-inducible repressors; and (iii) the activation of target genes by auxin-triggered repressor degradation. The biophysical studies reviewed here provide an impetus for elucidating the molecular determinants of the intricate interactions between core components of the nuclear auxin response module.
Collapse
Affiliation(s)
- Dhurvas Chandrasekaran Dinesh
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, D-06120 Halle (Saale), Germany
| | - Luz Irina A Calderón Villalobos
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, D-06120 Halle (Saale), Germany
| | - Steffen Abel
- Department of Molecular Signal Processing, Leibniz Institute of Plant Biochemistry, Weinberg 3, D-06120 Halle (Saale), Germany; Institute of Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Strasse 3, D-06120 Halle (Saale), Germany; Department of Plant Sciences, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA.
| |
Collapse
|
13
|
Lis M, Walther D. The orientation of transcription factor binding site motifs in gene promoter regions: does it matter? BMC Genomics 2016; 17:185. [PMID: 26939991 PMCID: PMC4778318 DOI: 10.1186/s12864-016-2549-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/27/2016] [Indexed: 12/23/2022] Open
Abstract
Background Gene expression is to large degree regulated by the specific binding of protein transcription factors to cis-regulatory transcription factor binding sites in gene promoter regions. Despite the identification of hundreds of binding site sequence motifs, the question as to whether motif orientation matters with regard to the gene expression regulation of the respective downstream genes appears surprisingly underinvestigated. Results We pursued a statistical approach by probing 293 reported non-palindromic transcription factor binding site and ten core promoter motifs in Arabidopsis thaliana for evidence of any relevance of motif orientation based on mapping statistics and effects on the co-regulation of gene expression of the respective downstream genes. Although positional intervals closer to the transcription start site (TSS) were found with increased frequencies of motifs exhibiting orientation preference, a corresponding effect with regard to gene expression regulation as evidenced by increased co-expression of genes harboring the favored orientation in their upstream sequence could not be established. Furthermore, we identified an intrinsic orientational asymmetry of sequence regions close to the TSS as the likely source of the identified motif orientation preferences. By contrast, motif presence irrespective of orientation was found associated with pronounced effects on gene expression co-regulation validating the pursued approach. Inspecting motif pairs revealed statistically preferred orientational arrangements, but no consistent effect with regard to arrangement-dependent gene expression regulation was evident. Conclusions Our results suggest that for the motifs considered here, either no specific orientation rendering them functional across all their instances exists with orientational requirements instead depending on gene-locus specific additional factors, or that the binding orientation of transcription factors may generally not be relevant, but rather the event of binding itself. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2549-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Monika Lis
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| | - Dirk Walther
- Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany.
| |
Collapse
|
14
|
Eggeling R, Roos T, Myllymäki P, Grosse I. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 2015; 16:375. [PMID: 26552868 PMCID: PMC4640111 DOI: 10.1186/s12859-015-0797-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 11/29/2022] Open
Abstract
Background Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery. Results To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice. Conclusions The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0797-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ralf Eggeling
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Teemu Roos
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Petri Myllymäki
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
15
|
Ponomarenko PM, Ponomarenko MP. Sequence-based prediction of transcription upregulation by auxin in plants. J Bioinform Comput Biol 2015; 13:1540009. [PMID: 25666655 DOI: 10.1142/s0219720015400090] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Auxin is one of the main regulators of growth and development in plants. Prediction of auxin response based on gene sequence is of high importance. We found the TGTCNC consensus of 111 known natural and artificially mutated auxin response elements (AuxREs) with measured auxin-caused relative increase in genes' transcription levels, so-called either "a response to auxin" or "an auxin response." This consensus was identical to the most cited AuxRE motif. Also, we found several DNA sequence features that correlate with auxin-caused increase in genes' transcription levels, namely: number of matches with TGTCNC, homology score based on nucleotide frequencies at the consensus positions, abundances of five trinucleotides and five B-helical DNA features around these known AuxREs. We combined these correlations using a four-step empirical model of auxin response based on a gene's sequence with four steps, namely: (1) search for AuxREs with no auxin; (2) stop at the found AuxRE; (3) repression of the basal transcription of the gene having this AuxRE; and (4) manifold increase of this gene's transcription in response to auxin. Independently measured increases in transcription levels in response to auxin for 70 Arabidopsis genes were found to significantly correlate with predictions of this equation (r = 0.44, p < 0.001) as well as with TATA-binding protein (TBP)'s affinity to promoters of these genes and with nucleosome packing of these promoters (both, p < 0.025). Finally, we improved our equation for prediction of a gene's transcription increase in response to auxin by taking into account TBP-binding and nucleosome packing (r = 0.53, p < 10(-6)). Fisher's F-test validated the significant impact of both TBP/promoter-affinity and promoter nucleosome on auxin response in addition to those of AuxRE, F = 4.07, p < 0.025. It means that both TATA-box and nucleosome should be taken into account to recognize transcription factor binding sites upon DNA sequences: in the case of the TATA-less nucleosome-rich promoters, recognition scores must be higher than in the case of the TATA-containing nucleosome-free promoters at the same transcription activity.
Collapse
Affiliation(s)
- Petr M Ponomarenko
- Children's Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA 90027, USA
| | | |
Collapse
|
16
|
Keilwagen J, Grau J. Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res 2015; 43:e119. [PMID: 26116565 PMCID: PMC4605289 DOI: 10.1093/nar/gkv577] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Revised: 05/11/2015] [Accepted: 05/21/2015] [Indexed: 11/17/2022] Open
Abstract
Binding of transcription factors to DNA is one of the keystones of gene regulation. The existence of statistical dependencies between binding site positions is widely accepted, while their relevance for computational predictions has been debated. Building probabilistic models of binding sites that may capture dependencies is still challenging, since the most successful motif discovery approaches require numerical optimization techniques, which are not suited for selecting dependency structures. To overcome this issue, we propose sparse local inhomogeneous mixture (Slim) models that combine putative dependency structures in a weighted manner allowing for numerical optimization of dependency structure and model parameters simultaneously. We find that Slim models yield a substantially better prediction performance than previous models on genomic context protein binding microarray data sets and on ChIP-seq data sets. To elucidate the reasons for the improved performance, we develop dependency logos, which allow for visual inspection of dependency structures within binding sites. We find that the dependency structures discovered by Slim models are highly diverse and highly transcription factor-specific, which emphasizes the need for flexible dependency models. The observed dependency structures range from broad heterogeneities to sparse dependencies between neighboring and non-neighboring binding site positions.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06099 Halle (Saale), Germany
| |
Collapse
|
17
|
Smieszek SP, Yang H, Paccanaro A, Devlin PF. Progressive promoter element combinations classify conserved orthogonal plant circadian gene expression modules. J R Soc Interface 2015; 11:rsif.2014.0535. [PMID: 25142519 PMCID: PMC4233729 DOI: 10.1098/rsif.2014.0535] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We aimed to test the proposal that progressive combinations of multiple promoter elements acting in concert may be responsible for the full range of phases observed in plant circadian output genes. In order to allow reliable selection of informative phase groupings of genes for our purpose, intrinsic cyclic patterns of expression were identified using a novel, non-biased method for the identification of circadian genes. Our non-biased approach identified two dominant, inherent orthogonal circadian trends underlying publicly available microarray data from plants maintained under constant conditions. Furthermore, these trends were highly conserved across several plant species. Four phase-specific modules of circadian genes were generated by projection onto these trends and, in order to identify potential combinatorial promoter elements that might classify genes into these groups, we used a Random Forest pipeline which merged data from multiple decision trees to look for the presence of element combinations. We identified a number of regulatory motifs which aggregated into coherent clusters capable of predicting the inclusion of genes within each phase module with very high fidelity and these motif combinations changed in a consistent, progressive manner from one phase module group to the next, providing strong support for our hypothesis.
Collapse
Affiliation(s)
- Sandra P Smieszek
- School of Biological Sciences, Royal Holloway University of London, Egham TW20 0EX, UK Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Haixuan Yang
- Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK Department of Computer Science, Royal Holloway University of London, Egham TW20 0EX, UK
| | - Paul F Devlin
- School of Biological Sciences, Royal Holloway University of London, Egham TW20 0EX, UK Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham TW20 0EX, UK
| |
Collapse
|
18
|
Wang S, Tie J, Wang R, Hu F, Gao L, Wang W, Wang L, Li Z, Hu S, Tang S, Li M, Wang X, Nie Y, Wu K, Fan D. SOX2, a predictor of survival in gastric cancer, inhibits cell proliferation and metastasis by regulating PTEN. Cancer Lett 2015; 358:210-219. [PMID: 25543086 DOI: 10.1016/j.canlet.2014.12.045] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 12/08/2014] [Accepted: 12/19/2014] [Indexed: 01/02/2023]
Abstract
Inconsistent results of SOX2 expression have been reported in gastric cancer (GC). Here, we demonstrated that SOX2 was progressively downregulated during GC development via immunochemistry in 755 human gastric specimens. Low SOX2 levels were associated with pathological stage and clinical outcome. Multivariate analysis indicated that SOX2 protein expression served as an independent prognostic marker for GC. Gain-and loss-of function studies showed the anti-proliferative, anti-metastatic, and pro-apoptotic effects of SOX2 in GC. PTEN was selected as SOX2 targets by cDNA microarray and ChIP-DSL, further identified by luciferase assays, EMSA and ChIP-PCR. PTEN upregulation in response to SOX2-enforced expression suppressed GC malignancy via regulating Akt dephosphorylation. PTEN inhibition reversed SOX2-induced anticancer effects. Moreover, concordant positivity of SOX2 and PTEN proteins in nontumorous tissues but lost in matched GC specimens predicted a worse patient prognosis. Thus, SOX2 proved to be a new marker for evaluating GC outcome.
Collapse
Affiliation(s)
- Simeng Wang
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Jun Tie
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China.
| | - Rui Wang
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Fengrong Hu
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Liucun Gao
- Department of Pharmacology and Toxicology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Wenlan Wang
- Department of Aerospace Hygiene and Health Service, School of Aerospace Medicine, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Lifeng Wang
- Department of Biochemistry and Molecular Biology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Zengshan Li
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Sijun Hu
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Shanhong Tang
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Mengbin Li
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Xin Wang
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Yongzhan Nie
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Kaichun Wu
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Daiming Fan
- State key Laboratory of Cancer Biology and Xijing Hospital of Digestive Diseases, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China.
| |
Collapse
|
19
|
Mironova VV, Omelyanchuk NA, Wiebe DS, Levitsky VG. Computational analysis of auxin responsive elements in the Arabidopsis thaliana L. genome. BMC Genomics 2014; 15 Suppl 12:S4. [PMID: 25563792 PMCID: PMC4331925 DOI: 10.1186/1471-2164-15-s12-s4] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Auxin responsive elements (AuxRE) were found in upstream regions of target genes for ARFs (Auxin response factors). While Chip-seq data for most of ARFs are still unavailable, prediction of potential AuxRE is restricted by consensus models that detect too many false positive sites. Using sequence analysis of experimentally proven AuxREs, we revealed both an extended nucleotide context pattern for AuxRE itself and three distinct types of its coupling motifs (Y-patch, AuxRE-like, and ABRE-like), which together with AuxRE may form the composite elements. Computational analysis of the genome-wide distribution of the predicted AuxREs and their impact on auxin responsive gene expression allowed us to conclude that: (1) AuxREs are enriched around the transcription start site with the maximum density in 5'UTR; (2) AuxREs mediate auxin responsive up-regulation, not down-regulation. (3) Directly oriented single AuxREs and reverse multiple AuxREs are mostly associated with auxin responsiveness. In the composite AuxRE elements associated with auxin response, ABRE-like and Y-patch are 5'-flanking or overlapping AuxRE, whereas AuxRE-like motif is 3'-flanking. The specificity in location and orientation of the coupling elements suggests them as potential binding sites for ARFs partners.
Collapse
|
20
|
Maaskola J, Rajewsky N. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res 2014; 42:12995-3011. [PMID: 25389269 PMCID: PMC4245949 DOI: 10.1093/nar/gku1083] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.
Collapse
Affiliation(s)
- Jonas Maaskola
- Laboratory for Systems Biology of Gene Regulatory Elements, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Strasse 10, Berlin-Buch 13125, Germany
| | - Nikolaus Rajewsky
- Laboratory for Systems Biology of Gene Regulatory Elements, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Strasse 10, Berlin-Buch 13125, Germany
| |
Collapse
|
21
|
Kamath U, De Jong K, Shehu A. Effective automated feature construction and selection for classification of biological sequences. PLoS One 2014; 9:e99982. [PMID: 25033270 PMCID: PMC4102475 DOI: 10.1371/journal.pone.0099982] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 05/21/2014] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features. METHODOLOGY We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not. RESULTS To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.
Collapse
Affiliation(s)
- Uday Kamath
- Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Kenneth De Jong
- Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Krasnow Institute, George Mason University, Fairfax, Virginia, United States of America
| | - Amarda Shehu
- Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Bioengineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Fairfax, Virginia, United States of America
| |
Collapse
|
22
|
Ohmiya H, Vitezic M, Frith MC, Itoh M, Carninci P, Forrest ARR, Hayashizaki Y, Lassmann T. RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genomics 2014; 15:269. [PMID: 24779366 PMCID: PMC4029093 DOI: 10.1186/1471-2164-15-269] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 04/04/2014] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5' ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution. RESULTS We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not. CONCLUSIONS By examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches.
Collapse
Affiliation(s)
- Hiroko Ohmiya
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
- RIKEN Advanced Center for Computing and Communication, Preventive Medicine and Applied Genomics Unit, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| | - Morana Vitezic
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
- Department of Cell and Molecular Biology (CMB), Karolinska Institute, SE-171 77 Stockholm, Sweden
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Martin C Frith
- Sequence Analysis Team, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, 135-0064 Tokyo, Japan
| | - Masayoshi Itoh
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
- RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| | - Alistair RR Forrest
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| | - Yoshihide Hayashizaki
- RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| | - Timo Lassmann
- RIKEN Center for Life Science Technologies (CLST), Division of Genomic Technologies, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045 Yokohama, Japan
| |
Collapse
|
23
|
Computational prediction of transcription factor binding sites based on an integrative approach incorporating genomic and epigenomic features. Genes Genomics 2014. [DOI: 10.1007/s13258-013-0136-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Eggeling R, Gohr A, Keilwagen J, Mohr M, Posch S, Smith AD, Grosse I. On the value of intra-motif dependencies of human insulator protein CTCF. PLoS One 2014; 9:e85629. [PMID: 24465627 PMCID: PMC3899044 DOI: 10.1371/journal.pone.0085629] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 12/05/2013] [Indexed: 01/08/2023] Open
Abstract
The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.
Collapse
Affiliation(s)
- Ralf Eggeling
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - André Gohr
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Quedlinburg, Germany
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
| | - Michaela Mohr
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
| | - Andrew D. Smith
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States of America
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle/Saale, Germany
- Department of Genebank, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Germany
- German Center of Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
25
|
Zhang Z, Chang CW, Hugo W, Cheung E, Sung WK. Simultaneously learning DNA motif along with its position and sequence rank preferences through expectation maximization algorithm. J Comput Biol 2014; 20:237-48. [PMID: 23461573 DOI: 10.1089/cmb.2012.0233] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
Collapse
Affiliation(s)
- ZhiZhuo Zhang
- National University of Singapore, Singapore, Singapore
| | | | | | | | | |
Collapse
|
26
|
Grau J, Posch S, Grosse I, Keilwagen J. A general approach for discriminative de novo motif discovery from high-throughput data. Nucleic Acids Res 2013; 41:e197. [PMID: 24057214 PMCID: PMC3834837 DOI: 10.1093/nar/gkt831] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
Collapse
Affiliation(s)
- Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06099 Halle, Saale, Germany, Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany and Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany
| | | | | | | |
Collapse
|
27
|
Takahashi M, Kamei Y, Ehara T, Yuan X, Suganami T, Takai-Igarashi T, Hatada I, Ogawa Y. Analysis of DNA methylation change induced by Dnmt3b in mouse hepatocytes. Biochem Biophys Res Commun 2013; 434:873-8. [PMID: 23611774 DOI: 10.1016/j.bbrc.2013.04.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2013] [Accepted: 04/07/2013] [Indexed: 01/25/2023]
Abstract
DNA methylation is a key epigenetic contributor to gene regulation in mammals. We have recently found that in the mouse liver, the promoter region of glycerol-3-phosphate acyltransferase 1, a rate-limiting enzyme of de novo lipogenesis, is regulated by DNA methylation, which is mediated by Dnmt3b, an enzyme required for the initiation of de novo methylation. In this study, using primary cultures of mouse hepatocytes with adenoviral overexpression of Dnmt3b, we characterized Dnmt3b-dependent DNA methylation on a genome-wide basis. A genome-wide DNA methylation analysis, called microarray-based integrated analysis of methylation by isoschizomers, identified 108 genes with Dnmt3b dependent DNA methylation. In DNA expression array analysis, expression of some genes with Dnmt3b-dependent DNA methylation was suppressed. Studies with primary mouse hepatocytes overexpressing Dnmt3b or Dnmt3a revealed that many genes with Dnmt3b-dependent methylation are not methylated by Dnmt3a, whereas those methylated by Dnmt3a are mostly methylated by Dnmt3b. Bioinformatic analysis showed that the CANAGCTG and CCGGWNCSC (N denotes A, T, G, or C; W denotes A or T; and S denotes C or G) sequences are enriched in genes methylated by overexpression of Dnmt3b and Dnmt3a, respectively. We also observed a large number of genes with Dnmt3b-dependent DNA methylation in primary cultures of mouse hepatocytes with adenoviral overexpression of Dnmt3, suggesting that Dnmt3b is an important DNA methyltransferase in primary mouse hepatocytes, targets specific genes, and potentially plays a role in vivo.
Collapse
Affiliation(s)
- Mayumi Takahashi
- Department of Organ Network and Metabolism, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, 1-5-45 Yushima, Tokyo 136-8510, Japan.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Hosseini P, Ovcharenko I, Matthews BF. Using an ensemble of statistical metrics to quantify large sets of plant transcription factor binding sites. PLANT METHODS 2013; 9:12. [PMID: 23578135 PMCID: PMC3639912 DOI: 10.1186/1746-4811-9-12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/28/2013] [Indexed: 05/07/2023]
Abstract
BACKGROUND From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today's transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion. RESULTS We present Marina, an open-source software for identifying over-represented TFBSs from amongst large sets of promoter sequences, using an ensemble of 7 statistical metrics and binding-site profiles. Through software comparison, we show that Marina can identify considerably more over-represented plant TFBSs compared to a popular software alternative. CONCLUSIONS Marina was used to identify over-represented TFBSs in a two time-point RNA-Seq study exploring the transcriptomic interplay between soybean (Glycine max) and soybean rust (Phakopsora pachyrhizi). Marina identified numerous abundant TFBSs recognized by transcription factors that are associated with defense-response such as WRKY, HY5 and MYB2. Comparing results from Marina to that of a popular software alternative suggests that regardless of the number of promoter-sequences, Marina is able to identify significantly more over-represented TFBSs.
Collapse
Affiliation(s)
- Parsa Hosseini
- Department of Bioinformatics and Computational Biology, George Mason University, Manassas, Virginia, USA
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Beltsville, Maryland, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA
| | - Benjamin F Matthews
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Beltsville, Maryland, USA
| |
Collapse
|
29
|
GRAU JAN, KEILWAGEN JENS, GOHR ANDRÉ, PAPONOV IVANA, POSCH STEFAN, SEIFERT MICHAEL, STRICKERT MARC, GROSSE IVO. DISPOM: A DISCRIMINATIVE DE-NOVO MOTIF DISCOVERY TOOL BASED ON THE JSTACS LIBRARY. J Bioinform Comput Biol 2013; 11:1340006. [DOI: 10.1142/s0219720013400064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.
Collapse
Affiliation(s)
- JAN GRAU
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - JENS KEILWAGEN
- Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| | - ANDRÉ GOHR
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - IVAN A. PAPONOV
- Institute of Biology II / Botany, Faculty of Biology, Albert–Ludwigs–University Freiburg, D-79104 Freiburg, Germany
| | - STEFAN POSCH
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - MICHAEL SEIFERT
- Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
| | - MARC STRICKERT
- Center for Synthetic Microbiology, SYNMIKRO, Philipps-Universität Marburg, Germany
| | - IVO GROSSE
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| |
Collapse
|
30
|
Weirauch MT, Cote A, Norel R, Annala M, Zhao Y, Riley TR, Saez-Rodriguez J, Cokelaer T, Vedenko A, Talukder S, DREAM5 consortium, Bussemaker HJ, Morris QD, Bulyk ML, Stolovitzky G, Hughes TR. Evaluation of methods for modeling transcription factor sequence specificity. Nat Biotechnol 2013; 31:126-34. [PMID: 23354101 PMCID: PMC3687085 DOI: 10.1038/nbt.2486] [Citation(s) in RCA: 275] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2012] [Accepted: 12/18/2012] [Indexed: 12/21/2022]
Abstract
Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.
Collapse
Affiliation(s)
- Matthew T. Weirauch
- Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Rheumatology and Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Atina Cote
- Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Raquel Norel
- IBM Computational Biology Center, Yorktown Heights, New York, NY, USA
| | - Matti Annala
- Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Yue Zhao
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Todd R. Riley
- Department of Biological Sciences, Columbia University, and Center for Computational Biology and Bioinformatics, Columbia University Medical Center, New York, NY
| | | | | | - Anastasia Vedenko
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Shaheynoor Talukder
- Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | | | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, and Center for Computational Biology and Bioinformatics, Columbia University Medical Center, New York, NY
| | - Quaid D. Morris
- Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
| | | | - Timothy R. Hughes
- Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
31
|
Mehdi AM, Sehgal MSB, Kobe B, Bailey TL, Bodén M. DLocalMotif: a discriminative approach for discovering local motifs in protein sequences. ACTA ACUST UNITED AC 2012; 29:39-46. [PMID: 23142965 DOI: 10.1093/bioinformatics/bts654] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. RESULTS This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. AVAILABILITY http://bioinf.scmb.uq.edu.au/dlocalmotif/
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Australia
| | | | | | | | | |
Collapse
|
32
|
Hartmann H, Guthöhrlein EW, Siebert M, Luehr S, Söding J. P-value-based regulatory motif discovery using positional weight matrices. Genome Res 2012; 23:181-94. [PMID: 22990209 PMCID: PMC3530678 DOI: 10.1101/gr.139881.112] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
To analyze gene regulatory networks, the sequence-dependent DNA/RNA binding affinities of proteins and noncoding RNAs are crucial. Often, these are deduced from sets of sequences enriched in factor binding sites. Two classes of computational approaches exist. The first describe binding motifs by sequence patterns and search the patterns with highest statistical significance for enrichment. The second class uses the more powerful position weight matrices (PWMs). Instead of maximizing the statistical significance of enrichment, they maximize a likelihood. Here we present XXmotif (eXhaustive evaluation of matriX motifs), the first PWM-based motif discovery method that can optimize PWMs by directly minimizing their P-values of enrichment. Optimization requires computing millions of enrichment P-values for thousands of PWMs. For a given PWM, the enrichment P-value is calculated efficiently from the match P-values of all possible motif placements in the input sequences using order statistics. The approach can naturally combine P-values for motif enrichment, conservation, and localization. On ChIP-chip/seq, miRNA knock-down, and coexpression data sets from yeast and metazoans, XXmotif outperformed state-of-the-art tools, both in numbers of correctly identified motifs and in the quality of PWMs. In segmentation modules of D. melanogaster, we detect the known key regulators and several new motifs. In human core promoters, XXmotif reports most previously described and eight novel motifs sharply peaked around the transcription start site, among them an Initiator motif similar to the fly and yeast versions. XXmotif's sensitivity, reliability, and usability will help to leverage the quickly accumulating wealth of functional genomics data.
Collapse
Affiliation(s)
- Holger Hartmann
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | | | | | | | | |
Collapse
|
33
|
Abstract
Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm “POWRS” (POsition-sensitive WoRd Set) for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties. Availability: BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.
Collapse
|
34
|
Mönke G, Seifert M, Keilwagen J, Mohr M, Grosse I, Hähnel U, Junker A, Weisshaar B, Conrad U, Bäumlein H, Altschmied L. Toward the identification and regulation of the Arabidopsis thaliana ABI3 regulon. Nucleic Acids Res 2012; 40:8240-54. [PMID: 22730287 PMCID: PMC3458547 DOI: 10.1093/nar/gks594] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The plant-specific, B3 domain-containing transcription factor ABSCISIC ACID INSENSITIVE3 (ABI3) is an essential component of the regulatory network controlling the development and maturation of the Arabidopsis thaliana seed. Genome-wide chromatin immunoprecipitation (ChIP-chip), transcriptome analysis, quantitative reverse transcriptase–polymerase chain reaction and a transient promoter activation assay have been combined to identify a set of 98 ABI3 target genes. Most of these presumptive ABI3 targets require the presence of abscisic acid for their activation and are specifically expressed during seed maturation. ABI3 target promoters are enriched for G-box-like and RY-like elements. The general occurrence of these cis motifs in non-ABI3 target promoters suggests the existence of as yet unidentified regulatory signals, some of which may be associated with epigenetic control. Several members of the ABI3 regulon are also regulated by other transcription factors, including the seed-specific, B3 domain-containing FUS3 and LEC2. The data strengthen and extend the notion that ABI3 is essential for the protection of embryonic structures from desiccation and raise pertinent questions regarding the specificity of promoter recognition.
Collapse
Affiliation(s)
- Gudrun Mönke
- Department of Molecular Genetics, Leibniz-Institute of Plant Genetics and Crop Plant Research (IPK) Corrensstr. 3, D-06466 Gatersleben, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Ma X, Kulkarni A, Zhang Z, Xuan Z, Serfling R, Zhang MQ. A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information. Nucleic Acids Res 2012; 40:e50. [PMID: 22228832 PMCID: PMC3326300 DOI: 10.1093/nar/gkr1135] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.
Collapse
Affiliation(s)
- Xiaotu Ma
- Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA
| | | | | | | | | | | |
Collapse
|
36
|
Walcher CL, Nemhauser JL. Bipartite promoter element required for auxin response. PLANT PHYSIOLOGY 2012; 158:273-82. [PMID: 22100645 PMCID: PMC3252081 DOI: 10.1104/pp.111.187559] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 11/16/2011] [Indexed: 05/19/2023]
Abstract
Multiple mechanisms have been described for coordination of responses to the plant hormones auxin and brassinosteroids (Zhang et al., 2009). One unexplained phenomenon is the reliance of the auxin transcriptional response on a functional brassinosteroid pathway. In this study, we used luciferase reporters to interrogate the promoter of SMALL AUXIN-UP RNA15 (SAUR15), a well-characterized auxin and brassinosteroid early response gene in Arabidopsis (Arabidopsis thaliana). After identifying a minimal region sufficient for auxin response, we targeted predicted cis-regulatory elements contained within this sequence and found a critical subset required for hormone response. Specifically, reporter sensitivity to auxin treatment required two elements: a Hormone Up at Dawn (HUD)-type E-box and an AuxRE-related TGTCT element. Reporter response to brassinosteroid treatment relied on the same two elements. Consistent with these findings, the transcription factors BRASSINOSTEROID INSENSITIVE1-EMS SUPPESSOR1 and MONOPTEROS (MP)/ AUXIN RESPONSE FACTOR5 (ARF5) showed enhanced binding to the critical promoter region containing these elements. Treatment with auxin or brassinosteroids could enhance binding of either transcription factor, and brassinosteroid enhancement of MP/ARF5 binding required an intact HUD element. Conservation of clustered HUD elements and AuxRE-related sequences in promoters of putative SAUR15 orthologs in a number of flowering plant species, in combination with evidence for statistically significant clustering of these elements across all Arabidopsis promoters, provided further evidence of the functional importance of coordinated transcription factor binding.
Collapse
|
37
|
Blomster T, Salojärvi J, Sipari N, Brosché M, Ahlfors R, Keinänen M, Overmyer K, Kangasjärvi J. Apoplastic reactive oxygen species transiently decrease auxin signaling and cause stress-induced morphogenic response in Arabidopsis. PLANT PHYSIOLOGY 2011; 157:1866-83. [PMID: 22007024 PMCID: PMC3327221 DOI: 10.1104/pp.111.181883] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 10/15/2011] [Indexed: 05/18/2023]
Abstract
Reactive oxygen species (ROS) are ubiquitous signaling molecules in plant stress and development. To gain further insight into the plant transcriptional response to apoplastic ROS, the phytotoxic atmospheric pollutant ozone was used as a model ROS inducer in Arabidopsis (Arabidopsis thaliana) and gene expression was analyzed with microarrays. In contrast to the increase in signaling via the stress hormones salicylic acid, abscisic acid, jasmonic acid (JA), and ethylene, ROS treatment caused auxin signaling to be transiently suppressed, which was confirmed with a DR5-uidA auxin reporter construct. Transcriptomic data revealed that various aspects of auxin homeostasis and signaling were modified by apoplastic ROS. Furthermore, a detailed analysis of auxin signaling showed that transcripts of several auxin receptors and Auxin/Indole-3-Acetic Acid (Aux/IAA) transcriptional repressors were reduced in response to apoplastic ROS. The ROS-derived changes in the expression of auxin signaling genes partially overlapped with abiotic stress, pathogen responses, and salicylic acid signaling. Several mechanisms known to suppress auxin signaling during biotic stress were excluded, indicating that ROS regulated auxin responses via a novel mechanism. Using mutants defective in various auxin (axr1, nit1, aux1, tir1 afb2, iaa28-1, iaa28-2) and JA (axr1, coi1-16) responses, ROS-induced cell death was found to be regulated by JA but not by auxin. Chronic ROS treatment resulted in altered leaf morphology, a stress response known as "stress-induced morphogenic response." Altered leaf shape of tir1 afb2 suggests that auxin was a negative regulator of stress-induced morphogenic response in the rosette.
Collapse
|
38
|
Abstract
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers.
Collapse
|