1
|
Augustijn HE, Roseboom AM, Medema MH, van Wezel GP. Harnessing regulatory networks in Actinobacteria for natural product discovery. J Ind Microbiol Biotechnol 2024; 51:kuae011. [PMID: 38569653 PMCID: PMC10996143 DOI: 10.1093/jimb/kuae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024]
Abstract
Microbes typically live in complex habitats where they need to rapidly adapt to continuously changing growth conditions. To do so, they produce an astonishing array of natural products with diverse structures and functions. Actinobacteria stand out for their prolific production of bioactive molecules, including antibiotics, anticancer agents, antifungals, and immunosuppressants. Attention has been directed especially towards the identification of the compounds they produce and the mining of the large diversity of biosynthetic gene clusters (BGCs) in their genomes. However, the current return on investment in random screening for bioactive compounds is low, while it is hard to predict which of the millions of BGCs should be prioritized. Moreover, many of the BGCs for yet undiscovered natural products are silent or cryptic under laboratory growth conditions. To identify ways to prioritize and activate these BGCs, knowledge regarding the way their expression is controlled is crucial. Intricate regulatory networks control global gene expression in Actinobacteria, governed by a staggering number of up to 1000 transcription factors per strain. This review highlights recent advances in experimental and computational methods for characterizing and predicting transcription factor binding sites and their applications to guide natural product discovery. We propose that regulation-guided genome mining approaches will open new avenues toward eliciting the expression of BGCs, as well as prioritizing subsets of BGCs for expression using synthetic biology approaches. ONE-SENTENCE SUMMARY This review provides insights into advances in experimental and computational methods aimed at predicting transcription factor binding sites and their applications to guide natural product discovery.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Anna M Roseboom
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute for Ecology (NIOO-KNAW), Wageningen, The Netherlands
| |
Collapse
|
2
|
Tsukanov AV, Mironova VV, Levitsky VG. Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis. FRONTIERS IN PLANT SCIENCE 2022; 13:938545. [PMID: 35968123 PMCID: PMC9373801 DOI: 10.3389/fpls.2022.938545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 07/05/2022] [Indexed: 05/15/2023]
Abstract
Position weight matrix (PWM) is the traditional motif model representing the transcription factor (TF) binding sites. It proposes that the positions contribute independently to TFs binding affinity, although this hypothesis does not fit the data perfectly. This explains why PWM hits are missing in a substantial fraction of ChIP-seq peaks. To study various modes of the direct binding of plant TFs, we compiled the benchmark collection of 111 ChIP-seq datasets for Arabidopsis thaliana, and applied the traditional PWM, and two alternative motif models BaMM and SiteGA, proposing the dependencies of the positions. The variation in the stringency of the recognition thresholds for the models proposed that the hits of PWM, BaMM, and SiteGA models are associated with the sites of high/medium, any, and low affinity, respectively. At the medium recognition threshold, about 60% of ChIP-seq peaks contain PWM hits consisting of conserved core consensuses, while BaMM and SiteGA provide hits for an additional 15% of peaks in which a weaker core consensus is compensated through intra-motif dependencies. The presence/absence of these dependencies in the motifs of alternative/traditional models was confirmed by the dependency logo DepLogo visualizing the position-wise partitioning of the alignments of predicted sites. We exemplify the detailed analysis of ChIP-seq profiles for plant TFs CCA1, MYC2, and SEP3. Gene ontology (GO) enrichment analysis revealed that among the three motif models, the SiteGA had the highest portions of genes with the significantly enriched GO terms among all predicted genes. We showed that both alternative motif models provide for traditional PWM greater extensions in predicted sites for TFs MYC2/SEP3 with condition/tissue specific functions, compared to those for TF CCA1 with housekeeping functions. Overall, the combined application of standard and alternative motif models is beneficial to detect various modes of the direct TF-DNA interactions in the maximal portion of ChIP-seq loci.
Collapse
Affiliation(s)
- Anton V. Tsukanov
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Victoria V. Mironova
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Plant Systems Physiology, Radboud Institute for Biological and Environmental Sciences (RIBES), Radboud University, Nijmegen, Netherlands
| | - Victor G. Levitsky
- Department of Systems Biology, Institute of Cytology and Genetics, Novosibirsk, Russia
- Department of Natural Science, Novosibirsk State University, Novosibirsk, Russia
- *Correspondence: Victor G. Levitsky
| |
Collapse
|
3
|
Ge W, Meier M, Roth C, Söding J. Bayesian Markov models improve the prediction of binding motifs beyond first order. NAR Genom Bioinform 2021; 3:lqab026. [PMID: 33928244 PMCID: PMC8057495 DOI: 10.1093/nargab/lqab026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/11/2021] [Accepted: 03/30/2021] [Indexed: 12/13/2022] Open
Abstract
Transcription factors (TFs) regulate gene expression by binding to specific DNA motifs. Accurate models for predicting binding affinities are crucial for quantitatively understanding of transcriptional regulation. Motifs are commonly described by position weight matrices, which assume that each position contributes independently to the binding energy. Models that can learn dependencies between positions, for instance, induced by DNA structure preferences, have yielded markedly improved predictions for most TFs on in vivo data. However, they are more prone to overfit the data and to learn patterns merely correlated with rather than directly involved in TF binding. We present an improved, faster version of our Bayesian Markov model software, BaMMmotif2. We tested it with state-of-the-art motif discovery tools on a large collection of ChIP-seq and HT-SELEX datasets. BaMMmotif2 models of fifth-order achieved a median false-discovery-rate-averaged recall 13.6% and 12.2% higher than the next best tool on 427 ChIP-seq datasets and 164 HT-SELEX datasets, respectively, while being 8 to 1000 times faster. BaMMmotif2 models showed no signs of overtraining in cross-cell line and cross-platform tests, with similar improvements on the next-best tool. These results demonstrate that dependencies beyond first order clearly improve binding models for most TFs.
Collapse
Affiliation(s)
- Wanwan Ge
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Markus Meier
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Christian Roth
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
4
|
Käppel S, Eggeling R, Rümpler F, Groth M, Melzer R, Theißen G. DNA-binding properties of the MADS-domain transcription factor SEPALLATA3 and mutant variants characterized by SELEX-seq. PLANT MOLECULAR BIOLOGY 2021; 105:543-557. [PMID: 33486697 PMCID: PMC7892521 DOI: 10.1007/s11103-020-01108-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 12/11/2020] [Indexed: 05/13/2023]
Abstract
We studied the DNA-binding profile of the MADS-domain transcription factor SEPALLATA3 and mutant variants by SELEX-seq. DNA-binding characteristics of SEPALLATA3 mutant proteins lead us to propose a novel DNA-binding mode. MIKC-type MADS-domain proteins, which function as essential transcription factors in plant development, bind as dimers to a 10-base-pair AT-rich motif termed CArG-box. However, this consensus motif cannot fully explain how the abundant family members in flowering plants can bind different target genes in specific ways. The aim of this study was to better understand the DNA-binding specificity of MADS-domain transcription factors. Also, we wanted to understand the role of a highly conserved arginine residue for binding specificity of the MADS-domain transcription factor family. Here, we studied the DNA-binding profile of the floral homeotic MADS-domain protein SEPALLATA3 by performing SELEX followed by high-throughput sequencing (SELEX-seq). We found a diverse set of bound sequences and could estimate the in vitro binding affinities of SEPALLATA3 to a huge number of different sequences. We found evidence for the preference of AT-rich motifs as flanking sequences. Whereas different CArG-boxes can act as SEPALLATA3 binding sites, our findings suggest that the preferred flanking motifs are almost always the same and thus mostly independent of the identity of the central CArG-box motif. Analysis of SEPALLATA3 proteins with a single amino acid substitution at position 3 of the DNA-binding MADS-domain further revealed that the conserved arginine residue, which has been shown to be involved in a shape readout mechanism, is especially important for the recognition of nucleotides at positions 3 and 8 of the CArG-box motif. This leads us to propose a novel DNA-binding mode for SEPALLATA3, which is different from that of other MADS-domain proteins known.
Collapse
Affiliation(s)
- Sandra Käppel
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Ralf Eggeling
- Department of Computer Science, University of Helsinki, Pietari Kalmin katu 5, 00014, Helsinki, Finland
- Methods in Medical Informatics, Department of Computer Science, University of Tübingen, Sand 14, 72076, Tübingen, Germany
- Institute for Biomedical Informatics, University of Tübingen, Tübingen, Germany
| | - Florian Rümpler
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Marco Groth
- Leibniz Institute on Aging-Fritz Lipmann Institute (FLI), Core Facility DNA Sequencing, Beutenbergstraße 11, 07745, Jena, Germany
| | - Rainer Melzer
- School of Biology and Environmental Science and Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Günter Theißen
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany.
| |
Collapse
|
5
|
Han YM, Kim MS, Jo J, Shin D, Kwon SH, SEO JB, Kang D, Lee BD, Ryu H, Hwang EM, Kim JM, Patel PD, Lyons DM, Schatzberg AF, Her S. Decoding the temporal nature of brain GR activity in the NFκB signal transition leading to depressive-like behavior. Mol Psychiatry 2021; 26:5087-5096. [PMID: 33483691 PMCID: PMC7821461 DOI: 10.1038/s41380-021-01016-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 11/17/2020] [Accepted: 01/05/2021] [Indexed: 01/30/2023]
Abstract
The fine-tuning of neuroinflammation is crucial for brain homeostasis as well as its immune response. The transcription factor, nuclear factor-κ-B (NFκB) is a key inflammatory player that is antagonized via anti-inflammatory actions exerted by the glucocorticoid receptor (GR). However, technical limitations have restricted our understanding of how GR is involved in the dynamics of NFκB in vivo. In this study, we used an improved lentiviral-based reporter to elucidate the time course of NFκB and GR activities during behavioral changes from sickness to depression induced by a systemic lipopolysaccharide challenge. The trajectory of NFκB activity established a behavioral basis for the NFκB signal transition involved in three phases, sickness-early-phase, normal-middle-phase, and depressive-like-late-phase. The temporal shift in brain GR activity was differentially involved in the transition of NFκB signals during the normal and depressive-like phases. The middle-phase GR effectively inhibited NFκB in a glucocorticoid-dependent manner, but the late-phase GR had no inhibitory action. Furthermore, we revealed the cryptic role of basal GR activity in the early NFκB signal transition, as evidenced by the fact that blocking GR activity with RU486 led to early depressive-like episodes through the emergence of the brain NFκB activity. These results highlight the inhibitory action of GR on NFκB by the basal and activated hypothalamic-pituitary-adrenal (HPA)-axis during body-to-brain inflammatory spread, providing clues about molecular mechanisms underlying systemic inflammation caused by such as COVID-19 infection, leading to depression.
Collapse
Affiliation(s)
- Young-Min Han
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Min Sun Kim
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Juyeong Jo
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Daiha Shin
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Seung-Hae Kwon
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Jong Bok SEO
- grid.410885.00000 0000 9149 5707Seoul Centre, Korea Basic Science Institute, Seoul, South Korea
| | - Dongmin Kang
- grid.255649.90000 0001 2171 7754Department of Life Science, Ewha Womans University, Seoul, South Korea
| | - Byoung Dae Lee
- grid.289247.20000 0001 2171 7818Department of Physiology, School of Medicine, Kyung Hee University, Seoul, South Korea
| | - Hoon Ryu
- grid.35541.360000000121053345Neuroscience Centre, Korea Institute of Science and Technology, Seoul, South Korea
| | - Eun Mi Hwang
- grid.35541.360000000121053345Center for Functional Connectomics, Korea Institute of Science and Technology, Seoul, South Korea
| | - Jae-Min Kim
- grid.14005.300000 0001 0356 9399Department of Psychiatry, Chonnam National University Medical School, Seoul, South Korea
| | - Paresh D. Patel
- grid.412590.b0000 0000 9081 2336Department of Psychiatry, Molecular and Behavioral Neuroscience Institute, University of Michigan Medical Centre, Ann Arbor, MI USA
| | - David M. Lyons
- grid.168010.e0000000419368956Departments of Psychiatry, Stanford University Medical Centre, Stanford, CA USA
| | - Alan F. Schatzberg
- grid.168010.e0000000419368956Departments of Psychiatry, Stanford University Medical Centre, Stanford, CA USA
| | - Song Her
- Seoul Centre, Korea Basic Science Institute, Seoul, South Korea.
| |
Collapse
|
6
|
Toivonen J, Das PK, Taipale J, Ukkonen E. MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs. Bioinformatics 2020; 36:2690-2696. [PMID: 31999322 PMCID: PMC7203737 DOI: 10.1093/bioinformatics/btaa045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 12/23/2019] [Accepted: 01/23/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jarkko Toivonen
- Department of Computer Science, University of Helsinki, Helsinki FI-00014, Finland
| | - Pratyush K Das
- Applied Tumor Genomics, Research Programs Unit, University of Helsinki, Helsinki FI-00014, Finland
| | - Jussi Taipale
- Department of Biochemistry, University of Cambridge, CB2 1GA Cambridge, UK
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, SE 141 83 Stockholm, Sweden
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
- Genome-Scale Biology Program, University of Helsinki, Helsinki FI-00014, Finland
| | - Esko Ukkonen
- Department of Computer Science, University of Helsinki, Helsinki FI-00014, Finland
| |
Collapse
|
7
|
Eggeling R, Grosse I, Koivisto M. Algorithms for learning parsimonious context trees. Mach Learn 2018. [DOI: 10.1007/s10994-018-5770-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|