1
|
Luo S, Xiao B, Geng J, Hu S. multiMotif: a generalized tool for scanning and visualization of diverse and distant multiple motifs. J Genet Genomics 2024; 51:1342-1345. [PMID: 38992773 DOI: 10.1016/j.jgg.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 06/25/2024] [Accepted: 07/01/2024] [Indexed: 07/13/2024]
Affiliation(s)
- Sainan Luo
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China
| | - Binghan Xiao
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100101, China
| | - Jianing Geng
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China.
| | - Songnian Hu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
2
|
Djordjevic M, Djordjevic M, Zdobnov E. Scoring Targets of Transcription in Bacteria Rather than Focusing on Individual Binding Sites. Front Microbiol 2017; 8:2314. [PMID: 29213263 PMCID: PMC5702782 DOI: 10.3389/fmicb.2017.02314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 11/09/2017] [Indexed: 11/13/2022] Open
Abstract
Reliable identification of targets of bacterial regulators is necessary to understand bacterial gene expression regulation. These targets are commonly predicted by searching for high-scoring binding sites in the upstream genomic regions, which typically leads to a large number of false positives. In contrast to the common approach, here we propose a novel concept, where overrepresentation of the scoring distribution that corresponds to the entire searched region is assessed, as opposed to predicting individual binding sites. We explore two implementations of this concept, based on Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests, which both provide straightforward P-value estimates for predicted targets. This approach is implemented for pleiotropic bacterial regulators, including σ70 (bacterial housekeeping σ factor) target predictions, which is a classical bioinformatics problem characterized by low specificity. We show that KS based approach is both faster and more accurate, departing from the current paradigm of AD being slower, but more accurate. Moreover, KS approach leads to a significant increase in the search accuracy compared to the standard approach, while at the same time straightforwardly assigning well established P-values to each potential target. Consequently, the new KS based method proposed here, which assigns P-values to fixed length upstream regions, provides a fast and accurate approach for predicting bacterial transcription targets.
Collapse
Affiliation(s)
- Marko Djordjevic
- Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | | | - Evgeny Zdobnov
- Swiss Institute of Bioinformatics and Department of Genetic Medicine and Development, University of Geneva, Geneva, Switzerland
| |
Collapse
|
3
|
Acevedo-Luna N, Mariño-Ramírez L, Halbert A, Hansen U, Landsman D, Spouge JL. Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 2016; 17:479. [PMID: 27871221 PMCID: PMC5117513 DOI: 10.1186/s12859-016-1354-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/11/2016] [Indexed: 11/24/2022] Open
Abstract
Background Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Results Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 43*42/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Conclusions Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1354-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Natalia Acevedo-Luna
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Leonardo Mariño-Ramírez
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Armand Halbert
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ulla Hansen
- Department of Biology, Boston University, 5 Cummington Mall, Boston, MA, 02215, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - John L Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
4
|
Jayaram N, Usvyat D, R Martin AC. Evaluating tools for transcription factor binding site prediction. BMC Bioinformatics 2016; 17:547. [PMID: 27806697 PMCID: PMC6889335 DOI: 10.1186/s12859-016-1298-9] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 10/20/2016] [Indexed: 12/21/2022] Open
Abstract
Background Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. Results We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best-performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes — those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. Conclusions Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1298-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Narayan Jayaram
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Daniel Usvyat
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
5
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
6
|
Schulze S, Henkel SG, Driesch D, Guthke R, Linde J. Computational prediction of molecular pathogen-host interactions based on dual transcriptome data. Front Microbiol 2015; 6:65. [PMID: 25705211 PMCID: PMC4319478 DOI: 10.3389/fmicb.2015.00065] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/19/2015] [Indexed: 11/13/2022] Open
Abstract
Inference of inter-species gene regulatory networks based on gene expression data is an important computational method to predict pathogen-host interactions (PHIs). Both the experimental setup and the nature of PHIs exhibit certain characteristics. First, besides an environmental change, the battle between pathogen and host leads to a constantly changing environment and thus complex gene expression patterns. Second, there might be a delay until one of the organisms reacts. Third, toward later time points only one organism may survive leading to missing gene expression data of the other organism. Here, we account for PHI characteristics by extending NetGenerator, a network inference tool that predicts gene regulatory networks from gene expression time series data. We tested multiple modeling scenarios regarding the stimuli functions of the interaction network based on a benchmark example. We show that modeling perturbation of a PHI network by multiple stimuli better represents the underlying biological phenomena. Furthermore, we utilized the benchmark example to test the influence of missing data points on the inference performance. Our results suggest that PHI network inference with missing data is possible, but we recommend to provide complete time series data. Finally, we extended the NetGenerator tool to incorporate gene- and time point specific variances, because complex PHIs may lead to high variance in expression data. Sample variances are directly considered in the objective function of NetGenerator and indirectly by testing the robustness of interactions based on variance dependent disturbance of gene expression values. We evaluated the method of variance incorporation on dual RNA sequencing (RNA-Seq) data of Mus musculus dendritic cells incubated with Candida albicans and proofed our method by predicting previously verified PHIs as robust interactions.
Collapse
Affiliation(s)
- Sylvie Schulze
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| | | | | | - Reinhard Guthke
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| | - Jörg Linde
- Department of Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute Jena, Germany
| |
Collapse
|
7
|
Wartenberg A, Linde J, Martin R, Schreiner M, Horn F, Jacobsen ID, Jenull S, Wolf T, Kuchler K, Guthke R, Kurzai O, Forche A, d'Enfert C, Brunke S, Hube B. Microevolution of Candida albicans in macrophages restores filamentation in a nonfilamentous mutant. PLoS Genet 2014; 10:e1004824. [PMID: 25474009 PMCID: PMC4256171 DOI: 10.1371/journal.pgen.1004824] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 10/15/2014] [Indexed: 11/30/2022] Open
Abstract
Following antifungal treatment, Candida albicans, and other human pathogenic fungi can undergo microevolution, which leads to the emergence of drug resistance. However, the capacity for microevolutionary adaptation of fungi goes beyond the development of resistance against antifungals. Here we used an experimental microevolution approach to show that one of the central pathogenicity mechanisms of C. albicans, the yeast-to-hyphae transition, can be subject to experimental evolution. The C. albicans cph1Δ/efg1Δ mutant is nonfilamentous, as central signaling pathways linking environmental cues to hyphal formation are disrupted. We subjected this mutant to constant selection pressure in the hostile environment of the macrophage phagosome. In a comparatively short time-frame, the mutant evolved the ability to escape macrophages by filamentation. In addition, the evolved mutant exhibited hyper-virulence in a murine infection model and an altered cell wall composition compared to the cph1Δ/efg1Δ strain. Moreover, the transcriptional regulation of hyphae-associated, and other pathogenicity-related genes became re-responsive to environmental cues in the evolved strain. We went on to identify the causative missense mutation via whole genome- and transcriptome-sequencing: a single nucleotide exchange took place within SSN3 that encodes a component of the Cdk8 module of the Mediator complex, which links transcription factors with the general transcription machinery. This mutation was responsible for the reconnection of the hyphal growth program with environmental signals in the evolved strain and was sufficient to bypass Efg1/Cph1-dependent filamentation. These data demonstrate that even central transcriptional networks can be remodeled very quickly under appropriate selection pressure. Pathogenic microbes often evolve complex traits to adapt to their respective hosts, and this evolution is ongoing: for example, microorganisms are developing resistance to antimicrobial compounds in the clinical setting. The ability of the common human pathogenic fungus, Candida albicans, to switch from yeast to hyphal (filamentous) growth is considered a central virulence attribute. For example, hyphal formation allows C. albicans to escape from macrophages following phagocytosis. A well-investigated signaling network integrates different environmental cues to induce and maintain hyphal growth. In fact, deletion of two central transcription factors in this network results in a mutant that is both nonfilamentous and avirulent. We used experimental evolution to study the adaptation capability of this mutant by continuous co-incubation within macrophages. We found that this selection regime led to a relatively rapid re-connection of signaling between environmental cues and the hyphal growth program. Indeed, the evolved mutant regained the ability to filament and its virulence in vivo. This bypass of central transcription factors was based on a single nucleotide exchange in a gene encoding a component of the general transcription regulation machinery. Our results show that even a complex regulatory network, such as the transcriptional network which governs hyphal growth, can be remodeled via microevolution.
Collapse
Affiliation(s)
- Anja Wartenberg
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Jörg Linde
- Research Group Systems Biology & Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Ronny Martin
- Septomics Research Center, Friedrich Schiller University and Leibniz Institute for Natural Product Research and Infection Biology –Hans Knoell Institute, Jena, Germany
| | - Maria Schreiner
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Fabian Horn
- Research Group Systems Biology & Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Ilse D. Jacobsen
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
- Research Group Microbial Immunology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Sabrina Jenull
- Medical University Vienna, Max F. Perutz Laboratories, Department of Medical Biochemistry, Vienna, Austria
| | - Thomas Wolf
- Research Group Systems Biology & Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Karl Kuchler
- Medical University Vienna, Max F. Perutz Laboratories, Department of Medical Biochemistry, Vienna, Austria
| | - Reinhard Guthke
- Research Group Systems Biology & Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
| | - Oliver Kurzai
- Septomics Research Center, Friedrich Schiller University and Leibniz Institute for Natural Product Research and Infection Biology –Hans Knoell Institute, Jena, Germany
| | - Anja Forche
- Department of Biology, Bowdoin College, Brunswick, Maine, United States of America
| | - Christophe d'Enfert
- Institut Pasteur, Unité Biologie et Pathogénicité Fongiques, Département Génomes et Génétique, Paris, France
- INRA, USC2019, Paris, France
| | - Sascha Brunke
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
- Integrated Research and Treatment Center, Sepsis und Sepsisfolgen, Center for Sepsis Control and Care (CSCC), Universitätsklinikum Jena, Germany
| | - Bernhard Hube
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knoell Institute Jena (HKI), Jena, Germany
- Integrated Research and Treatment Center, Sepsis und Sepsisfolgen, Center for Sepsis Control and Care (CSCC), Universitätsklinikum Jena, Germany
- Friedrich Schiller University, Jena, Germany
- * E-mail:
| |
Collapse
|
8
|
Vinga S. Information theory applications for biological sequence analysis. Brief Bioinform 2014; 15:376-89. [PMID: 24058049 PMCID: PMC7109941 DOI: 10.1093/bib/bbt068] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 08/17/2013] [Indexed: 01/13/2023] Open
Abstract
Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Collapse
Affiliation(s)
- Susana Vinga
- IDMEC, Instituto Superior Técnico - Universidade de Lisboa (IST-UL), Av. Rovisco Pais, 1049-001 Lisboa, Portugal. Tel.: +351-218419504; Fax: +351-218498097;
| |
Collapse
|
9
|
Horn F, Rittweger M, Taubert J, Lysenko A, Rawlings C, Guthke R. Interactive exploration of integrated biological datasets using context-sensitive workflows. Front Genet 2014; 5:21. [PMID: 24600467 PMCID: PMC3929842 DOI: 10.3389/fgene.2014.00021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/21/2014] [Indexed: 11/16/2022] Open
Abstract
Network inference utilizes experimental high-throughput data for the reconstruction of molecular interaction networks where new relationships between the network entities can be predicted. Despite the increasing amount of experimental data, the parameters of each modeling technique cannot be optimized based on the experimental data alone, but needs to be qualitatively assessed if the components of the resulting network describe the experimental setting. Candidate list prioritization and validation builds upon data integration and data visualization. The application of tools supporting this procedure is limited to the exploration of smaller information networks because the display and interpretation of large amounts of information is challenging regarding the computational effort and the users' experience. The Ondex software framework was extended with customizable context-sensitive menus which allow additional integration and data analysis options for a selected set of candidates during interactive data exploration. We provide new functionalities for on-the-fly data integration using InterProScan, PubMed Central literature search, and sequence-based homology search. We applied the Ondex system to the integration of publicly available data for Aspergillus nidulans and analyzed transcriptome data. We demonstrate the advantages of our approach by proposing new hypotheses for the functional annotation of specific genes of differentially expressed fungal gene clusters. Our extension of the Ondex framework makes it possible to overcome the separation between data integration and interactive analysis. More specifically, computationally demanding calculations can be performed on selected sub-networks without losing any information from the whole network. Furthermore, our extensions allow for direct access to online biological databases which helps to keep the integrated information up-to-date.
Collapse
Affiliation(s)
- Fabian Horn
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| | - Martin Rittweger
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| | - Jan Taubert
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Artem Lysenko
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Christopher Rawlings
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Reinhard Guthke
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| |
Collapse
|
10
|
LASAGNA-Search: an integrated web tool for transcription factor binding site search and visualization. Biotechniques 2013; 54:141-53. [PMID: 23599922 DOI: 10.2144/000113999] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The release of ChIP-seq data from the ENCyclopedia Of DNA Elements (ENCODE) and Model Organism ENCyclopedia Of DNA Elements (modENCODE) projects has significantly increased the amount of transcription factor (TF) binding affinity information available to researchers. However, scientists still routinely use TF binding site (TFBS) search tools to scan unannotated sequences for TFBSs, particularly when searching for lesser-known TFs or TFs in organisms for which ChIP-seq data are unavailable. The sequence analysis often involves multiple steps such as TF model collection, promoter sequence retrieval, and visualization; thus, several different tools are required. We have developed a novel integrated web tool named LASAGNA-Search that allows users to perform TFBS searches without leaving the web site. LASAGNA-Search uses the LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association) algorithm for TFBS alignment. Important features of LASAGNA-Search include (i) acceptance of unaligned variable-length TFBSs, (ii) a collection of 1726 TF models, (iii) automatic promoter sequence retrieval, (iv) visualization in the UCSC Genome Browser, and (v) gene regulatory network inference and visualization based on binding specificities. LASAGNA-Search is freely available at http://biogrid.engr.uconn.edu/lasagna_search/.
Collapse
|
11
|
Lee C, Huang CH. LASAGNA: a novel algorithm for transcription factor binding site alignment. BMC Bioinformatics 2013; 14:108. [PMID: 23522376 PMCID: PMC3747862 DOI: 10.1186/1471-2105-14-108] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 03/08/2013] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Scientists routinely scan DNA sequences for transcription factor (TF) binding sites (TFBSs). Most of the available tools rely on position-specific scoring matrices (PSSMs) constructed from aligned binding sites. Because of the resolutions of assays used to obtain TFBSs, databases such as TRANSFAC, ORegAnno and PAZAR store unaligned variable-length DNA segments containing binding sites of a TF. These DNA segments need to be aligned to build a PSSM. While the TRANSFAC database provides scoring matrices for TFs, nearly 78% of the TFs in the public release do not have matrices available. As work on TFBS alignment algorithms has been limited, it is highly desirable to have an alignment algorithm tailored to TFBSs. RESULTS We designed a novel algorithm named LASAGNA, which is aware of the lengths of input TFBSs and utilizes position dependence. Results on 189 TFs of 5 species in the TRANSFAC database showed that our method significantly outperformed ClustalW2 and MEME. We further compared a PSSM method dependent on LASAGNA to an alignment-free TFBS search method. Results on 89 TFs whose binding sites can be located in genomes showed that our method is significantly more precise at fixed recall rates. Finally, we described LASAGNA-ChIP, a more sophisticated version for ChIP (Chromatin immunoprecipitation) experiments. Under the one-per-sequence model, it showed comparable performance with MEME in discovering motifs in ChIP-seq peak sequences. CONCLUSIONS We conclude that the LASAGNA algorithm is simple and effective in aligning variable-length binding sites. It has been integrated into a user-friendly webtool for TFBS search and visualization called LASAGNA-Search. The tool currently stores precomputed PSSM models for 189 TFs and 133 TFs built from TFBSs in the TRANSFAC Public database (release 7.0) and the ORegAnno database (08Nov10 dump), respectively. The webtool is available at http://biogrid.engr.uconn.edu/lasagna_search/.
Collapse
Affiliation(s)
- Chih Lee
- Department of Computer Science and Engineering, University of Connecticut,
Fairfield Road, Storrs, CT 06269, USA
| | - Chun-Hsi Huang
- Department of Computer Science and Engineering, University of Connecticut,
Fairfield Road, Storrs, CT 06269, USA
| |
Collapse
|
12
|
Quader S, Huang CH. Effect of positional dependence and alignment strategy on modeling transcription factor binding sites. BMC Res Notes 2012; 5:340. [PMID: 22748199 PMCID: PMC3465234 DOI: 10.1186/1756-0500-5-340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Accepted: 06/07/2012] [Indexed: 11/29/2022] Open
Abstract
Background Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites. Methods We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test. Results We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length. Conclusions These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation.
Collapse
Affiliation(s)
- Saad Quader
- Department of Computer Science & Engineering, University of Connecticut, Storrs, 06269-2155, USA
| | | |
Collapse
|
13
|
Horn F, Heinekamp T, Kniemeyer O, Pollmächer J, Valiante V, Brakhage AA. Systems biology of fungal infection. Front Microbiol 2012; 3:108. [PMID: 22485108 PMCID: PMC3317178 DOI: 10.3389/fmicb.2012.00108] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 03/05/2012] [Indexed: 12/26/2022] Open
Abstract
Elucidation of pathogenicity mechanisms of the most important human-pathogenic fungi, Aspergillus fumigatus and Candida albicans, has gained great interest in the light of the steadily increasing number of cases of invasive fungal infections. A key feature of these infections is the interaction of the different fungal morphotypes with epithelial and immune effector cells in the human host. Because of the high level of complexity, it is necessary to describe and understand invasive fungal infection by taking a systems biological approach, i.e., by a comprehensive quantitative analysis of the non-linear and selective interactions of a large number of functionally diverse, and frequently multifunctional, sets of elements, e.g., genes, proteins, metabolites, which produce coherent and emergent behaviors in time and space. The recent advances in systems biology will now make it possible to uncover the structure and dynamics of molecular and cellular cause-effect relationships within these pathogenic interactions. We review current efforts to integrate omics and image-based data of host-pathogen interactions into network and spatio-temporal models. The modeling will help to elucidate pathogenicity mechanisms and to identify diagnostic biomarkers and potential drug targets for therapy and could thus pave the way for novel intervention strategies based on novel antifungal drugs and cell therapy.
Collapse
Affiliation(s)
- Fabian Horn
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
| | - Thorsten Heinekamp
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
| | - Olaf Kniemeyer
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
| | - Johannes Pollmächer
- Applied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
| | - Vito Valiante
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
| | - Axel A. Brakhage
- Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology – Hans Knöll InstituteJena, Germany
- Department of Microbiology and Molecular Biology, Institute of Microbiology, Friedrich Schiller UniversityJena, Germany
| |
Collapse
|
14
|
Linde J, Hortschansky P, Fazius E, Brakhage AA, Guthke R, Haas H. Regulatory interactions for iron homeostasis in Aspergillus fumigatus inferred by a Systems Biology approach. BMC SYSTEMS BIOLOGY 2012; 6:6. [PMID: 22260221 PMCID: PMC3305660 DOI: 10.1186/1752-0509-6-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 01/19/2012] [Indexed: 01/01/2023]
Abstract
BACKGROUND In System Biology, iterations of wet-lab experiments followed by modelling approaches and model-inspired experiments describe a cyclic workflow. This approach is especially useful for the inference of gene regulatory networks based on high-throughput gene expression data. Experiments can verify or falsify the predicted interactions allowing further refinement of the network model. Aspergillus fumigatus is a major human fungal pathogen. One important virulence trait is its ability to gain sufficient amounts of iron during infection process. Even though some regulatory interactions are known, we are still far from a complete understanding of the way iron homeostasis is regulated. RESULTS In this study, we make use of a reverse engineering strategy to infer a regulatory network controlling iron homeostasis in A. fumigatus. The inference approach utilizes the temporal change in expression data after a change from iron depleted to iron replete conditions. The modelling strategy is based on a set of linear differential equations and offers the possibility to integrate known regulatory interactions as prior knowledge. Moreover, it makes use of important selection criteria, such as sparseness and robustness. By compiling a list of known regulatory interactions for iron homeostasis in A. fumigatus and softly integrating them during network inference, we are able to predict new interactions between transcription factors and target genes. The proposed activation of the gene expression of hapX by the transcriptional regulator SrbA constitutes a so far unknown way of regulating iron homeostasis based on the amount of metabolically available iron. This interaction has been verified by Northern blots in a recent experimental study. In order to improve the reliability of the predicted network, the results of this experimental study have been added to the set of prior knowledge. The final network includes three SrbA target genes. Based on motif searching within the regulatory regions of these genes, we identify potential DNA-binding sites for SrbA. Our wet-lab experiments demonstrate high-affinity binding capacity of SrbA to the promoters of hapX, hemA and srbA. CONCLUSIONS This study presents an application of the typical Systems Biology circle and is based on cooperation between wet-lab experimentalists and in silico modellers. The results underline that using prior knowledge during network inference helps to predict biologically important interactions. Together with the experimental results, we indicate a novel iron homeostasis regulating system sensing the amount of metabolically available iron and identify the binding site of iron-related SrbA target genes. It will be of high interest to study whether these regulatory interactions are also important for close relatives of A. fumigatus and other pathogenic fungi, such as Candida albicans.
Collapse
Affiliation(s)
- Jörg Linde
- Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute, Jena, Germany.
| | | | | | | | | | | |
Collapse
|