1
|
Reconstruction and analysis of transcriptome regulatory network of Methanobrevibacter ruminantium M1. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2021.101489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
2
|
Bacterial Transcriptional Regulators: A Road Map for Functional, Structural, and Biophysical Characterization. Int J Mol Sci 2022; 23:ijms23042179. [PMID: 35216300 PMCID: PMC8879271 DOI: 10.3390/ijms23042179] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/11/2022] [Accepted: 02/11/2022] [Indexed: 12/12/2022] Open
Abstract
The different niches through which bacteria move during their life cycle require a fast response to the many environmental queues they encounter. The sensing of these stimuli and their correct response is driven primarily by transcriptional regulators. This kind of protein is involved in sensing a wide array of chemical species, a process that ultimately leads to the regulation of gene transcription. The allosteric-coupling mechanism of sensing and regulation is a central aspect of biological systems and has become an important field of research during the last decades. In this review, we summarize the state-of-the-art techniques applied to unravel these complex mechanisms. We introduce a roadmap that may serve for experimental design, depending on the answers we seek and the initial information we have about the system of study. We also provide information on databases containing available structural information on each family of transcriptional regulators. Finally, we discuss the recent results of research about the allosteric mechanisms of sensing and regulation involving many transcriptional regulators of interest, highlighting multipronged strategies and novel experimental techniques. The aim of the experiments discussed here was to provide a better understanding at a molecular level of how bacteria adapt to the different environmental threats they face.
Collapse
|
3
|
Kılıç S, Sánchez-Osuna M, Collado-Padilla A, Barbé J, Erill I. Flexible comparative genomics of prokaryotic transcriptional regulatory networks. BMC Genomics 2020; 21:466. [PMID: 33327941 PMCID: PMC7739468 DOI: 10.1186/s12864-020-06838-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 06/16/2020] [Indexed: 11/25/2022] Open
Abstract
Background Comparative genomics methods enable the reconstruction of bacterial regulatory networks using available experimental data. In spite of their potential for accelerating research into the composition and evolution of bacterial regulons, few comparative genomics suites have been developed for the automated analysis of these regulatory systems. Available solutions typically rely on precomputed databases for operon and ortholog predictions, limiting the scope of analyses to processed complete genomes, and several key issues such as the transfer of experimental information or the integration of regulatory information in a probabilistic setting remain largely unaddressed. Results Here we introduce CGB, a flexible platform for comparative genomics of prokaryotic regulons. CGB has few external dependencies and enables fully customized analyses of newly available genome data. The platform automates the merging of experimental information and uses a gene-centered, Bayesian framework to generate and integrate easily interpretable results. We demonstrate its flexibility and power by analyzing the evolution of type III secretion system regulation in pathogenic Proteobacteria and by characterizing the SOS regulon of a new bacterial phylum, the Balneolaeota. Conclusions Our results demonstrate the applicability of the CGB pipeline in multiple settings. CGB’s ability to automatically integrate experimental information from multiple sources and use complete and draft genomic data, coupled with its non-reliance on precomputed databases and its easily interpretable display of gene-centered posterior probabilities of regulation provide users with an unprecedented level of flexibility in launching comparative genomics analyses of prokaryotic transcriptional regulatory networks. The analyses of type III secretion and SOS response regulatory networks illustrate instances of convergent and divergent evolution of these regulatory systems, showcasing the power of formal ancestral state reconstruction at inferring the evolutionary history of regulatory networks.
Collapse
Affiliation(s)
- Sefa Kılıç
- University of Maryland Baltimore County, Baltimore, MD, 21250, USA
| | | | | | - Jordi Barbé
- Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
| | - Ivan Erill
- University of Maryland Baltimore County, Baltimore, MD, 21250, USA.
| |
Collapse
|
4
|
Cao H, Ma Q, Chen X, Xu Y. DOOR: a prokaryotic operon database for genome analyses and functional inference. Brief Bioinform 2020; 20:1568-1577. [PMID: 28968679 DOI: 10.1093/bib/bbx088] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 06/13/2017] [Indexed: 11/14/2022] Open
Abstract
The rapid accumulation of fully sequenced prokaryotic genomes provides unprecedented information for biological studies of bacterial and archaeal organisms in a systematic manner. Operons are the basic functional units for conducting such studies. Here, we review an operon database DOOR (the Database of prOkaryotic OpeRons) that we have previously developed and continue to update. Currently, the database contains 6 975 454 computationally predicted operons in 2072 complete genomes. In addition, the database also contains the following information: (i) transcriptional units for 24 genomes derived using publicly available transcriptomic data; (ii) orthologous gene mapping across genomes; (iii) 6408 cis-regulatory motifs for transcriptional factors of some operons for 203 genomes; (iv) 3 456 718 Rho-independent terminators for 2072 genomes; as well as (v) a suite of tools in support of applications of the predicted operons. In this review, we will explain how such data are computationally derived and demonstrate how they can be used to derive a wide range of higher-level information needed for systems biology studies to tackle complex and fundamental biology questions.
Collapse
|
5
|
Xie J, Ma A, Fennell A, Ma Q, Zhao J. It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief Bioinform 2020; 20:1449-1464. [PMID: 29490019 DOI: 10.1093/bib/bby014] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 01/16/2018] [Indexed: 12/12/2022] Open
Abstract
Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.
Collapse
|
6
|
Prathiviraj R, Chellapandi P. Modeling a global regulatory network of Methanothermobacter thermautotrophicus strain ∆H. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s13721-020-0223-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
7
|
Genome-scale exploration of transcriptional regulation in the nisin Z producer Lactococcus lactis subsp. lactis IO-1. Sci Rep 2020; 10:3787. [PMID: 32123183 PMCID: PMC7051946 DOI: 10.1038/s41598-020-59731-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 01/13/2020] [Indexed: 02/06/2023] Open
Abstract
Transcription is of the most crucial steps of gene expression in bacteria, whose regulation guarantees the bacteria's ability to adapt to varying environmental conditions. Discovering the molecular basis and genomic principles of the transcriptional regulation is thus one of the most important tasks in cellular and molecular biology. Here, a comprehensive phylogenetic footprinting framework was implemented to predict maximal regulons of Lactococcus lactis subsp. lactis IO-1, a lactic acid bacterium known for its high potentials in nisin Z production as well as efficient xylose consumption which have made it a promising biotechnological strain. A total set of 321 regulons covering more than 90% of all the bacterium's operons have been elucidated and validated according to available data. Multiple novel biologically-relevant members were introduced amongst which arsC, mtlA and mtl operon for BusR, MtlR and XylR regulons can be named, respectively. Moreover, the effect of riboflavin on nisin biosynthesis was assessed in vitro and a negative correlation was observed. It is believed that understandings from such networks not only can be useful for studying transcriptional regulatory potentials of the target organism but also can be implemented in biotechnology to rationally design favorable production conditions.
Collapse
|
8
|
Ledezma-Tejeida D, Altamirano-Pacheco L, Fajardo V, Collado-Vides J. Limits to a classic paradigm: most transcription factors in E. coli regulate genes involved in multiple biological processes. Nucleic Acids Res 2020; 47:6656-6667. [PMID: 31194874 PMCID: PMC6649764 DOI: 10.1093/nar/gkz525] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 05/29/2019] [Accepted: 06/04/2019] [Indexed: 01/12/2023] Open
Abstract
Transcription factors (TFs) are important drivers of cellular decision-making. When bacteria encounter a change in the environment, TFs alter the expression of a defined set of genes in order to adequately respond. It is commonly assumed that genes regulated by the same TF are involved in the same biological process. Examples of this are methods that rely on coregulation to infer function of not-yet-annotated genes. We have previously shown that only 21% of TFs involved in metabolism regulate functionally homogeneous genes, based on the proximity of the gene products’ catalyzed reactions in the metabolic network. Here, we provide more evidence to support the claim that a 1-TF/1-process relationship is not a general property. We show that the observed functional heterogeneity of regulons is not a result of the quality of the annotation of regulatory interactions, nor the absence of protein–metabolite interactions, and that it is also present when function is defined by Gene Ontology terms. Furthermore, the observed functional heterogeneity is different from the one expected by chance, supporting the notion that it is a biological property. To further explore the relationship between transcriptional regulation and metabolism, we analyzed five other types of regulatory groups and identified complex regulons (i.e. genes regulated by the same combination of TFs) as the most functionally homogeneous, and this is supported by coexpression data. Whether higher levels of related functions exist beyond metabolism and current functional annotations remains an open question.
Collapse
Affiliation(s)
- Daniela Ledezma-Tejeida
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico.,Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Luis Altamirano-Pacheco
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Vicente Fajardo
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico.,Department of Biomedical Engineering, Boston University, Boston, MA, USA
| |
Collapse
|
9
|
2CS-CHX T Operon Signature of Chlorhexidine Tolerance among Enterococcus faecium Isolates. Appl Environ Microbiol 2019; 85:AEM.01589-19. [PMID: 31562170 DOI: 10.1128/aem.01589-19] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 09/20/2019] [Indexed: 11/20/2022] Open
Abstract
Chlorhexidine (CHX) is a broad-spectrum antiseptic widely used in community and clinical contexts for many years that has recently acquired higher relevance in nosocomial infection control worldwide. Despite this, CHX tolerance among Enterococcus faecium bacteria, representing one of the leading agents causing nosocomial infections, has been poorly understood. This study provides new phenotypic and molecular data for better identification of CHX-tolerant E. faecium subpopulations in community and clinical contexts. The chlorhexidine MIC (MICCHX) distribution of 106 E. faecium isolates suggested the occurrence of tolerant subpopulations in diverse sources (human, animal, food, environment) and phylogenomic backgrounds (clades A1/A2/B), with predominance in clade A1. They carried a specific variant of the 2CS-CHXT operon, identified here. It encodes glucose and amino acid-polyamine-organocation family transporters, besides the DNA-binding response regulator ChtR, with a P102H mutation previously described only in CHX-tolerant clade A1 E. faecium, and the ChtS sensor. 2CS-CHXT seems to be associated with three regulons modulating diverse bacterial biological functions. Combined data from normal MIC distribution and 2CS-CHXT operon characterization support a tentative epidemiological cutoff (ECOFF) of 8 mg/liter to CHX, which is useful to detect tolerant E. faecium populations in future surveillance studies. The spread of tolerant E. faecium in diverse epidemiological backgrounds calls for the prudent use of CHX in multiple contexts.IMPORTANCE Chlorhexidine is one of the substances included in the World Health Organization's list of essential medicines, which comprises the safest and most effective medicines needed in global health systems. Although it has been widely applied as a disinfectant and antiseptic in health care (skin, hands, mouthwashes, eye drops) since the 1950s, its use in hospitals to prevent nosocomial infections has increased worldwide in recent years. Here, we provide a comprehensive study on chlorhexidine tolerance among strains of Enterococcus faecium, one of the leading nosocomial agents worldwide, and identify a novel 2CS-CHXT operon as a signature of tolerant strains occurring in diverse phylogenomic groups. Our data allowed for the proposal of a tentative epidemiological cutoff of 8 mg/liter, which is useful to detect tolerant E. faecium populations in surveillance studies in community and clinical contexts. The prediction of 2CS-CHXT regulons will also facilitate the design of future experimental studies to better uncover chlorhexidine tolerance among E. faecium bacteria.
Collapse
|
10
|
Liu B, Han L, Liu X, Wu J, Ma Q. Computational Prediction of Sigma-54 Promoters in Bacterial Genomes by Integrating Motif Finding and Machine Learning Strategies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1211-1218. [PMID: 29993815 DOI: 10.1109/tcbb.2018.2816032] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sigma factor, as a unit of RNA polymerase holoenzyme, is a critical factor in the process of gene transcriptional regulation. It recognizes the specific DNA sites and brings the core enzyme of RNA polymerase to the upstream regions of target genes. Therefore, the prediction of the promoters for a particular sigma factor is essential for interpreting functional genomic data and observation. This paper develops a new method to predict sigma-54 promoters in bacterial genomes. The new method organically integrates motif finding and machine learning strategies to capture the intrinsic features of sigma-54 promoters. The experiments on E. coli benchmark test set show that our method has good capability to distinguish sigma-54 promoters from surrounding or randomly selected DNA sequences. The applications of the other three bacterial genomes indicate the potential robustness and applicative power of our method on a large number of bacterial genomes. The source code of our method can be freely downloaded at https://github.com/maqin2001/PromotePredictor.
Collapse
|
11
|
Chen X, Ma A, McDermaid A, Zhang H, Liu C, Cao H, Ma Q. RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis. Genes (Basel) 2018; 9:genes9060278. [PMID: 29849014 PMCID: PMC6027394 DOI: 10.3390/genes9060278] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 05/19/2018] [Accepted: 05/25/2018] [Indexed: 11/16/2022] Open
Abstract
Regulons, which serve as co-regulated gene groups contributing to the transcriptional regulation of microbial genomes, have the potential to aid in understanding of underlying regulatory mechanisms. In this study, we designed a novel computational pipeline, regulon identification based on comparative genomics and transcriptomics analysis (RECTA), for regulon prediction related to the gene regulatory network under certain conditions. To demonstrate the effectiveness of this tool, we implemented RECTA on Lactococcus lactis MG1363 data to elucidate acid-response regulons. A total of 51 regulons were identified, 14 of which have computational-verified significance. Among these 14 regulons, five of them were computationally predicted to be connected with acid stress response. Validated by literature, 33 genes in Lactococcus lactis MG1363 were found to have orthologous genes which were associated with six regulons. An acid response related regulatory network was constructed, involving two trans-membrane proteins, eight regulons (llrA, llrC, hllA, ccpA, NHP6A, rcfB, regulons #8 and #39), nine functional modules, and 33 genes with orthologous genes known to be associated with acid stress. The predicted response pathways could serve as promising candidates for better acid tolerance engineering in Lactococcus lactis. Our RECTA pipeline provides an effective way to construct a reliable gene regulatory network through regulon elucidation, and has strong application power and can be effectively applied to other bacterial genomes where the elucidation of the transcriptional regulation network is needed.
Collapse
Affiliation(s)
- Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China.
| | - Anjun Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| | - Adam McDermaid
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| | - Hanyuan Zhang
- College of Computer Science and Engineering, University of Nebraska Lincoln, Lincoln, NE 68588, USA.
| | - Chao Liu
- Shandong Provincial Hospital affiliated to Shandong University, Jinan 250021, China.
| | - Huansheng Cao
- Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA.
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD 57006, USA.
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57006, USA.
| |
Collapse
|
12
|
Matern WM, Rifat D, Bader JS, Karakousis PC. Gene Enrichment Analysis Reveals Major Regulators of Mycobacterium tuberculosis Gene Expression in Two Models of Antibiotic Tolerance. Front Microbiol 2018; 9:610. [PMID: 29670589 PMCID: PMC5893760 DOI: 10.3389/fmicb.2018.00610] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 03/15/2018] [Indexed: 01/10/2023] Open
Abstract
The development of antibiotic tolerance is believed to be a major factor in the lengthy duration of current tuberculosis therapies. In the current study, we have modeled antibiotic tolerance in vitro by exposing Mycobacterium tuberculosis to two distinct stress conditions: progressive hypoxia and nutrient starvation [phosphate-buffered saline (PBS)]. We then studied the bacterial transcriptional response using RNA-seq and employed a bioinformatics approach to identify important transcriptional regulators, which was facilitated by a novel Regulon Enrichment Test (RET). A total of 17 transcription factor (TF) regulons were enriched in the hypoxia gene set and 16 regulons were enriched in the nutrient starvation, with 12 regulons enriched in both conditions. Using the same approach to analyze previously published gene expression datasets, we found that three M. tuberculosis regulons (Rv0023, SigH, and Crp) were commonly induced in both stress conditions and were also among the regulons enriched in our data. These regulators are worthy of further study to determine their potential role in the development and maintenance of antibiotic tolerance in M. tuberculosis following stress exposure.
Collapse
Affiliation(s)
- William M Matern
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Department of Biomedical Engineering and High-Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Dalin Rifat
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Joel S Bader
- Department of Biomedical Engineering and High-Throughput Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Petros C Karakousis
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
13
|
Liu B, Yang J, Li Y, McDermaid A, Ma Q. An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data. Brief Bioinform 2017; 19:1069-1081. [DOI: 10.1093/bib/bbx026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Indexed: 01/06/2023] Open
Affiliation(s)
- Bingqiang Liu
- School of Mathematics, Shandong University, Jinan Shandong, P. R. China
| | - Jinyu Yang
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
| | - Yang Li
- School of Mathematics, Shandong University, Jinan Shandong, P. R. China
| | - Adam McDermaid
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
| | - Qin Ma
- Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
| |
Collapse
|
14
|
Liu B, Zhang H, Zhou C, Li G, Fennell A, Wang G, Kang Y, Liu Q, Ma Q. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics 2016; 17:578. [PMID: 27507169 PMCID: PMC4977642 DOI: 10.1186/s12864-016-2982-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 07/29/2016] [Indexed: 11/10/2022] Open
Abstract
Background Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Results Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP3). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP3 consistently outperformed other popular motif finding tools. We have integrated MP3 into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. Conclusion The performance evaluation indicated that MP3 is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2982-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Hanyuan Zhang
- Systems Biology and Biomedical Informatics (SBBI) Laboratory University of Nebraska-Lincoln, Lincoln, NE, 68588-0115, USA
| | - Chuan Zhou
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Anne Fennell
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, 57007, USA.,BioSNTR, Brookings, SD, USA
| | - Guanghui Wang
- School of Mathematics, Shandong University, Jinan, 250100, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences and information, Beijing Institute of Genomics of CAS, Beijing, 100101, People's Republic of China
| | - Qi Liu
- Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, 57007, USA. .,BioSNTR, Brookings, SD, USA.
| |
Collapse
|