1
|
Waldburger L, Thompson MG, Weisberg AJ, Lee N, Chang JH, Keasling JD, Shih PM. Transcriptome architecture of the three main lineages of agrobacteria. mSystems 2023; 8:e0033323. [PMID: 37477440 PMCID: PMC10469942 DOI: 10.1128/msystems.00333-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 06/15/2023] [Indexed: 07/22/2023] Open
Abstract
Agrobacteria are a diverse, polyphyletic group of prokaryotes with multipartite genomes capable of transferring DNA into the genomes of host plants, making them an essential tool in plant biotechnology. Despite their utility in plant transformation, genome-wide transcriptional regulation is not well understood across the three main lineages of agrobacteria. Transcription start sites (TSSs) are a necessary component of gene expression and regulation. In this study, we used differential RNA-seq and a TSS identification algorithm optimized on manually annotated TSS, then validated with existing TSS to identify thousands of TSS with nucleotide resolution for representatives of each lineage. We extend upon the 356 TSSs previously reported in Agrobacterium fabrum C58 by identifying 1,916 TSSs. In addition, we completed genomes and phenotyping of Rhizobium rhizogenes C16/80 and Allorhizobium vitis T60/94, identifying 2,650 and 2,432 TSSs, respectively. Parameter optimization was crucial for an accurate, high-resolution view of genome and transcriptional dynamics, highlighting the importance of algorithm optimization in genome-wide TSS identification and genomics at large. The optimized algorithm reduced the number of TSSs identified internal and antisense to the coding sequence on average by 90.5% and 91.9%, respectively. Comparison of TSS conservation between orthologs of the three lineages revealed differences in cell cycle regulation of ctrA as well as divergence of transcriptional regulation of chemotaxis-related genes when grown in conditions that simulate the plant environment. These results provide a framework to elucidate the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. IMPORTANCE Transcription start sites (TSSs) are fundamental for understanding gene expression and regulation. Agrobacteria, a group of prokaryotes with the ability to transfer DNA into the genomes of host plants, are widely used in plant biotechnology. However, the genome-wide transcriptional regulation of agrobacteria is not well understood, especially in less-studied lineages. Differential RNA-seq and an optimized algorithm enabled identification of thousands of TSSs with nucleotide resolution for representatives of each lineage. The results of this study provide a framework for elucidating the mechanistic basis and evolution of pathology across the three main lineages of agrobacteria. The optimized algorithm also highlights the importance of parameter optimization in genome-wide TSS identification and genomics at large.
Collapse
Affiliation(s)
- Lucas Waldburger
- Department of Bioengineering, University of California, Berkeley, California, USA
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Mitchell G. Thompson
- Joint BioEnergy Institute, Emeryville, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Alexandra J. Weisberg
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Namil Lee
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
| | - Jeff H. Chang
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Jay D. Keasling
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, USA
- Institute for Quantitative Biosciences, University of California, Berkeley, California, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
- Center for Synthetic Biochemistry, Institute for Synthetic Biology, Shenzhen Institutes for Advanced Technologies, Shenzhen, China
| | - Patrick M. Shih
- Joint BioEnergy Institute, Emeryville, California, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California, USA
| |
Collapse
|
2
|
Barbero-Aparicio JA, Olivares-Gil A, Díez-Pastor JF, García-Osorio C. Deep learning and support vector machines for transcription start site identification. PeerJ Comput Sci 2023; 9:e1340. [PMID: 37346545 PMCID: PMC10280436 DOI: 10.7717/peerj-cs.1340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 06/23/2023]
Abstract
Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.
Collapse
Affiliation(s)
| | - Alicia Olivares-Gil
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - José F. Díez-Pastor
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - César García-Osorio
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| |
Collapse
|
3
|
Barbero-Aparicio JA, Cuesta-Lopez S, García-Osorio CI, Pérez-Rodríguez J, García-Pedrajas N. Nonlinear physics opens a new paradigm for accurate transcription start site prediction. BMC Bioinformatics 2022; 23:565. [PMID: 36585618 PMCID: PMC9801560 DOI: 10.1186/s12859-022-05129-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open
Abstract
There is evidence that DNA breathing (spontaneous opening of the DNA strands) plays a relevant role in the interactions of DNA with other molecules, and in particular in the transcription process. Therefore, having physical models that can predict these openings is of interest. However, this source of information has not been used before either in transcription start sites (TSSs) or promoter prediction. In this article, one such model is used as an additional information source that, when used by a machine learning (ML) model, improves the results of current methods for the prediction of TSSs. In addition, we provide evidence on the validity of the physical model, as it is able by itself to predict TSSs with high accuracy. This opens an exciting avenue of research at the intersection of statistical mechanics and ML, where ML models in bioinformatics can be improved using physical models of DNA as feature extractors.
Collapse
Affiliation(s)
- José Antonio Barbero-Aparicio
- grid.23520.360000 0000 8569 1592Departamento de Informática, Universidad de Burgos, Avda. de Cantabria s/n, 09006 Burgos, Spain
| | - Santiago Cuesta-Lopez
- grid.23520.360000 0000 8569 1592Universidad de Burgos, Hospital del Rey, s/n, 09001 Burgos, Spain ,ICAMCyL Foundation, Internacional Center for Advanced Materials and Raw Materials of Castilla y León, León Technology Park, main building, first floor, offices 106-108, C/Julia Morros s/n, Armunia, 24009 León, Spain
| | - César Ignacio García-Osorio
- grid.23520.360000 0000 8569 1592Departamento de Informática, Universidad de Burgos, Avda. de Cantabria s/n, 09006 Burgos, Spain
| | - Javier Pérez-Rodríguez
- grid.449008.10000 0004 1795 4150Departamento de Métodos Cuantitativos, Universidad de Loyola Andalucía, Escritor Castilla Aguayo, 4, 14004 Córdoba, Spain
| | - Nicolás García-Pedrajas
- grid.411901.c0000 0001 2183 9102Department of Computing and Numerical Analysis, University of Córdoba, Edificio Albert Einstein, Campus de Rabanales, 14071 Córdoba, Spain
| |
Collapse
|
4
|
Forquet R, Jiang X, Nasser W, Hommais F, Reverchon S, Meyer S. Mapping the Complex Transcriptional Landscape of the Phytopathogenic Bacterium Dickeya dadantii. mBio 2022; 13:e0052422. [PMID: 35491820 PMCID: PMC9239193 DOI: 10.1128/mbio.00524-22] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/07/2022] [Indexed: 11/21/2022] Open
Abstract
Dickeya dadantii is a phytopathogenic bacterium that causes soft rot in a wide range of plant hosts worldwide and a model organism for studying virulence gene regulation. The present study provides a comprehensive and annotated transcriptomic map of D. dadantii obtained by a computational method combining five independent transcriptomic data sets: (i) paired-end RNA sequencing (RNA-seq) data for a precise reconstruction of the RNA landscape; (ii) DNA microarray data providing transcriptional responses to a broad variety of environmental conditions; (iii) long-read Nanopore native RNA-seq data for isoform-level transcriptome validation and determination of transcription termination sites; (iv) differential RNA sequencing (dRNA-seq) data for the precise mapping of transcription start sites; (v) in planta DNA microarray data for a comparison of gene expression profiles between in vitro experiments and the early stages of plant infection. Our results show that transcription units sometimes coincide with predicted operons but are generally longer, most of them comprising internal promoters and terminators that generate alternative transcripts of variable gene composition. We characterize the occurrence of transcriptional read-through at terminators, which might play a basal regulation role and explain the extent of transcription beyond the scale of operons. We finally highlight the presence of noncontiguous operons and excludons in the D. dadantii genome, novel genomic arrangements that might contribute to the basal coordination of transcription. The highlighted transcriptional organization may allow D. dadantii to finely adjust its gene expression program for a rapid adaptation to fast-changing environments. IMPORTANCE This is the first transcriptomic map of a Dickeya species. It may therefore significantly contribute to further progress in the field of phytopathogenicity. It is also one of the first reported applications of long-read Nanopore native RNA-seq in prokaryotes. Our findings yield insights into basal rules of coordination of transcription that might be valid for other bacteria and may raise interest in the field of microbiology in general. In particular, we demonstrate that gene expression is coordinated at the scale of transcription units rather than operons, which are larger functional genomic units capable of generating transcripts with variable gene composition for a fine-tuning of gene expression in response to environmental changes. In line with recent studies, our findings indicate that the canonical operon model is insufficient to explain the complexity of bacterial transcriptomes.
Collapse
Affiliation(s)
- Raphaël Forquet
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| | - Xuejiao Jiang
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| | - William Nasser
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| | - Florence Hommais
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| | - Sylvie Reverchon
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| | - Sam Meyer
- Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, CNRS UMR5240, Laboratoire de Microbiologie, Adaptation, Pathogénie, Villeurbanne, France
| |
Collapse
|
5
|
Hamde F, Dinka H, Naimuddin M. In silico analysis of promoter regions to identify regulatory elements in TetR family transcriptional regulatory genes of Mycobacterium colombiense CECT 3035. J Genet Eng Biotechnol 2022; 20:53. [PMID: 35357597 PMCID: PMC8971250 DOI: 10.1186/s43141-022-00331-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/09/2022] [Indexed: 12/18/2022]
Abstract
Background Mycobacterium colombiense is an acid-fast, non-motile, rod-shaped mycobacterium confirmed to cause respiratory disease and disseminated infection in immune-compromised patients, and lymphadenopathy in immune-competent children. It has virulence mechanisms that allow them to adapt, survive, replicate, and produce diseases in the host. To tackle the diseases caused by M. colombiense, understanding of the regulation mechanisms of its genes is important. This paper, therefore, analyzes transcription start sites, promoter regions, motifs, transcription factors, and CpG islands in TetR family transcriptional regulatory (TFTR) genes of M. colombiense CECT 3035 using neural network promoter prediction, MEME, TOMTOM algorithms, and evolutionary analysis with the help of MEGA-X. Results The analysis of 22 protein coding TFTR genes of M. colombiense CECT 3035 showed that 86.36% and 13.64% of the gene sequences had one and two TSSs, respectively. Using MEME, we identified five motifs (MTF1, MTF2, MTF3, MTF4, and MTF5) and MTF1 was revealed as the common promoter motif for 100% TFTR genes of M. colombiense CECT 3035 which may serve as binding site for transcription factors that shared a minimum homology of 95.45%. MTF1 was compared to the registered prokaryotic motifs and found to match with 15 of them. MTF1 serves as the binding site mainly for AraC, LexA, and Bacterial histone-like protein families. Other protein families such as MATP, RR, σ-70 factor, TetR, LytTR, LuxR, and NAP also appear to be the binding candidates for MTF1. These families are known to have functions in virulence mechanisms, metabolism, quorum sensing, cell division, and antibiotic resistance. Furthermore, it was found that TFTR genes of M. colombiense CECT 3035 have many CpG islands with several fragments in their CpG islands. Molecular evolutionary genetic analysis showed close relationship among the genes. Conclusion We believe these findings will provide a better understanding of the regulation of TFTR genes in M. colombiense CECT 3035 involved in vital processes such as cell division, pathogenesis, and drug resistance and are likely to provide insights for drug development important to tackle the diseases caused by this mycobacterium. We believe this is the first report of in silico analyses of the transcriptional regulation of M. colombiense TFTR genes.
Collapse
Affiliation(s)
- Feyissa Hamde
- Department of Applied Biology, School of Applied Natural Science, Adama Science and Technology University, P.O. Box 1888, Adama, Ethiopia.
| | - Hunduma Dinka
- Department of Applied Biology, School of Applied Natural Science, Adama Science and Technology University, P.O. Box 1888, Adama, Ethiopia
| | - Mohammed Naimuddin
- Department of Applied Biology, School of Applied Natural Science, Adama Science and Technology University, P.O. Box 1888, Adama, Ethiopia.
| |
Collapse
|
6
|
Webb IUC, Xu J, Sánchez-Cañizares C, Karunakaran R, Ramachandran VK, Rutten PJ, East AK, Huang WE, Watmough NJ, Poole PS. Regulation and Characterization of Mutants of fixABCX in Rhizobium leguminosarum. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2021; 34:1167-1180. [PMID: 34110256 DOI: 10.1094/mpmi-02-21-0037-r] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Symbiosis between Rhizobium leguminosarum and Pisum sativum requires tight control of redox balance in order to maintain respiration under the microaerobic conditions required for nitrogenase while still producing the eight electrons and sixteen molecules of ATP needed for nitrogen fixation. FixABCX, a cluster of electron transfer flavoproteins essential for nitrogen fixation, is encoded on the Sym plasmid (pRL10), immediately upstream of nifA, which encodes the general transcriptional regulator of nitrogen fixation. There is a symbiotically regulated NifA-dependent promoter upstream of fixA (PnifA1), as well as an additional basal constitutive promoter driving background expression of nifA (PnifA2). These were confirmed by 5'-end mapping of transcription start sites using differential RNA-seq. Complementation of polar fixAB and fixX mutants (Fix- strains) confirmed expression of nifA from PnifA1 in symbiosis. Electron microscopy combined with single-cell Raman microspectroscopy characterization of fixAB mutants revealed previously unknown heterogeneity in bacteroid morphology within a single nodule. Two morphotypes of mutant fixAB bacteroids were observed. One was larger than wild-type bacteroids and contained high levels of polyhydroxy-3-butyrate, a complex energy/reductant storage product. A second bacteroid phenotype was morphologically and compositionally different and resembled wild-type infection thread cells. From these two characteristic fixAB mutant bacteroid morphotypes, inferences can be drawn on the metabolism of wild-type nitrogen-fixing bacteroids.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.
Collapse
Affiliation(s)
- Isabel U C Webb
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, U.K
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, U.K
| | - Jiabao Xu
- Department of Engineering, University of Oxford, Parks Road, Oxford OX1 3PJ, U.K
| | | | - Ramakrishnan Karunakaran
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, U.K
| | - Vinoy K Ramachandran
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, U.K
| | - Paul J Rutten
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, U.K
| | - Alison K East
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, U.K
| | - Wei E Huang
- Department of Engineering, University of Oxford, Parks Road, Oxford OX1 3PJ, U.K
| | - Nicholas J Watmough
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk NR4 7TJ, U.K
| | - Philip S Poole
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, U.K
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, U.K
| |
Collapse
|
7
|
Cervantes-Rivera R, Puhar A. Whole-genome Identification of Transcriptional Start Sites by Differential RNA-seq in Bacteria. Bio Protoc 2020; 10:e3757. [PMID: 33659416 PMCID: PMC7842792 DOI: 10.21769/bioprotoc.3757] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/25/2020] [Accepted: 07/23/2020] [Indexed: 11/02/2022] Open
Abstract
Gene transcription in bacteria often starts some nucleotides upstream of the start codon. Identifying the specific Transcriptional Start Site (TSS) is essential for genetic manipulation, as in many cases upstream of the start codon there are sequence elements that are involved in gene expression regulation. Taken into account the classical gene structure, we are able to identify two kinds of transcriptional start site: primary and secondary. A primary transcriptional start site is located some nucleotides upstream of the translational start site, while a secondary transcriptional start site is located within the gene encoding sequence. Here, we present a step by step protocol for genome-wide transcriptional start sites determination by differential RNA-sequencing (dRNA-seq) using the enteric pathogen Shigella flexneri serotype 5a strain M90T as model. However, this method can be employed in any other bacterial species of choice. In the first steps, total RNA is purified from bacterial cultures using the hot phenol method. Ribosomal RNA (rRNA) is specifically depleted via hybridization probes using a commercial kit. A 5'-monophosphate-dependent exonuclease (TEX)-treated RNA library enriched in primary transcripts is then prepared for comparison with a library that has not undergone TEX-treatment, followed by ligation of an RNA linker adaptor of known sequence allowing the determination of TSS with single nucleotide precision. Finally, the RNA is processed for Illumina sequencing library preparation and sequenced as purchased service. TSS are identified by in-house bioinformatic analysis. Our protocol is cost-effective as it minimizes the use of commercial kits and employs freely available software.
Collapse
Affiliation(s)
- Ramón Cervantes-Rivera
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Sweden
- Umeå Centre for Microbial Research (UCMR), Umeå University, 90 187 Umeå, Sweden
- Department of Molecular Biology, Umeå University, 90 187 Umeå, Sweden
| | - Andrea Puhar
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Sweden
- Umeå Centre for Microbial Research (UCMR), Umeå University, 90 187 Umeå, Sweden
- Department of Molecular Biology, Umeå University, 90 187 Umeå, Sweden
| |
Collapse
|
8
|
Soutourina O, Dubois T, Monot M, Shelyakin PV, Saujet L, Boudry P, Gelfand MS, Dupuy B, Martin-Verstraete I. Genome-Wide Transcription Start Site Mapping and Promoter Assignments to a Sigma Factor in the Human Enteropathogen Clostridioides difficile. Front Microbiol 2020; 11:1939. [PMID: 32903654 PMCID: PMC7438776 DOI: 10.3389/fmicb.2020.01939] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 07/23/2020] [Indexed: 12/12/2022] Open
Abstract
The emerging human enteropathogen Clostridioides difficile is the main cause of diarrhea associated with antibiotherapy. Regulatory pathways underlying the adaptive responses remain understudied and the global view of C. difficile promoter structure is still missing. In the genome of C. difficile 630, 22 genes encoding sigma factors are present suggesting a complex pattern of transcription in this bacterium. We present here the first transcriptional map of the C. difficile genome resulting from the identification of transcriptional start sites (TSS), promoter motifs and operon structures. By 5′-end RNA-seq approach, we mapped more than 1000 TSS upstream of genes. In addition to these primary TSS, this analysis revealed complex structure of transcriptional units such as alternative and internal promoters, potential RNA processing events and 5′ untranslated regions. By following an in silico iterative strategy that used as an input previously published consensus sequences and transcriptomic analysis, we identified candidate promoters upstream of most of protein-coding and non-coding RNAs genes. This strategy also led to refine consensus sequences of promoters recognized by major sigma factors of C. difficile. Detailed analysis focuses on the transcription in the pathogenicity locus and regulatory genes, as well as regulons of transition phase and sporulation sigma factors as important components of C. difficile regulatory network governing toxin gene expression and spore formation. Among the still uncharacterized regulons of the major sigma factors of C. difficile, we defined the SigL regulon by combining transcriptome and in silico analyses. We showed that the SigL regulon is largely involved in amino-acid degradation, a metabolism crucial for C. difficile gut colonization. Finally, we combined our TSS mapping, in silico identification of promoters and RNA-seq data to improve gene annotation and to suggest operon organization in C. difficile. These data will considerably improve our knowledge of global regulatory circuits controlling gene expression in C. difficile and will serve as a useful rich resource for scientific community both for the detailed analysis of specific genes and systems biology studies.
Collapse
Affiliation(s)
- Olga Soutourina
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France.,Institut Universitaire de France, Paris, France.,Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), Gif-sur-Yvette, France
| | - Thomas Dubois
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France
| | - Marc Monot
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France
| | | | - Laure Saujet
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France
| | - Pierre Boudry
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems, Moscow, Russia.,Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Bruno Dupuy
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France
| | - Isabelle Martin-Verstraete
- Laboratoire Pathogenèses des Bactéries Anaérobies, Institut Pasteur, UMR CNRS 2001, Université de Paris, Paris, France.,Institut Universitaire de France, Paris, France
| |
Collapse
|
9
|
de la Fuente L, Arzalluz-Luque Á, Tardáguila M, Del Risco H, Martí C, Tarazona S, Salguero P, Scott R, Lerma A, Alastrue-Agudo A, Bonilla P, Newman JRB, Kosugi S, McIntyre LM, Moreno-Manzano V, Conesa A. tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing. Genome Biol 2020; 21:119. [PMID: 32423416 PMCID: PMC7236505 DOI: 10.1186/s13059-020-02028-w] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 04/23/2020] [Indexed: 12/26/2022] Open
Abstract
Recent advances in long-read sequencing solve inaccuracies in alternative transcript identification of full-length transcripts in short-read RNA-Seq data, which encourages the development of methods for isoform-centered functional analysis. Here, we present tappAS, the first framework to enable a comprehensive Functional Iso-Transcriptomics (FIT) analysis, which is effective at revealing the functional impact of context-specific post-transcriptional regulation. tappAS uses isoform-resolved annotation of coding and non-coding functional domains, motifs, and sites, in combination with novel analysis methods to interrogate different aspects of the functional readout of transcript variants and isoform regulation. tappAS software and documentation are available at https://app.tappas.org.
Collapse
Affiliation(s)
- Lorena de la Fuente
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
- Present Address: Bioinformatics Unit, IIS Fundación Jiménez Díaz, Madrid, Spain
| | - Ángeles Arzalluz-Luque
- Department of Statistics and Operational Research, Polytechnical University of Valencia, Valencia, Spain
| | - Manuel Tardáguila
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Héctor Del Risco
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
| | - Cristina Martí
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Sonia Tarazona
- Department of Statistics and Operational Research, Polytechnical University of Valencia, Valencia, Spain
| | - Pedro Salguero
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Raymond Scott
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
| | - Alberto Lerma
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Ana Alastrue-Agudo
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Pablo Bonilla
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Pathology, University of Florida, Gainesville, FL, USA
| | - Shunichi Kosugi
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Laboratory for Statistical and Translational Genetics, Center for Integrative Medical Sciences, RIKEN, Wako, Japan
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | | | - Ana Conesa
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
10
|
Oliveira PH, Fang G. Conserved DNA Methyltransferases: A Window into Fundamental Mechanisms of Epigenetic Regulation in Bacteria. Trends Microbiol 2020; 29:28-40. [PMID: 32417228 DOI: 10.1016/j.tim.2020.04.007] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 03/19/2020] [Accepted: 04/10/2020] [Indexed: 12/14/2022]
Abstract
An increasing number of studies have reported that bacterial DNA methylation has important functions beyond the roles in restriction-modification systems, including the ability of affecting clinically relevant phenotypes such as virulence, host colonization, sporulation, biofilm formation, among others. Although insightful, such studies have a largely ad hoc nature and would benefit from a systematic strategy enabling a joint functional characterization of bacterial methylomes by the microbiology community. In this opinion article, we propose that highly conserved DNA methyltransferases (MTases) represent a unique opportunity for bacterial epigenomic studies. These MTases are rather common in bacteria, span various taxonomic scales, and are present in multiple human pathogens. Apart from well-characterized core DNA MTases, like those from Vibrio cholerae, Salmonella enterica, Clostridioides difficile, or Streptococcus pyogenes, multiple highly conserved DNA MTases are also found in numerous human pathogens, including those belonging to the genera Burkholderia and Acinetobacter. We discuss why and how these MTases can be prioritized to enable a community-wide, integrative approach for functional epigenomic studies. Ultimately, we discuss how some highly conserved DNA MTases may emerge as promising targets for the development of novel epigenetic inhibitors for biomedical applications.
Collapse
Affiliation(s)
- Pedro H Oliveira
- Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, NY, USA.
| | - Gang Fang
- Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, NY, USA.
| |
Collapse
|
11
|
Yu SH, Vogel J, Förstner KU. ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes. Gigascience 2018; 7:5087959. [PMID: 30169674 PMCID: PMC6123526 DOI: 10.1093/gigascience/giy096] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 08/23/2018] [Indexed: 11/13/2022] Open
Abstract
To understand the gene regulation of an organism of interest, a comprehensive genome annotation is essential. While some features, such as coding sequences, can be computationally predicted with high accuracy based purely on the genomic sequence, others, such as promoter elements or noncoding RNAs, are harder to detect. RNA sequencing (RNA-seq) has proven to be an efficient method to identify these genomic features and to improve genome annotations. However, processing and integrating RNA-seq data in order to generate high-resolution annotations is challenging, time consuming, and requires numerous steps. We have constructed a powerful and modular tool called ANNOgesic that provides the required analyses and simplifies RNA-seq-based bacterial and archaeal genome annotation. It can integrate data from conventional RNA-seq and differential RNA-seq and predicts and annotates numerous features, including small noncoding RNAs, with high precision. The software is available under an open source license (ISCL) at https://pypi.org/project/ANNOgesic/.
Collapse
Affiliation(s)
- Sung-Huan Yu
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany.,Helmholtz Institute for RNA-based Infection Research (HIRI), Josef-Schneider-Straße 2, 97080 Würzburg Germany
| | - Konrad U Förstner
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany.,ZB MED - Information Center for Life Sciences, Informationservices, Gleueler Straße 60, 50931 Cologne (Köln), Germany.,Technical University of Cologne, Faculty for Information and Communication Sciences, Claudiusstraße 1, 50678 Cologne (Köln), Germany
| |
Collapse
|
12
|
Le Scornet A, Redder P. Post-transcriptional control of virulence gene expression in Staphylococcus aureus. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2018; 1862:734-741. [PMID: 29705591 DOI: 10.1016/j.bbagrm.2018.04.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 04/25/2018] [Accepted: 04/25/2018] [Indexed: 12/12/2022]
Abstract
Opportunistic pathogens have to be ready to change life-style whenever the occasion arises, and therefore need to keep tight control over the expression of their virulence factors. Doubly so for commensal bacteria, such as Staphylococcus aureus, which should avoid harming their hosts when they are in a state of peaceful co-existence. S. aureus carries very few sigma factors to help define the transcriptional programs, but instead uses a plethora of small RNA molecules and RNA-RNA interactions to regulate gene expression post-transcriptionally. The endoribonucleases RNase III and RNase Y contribute to this regulatory diversity, and provide a link to RNA-decay and intra-cellular spatiotemporal control of expression. In this review we describe some of these post-transcriptional mechanisms as well as some of the novel transcriptomic approaches that have been used to find and to study them.
Collapse
Affiliation(s)
- Alexandre Le Scornet
- LMGM, Centre de Biologie Integrative, Paul Sabatier University, 118, Route de Narbonne, 31062 Toulouse, France
| | - Peter Redder
- LMGM, Centre de Biologie Integrative, Paul Sabatier University, 118, Route de Narbonne, 31062 Toulouse, France.
| |
Collapse
|
13
|
Promworn Y, Kaewprommal P, Shaw PJ, Intarapanich A, Tongsima S, Piriyapongsa J. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data. PLoS One 2017; 12:e0178483. [PMID: 28542466 PMCID: PMC5444824 DOI: 10.1371/journal.pone.0178483] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/12/2017] [Indexed: 11/25/2022] Open
Abstract
Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).
Collapse
Affiliation(s)
- Yuttachon Promworn
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Pavita Kaewprommal
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Philip J. Shaw
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Apichart Intarapanich
- National Electronics and Computer Technology Center (NECTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Jittima Piriyapongsa
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
- * E-mail:
| |
Collapse
|
14
|
James K, Cockell SJ, Zenkin N. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics. Methods 2017; 120:76-84. [PMID: 28434904 DOI: 10.1016/j.ymeth.2017.04.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Revised: 04/13/2017] [Accepted: 04/18/2017] [Indexed: 01/13/2023] Open
Abstract
The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses.
Collapse
Affiliation(s)
- Katherine James
- Centre for Bacterial Cell Biology, Institute for Cell and Molecular Bioscience, Newcastle University, Baddiley-Clark Building, Richardson Road, Newcastle Upon Tyne NE2 4AX, UK.
| | - Simon J Cockell
- Bioinformatics Support Unit, Newcastle University, William Leech Building, Framlington Place, Newcastle Upon Tyne NE2 4HH, UK
| | - Nikolay Zenkin
- Centre for Bacterial Cell Biology, Institute for Cell and Molecular Bioscience, Newcastle University, Baddiley-Clark Building, Richardson Road, Newcastle Upon Tyne NE2 4AX, UK
| |
Collapse
|
15
|
Hilker R, Stadermann KB, Schwengers O, Anisiforov E, Jaenicke S, Weisshaar B, Zimmermann T, Goesmann A. ReadXplorer 2-detailed read mapping analysis and visualization from one single source. Bioinformatics 2016; 32:3702-3708. [PMID: 27540267 PMCID: PMC5167064 DOI: 10.1093/bioinformatics/btw541] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2016] [Revised: 08/02/2016] [Accepted: 08/15/2016] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION The vast amount of already available and currently generated read mapping data requires comprehensive visualization, and should benefit from bioinformatics tools offering a wide spectrum of analysis functionality from just one source. Appropriate handling of multiple mapped reads during mapping analyses remains an issue that demands improvement. RESULTS The capabilities of the read mapping analysis and visualization tool ReadXplorer were vastly enhanced. Here, we present an even finer granulated read mapping classification, improving the level of detail for analyses and visualizations. The spectrum of automatic analysis functions has been broadened to include genome rearrangement detection as well as correlation analysis between two mapping data sets. Existing functions were refined and enhanced, namely the computation of differentially expressed genes, the read count and normalization analysis and the transcription start site detection. Additionally, ReadXplorer 2 features a highly improved support for large eukaryotic data sets and a command line version, enabling its integration into workflows. Finally, the new version is now able to display any kind of tabular results from other bioinformatics tools. AVAILABILITY AND IMPLEMENTATION http://www.readxplorer.org CONTACT: readxplorer@computational.bio.uni-giessen.deSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rolf Hilker
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| | - Kai Bernd Stadermann
- Faculty of Biology, Chair of Genome Research, Bielefeld University, Bielefeld 33615, Germany
| | - Oliver Schwengers
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| | - Evgeny Anisiforov
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| | - Sebastian Jaenicke
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| | - Bernd Weisshaar
- Faculty of Biology, Chair of Genome Research, Bielefeld University, Bielefeld 33615, Germany
| | - Tobias Zimmermann
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, Giessen 35392, Germany
| |
Collapse
|
16
|
Choi SC. On the study of microbial transcriptomes using second- and third-generation sequencing technologies. J Microbiol 2016; 54:527-36. [PMID: 27480632 DOI: 10.1007/s12275-016-6233-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 06/16/2016] [Accepted: 06/16/2016] [Indexed: 12/19/2022]
Abstract
Second-generation sequencing technologies transformed the study of microbial transcriptomes. They helped reveal the transcription start sites and antisense transcripts of microbial species, improving the microbial genome annotation. Quantification of genome-wide gene expression levels allowed for functional studies of microbial research. Ever-evolving sequencing technologies are reshaping approaches to studying microbial transcriptomes. Recently, Oxford Nanopore Technologies delivered a sequencing platform called MinION, a third-generation sequencing technology, to the research community. We expect it to be the next sequencing technology that enables breakthroughs in life science fields. The studies of microbial transcriptomes will be no exception. In this paper, we review microbial transcriptomics studies using second- generation sequencing technology. We also discuss the prospect of microbial transcriptomics studies with thirdgeneration sequencing.
Collapse
Affiliation(s)
- Sang Chul Choi
- Department of Biology, Sungshin Women's University, Seoul, 01133, Republic of Korea.
| |
Collapse
|
17
|
Cohen O, Doron S, Wurtzel O, Dar D, Edelheit S, Karunker I, Mick E, Sorek R. Comparative transcriptomics across the prokaryotic tree of life. Nucleic Acids Res 2016; 44:W46-53. [PMID: 27154273 PMCID: PMC4987935 DOI: 10.1093/nar/gkw394] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Accepted: 04/28/2016] [Indexed: 12/23/2022] Open
Abstract
Whole-transcriptome sequencing studies from recent years revealed an unexpected complexity in transcriptomes of bacteria and archaea, including abundant non-coding RNAs, cis-antisense transcription and regulatory untranslated regions (UTRs). Understanding the functional relevance of the plethora of non-coding RNAs in a given organism is challenging, especially since some of these RNAs were attributed to ‘transcriptional noise’. To allow the search for conserved transcriptomic elements we produced comparative transcriptome maps for multiple species across the microbial tree of life. These transcriptome maps are detailed in annotations, comparable by gene families, and BLAST-searchable by user provided sequences. Our transcriptome collection includes 18 model organisms spanning 10 phyla/subphyla of bacteria and archaea that were sequenced using standardized RNA-seq methods. The utility of the comparative approach, as implemented in our web server, is demonstrated by highlighting genes with exceptionally long 5′UTRs across species, which correspond to many known riboswitches and further suggest novel putative regulatory elements. Our study provides a standardized reference transcriptome to major clinically and environmentally important microbial phyla. The viewer is available at http://exploration.weizmann.ac.il/TCOL, setting a framework for comparative studies of the microbial non-coding genome.
Collapse
Affiliation(s)
- Ofir Cohen
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Shany Doron
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Omri Wurtzel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | - Daniel Dar
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sarit Edelheit
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Iris Karunker
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eran Mick
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Rotem Sorek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
18
|
van der Meulen SB, de Jong A, Kok J. Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism. RNA Biol 2016; 13:353-66. [PMID: 26950529 PMCID: PMC4829306 DOI: 10.1080/15476286.2016.1146855] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.
Collapse
Affiliation(s)
- Sjoerd B van der Meulen
- a Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute , University of Groningen , Groningen , The Netherlands.,b Top Institute Food and Nutrition (TIFN) , Wageningen , The Netherlands
| | - Anne de Jong
- a Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute , University of Groningen , Groningen , The Netherlands.,b Top Institute Food and Nutrition (TIFN) , Wageningen , The Netherlands
| | - Jan Kok
- a Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute , University of Groningen , Groningen , The Netherlands.,b Top Institute Food and Nutrition (TIFN) , Wageningen , The Netherlands
| |
Collapse
|
19
|
Stazic D, Voß B. The complexity of bacterial transcriptomes. J Biotechnol 2015; 232:69-78. [PMID: 26450562 DOI: 10.1016/j.jbiotec.2015.09.041] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 09/07/2015] [Accepted: 09/29/2015] [Indexed: 01/09/2023]
Abstract
For eukaryotes there seems to be no doubt that differences on the trancriptomic level substantially contribute to the process of species diversification, whereas for bacteria this is thought to be less important. Recent years saw a significant increase in full transcriptome studies for bacteria, which provided deep insight into the architecture of bacterial transcriptomes. Most notably, it became evident that, in contrast to previous scientific consensus, bacterial transcriptomes are quite complex. There exist a large number of cis-antisense RNAs, non-coding RNAs, overlapping transcripts and RNA elements that regulate transcription, such as riboswitches. Furthermore, processing and degradation of RNA has gained interest, because it has a significant impact on the composition of the transcriptome. In this review, we summarize recent findings and put them into a broader context with respect to the complexity of bacterial transcriptomes and its putative biological meanings.
Collapse
Affiliation(s)
- D Stazic
- University of Freiburg, Faculty of Biology, Computational Transcriptomics, Schänzlestr. 1, 79104 Freiburg, Germany.
| | - B Voß
- University of Freiburg, Faculty of Biology, Computational Transcriptomics, Schänzlestr. 1, 79104 Freiburg, Germany.
| |
Collapse
|
20
|
Bischler T, Tan HS, Nieselt K, Sharma CM. Differential RNA-seq (dRNA-seq) for annotation of transcriptional start sites and small RNAs in Helicobacter pylori. Methods 2015; 86:89-101. [PMID: 26091613 DOI: 10.1016/j.ymeth.2015.06.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 06/07/2015] [Accepted: 06/09/2015] [Indexed: 12/29/2022] Open
Abstract
The global mapping of transcription boundaries is a key step in the elucidation of the full complement of transcriptional features of an organism. It facilitates the annotation of operons and untranslated regions as well as novel transcripts, including cis- and trans-encoded small RNAs (sRNAs). So called RNA sequencing (RNA-seq) based on deep sequencing of cDNAs has greatly facilitated transcript mapping with single nucleotide resolution. However, conventional RNA-seq approaches typically cannot distinguish between primary and processed transcripts. Here we describe the recently developed differential RNA-seq (dRNA-seq) approach, which facilitates the annotation of transcriptional start sites (TSS) based on deep sequencing of two differentially treated cDNA library pairs, with one library being enriched for primary transcripts. Using the human pathogen Helicobacter pylori as a model organism, we describe the application of dRNA-seq together with an automated TSS annotation approach for generation of a genome-wide TSS map in bacteria. Besides a description of transcriptome and regulatory features that can be identified by this approach, we discuss the impact of different library preparation protocols and sequencing platforms as well as manual and automated TSS annotation. Moreover, we have set up an easily accessible online browser for visualization of the H. pylori transcriptome data from this and our previous H. pylori dRNA-seq study.
Collapse
Affiliation(s)
- Thorsten Bischler
- Research Center for Infectious Diseases (ZINF), University of Würzburg, Josef-Schneider-Str. 2/Bau D15, 97080 Würzburg, Germany
| | - Hock Siew Tan
- Research Center for Infectious Diseases (ZINF), University of Würzburg, Josef-Schneider-Str. 2/Bau D15, 97080 Würzburg, Germany
| | - Kay Nieselt
- Integrative Transcriptomics, ZBIT (Center for Bioinformatics Tübingen), University of Tübingen, Sand 14, D-72076 Tübingen, Germany
| | - Cynthia M Sharma
- Research Center for Infectious Diseases (ZINF), University of Würzburg, Josef-Schneider-Str. 2/Bau D15, 97080 Würzburg, Germany.
| |
Collapse
|
21
|
Redder P. Using EMOTE to map the exact 5'-ends of processed RNA on a transcriptome-wide scale. Methods Mol Biol 2015; 1259:69-85. [PMID: 25579580 DOI: 10.1007/978-1-4939-2214-7_5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The presence or absence of structure in an RNA is often crucial to its function. This is evident for highly structured RNAs such as rRNA, tRNA, or riboswitches, but it is also the case for many mRNAs, where secondary structures in the 5' or 3' UTR can determine the efficiency of translation or the half-life of the RNA. There are paths to modify such secondary structures, (1) by the action of a helicase that allows an alternative RNA structure to form, (2) by the formation of a duplex with another RNA, or (3) by cleavage of the RNA in a way that favors a different secondary structure. None of the three exclude the others, and in vivo it is common that two or all three work together to remodel an RNA to the desired form. However, while the first two solutions can be reversible, the cleavage of RNA is final, and there is no chance to go back. In this chapter, a method for tracking the 5' end created by RNA processing on a transcriptome-wide scale is presented. The Exact Mapping Of Transcriptome Ends (EMOTE) allows the large-scale identification of mono-phosphorylated RNA 5'-ends and provides the exact processing sites.
Collapse
Affiliation(s)
- Peter Redder
- Faculty of Medicine, University of Geneva, Rue Michel-Servet 1, 1211, Geneve 4, Switzerland,
| |
Collapse
|
22
|
Creecy JP, Conway T. Quantitative bacterial transcriptomics with RNA-seq. Curr Opin Microbiol 2014; 23:133-40. [PMID: 25483350 DOI: 10.1016/j.mib.2014.11.011] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 11/11/2014] [Accepted: 11/12/2014] [Indexed: 02/06/2023]
Abstract
RNA sequencing has emerged as the premier approach to study bacterial transcriptomes. While the earliest published studies analyzed the data qualitatively, the data are readily digitized and lend themselves to quantitative analysis. High-resolution RNA sequence (RNA-seq) data allows transcriptional features (promoters, terminators, operons, among others) to be pinpointed on any bacterial transcriptome. Once the transcriptome is mapped, the activity of transcriptional features can be quantified. Here we highlight how quantitative transcriptome analysis can reveal biological insights and briefly discuss some of the challenges to be faced by the field of bacterial transcriptomics in the near future.
Collapse
Affiliation(s)
- James P Creecy
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States; Department of Biology, University of Central Oklahoma, Edmond, OK 73034, United States
| | - Tyrrell Conway
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States.
| |
Collapse
|
23
|
Zaramela LS, Vêncio RZN, ten-Caten F, Baliga NS, Koide T. Transcription start site associated RNAs (TSSaRNAs) are ubiquitous in all domains of life. PLoS One 2014; 9:e107680. [PMID: 25238539 PMCID: PMC4169567 DOI: 10.1371/journal.pone.0107680] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2014] [Accepted: 08/18/2014] [Indexed: 01/06/2023] Open
Abstract
A plethora of non-coding RNAs has been discovered using high-resolution transcriptomics tools, indicating that transcriptional and post-transcriptional regulation is much more complex than previously appreciated. Small RNAs associated with transcription start sites of annotated coding regions (TSSaRNAs) are pervasive in both eukaryotes and bacteria. Here, we provide evidence for existence of TSSaRNAs in several archaeal transcriptomes including: Halobacterium salinarum, Pyrococcus furiosus, Methanococcus maripaludis, and Sulfolobus solfataricus. We validated TSSaRNAs from the model archaeon Halobacterium salinarum NRC-1 by deep sequencing two independent small-RNA enriched (RNA-seq) and a primary-transcript enriched (dRNA-seq) strand-specific libraries. We identified 652 transcripts, of which 179 were shown to be primary transcripts (∼7% of the annotated genome). Distinct growth-associated expression patterns between TSSaRNAs and their cognate genes were observed, indicating a possible role in environmental responses that may result from RNA polymerase with varying pausing rhythms. This work shows that TSSaRNAs are ubiquitous across all domains of life.
Collapse
Affiliation(s)
- Livia S. Zaramela
- Department Biochemistry and Immunology, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, Brazil
| | - Ricardo Z. N. Vêncio
- Department of Computing and Mathematics, Faculdade de Filosofia Ciências e Letras de Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil
| | - Felipe ten-Caten
- Department Biochemistry and Immunology, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, Brazil
| | - Nitin S. Baliga
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Tie Koide
- Department Biochemistry and Immunology, Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, Brazil
- * E-mail:
| |
Collapse
|
24
|
Sharma CM, Vogel J. Differential RNA-seq: the approach behind and the biological insight gained. Curr Opin Microbiol 2014; 19:97-105. [PMID: 25024085 DOI: 10.1016/j.mib.2014.06.010] [Citation(s) in RCA: 151] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 06/15/2014] [Accepted: 06/19/2014] [Indexed: 01/14/2023]
Abstract
RNA-sequencing has revolutionized the quantitative and qualitative analysis of transcriptomes in both prokaryotes and eukaryotes. It provides a generic approach for gene expression profiling, annotation of transcript boundaries and operons, as well as identifying novel transcripts including small noncoding RNA molecules and antisense RNAs. We recently developed a differential RNA-seq (dRNA-seq) method which in addition to the above, yields information as to whether a given RNA is a primary or processed transcript. Originally applied to describe the primary transcriptome of the gastric pathogen Helicobacter pylori, dRNA-seq has since provided global maps of transcriptional start sites in diverse species, informed new biology in the CRISPR-Cas9 system, advanced to a tool for comparative transcriptomics, and inspired simultaneous RNA-seq of pathogen and host.
Collapse
Affiliation(s)
- Cynthia M Sharma
- University of Würzburg, Institute for Molecular Infection Biology & Research Center for Infectious Diseases, Josef-Schneider-Straße 2/D15, D-97080 Würzburg, Germany.
| | - Jörg Vogel
- University of Würzburg, Institute for Molecular Infection Biology & Research Center for Infectious Diseases, Josef-Schneider-Straße 2/D15, D-97080 Würzburg, Germany.
| |
Collapse
|
25
|
Morton T, Petricka J, Corcoran DL, Li S, Winter CM, Carda A, Benfey PN, Ohler U, Megraw M. Paired-end analysis of transcription start sites in Arabidopsis reveals plant-specific promoter signatures. THE PLANT CELL 2014; 26:2746-60. [PMID: 25035402 PMCID: PMC4145111 DOI: 10.1105/tpc.114.125617] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 06/03/2014] [Accepted: 06/24/2014] [Indexed: 05/19/2023]
Abstract
Understanding plant gene promoter architecture has long been a challenge due to the lack of relevant large-scale data sets and analysis methods. Here, we present a publicly available, large-scale transcription start site (TSS) data set in plants using a high-resolution method for analysis of 5' ends of mRNA transcripts. Our data set is produced using the paired-end analysis of transcription start sites (PEAT) protocol, providing millions of TSS locations from wild-type Columbia-0 Arabidopsis thaliana whole root samples. Using this data set, we grouped TSS reads into "TSS tag clusters" and categorized clusters into three spatial initiation patterns: narrow peak, broad with peak, and weak peak. We then designed a machine learning model that predicts the presence of TSS tag clusters with outstanding sensitivity and specificity for all three initiation patterns. We used this model to analyze the transcription factor binding site content of promoters exhibiting these initiation patterns. In contrast to the canonical notions of TATA-containing and more broad "TATA-less" promoters, the model shows that, in plants, the vast majority of transcription start sites are TATA free and are defined by a large compendium of known DNA sequence binding elements. We present results on the usage of these elements and provide our Plant PEAT Peaks (3PEAT) model that predicts the presence of TSSs directly from sequence.
Collapse
Affiliation(s)
- Taj Morton
- Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon 97331
| | - Jalean Petricka
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708 Department of Biology, HHMI and Center for Systems Biology, Duke University, Durham, North Carolina 27708 Department of Biology, Carleton College, Northfield, Minnesota 55057
| | - David L Corcoran
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708
| | - Song Li
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708
| | - Cara M Winter
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708 Department of Biology, HHMI and Center for Systems Biology, Duke University, Durham, North Carolina 27708
| | - Alexa Carda
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708
| | - Philip N Benfey
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708 Department of Biology, HHMI and Center for Systems Biology, Duke University, Durham, North Carolina 27708
| | - Uwe Ohler
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708 Department of Computer Science, Duke University, 308 Research Drive, Durham, North Carolina 27708 Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina 27710 Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Molly Megraw
- Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon 97331 Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708 Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331 Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331
| |
Collapse
|