1
|
Adams PP, Storz G. Prevalence of small base-pairing RNAs derived from diverse genomic loci. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194524. [PMID: 32147527 DOI: 10.1016/j.bbagrm.2020.194524] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 03/03/2020] [Accepted: 03/03/2020] [Indexed: 12/21/2022]
Abstract
Small RNAs (sRNAs) that act by base-pairing have been shown to play important roles in fine-tuning the levels and translation of their target transcripts across a variety of model and pathogenic organisms. Work from many different groups in a wide range of bacterial species has provided evidence for the importance and complexity of sRNA regulatory networks, which allow bacteria to quickly respond to changes in their environment. However, despite the expansive literature, much remains to be learned about all aspects of sRNA-mediated regulation, particularly in bacteria beyond the well-characterized Escherichia coli and Salmonella enterica species. Here we discuss what is known, and what remains to be learned, about the identification of regulatory base-pairing RNAs produced from diverse genomic loci including how their expression is regulated. This article is part of a Special Issue entitled: RNA and gene control in bacteria edited by Dr. M. Guillier and F. Repoila.
Collapse
Affiliation(s)
- Philip P Adams
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-5430, USA; Postdoctoral Research Associate Program, National Institute of General Medical Sciences, National Institutes of Health, Bethesda, MD 20892-6200, USA.
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-5430, USA
| |
Collapse
|
2
|
Emamjomeh A, Zahiri J, Asadian M, Behmanesh M, Fakheri BA, Mahdevar G. Identification, Prediction and Data Analysis of Noncoding RNAs: A Review. Med Chem 2019; 15:216-230. [PMID: 30484409 DOI: 10.2174/1573406414666181015151610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 06/03/2018] [Accepted: 09/30/2018] [Indexed: 12/13/2022]
Abstract
BACKGROUND Noncoding RNAs (ncRNAs) which play an important role in various cellular processes are important in medicine as well as in drug design strategies. Different studies have shown that ncRNAs are dis-regulated in cancer cells and play an important role in human tumorigenesis. Therefore, it is important to identify and predict such molecules by experimental and computational methods, respectively. However, to avoid expensive experimental methods, computational algorithms have been developed for accurately and fast prediction of ncRNAs. OBJECTIVE The aim of this review was to introduce the experimental and computational methods to identify and predict ncRNAs structure. Also, we explained the ncRNA's roles in cellular processes and drugs design, briefly. METHOD In this survey, we will introduce ncRNAs and their roles in biological and medicinal processes. Then, some important laboratory techniques will be studied to identify ncRNAs. Finally, the state-of-the-art models and algorithms will be introduced along with important tools and databases. RESULTS The results showed that the integration of experimental and computational approaches improves to identify ncRNAs. Moreover, the high accurate databases, algorithms and tools were compared to predict the ncRNAs. CONCLUSION ncRNAs prediction is an exciting research field, but there are different difficulties. It requires accurate and reliable algorithms and tools. Also, it should be mentioned that computational costs of such algorithm including running time and usage memory are very important. Finally, some suggestions were presented to improve computational methods of ncRNAs gene and structural prediction.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mehrdad Asadian
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Mehrdad Behmanesh
- Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Barat A Fakheri
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Ghasem Mahdevar
- Department of Mathematics, Faculty of Sciences, University of Isfahan, Isfahan, Iran
| |
Collapse
|
3
|
Mustoe AM, Busan S, Rice GM, Hajdin CE, Peterson BK, Ruda VM, Kubica N, Nutiu R, Baryza JL, Weeks KM. Pervasive Regulatory Functions of mRNA Structure Revealed by High-Resolution SHAPE Probing. Cell 2018; 173:181-195.e18. [PMID: 29551268 DOI: 10.1016/j.cell.2018.02.034] [Citation(s) in RCA: 189] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 01/02/2018] [Accepted: 02/15/2018] [Indexed: 11/25/2022]
Abstract
mRNAs can fold into complex structures that regulate gene expression. Resolving such structures de novo has remained challenging and has limited our understanding of the prevalence and functions of mRNA structure. We use SHAPE-MaP experiments in living E. coli cells to derive quantitative, nucleotide-resolution structure models for 194 endogenous transcripts encompassing approximately 400 genes. Individual mRNAs have exceptionally diverse architectures, and most contain well-defined structures. Active translation destabilizes mRNA structure in cells. Nevertheless, mRNA structure remains similar between in-cell and cell-free environments, indicating broad potential for structure-mediated gene regulation. We find that the translation efficiency of endogenous genes is regulated by unfolding kinetics of structures overlapping the ribosome binding site. We discover conserved structured elements in 35% of UTRs, several of which we validate as novel protein binding motifs. RNA structure regulates every gene studied here in a meaningful way, implying that most functional structures remain to be discovered.
Collapse
Affiliation(s)
- Anthony M Mustoe
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA.
| | - Steven Busan
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Greggory M Rice
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA; Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | | | - Brant K Peterson
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Vera M Ruda
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Neil Kubica
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Razvan Nutiu
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Jeremy L Baryza
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
4
|
Chen Q, Lan C, Chen B, Wang L, Li J, Zhang C. Exploring Consensus RNA Substructural Patterns Using Subgraph Mining. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1134-1146. [PMID: 28026781 DOI: 10.1109/tcbb.2016.2645202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Frequently recurring RNA structural motifs play important roles in RNA folding process and interaction with other molecules. Traditional index-based and shape-based schemas are useful in modeling RNA secondary structures but ignore the structural discrepancy of individual RNA family member. Further, the in-depth analysis of underlying substructure pattern is insufficient due to varied and unnormalized substructure data. This prevents us from understanding RNAs functions and their inherent synergistic regulation networks. This article thus proposes a novel labeled graph-based algorithm RnaGraph to uncover frequently RNA substructure patterns. Attribute data and graph data are combined to characterize diverse substructures and their correlations, respectively. Further, a top-k graph pattern mining algorithm is developed to extract interesting substructure motifs by integrating frequency and similarity. The experimental results show that our methods assist in not only modelling complex RNA secondary structures but also identifying hidden but interesting RNA substructure patterns.
Collapse
|
5
|
Transcriptional Variation of Diverse Enteropathogenic Escherichia coli Isolates under Virulence-Inducing Conditions. mSystems 2017; 2:mSystems00024-17. [PMID: 28766584 PMCID: PMC5527300 DOI: 10.1128/msystems.00024-17] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 05/06/2017] [Indexed: 12/23/2022] Open
Abstract
Enteropathogenic Escherichia coli (EPEC) bacteria are a diverse group of pathogens that cause moderate to severe diarrhea in young children in developing countries. EPEC isolates can be further subclassified as typical EPEC (tEPEC) isolates that contain the bundle-forming pilus (BFP) or as atypical EPEC (aEPEC) isolates that do not contain BFP. Comparative genomics studies have recently highlighted the considerable genomic diversity among EPEC isolates. In the current study, we used RNA sequencing (RNA-Seq) to characterize the global transcriptomes of eight tEPEC isolates representing the identified genomic diversity, as well as one aEPEC isolate. The global transcriptomes were determined for the EPEC isolates under conditions of laboratory growth that are known to induce expression of virulence-associated genes. The findings demonstrate that unique genes of EPEC isolates from diverse phylogenomic lineages contribute to variation in their global transcriptomes. There were also phylogroup-specific differences in the global transcriptomes, including genes involved in iron acquisition, which had significant differential expression in the EPEC isolates belonging to phylogroup B2. Also, three EPEC isolates from the same phylogenomic lineage (EPEC8) had greater levels of similarity in their genomic content and exhibited greater similarities in their global transcriptomes than EPEC from other lineages; however, even among closely related isolates there were isolate-specific differences among their transcriptomes. These findings highlight the transcriptional variability that correlates with the previously unappreciated genomic diversity of EPEC. IMPORTANCE Recent studies have demonstrated that there is considerable genomic diversity among EPEC isolates; however, it is unknown if this genomic diversity leads to differences in their global transcription. This study used RNA-Seq to compare the global transcriptomes of EPEC isolates from diverse phylogenomic lineages. We demonstrate that there are lineage- and isolate-specific differences in the transcriptomes of genomically diverse EPEC isolates during growth under in vitro virulence-inducing conditions. This study addressed biological variation among isolates of a single pathovar in an effort to demonstrate that while each of these isolates is considered an EPEC isolate, there is significant transcriptional diversity among members of this pathovar. Future studies should consider whether this previously undescribed transcriptional variation may play a significant role in isolate-specific variability of EPEC clinical presentations.
Collapse
|
6
|
A Review on Recent Computational Methods for Predicting Noncoding RNAs. BIOMED RESEARCH INTERNATIONAL 2017; 2017:9139504. [PMID: 28553651 PMCID: PMC5434267 DOI: 10.1155/2017/9139504] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 02/06/2017] [Accepted: 02/15/2017] [Indexed: 12/20/2022]
Abstract
Noncoding RNAs (ncRNAs) play important roles in various cellular activities and diseases. In this paper, we presented a comprehensive review on computational methods for ncRNA prediction, which are generally grouped into four categories: (1) homology-based methods, that is, comparative methods involving evolutionarily conserved RNA sequences and structures, (2) de novo methods using RNA sequence and structure features, (3) transcriptional sequencing and assembling based methods, that is, methods designed for single and pair-ended reads generated from next-generation RNA sequencing, and (4) RNA family specific methods, for example, methods specific for microRNAs and long noncoding RNAs. In the end, we summarized the advantages and limitations of these methods and pointed out a few possible future directions for ncRNA prediction. In conclusion, many computational methods have been demonstrated to be effective in predicting ncRNAs for further experimental validation. They are critical in reducing the huge number of potential ncRNAs and pointing the community to high confidence candidates. In the future, high efficient mapping technology and more intrinsic sequence features (e.g., motif and k-mer frequencies) and structure features (e.g., minimum free energy, conserved stem-loop, or graph structures) are suggested to be combined with the next- and third-generation sequencing platforms to improve ncRNA prediction.
Collapse
|
7
|
Barman RK, Mukhopadhyay A, Das S. An improved method for identification of small non-coding RNAs in bacteria using support vector machine. Sci Rep 2017; 7:46070. [PMID: 28383059 PMCID: PMC5382675 DOI: 10.1038/srep46070] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 03/08/2017] [Indexed: 12/25/2022] Open
Abstract
Bacterial small non-coding RNAs (sRNAs) are not translated into proteins, but act as functional RNAs. They are involved in diverse biological processes like virulence, stress response and quorum sensing. Several high-throughput techniques have enabled identification of sRNAs in bacteria, but experimental detection remains a challenge and grossly incomplete for most species. Thus, there is a need to develop computational tools to predict bacterial sRNAs. Here, we propose a computational method to identify sRNAs in bacteria using support vector machine (SVM) classifier. The primary sequence and secondary structure features of experimentally-validated sRNAs of Salmonella Typhimurium LT2 (SLT2) was used to build the optimal SVM model. We found that a tri-nucleotide composition feature of sRNAs achieved an accuracy of 88.35% for SLT2. We validated the SVM model also on the experimentally-detected sRNAs of E. coli and Salmonella Typhi. The proposed model had robustly attained an accuracy of 81.25% and 88.82% for E. coli K-12 and S. Typhi Ty2, respectively. We confirmed that this method significantly improved the identification of sRNAs in bacteria. Furthermore, we used a sliding window-based method and identified sRNAs from complete genomes of SLT2, S. Typhi Ty2 and E. coli K-12 with sensitivities of 89.09%, 83.33% and 67.39%, respectively.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, National Institute Of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, National Institute Of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Division of Clinical Medicine, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| |
Collapse
|
8
|
A Review of Computational Methods for Finding Non-Coding RNA Genes. Genes (Basel) 2016; 7:genes7120113. [PMID: 27918472 PMCID: PMC5192489 DOI: 10.3390/genes7120113] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 11/04/2016] [Accepted: 11/17/2016] [Indexed: 12/19/2022] Open
Abstract
Finding non-coding RNA (ncRNA) genes has emerged over the past few years as a cutting-edge trend in bioinformatics. There are numerous computational intelligence (CI) challenges in the annotation and interpretation of ncRNAs because it requires a domain-related expert knowledge in CI techniques. Moreover, there are many classes predicted yet not experimentally verified by researchers. Recently, researchers have applied many CI methods to predict the classes of ncRNAs. However, the diverse CI approaches lack a definitive classification framework to take advantage of past studies. A few review papers have attempted to summarize CI approaches, but focused on the particular methodological viewpoints. Accordingly, in this article, we summarize in greater detail than previously available, the CI techniques for finding ncRNAs genes. We differentiate from the existing bodies of research and discuss concisely the technical merits of various techniques. Lastly, we review the limitations of ncRNA gene-finding CI methods with a point-of-view towards the development of new computational tools.
Collapse
|
9
|
Pian C, Zhang G, Chen Z, Chen Y, Zhang J, Yang T, Zhang L. LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature. PLoS One 2016; 11:e0154567. [PMID: 27228152 PMCID: PMC4882039 DOI: 10.1371/journal.pone.0154567] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 04/15/2016] [Indexed: 12/31/2022] Open
Abstract
As a novel class of noncoding RNAs, long noncoding RNAs (lncRNAs) have been verified to be associated with various diseases. As large scale transcripts are generated every year, it is significant to accurately and quickly identify lncRNAs from thousands of assembled transcripts. To accurately discover new lncRNAs, we develop a classification tool of random forest (RF) named LncRNApred based on a new hybrid feature. This hybrid feature set includes three new proposed features, which are MaxORF, RMaxORF and SNR. LncRNApred is effective for classifying lncRNAs and protein coding transcripts accurately and quickly. Moreover,our RF model only requests the training using data on human coding and non-coding transcripts. Other species can also be predicted by using LncRNApred. The result shows that our method is more effective compared with the Coding Potential Calculate (CPC). The web server of LncRNApred is available for free at http://mm20132014.wicp.net:57203/LncRNApred/home.jsp.
Collapse
Affiliation(s)
- Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Guangle Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Zhi Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Jin Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Tao Yang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| |
Collapse
|
10
|
Computational Detection of piRNA in Human Using Support Vector Machine. Avicenna J Med Biotechnol 2016; 8:36-41. [PMID: 26855734 PMCID: PMC4717465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Piwi-interacting RNAs (piRNAs) are small non-coding RNAs (ncRNAs), with a length of about 24-32 nucleotides, which have been discovered recently. These ncRNAs play an important role in germline development, transposon silencing, epigenetic regulation, protecting the genome from invasive transposable elements, and the pathophysiology of diseases such as cancer. piRNA identification is challenging due to the lack of conserved piRNA sequences and structural elements. METHODS To detect piRNAs, an appropriate feature set, including 8 diverse feature groups to encode each RNA was applied. In addition, a Support Vector Machine (SVM) classifier was used with optimized parameters for RNA classification. According to the obtained results, the classification performance using the optimized feature subsets was much higher than the one in previously published studies. RESULTS Our results revealed 98% accuracy, Mathew' correlation coefficient of 98% and 99% specificity in discriminating piRNAs from the other RNAs. Also, the obtained results show that the proposed method outperforms its competitors. CONCLUSION In this paper, a prediction method was proposed to identify piRNA in human. Also, 48 heterogeneous features (sequence and structural features) were used to encode RNAs. To assess the performance of the method, a benchmark dataset containing 515 piRNAs and 1206 types of other RNAs was constructed. Our method reached the accuracy of 99% on the benchmark dataset. Also, our analysis revealed that the structural features are the most contributing features in piRNA prediction.
Collapse
|
11
|
Rau MH, Bojanovič K, Nielsen AT, Long KS. Differential expression of small RNAs under chemical stress and fed-batch fermentation in E. coli. BMC Genomics 2015; 16:1051. [PMID: 26653712 PMCID: PMC4676190 DOI: 10.1186/s12864-015-2231-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 11/18/2015] [Indexed: 01/03/2023] Open
Abstract
Background Bacterial small RNAs (sRNAs) are recognized as posttranscriptional regulators involved in the control of bacterial lifestyle and adaptation to stressful conditions. Although chemical stress due to the toxicity of precursor and product compounds is frequently encountered in microbial bioprocessing applications, the involvement of sRNAs in this process is not well understood. We have used RNA sequencing to map sRNA expression in E. coli under chemical stress and high cell density fermentation conditions with the aim of identifying sRNAs involved in the transcriptional response and those with potential roles in stress tolerance. Results RNA sequencing libraries were prepared from RNA isolated from E. coli K-12 MG1655 cells grown under high cell density fermentation conditions or subjected to chemical stress with twelve compounds including four organic solvent-like compounds, four organic acids, two amino acids, geraniol and decanoic acid. We have discovered 253 novel intergenic transcripts with this approach, adding to the roughly 200 intergenic sRNAs previously reported in E. coli. There are eighty-four differentially expressed sRNAs during fermentation, of which the majority are novel, supporting possible regulatory roles for these transcripts in adaptation during different fermentation stages. There are a total of 139 differentially expressed sRNAs under chemical stress conditions, where twenty-nine exhibit significant expression changes in multiple tested conditions, suggesting that they may be involved in a more general chemical stress response. Among those with known functions are sRNAs involved in regulation of outer membrane proteins, iron availability, maintaining envelope homeostasis, as well as sRNAs incorporated into complex networks controlling motility and biofilm formation. Conclusions This study has used deep sequencing to reveal a wealth of hitherto undescribed sRNAs in E. coli and provides an atlas of sRNA expression during seventeen different growth and stress conditions. Although the number of novel sRNAs with regulatory functions is unknown, several exhibit specific expression patterns during high cell density fermentation and are differentially expressed in the presence of multiple chemicals, suggesting they may play regulatory roles during these stress conditions. These novel sRNAs, together with specific known sRNAs, are candidates for improving stress tolerance and our understanding of the E. coli regulatory network during fed-batch fermentation. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2231-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin Holm Rau
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Allé 6, 2970, Hørsholm, Denmark.
| | - Klara Bojanovič
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Allé 6, 2970, Hørsholm, Denmark.
| | - Alex Toftgaard Nielsen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Allé 6, 2970, Hørsholm, Denmark.
| | - Katherine S Long
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kogle Allé 6, 2970, Hørsholm, Denmark.
| |
Collapse
|
12
|
Li D, Shao F, Lu S. Identification and characterization of mRNA-like noncoding RNAs in Salvia miltiorrhiza. PLANTA 2015; 241:1131-43. [PMID: 25601000 DOI: 10.1007/s00425-015-2246-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 01/09/2015] [Indexed: 05/20/2023]
Abstract
Identification and characterization of 5,446 mlncRNAs from Salvia miltiorrhiza showed that the majority of identified mlncRNAs were stress responsive, providing a framework for elucidating mlncRNA functions in S. miltiorrhiza. mRNA-like noncoding RNAs (mlncRNAs) are transcribed by RNA polymerase II and are polyadenylated, capped and spliced. They play important roles in plant development and defense responses. However, there is no information available for mlncRNAs in Salvia miltiorrhiza Bunge, the first Chinese medicinal material entering the international market. To perform a transcriptome-wide identification of S. miltiorrhiza mlncRNAs, we assembled over 8 million RNA-seq reads from GenBank database and 5,624 ESTs from PlantGDB into 44422 unigenes. Using a computational identification pipeline, we identified 5446 S. miltiorrhiza mlncRNA candidates from the assembled unigenes. Of the 5446 mlncRNAs, 2 are primary transcripts of conserved miRNAs, and 2030 can be grouped into 470 families with at least two members in a family. Quantitative real-time PCR analysis of mlncRNAs with at least 900 nt showed that the majority were differentially expressed in roots, stems, leaves and flowers and responsive to methyl jasmonate (MeJA) treatment in S. miltiorrhiza. Analysis of published RNA-seq data showed that a total of 3,044 mlncRNAs were expressed in hairy roots of S. miltiorrhiza and the expression of 1,904 of the 3,044 mlncRNAs was altered by yeast extract and Ag(+) treatment. The results indicate that the majority of mlncRNAs are involved in plant response to stress. This study provides a framework for understanding the roles of mlncRNAs in S. miltiorrhiza.
Collapse
Affiliation(s)
- Dongqiao Li
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China,
| | | | | |
Collapse
|
13
|
Manzourolajdad A, Arnold J. Secondary structural entropy in RNA switch (Riboswitch) identification. BMC Bioinformatics 2015; 16:133. [PMID: 25928324 PMCID: PMC4448311 DOI: 10.1186/s12859-015-0523-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 03/02/2015] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND RNA regulatory elements play a significant role in gene regulation. Riboswitches, a widespread group of regulatory RNAs, are vital components of many bacterial genomes. These regulatory elements generally function by forming a ligand-induced alternative fold that controls access to ribosome binding sites or other regulatory sites in RNA. Riboswitch-mediated mechanisms are ubiquitous across bacterial genomes. A typical class of riboswitch has its own unique structural and biological complexity, making de novo riboswitch identification a formidable task. Traditionally, riboswitches have been identified through comparative genomics based on sequence and structural homology. The limitations of structural-homology-based approaches, coupled with the assumption that there is a great diversity of undiscovered riboswitches, suggests the need for alternative methods for riboswitch identification, possibly based on features intrinsic to their structure. As of yet, no such reliable method has been proposed. RESULTS We used structural entropy of riboswitch sequences as a measure of their secondary structural dynamics. Entropy values of a diverse set of riboswitches were compared to that of their mutants, their dinucleotide shuffles, and their reverse complement sequences under different stochastic context-free grammar folding models. Significance of our results was evaluated by comparison to other approaches, such as the base-pairing entropy and energy landscapes dynamics. Classifiers based on structural entropy optimized via sequence and structural features were devised as riboswitch identifiers and tested on Bacillus subtilis, Escherichia coli, and Synechococcus elongatus as an exploration of structural entropy based approaches. The unusually long untranslated region of the cotH in Bacillus subtilis, as well as upstream regions of certain genes, such as the sucC genes were associated with significant structural entropy values in genome-wide examinations. CONCLUSIONS Various tests show that there is in fact a relationship between higher structural entropy and the potential for the RNA sequence to have alternative structures, within the limitations of our methodology. This relationship, though modest, is consistent across various tests. Understanding the behavior of structural entropy as a fairly new feature for RNA conformational dynamics, however, may require extensive exploratory investigation both across RNA sequences and folding models.
Collapse
Affiliation(s)
- Amirhossein Manzourolajdad
- Institute of Bioinformatics, University of Georgia, Davison Life Sciences Bldg, Room B118B, 120 Green St, Athens, 30602, USA. .,National Center for Biotechnology Information (NCBI), NIH, Building 38A, RM 6S614K, 8600 Rockville Pike, Bethesda, 20894, USA.
| | - Jonathan Arnold
- Institute of Bioinformatics, University of Georgia, Davison Life Sciences Bldg, Room B118B, 120 Green St, Athens, 30602, USA. .,Department of Genetics, University of Georgia, Davison Life Sciences Bldg, 120 Green St, Athens, 30602, USA.
| |
Collapse
|
14
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 2014; 42:e93. [PMID: 24771344 PMCID: PMC4066759 DOI: 10.1093/nar/gku325] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/02/2014] [Accepted: 04/07/2014] [Indexed: 12/13/2022] Open
Abstract
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| | - Chakarida Nukoolkit
- School of Information Technology, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Boonserm Kaewkamnerdpong
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| |
Collapse
|
15
|
Lopes IDON, Schliep A, de Carvalho ACPDLF. The discriminant power of RNA features for pre-miRNA recognition. BMC Bioinformatics 2014; 15:124. [PMID: 24884650 PMCID: PMC4046174 DOI: 10.1186/1471-2105-15-124] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 04/08/2014] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Computational discovery of microRNAs (miRNA) is based on pre-determined sets of features from miRNA precursors (pre-miRNA). Some feature sets are composed of sequence-structure patterns commonly found in pre-miRNAs, while others are a combination of more sophisticated RNA features. In this work, we analyze the discriminant power of seven feature sets, which are used in six pre-miRNA prediction tools. The analysis is based on the classification performance achieved with these feature sets for the training algorithms used in these tools. We also evaluate feature discrimination through the F-score and feature importance in the induction of random forests. RESULTS Small or non-significant differences were found among the estimated classification performances of classifiers induced using sets with diversification of features, despite the wide differences in their dimension. Inspired in these results, we obtained a lower-dimensional feature set, which achieved a sensitivity of 90% and a specificity of 95%. These estimates are within 0.1% of the maximal values obtained with any feature set (SELECT, Section "Results and discussion") while it is 34 times faster to compute. Even compared to another feature set (FS2, see Section "Results and discussion"), which is the computationally least expensive feature set of those from the literature which perform within 0.1% of the maximal values, it is 34 times faster to compute. The results obtained by the tools used as references in the experiments carried out showed that five out of these six tools have lower sensitivity or specificity. CONCLUSION In miRNA discovery the number of putative miRNA loci is in the order of millions. Analysis of putative pre-miRNAs using a computationally expensive feature set would be wasteful or even unfeasible for large genomes. In this work, we propose a relatively inexpensive feature set and explore most of the learning aspects implemented in current ab-initio pre-miRNA prediction tools, which may lead to the development of efficient ab-initio pre-miRNA discovery tools.The material to reproduce the main results from this paper can be downloaded from http://bioinformatics.rutgers.edu/Static/Software/discriminant.tar.gz.
Collapse
Affiliation(s)
- Ivani de O N Lopes
- Empresa Brasileira de Pesquisa Agropecuária, Embrapa Soja, Caixa Postal 231, Londrina-PR, CEP 86001-970, Brasil.
| | | | | |
Collapse
|
16
|
Wang C, Wei L, Guo M, Zou Q. Computational approaches in detecting non- coding RNA. Curr Genomics 2014; 14:371-7. [PMID: 24396270 PMCID: PMC3861888 DOI: 10.2174/13892029113149990005] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 07/18/2013] [Accepted: 07/18/2013] [Indexed: 12/21/2022] Open
Abstract
The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Leyi Wei
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Quan Zou
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| |
Collapse
|
17
|
Khoo JS, Chai SF, Mohamed R, Nathan S, Firdaus-Raih M. Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs. BMC Genomics 2012; 13 Suppl 7:S13. [PMID: 23282220 PMCID: PMC3521395 DOI: 10.1186/1471-2164-13-s7-s13] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The sRNAs of bacterial pathogens are known to be involved in various cellular roles including environmental adaptation as well as regulation of virulence and pathogenicity. It is expected that sRNAs may also have similar functions for Burkholderia pseudomallei, a soil bacterium that can adapt to diverse environmental conditions, which causes the disease melioidosis and is also able to infect a wide variety of hosts. RESULTS By integrating several proven sRNA prediction programs into a computational pipeline, available Burkholderia spp. genomes were screened to identify sRNA gene candidates. Orthologous sRNA candidates were then identified via comparative analysis. From the total prediction, 21 candidates were found to have Rfam homologs. RT-PCR and sequencing of candidate sRNA genes of unknown functions revealed six putative sRNAs which were highly conserved in Burkholderia spp. and two that were unique to B. pseudomallei present in a normal culture conditions transcriptome. The validated sRNAs include potential cis-acting elements associated with the modulation of methionine metabolism and one B. pseudomallei-specific sRNA that is expected to bind to the Hfq protein. CONCLUSIONS The use of the pipeline developed in this study and subsequent comparative analysis have successfully aided in the discovery and shortlisting of sRNA gene candidates for validation. This integrated approach identified 29 B. pseudomallei sRNA genes - of which 21 have Rfam homologs and 8 are novel.
Collapse
Affiliation(s)
- Jia-Shiun Khoo
- School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | | | | | | | | |
Collapse
|
18
|
Li W, Ying X, Lu Q, Chen L. Predicting sRNAs and their targets in bacteria. GENOMICS PROTEOMICS & BIOINFORMATICS 2012. [PMID: 23200137 PMCID: PMC5054197 DOI: 10.1016/j.gpb.2012.09.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Bacterial small RNAs (sRNAs) are an emerging class of regulatory RNAs of about 40–500 nucleotides in length and, by binding to their target mRNAs or proteins, get involved in many biological processes such as sensing environmental changes and regulating gene expression. Thus, identification of bacterial sRNAs and their targets has become an important part of sRNA biology. Current strategies for discovery of sRNAs and their targets usually involve bioinformatics prediction followed by experimental validation, emphasizing a key role for bioinformatics prediction. Here, therefore, we provided an overview on prediction methods, focusing on the merits and limitations of each class of models. Finally, we will present our thinking on developing related bioinformatics models in future.
Collapse
Affiliation(s)
- Wuju Li
- Beijing Institute of Basic Medical Sciences, Beijing 100850, China.
| | | | | | | |
Collapse
|
19
|
Chen XS, Brown CM. Computational identification of new structured cis-regulatory elements in the 3'-untranslated region of human protein coding genes. Nucleic Acids Res 2012; 40:8862-73. [PMID: 22821558 PMCID: PMC3467077 DOI: 10.1093/nar/gks684] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Revised: 06/15/2012] [Accepted: 06/20/2012] [Indexed: 01/14/2023] Open
Abstract
Messenger ribonucleic acids (RNAs) contain a large number of cis-regulatory RNA elements that function in many types of post-transcriptional regulation. These cis-regulatory elements are often characterized by conserved structures and/or sequences. Although some classes are well known, given the wide range of RNA-interacting proteins in eukaryotes, it is likely that many new classes of cis-regulatory elements are yet to be discovered. An approach to this is to use computational methods that have the advantage of analysing genomic data, particularly comparative data on a large scale. In this study, a set of structural discovery algorithms was applied followed by support vector machine (SVM) classification. We trained a new classification model (CisRNA-SVM) on a set of known structured cis-regulatory elements from 3'-untranslated regions (UTRs) and successfully distinguished these and groups of cis-regulatory elements not been strained on from control genomic and shuffled sequences. The new method outperformed previous methods in classification of cis-regulatory RNA elements. This model was then used to predict new elements from cross-species conserved regions of human 3'-UTRs. Clustering of these elements identified new classes of potential cis-regulatory elements. The model, training and testing sets and novel human predictions are available at: http://mRNA.otago.ac.nz/CisRNA-SVM.
Collapse
Affiliation(s)
- Xiaowei Sylvia Chen
- Department of Biochemistry and Genetics Otago, University of Otago, Dunedin 9054, New Zealand.
| | | |
Collapse
|
20
|
Khandelwal G, Jayaram B. DNA-water interactions distinguish messenger RNA genes from transfer RNA genes. J Am Chem Soc 2012; 134:8814-6. [PMID: 22551381 DOI: 10.1021/ja3020956] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Physicochemical properties of DNA sequences as a guide to developing insights into genome organization has received little attention. Here, we utilize the energetics of DNA to further advance the knowledge on its language at a molecular level. Specifically, we ask the question whether physicochemical properties of different functional units on genomes differ. We extract intramolecular and solvation energies of different DNA base pair steps from a comprehensive set of molecular dynamics simulations. We then investigate the solvation behavior of DNA sequences coding for mRNAs and tRNAs. Distinguishing mRNA genes from tRNA genes is a tricky problem in genome annotation without assumptions on length of DNA and secondary structure of the product of transcription. We find that solvation energetics of DNA behaves as an extremely efficient property in discriminating 2,063,537 genes coding for mRNAs from 56,251 genes coding for tRNAs in all (~1500) completely sequenced prokaryotic genomes.
Collapse
Affiliation(s)
- Garima Khandelwal
- Department of Chemistry, Indian Institute of Technology Delhi, Hauz Khas, New Delhi-110016, India
| | | |
Collapse
|
21
|
Wu B, Li Y, Yan H, Ma Y, Luo H, Yuan L, Chen S, Lu S. Comprehensive transcriptome analysis reveals novel genes involved in cardiac glycoside biosynthesis and mlncRNAs associated with secondary metabolism and stress response in Digitalis purpurea. BMC Genomics 2012; 13:15. [PMID: 22233149 PMCID: PMC3269984 DOI: 10.1186/1471-2164-13-15] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 01/10/2012] [Indexed: 11/10/2022] Open
Abstract
Abstract Conclusions Through comprehensive transcriptome analysis, we not only identified 29 novel gene families potentially involved in the biosynthesis of cardiac glycosides but also characterized a large number of mlncRNAs. Our results suggest the importance of mlncRNAs in secondary metabolism and stress response in D. purpurea.
Collapse
Affiliation(s)
- Bin Wu
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, No,151, Malianwa North Road, Haidian District, Beijing 100193, China
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Chang X, Li Y, Ping J, Xing XB, Sun H, Jia P, Wang C, Li YY, Li YX. EcoBrowser: a web-based tool for visualizing transcriptome data of Escherichia coli. BMC Res Notes 2011; 4:405. [PMID: 21992408 PMCID: PMC3203075 DOI: 10.1186/1756-0500-4-405] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 10/13/2011] [Indexed: 11/23/2022] Open
Abstract
Background Escherichia coli has been extensively studied as a prokaryotic model organism whose whole genome was determined in 1997. However, it is difficult to identify all the gene products involved in diverse functions by using whole genome sequencesalone. The high-resolution transcriptome mapping using tiling arrays has proved effective to improve the annotation of transcript units and discover new transcripts of ncRNAs. While abundant tiling array data have been generated, the lack of appropriate visualization tools to accommodate and integrate multiple sources of data has emerged. Findings EcoBrowser is a web-based tool for visualizing genome annotations and transcriptome data of E. coli. Important tiling array data of E. coli from different experimental platforms are collected and processed for query. An AJAX based genome browser is embedded for visualization. Thus, genome annotations can be compared with transcript profiling and genome occupancy profiling from independent experiments, which will be helpful in discovering new transcripts including novel mRNAs and ncRNAs, generating a detailed description of the transcription unit architecture, further providing clues for investigation of prokaryotic transcriptional regulation that has proved to be far more complex than previously thought. Conclusions With the help of EcoBrowser, users can get a systemic view both from the vertical and parallel sides, as well as inspirations for the design of new experiments which will expand our understanding of the regulation mechanism.
Collapse
Affiliation(s)
- Xiao Chang
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Febrer M, McLay K, Caccamo M, Twomey KB, Ryan RP. Advances in bacterial transcriptome and transposon insertion-site profiling using second-generation sequencing. Trends Biotechnol 2011; 29:586-94. [PMID: 21764162 DOI: 10.1016/j.tibtech.2011.06.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Revised: 05/25/2011] [Accepted: 06/09/2011] [Indexed: 12/20/2022]
Abstract
The arrival of second-generation sequencing has revolutionized the study of bacteria within a short period. The sequence information generated from these platforms has helped in our understanding of bacterial development, adaptation and diversity and how bacteria cause disease. Furthermore, these technologies have quickly been adapted for high-throughput studies that were previously performed using DNA cloning or microarray-based applications. This has facilitated a more comprehensive study of bacterial transcriptomes through RNA sequencing (RNA-Seq) and the systematic determination of gene function by 'transposon monitoring'. In this review, we provide an outline of these powerful tools and the in silico analyses used in their application, and also highlight the biological questions being addressed in these approaches.
Collapse
Affiliation(s)
- Melanie Febrer
- The Genome Analysis Centre, Norwich Research Park, Colney Lane, Norwich NR4 7UH, UK
| | | | | | | | | |
Collapse
|
24
|
Abstract
The intergenic regions in bacterial genomes can contain regulatory leader sequences and small RNAs (sRNAs), which both serve to modulate gene expression. Computational analyses have predicted the presence of hundreds of these noncoding regulatory RNAs in Escherichia coli; however, only about 80 have been experimentally validated. By applying a deep-sequencing approach, we detected and quantified the vast majority of the previously validated regulatory elements and identified 10 new sRNAs and nine new regulatory leader sequences in the intergenic regions of E. coli. Half of the newly discovered sRNAs displayed enhanced stability in the presence of the RNA-binding protein Hfq, which is vital to the function of many of the known E. coli sRNAs. Whereas previous methods have often relied on phylogenetic conservation to identify regulatory leader sequences, only five of the newly discovered E. coli leader sequences were present in the genomes of other enteric species. For those newly identified regulatory elements having orthologs in Salmonella, evolutionary analyses showed that these regions encoded new noncoding elements rather than small, unannotated protein-coding transcripts. In addition to discovering new noncoding regulatory elements, we validated 53 sRNAs that were previously predicted but never detected and showed that the presence, within intergenic regions, of σ(70) promoters and sequences with compensatory mutations that maintain stable RNA secondary structures across related species is a good predictor of novel sRNAs.
Collapse
|
25
|
Herbig A, Nieselt K. nocoRNAc: characterization of non-coding RNAs in prokaryotes. BMC Bioinformatics 2011; 12:40. [PMID: 21281482 PMCID: PMC3230914 DOI: 10.1186/1471-2105-12-40] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 01/31/2011] [Indexed: 11/10/2022] Open
Abstract
Background The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not. Results We present NOCORNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. NOCORNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and NOCORNAc to the genome of Streptomyces coelicolor and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner. Conclusions We have developed NOCORNAc, a framework that facilitates the automated characterization of functional ncRNAs. NOCORNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. NOCORNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at http://www.zbit.uni-tuebingen.de/pas/nocornac.htm.
Collapse
Affiliation(s)
- Alexander Herbig
- Center for Bioinformatics Tübingen, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | | |
Collapse
|
26
|
Pánek J, Krásny L, Bobek J, Jezková E, Korelusová J, Vohradsky J. The suboptimal structures find the optimal RNAs: homology search for bacterial non-coding RNAs using suboptimal RNA structures. Nucleic Acids Res 2010; 39:3418-26. [PMID: 21193488 PMCID: PMC3082871 DOI: 10.1093/nar/gkq1186] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Non-coding RNAs (ncRNAs) are regulatory molecules encoded in the intergenic or intragenic regions of the genome. In prokaryotes, biocomputational identification of homologs of known ncRNAs in other species often fails due to weakly evolutionarily conserved sequences, structures, synteny and genome localization, except in the case of evolutionarily closely related species. To eliminate results from weak conservation, we focused on RNA structure, which is the most conserved ncRNA property. Analysis of the structure of one of the few well-studied bacterial ncRNAs, 6S RNA, demonstrated that unlike optimal and consensus structures, suboptimal structures are capable of capturing RNA homology even in divergent bacterial species. A computational procedure for the identification of homologous ncRNAs using suboptimal structures was created. The suggested procedure was applied to strongly divergent bacterial species and was capable of identifying homologous ncRNAs.
Collapse
Affiliation(s)
- Josef Pánek
- Laboratory of Bioinformatics, Institute of Microbiology, Academy of Sciences of the Czech Republic, Vídeňská 1073, 14220 Prague, Czech Republic.
| | | | | | | | | | | |
Collapse
|
27
|
Stead MB, Marshburn S, Mohanty BK, Mitra J, Pena Castillo L, Ray D, van Bakel H, Hughes TR, Kushner SR. Analysis of Escherichia coli RNase E and RNase III activity in vivo using tiling microarrays. Nucleic Acids Res 2010; 39:3188-203. [PMID: 21149258 PMCID: PMC3082872 DOI: 10.1093/nar/gkq1242] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Tiling microarrays have proven to be a valuable tool for gaining insights into the transcriptomes of microbial organisms grown under various nutritional or stress conditions. Here, we describe the use of such an array, constructed at the level of 20 nt resolution for the Escherichia coli MG1655 genome, to observe genome-wide changes in the steady-state RNA levels in mutants defective in either RNase E or RNase III. The array data were validated by comparison to previously published results for a variety of specific transcripts as well as independent northern analysis of additional mRNAs and sRNAs. In the absence of RNase E, 60% of the annotated coding sequences showed either increases or decreases in their steady-state levels. In contrast, only 12% of the coding sequences were affected in the absence of RNase III. Unexpectedly, many coding sequences showed decreased abundance in the RNase E mutant, while more than half of the annotated sRNAs showed changes in abundance. Furthermore, the steady-state levels of many transcripts showed overlapping effects of both ribonucleases. Data are also presented demonstrating how the arrays were used to identify potential new genes, RNase III cleavage sites and the direct or indirect control of specific biological pathways.
Collapse
Affiliation(s)
- Mark B Stead
- Department of Genetics, University of Georgia, Athens, GA 30605, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
Using an oligonucleotide microarray, we searched for previously unrecognized transcription units in intergenic regions in the genome of Bacillus subtilis, with an emphasis on identifying small genes activated during spore formation. Nineteen transcription units were identified, 11 of which were shown to depend on one or more sporulation-regulatory proteins for their expression. A high proportion of the transcription units contained small, functional open reading frames (ORFs). One such newly identified ORF is a member of a family of six structurally similar genes that are transcribed under the control of sporulation transcription factor σ(E) or σ(K). A multiple mutant lacking all six genes was found to sporulate with slightly higher efficiency than the wild type, suggesting that under standard laboratory conditions the expression of these genes imposes a small cost on the production of heat-resistant spores. Finally, three of the transcription units specified small, noncoding RNAs; one of these was under the control of the sporulation transcription factor σ(E), and another was under the control of the motility sigma factor σ(D).
Collapse
|
29
|
Chen F, Chen YPP. Exploring the ncRNA-ncRNA patterns based on bridging rules. J Biomed Inform 2010; 43:569-77. [PMID: 20152932 DOI: 10.1016/j.jbi.2010.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Revised: 11/11/2009] [Accepted: 02/05/2010] [Indexed: 10/19/2022]
Abstract
ncRNAs play an important role in the regulation of gene expression. However, many of their functions have not yet been fully discovered. There are complicated relationships between ncRNAs in different categories. Finding these relationships can contribute to identify ncRNAs' functions and properties. We extend the association rule to represent the relationship between two ncRNAs. Based on this rule, we can speculate the ncRNA's function when it interacts with other ncRNAs. We propose two measures to explore the relationships between ncRNAs in different categories. Entropy theory is to calculate how close two ncRNAs are. Association rule is to represent the interactions between ncRNAs. We use three datasets from miRBase and RNAdb. Two from miRBase are designed for finding relationships between miRNAs; the other from RNAdb is designed for relationships among miRNA, snoRNA and piRNA. We evaluate our measures from both biological significance and performance perspectives. All the cross-species patterns regarding miRNA that we found are proven correct using miRNAMap 2.0. In addition, we find novel cross-genomes patterns such as (hsa-mir-190b-->hsa-mir-153-2). According to the patterns we find, we can (1) explore one ncRNA's function from another with known function and (2) speculate the functions of both of them based on the relationship even we do no understand either of them. Our methods' merits also include: (1) they are suitable for any ncRNA datasets and (2) they are not sensitive to the parameters.
Collapse
Affiliation(s)
- Feng Chen
- Faculty of Science, Technology and Engineering, La Trobe University, Bundoora, Vic. 3086, Australia
| | | |
Collapse
|