1
|
Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. RESEARCH SQUARE 2025:rs.3.rs-5664139. [PMID: 40195991 PMCID: PMC11974997 DOI: 10.21203/rs.3.rs-5664139/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (everything). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (everywhere). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (all-at-once). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
Collapse
|
2
|
Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.628809. [PMID: 39764046 PMCID: PMC11702757 DOI: 10.1101/2024.12.17.628809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (everything). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (everywhere). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (all-at-once). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
Collapse
|
3
|
Zhang Y, Wu Q, Forsythe S, Liu C, Chen N, Li Y, Zhang J, Wang J, Ding Y. The cascade regulation of small RNA and quorum sensing system: Focusing on biofilm formation of foodborne pathogens in food industry. FOOD BIOSCI 2023. [DOI: 10.1016/j.fbio.2023.102472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
4
|
Komarova ES, Dontsova OA, Pyshnyi DV, Kabilov MR, Sergiev PV. Flow-Seq Method: Features and Application in Bacterial Translation Studies. Acta Naturae 2022; 14:20-37. [PMID: 36694903 PMCID: PMC9844084 DOI: 10.32607/actanaturae.11820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/11/2022] [Indexed: 01/22/2023] Open
Abstract
The Flow-seq method is based on using reporter construct libraries, where a certain element regulating the gene expression of fluorescent reporter proteins is represented in many thousands of variants. Reporter construct libraries are introduced into cells, sorted according to their fluorescence level, and then subjected to next-generation sequencing. Therefore, it turns out to be possible to identify patterns that determine the expression efficiency, based on tens and hundreds of thousands of reporter constructs in one experiment. This method has become common in evaluating the efficiency of protein synthesis simultaneously by multiple mRNA variants. However, its potential is not confined to this area. In the presented review, a comparative analysis of the Flow-seq method and other alternative approaches used for translation efficiency evaluation of mRNA was carried out; the features of its application and the results obtained by Flow-seq were also considered.
Collapse
Affiliation(s)
- E. S. Komarova
- Institute of Functional Genomics, Lomonosov Moscow State University, Moscow, 119234 Russia
| | - O. A. Dontsova
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119234 Russia
- Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119234 Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117437 Russia
| | - D. V. Pyshnyi
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia
| | - M. R. Kabilov
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia
| | - P. V. Sergiev
- Institute of Functional Genomics, Lomonosov Moscow State University, Moscow, 119234 Russia
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119234 Russia
- Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, 119234 Russia
| |
Collapse
|
5
|
González-Tortuero E, Anthon C, Havgaard JH, Geissler AS, Breüner A, Hjort C, Gorodkin J, Seemann SE. The Bacillaceae-1 RNA motif comprises two distinct classes. Gene 2022; 841:146756. [PMID: 35905857 DOI: 10.1016/j.gene.2022.146756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/10/2022] [Accepted: 07/24/2022] [Indexed: 11/04/2022]
Abstract
Non-coding RNAs are key regulatory players in bacteria. Many computationally predicted non-coding RNAs, however, lack functional associations. An example is the Bacillaceae-1 RNA motif, whose Rfam model consists of two hairpin loops. We find the motif conserved in nine of 13 non-pathogenic strains of the genus Bacillus but only in one pathogenic strain. To elucidate functional characteristics, we studied 118 hits of the Rfam model in 11 Bacillus spp. and found two distinct classes based on the ensemble diversity of their RNA secondary structure and the genomic context concerning the ribosomal RNA (rRNA) cluster. Forty hits are associated with the rRNA cluster, of which all 19 hits upstream flanking of 16S rRNA have a reverse complementary structure of low structural diversity. Fifty-two hits have large ensemble diversity, of which 38 are located between two coding genes. For eight hits in Bacillus subtilis, we investigated public expression data under various conditions and observed either the forward or the reverse complementary motif expressed. Five hits are associated with the rRNA cluster. Four of them are located upstream of the 16S rRNA and are not transcriptionally active, but instead, their reverse complements with low structural diversity are expressed together with the rRNA cluster. The three other hits are located between two coding genes in non-conserved genomic loci. Two of them are independently expressed from their surrounding genes and are structurally diverse. In summary, we found that Bacillaceae-1 RNA motifs upstream flanking of ribosomal RNA clusters tend to have one stable structure with the reverse complementary motif expressed in B. subtilis. In contrast, a subgroup of intergenic motifs has the thermodynamic potential for structural switches.
Collapse
Affiliation(s)
- Enrique González-Tortuero
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Christian Anthon
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Jakob H Havgaard
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Adrian S Geissler
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | | | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health (RTH), Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| |
Collapse
|
6
|
Bhandari BK, Lim CS, Remus DM, Chen A, van Dolleweerd C, Gardner PP. Analysis of 11,430 recombinant protein production experiments reveals that protein yield is tunable by synonymous codon changes of translation initiation sites. PLoS Comput Biol 2021; 17:e1009461. [PMID: 34610008 PMCID: PMC8519471 DOI: 10.1371/journal.pcbi.1009461] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 10/15/2021] [Accepted: 09/19/2021] [Indexed: 12/16/2022] Open
Abstract
Recombinant protein production is a key process in generating proteins of interest in the pharmaceutical industry and biomedical research. However, about 50% of recombinant proteins fail to be expressed in a variety of host cells. Here we show that the accessibility of translation initiation sites modelled using the mRNA base-unpairing across the Boltzmann's ensemble significantly outperforms alternative features. This approach accurately predicts the successes or failures of expression experiments, which utilised Escherichia coli cells to express 11,430 recombinant proteins from over 189 diverse species. On this basis, we develop TIsigner that uses simulated annealing to modify up to the first nine codons of mRNAs with synonymous substitutions. We show that accessibility captures the key propensity beyond the target region (initiation sites in this case), as a modest number of synonymous changes is sufficient to tune the recombinant protein expression levels. We build a stochastic simulation model and show that higher accessibility leads to higher protein production and slower cell growth, supporting the idea of protein cost, where cell growth is constrained by protein circuits during overexpression.
Collapse
Affiliation(s)
- Bikash K. Bhandari
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Daniela M. Remus
- Callaghan Innovation Protein Science and Engineering, University of Canterbury, Christchurch, New Zealand
| | - Augustine Chen
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Craig van Dolleweerd
- Biomolecular Interaction Center, University of Canterbury, Christchurch, New Zealand
| | - Paul P. Gardner
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
- Biomolecular Interaction Center, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
7
|
Bhandari BK, Lim CS, Gardner PP. TISIGNER.com: web services for improving recombinant protein production. Nucleic Acids Res 2021; 49:W654-W661. [PMID: 33744969 PMCID: PMC8265118 DOI: 10.1093/nar/gkab175] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/17/2021] [Accepted: 03/03/2021] [Indexed: 12/25/2022] Open
Abstract
Experiments that are planned using accurate prediction algorithms will mitigate failures in recombinant protein production. We have developed TISIGNER (https://tisigner.com) with the aim of addressing technical challenges to recombinant protein production. We offer three web services, TIsigner (Translation Initiation coding region designer), SoDoPE (Soluble Domain for Protein Expression) and Razor, which are specialised in synonymous optimisation of recombinant protein expression, solubility and signal peptide analysis, respectively. Importantly, TIsigner, SoDoPE and Razor are linked, which allows users to switch between the tools when optimising genes of interest.
Collapse
Affiliation(s)
- Bikash K Bhandari
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
| | - Paul P Gardner
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch 8140, New Zealand
| |
Collapse
|
8
|
Spetale FE, Murillo J, Villanova GV, Bulacio P, Tapia E. FGGA-lnc: automatic gene ontology annotation of lncRNA sequences based on secondary structures. Interface Focus 2021; 11:20200064. [PMID: 34123354 PMCID: PMC8193470 DOI: 10.1098/rsfs.2020.0064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2021] [Indexed: 02/01/2023] Open
Abstract
The study of long non-coding RNAs (lncRNAs), greater than 200 nucleotides, is central to understanding the development and progression of many complex diseases. Unlike proteins, the functionality of lncRNAs is only subtly encoded in their primary sequence. Current in-silico lncRNA annotation methods mostly rely on annotations inferred from interaction networks. But extensive experimental studies are required to build these networks. In this work, we present a graph-based machine learning method called FGGA-lnc for the automatic gene ontology (GO) annotation of lncRNAs across the three GO subdomains. We build upon FGGA (factor graph GO annotation), a computational method originally developed to annotate protein sequences from non-model organisms. In the FGGA-lnc version, a coding-based approach is introduced to fuse primary sequence and secondary structure information of lncRNA molecules. As a result, lncRNA sequences become sequences of a higher-order alphabet allowing supervised learning methods to assess individual GO-term annotations. Raw GO annotations obtained in this way are unaware of the GO structure and therefore likely to be inconsistent with it. The message-passing algorithm embodied by factor graph models overcomes this problem. Evaluations of the FGGA-lnc method on lncRNA data, from model and non-model organisms, showed promising results suggesting it as a candidate to satisfy the huge demand for functional annotations arising from high-throughput sequencing technologies.
Collapse
Affiliation(s)
- Flavio E. Spetale
- CIFASIS-Conicet-UNR, 27 de Febrero 210 bis, S2000EZP Rosario, Santa Fe, Argentina
- Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Riobamba 245 bis, S2000EZP Rosario, Argentina
| | - Javier Murillo
- CIFASIS-Conicet-UNR, 27 de Febrero 210 bis, S2000EZP Rosario, Santa Fe, Argentina
- Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Riobamba 245 bis, S2000EZP Rosario, Argentina
| | - Gabriela V. Villanova
- Laboratorio Mixto de Biotecnología Acuática (FCByF-UNR), Av. Eduardo Carrasco S/N, S2000EZP Rosario, Argentina
| | - Pilar Bulacio
- CIFASIS-Conicet-UNR, 27 de Febrero 210 bis, S2000EZP Rosario, Santa Fe, Argentina
- Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Riobamba 245 bis, S2000EZP Rosario, Argentina
| | - Elizabeth Tapia
- CIFASIS-Conicet-UNR, 27 de Febrero 210 bis, S2000EZP Rosario, Santa Fe, Argentina
- Facultad de Ciencias Exactas, Ingeniería y Agrimensura, Universidad Nacional de Rosario, Riobamba 245 bis, S2000EZP Rosario, Argentina
| |
Collapse
|
9
|
Sarrazin-Gendron R, Reinharz V, Oliver CG, Moitessier N, Waldispühl J. Automated, customizable and efficient identification of 3D base pair modules with BayesPairing. Nucleic Acids Res 2019; 47:3321-3332. [PMID: 30828711 PMCID: PMC6468301 DOI: 10.1093/nar/gkz102] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 02/06/2019] [Accepted: 02/28/2019] [Indexed: 12/12/2022] Open
Abstract
RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson–Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson–Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present BayesPairing, an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. BayesPairing uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that BayesPairing achieves up to ∼70% identification accuracy on module positions and base pair interactions. BayesPairing can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.
Collapse
Affiliation(s)
| | - Vladimir Reinharz
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea
| | - Carlos G Oliver
- School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada
| | - Nicolas Moitessier
- Department of Chemistry, McGill University, Montreal, QC H3A 0B8, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada
| |
Collapse
|
10
|
Sanchez IM, Purwin TJ, Chervoneva I, Erkes DA, Nguyen MQ, Davies MA, Nathanson KL, Kemper K, Peeper DS, Aplin AE. In Vivo ERK1/2 Reporter Predictively Models Response and Resistance to Combined BRAF and MEK Inhibitors in Melanoma. Mol Cancer Ther 2019; 18:1637-1648. [PMID: 31270153 PMCID: PMC6726573 DOI: 10.1158/1535-7163.mct-18-1056] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 05/02/2019] [Accepted: 06/25/2019] [Indexed: 01/08/2023]
Abstract
Combined BRAF and MEK inhibition is a standard of care in patients with advanced BRAF-mutant melanoma, but acquired resistance remains a challenge that limits response durability. Here, we quantitated in vivo ERK1/2 activity and tumor response associated with resistance to combined BRAF and MEK inhibition in mutant BRAF xenografts. We found that ERK1/2 pathway reactivation preceded the growth of resistant tumors. Moreover, we detected a subset of cells that not only persisted throughout long-term treatment but restored ERK1/2 signaling and grew upon drug removal. Cell lines derived from combination-resistant tumors (CRT) exhibited elevated ERK1/2 phosphorylation, which were sensitive to ERK1/2 inhibition. In some CRTs, we detected a tandem duplication of the BRAF kinase domain. Monitoring ERK1/2 activity in vivo was efficacious in predicting tumor response during intermittent treatment. We observed maintained expression of the mitotic regulator, polo-like kinase 1 (Plk1), in melanoma resistant to BRAF and MEK inhibitors. Plk1 inhibition induced apoptosis in CRTs, leading to slowed growth of BRAF and MEK inhibitor-resistant tumors in vivo These data demonstrate the utility of in vivo ERK1/2 pathway reporting as a tool to optimize clinical dosing schemes and establish suppression of Plk1 as potential salvage therapy for BRAF inhibitor and MEK inhibitor-resistant melanoma.
Collapse
Affiliation(s)
- Ileine M Sanchez
- Department of Cancer Biology, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Timothy J Purwin
- Department of Cancer Biology, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Inna Chervoneva
- Division of Biostatistics, Sidney Kimmel Cancer Center at Jefferson, Philadelphia, Pennsylvania
| | - Dan A Erkes
- Department of Cancer Biology, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Mai Q Nguyen
- Department of Cancer Biology, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Michael A Davies
- Department of Melanoma Medical Oncology, Division of Cancer Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Katherine L Nathanson
- Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
- Abramson Cancer Center, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Kristel Kemper
- Division of Molecular Oncology & Immunology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Daniel S Peeper
- Division of Molecular Oncology & Immunology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Andrew E Aplin
- Department of Cancer Biology, Thomas Jefferson University, Philadelphia, Pennsylvania.
- Department of Pharmacology and Experimental Therapeutics, Sidney Kimmel Cancer Center at Jefferson, Philadelphia, Pennsylvania
| |
Collapse
|
11
|
Genome Analysis of Hypomyces perniciosus, the Causal Agent of Wet Bubble Disease of Button Mushroom ( Agaricus bisporus). Genes (Basel) 2019; 10:genes10060417. [PMID: 31146507 PMCID: PMC6627653 DOI: 10.3390/genes10060417] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 05/17/2019] [Accepted: 05/27/2019] [Indexed: 12/14/2022] Open
Abstract
The mycoparasitic fungus Hypomyces perniciosus causes wet bubble disease of mushrooms, particularly Agaricus bisporus. The genome of a highly virulent strain of H. perniciosus HP10 was sequenced and compared to three other fungi from the order Hypocreales that cause disease on A. bisporus. H. perniciosus genome is ~44 Mb, encodes 10,077 genes and enriched with transposable elements up to 25.3%. Phylogenetic analysis revealed that H. perniciosus is closely related to Cladobotryum protrusum and diverged from their common ancestor ~156.7 million years ago. H. perniciosus has few secreted proteins compared to C. protrusum and Trichoderma virens, but significantly expanded protein families of transporters, protein kinases, CAZymes (GH 18), peptidases, cytochrome P450, and SMs that are essential for mycoparasitism and adaptation to harsh environments. This study provides insights into H. perniciosus evolution and pathogenesis and will contribute to the development of effective disease management strategies to control wet bubble disease.
Collapse
|
12
|
Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, Petrov AI. Non-Coding RNA Analysis Using the Rfam Database. ACTA ACUST UNITED AC 2018; 62:e51. [PMID: 29927072 DOI: 10.1002/cpbi.51] [Citation(s) in RCA: 264] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Ioanna Kalvari
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Eric P Nawrocki
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland
| | - Joanna Argasinska
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
13
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
14
|
Bayrak CS, Kim N, Schlick T. Using sequence signatures and kink-turn motifs in knowledge-based statistical potentials for RNA structure prediction. Nucleic Acids Res 2017; 45:5414-5422. [PMID: 28158755 PMCID: PMC5435971 DOI: 10.1093/nar/gkx045] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 01/22/2017] [Indexed: 12/15/2022] Open
Abstract
Kink turns are widely occurring motifs in RNA, located in internal loops and associated with many biological functions including translation, regulation and splicing. The associated sequence pattern, a 3-nt bulge and G-A, A-G base-pairs, generates an angle of ∼50° along the helical axis due to A-minor interactions. The conserved sequence and distinct secondary structures of kink-turns (k-turn) suggest computational folding rules to predict k-turn-like topologies from sequence. Here, we annotate observed k-turn motifs within a non-redundant RNA dataset based on sequence signatures and geometrical features, analyze bending and torsion angles, and determine distinct knowledge-based potentials with and without k-turn motifs. We apply these scoring potentials to our RAGTOP (RNA-As-Graph-Topologies) graph sampling protocol to construct and sample coarse-grained graph representations of RNAs from a given secondary structure. We present graph-sampling results for 35 RNAs, including 12 k-turn and 23 non k-turn internal loops, and compare the results to solved structures and to RAGTOP results without special k-turn potentials. Significant improvements are observed with the updated scoring potentials compared to the k-turn-free potentials. Because k-turns represent a classic example of sequence/structure motif, our study suggests that other such motifs with sequence signatures and unique geometrical features can similarly be utilized for RNA structure prediction and design.
Collapse
Affiliation(s)
- Cigdem Sevim Bayrak
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Namhee Kim
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Tamar Schlick
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| |
Collapse
|
15
|
Fakhry CT, Kulkarni P, Chen P, Kulkarni R, Zarringhalam K. Prediction of bacterial small RNAs in the RsmA (CsrA) and ToxT pathways: a machine learning approach. BMC Genomics 2017; 18:645. [PMID: 28830349 PMCID: PMC5568370 DOI: 10.1186/s12864-017-4057-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/14/2017] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Small RNAs (sRNAs) constitute an important class of post-transcriptional regulators that control critical cellular processes in bacteria. Recent research using high-throughput transcriptomic approaches has led to a dramatic increase in the discovery of bacterial sRNAs. However, it is generally believed that the currently identified sRNAs constitute a limited subset of the bacterial sRNA repertoire. In several cases, sRNAs belonging to a specific class are already known and the challenge is to identify additional sRNAs belonging to the same class. In such cases, machine-learning approaches can be used to predict novel sRNAs in a given class. METHODS In this work, we develop novel bioinformatics approaches that integrate sequence and structure-based features to train machine-learning models for the discovery of bacterial sRNAs. We show that features derived from recurrent structural motifs in the ensemble of low energy secondary structures can distinguish the RNA classes with high accuracy. RESULTS We apply this approach to predict new members in two broad classes of bacterial small RNAs: 1) sRNAs that bind to the RNA-binding protein RsmA/CsrA in diverse bacterial species and 2) sRNAs regulated by the master regulator of virulence, ToxT, in Vibrio cholerae. CONCLUSION The involvement of sRNAs in bacterial adaptation to changing environments is an increasingly recurring theme in current research in microbiology. It is likely that future research, combining experimental and computational approaches, will discover many more examples of sRNAs as components of critical regulatory pathways in bacteria. We have developed a novel approach for prediction of small RNA regulators in important bacterial pathways. This approach can be applied to specific classes of sRNAs for which several members have been identified and the challenge is to identify additional sRNAs.
Collapse
Affiliation(s)
- Carl Tony Fakhry
- Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Prajna Kulkarni
- Department of Physics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Ping Chen
- Department of Engineering, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Rahul Kulkarni
- Department of Physics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| | - Kourosh Zarringhalam
- Department of Mathematics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125 MA USA
| |
Collapse
|
16
|
Abstract
Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia. .,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia.
| | - John S Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia.,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia
| |
Collapse
|
17
|
Barquist L, Burge SW, Gardner PP. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:12.13.1-12.13.25. [PMID: 27322404 PMCID: PMC5010141 DOI: 10.1002/cpbi.4] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Lars Barquist
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, D-97080 Germany
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Sarah W. Burge
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
18
|
Reinharz V, Ponty Y, Waldispühl J. Combining structure probing data on RNA mutants with evolutionary information reveals RNA-binding interfaces. Nucleic Acids Res 2016; 44:e104. [PMID: 27095200 PMCID: PMC4914100 DOI: 10.1093/nar/gkw217] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 03/21/2016] [Indexed: 01/28/2023] Open
Abstract
Systematic structure probing experiments (e.g. SHAPE) of RNA mutants such as the mutate-and-map (MaM) protocol give us a direct access into the genetic robustness of ncRNA structures. Comparative studies of homologous sequences provide a distinct, yet complementary, approach to analyze structural and functional properties of non-coding RNAs. In this paper, we introduce a formal framework to combine the biochemical signal collected from MaM experiments, with the evolutionary information available in multiple sequence alignments. We apply neutral theory principles to detect complex long-range dependencies between nucleotides of a single stranded RNA, and implement these ideas into a software called aRNhAck. We illustrate the biological significance of this signal and show that the nucleotides networks calculated with aRNhAck are correlated with nucleotides located in RNA–RNA, RNA–protein, RNA–DNA and RNA–ligand interfaces. aRNhAck is freely available at http://csb.cs.mcgill.ca/arnhack.
Collapse
Affiliation(s)
- Vladimir Reinharz
- School of Computer Science, McGill University, Montreal, Québec H3A 0E9, Canada
| | - Yann Ponty
- Laboratoire d'informatique, École Polytechnique, 91128 Palaiseau, France
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montreal, Québec H3A 0E9, Canada
| |
Collapse
|
19
|
Van Roey K, Davey NE. Motif co-regulation and co-operativity are common mechanisms in transcriptional, post-transcriptional and post-translational regulation. Cell Commun Signal 2015; 13:45. [PMID: 26626130 PMCID: PMC4666095 DOI: 10.1186/s12964-015-0123-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 11/24/2015] [Indexed: 01/01/2023] Open
Abstract
A substantial portion of the regulatory interactions in the higher eukaryotic cell are mediated by simple sequence motifs in the regulatory segments of genes and (pre-)mRNAs, and in the intrinsically disordered regions of proteins. Although these regulatory modules are physicochemically distinct, they share an evolutionary plasticity that has facilitated a rapid growth of their use and resulted in their ubiquity in complex organisms. The ease of motif acquisition simplifies access to basal housekeeping functions, facilitates the co-regulation of multiple biomolecules allowing them to respond in a coordinated manner to changes in the cell state, and supports the integration of multiple signals for combinatorial decision-making. Consequently, motifs are indispensable for temporal, spatial, conditional and basal regulation at the transcriptional, post-transcriptional and post-translational level. In this review, we highlight that many of the key regulatory pathways of the cell are recruited by motifs and that the ease of motif acquisition has resulted in large networks of co-regulated biomolecules. We discuss how co-operativity allows simple static motifs to perform the conditional regulation that underlies decision-making in higher eukaryotic biological systems. We observe that each gene and its products have a unique set of DNA, RNA or protein motifs that encode a regulatory program to define the logical circuitry that guides the life cycle of these biomolecules, from transcription to degradation. Finally, we contrast the regulatory properties of protein motifs and the regulatory elements of DNA and (pre-)mRNAs, advocating that co-regulation, co-operativity, and motif-driven regulatory programs are common mechanisms that emerge from the use of simple, evolutionarily plastic regulatory modules.
Collapse
Affiliation(s)
- Kim Van Roey
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117, Heidelberg, Germany.
- Health Services Research Unit, Operational Direction Public Health and Surveillance, Scientific Institute of Public Health (WIV-ISP), 1050, Brussels, Belgium.
| | - Norman E Davey
- Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.
| |
Collapse
|
20
|
Zirbel CL, Roll J, Sweeney BA, Petrov AI, Pirrung M, Leontis NB. Identifying novel sequence variants of RNA 3D motifs. Nucleic Acids Res 2015; 43:7504-20. [PMID: 26130723 PMCID: PMC4551918 DOI: 10.1093/nar/gkv651] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 05/29/2015] [Indexed: 02/06/2023] Open
Abstract
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download.
Collapse
Affiliation(s)
- Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| | - James Roll
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Blake A Sweeney
- Department of Biology, Bowling Green State University, Bowling Green, OH 43403, USA
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Meg Pirrung
- Department of Pharmacology, University of Colorado Denver, Aurora, CO 80045, USA
| | - Neocles B Leontis
- Department of Chemistry, Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
21
|
Nagarajan R, Chothani SP, Ramakrishnan C, Sekijima M, Gromiha MM. Structure based approach for understanding organism specific recognition of protein-RNA complexes. Biol Direct 2015; 10:8. [PMID: 25886642 PMCID: PMC4352265 DOI: 10.1186/s13062-015-0039-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/03/2015] [Indexed: 12/11/2022] Open
Abstract
Background Protein-RNA interactions perform diverse functions within the cell. Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In earlier works, the recognition mechanisms have been studied for a specific complex or using a set of non–redundant complexes. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms from Protein Data Bank (PDB). The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, variance, conservation at binding sites, binding segments, binding motifs of amino acid residues and nucleotides, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding. Results We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms while RNA prefers to use a stretch of up to six nucleotides for binding with proteins. We have developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae. Conclusion Based on structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex. Reviewers This article was reviewed by Sandor Pongor, Gajendra Raghava and Narayanaswamy Srinivasan. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0039-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raju Nagarajan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Sonia Pankaj Chothani
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India. .,Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, 10510, USA.
| | - Chandrasekaran Ramakrishnan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Masakazu Sekijima
- Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| |
Collapse
|