1
|
Trinity L, Stege U, Jabbari H. Tying the knot: Unraveling the intricacies of the coronavirus frameshift pseudoknot. PLoS Comput Biol 2024; 20:e1011787. [PMID: 38713726 PMCID: PMC11108256 DOI: 10.1371/journal.pcbi.1011787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 05/21/2024] [Accepted: 04/27/2024] [Indexed: 05/09/2024] Open
Abstract
Understanding and targeting functional RNA structures towards treatment of coronavirus infection can help us to prepare for novel variants of SARS-CoV-2 (the virus causing COVID-19), and any other coronaviruses that could emerge via human-to-human transmission or potential zoonotic (inter-species) events. Leveraging the fact that all coronaviruses use a mechanism known as -1 programmed ribosomal frameshifting (-1 PRF) to replicate, we apply algorithms to predict the most energetically favourable secondary structures (each nucleotide involved in at most one pairing) that may be involved in regulating the -1 PRF event in coronaviruses, especially SARS-CoV-2. We compute previously unknown most stable structure predictions for the frameshift site of coronaviruses via hierarchical folding, a biologically motivated framework where initial non-crossing structure folds first, followed by subsequent, possibly crossing (pseudoknotted), structures. Using mutual information from 181 coronavirus sequences, in conjunction with the algorithm KnotAli, we compute secondary structure predictions for the frameshift site of different coronaviruses. We then utilize the Shapify algorithm to obtain most stable SARS-CoV-2 secondary structure predictions guided by frameshift sequence-specific and genome-wide experimental data. We build on our previous secondary structure investigation of the singular SARS-CoV-2 68 nt frameshift element sequence, by using Shapify to obtain predictions for 132 extended sequences and including covariation information. Previous investigations have not applied hierarchical folding to extended length SARS-CoV-2 frameshift sequences. By doing so, we simulate the effects of ribosome interaction with the frameshift site, providing insight to biological function. We contribute in-depth discussion to contextualize secondary structure dual-graph motifs for SARS-CoV-2, highlighting the energetic stability of the previously identified 3_8 motif alongside the known dominant 3_3 and 3_6 (native-type) -1 PRF structures. Using a combination of thermodynamic methods and sequence covariation, our novel predictions suggest function of the attenuator hairpin via previously unknown pseudoknotted base pairing. While certain initial RNA folding is consistent, other pseudoknotted base pairs form which indicate potential conformational switching between the two structures.
Collapse
Affiliation(s)
- Luke Trinity
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Ulrike Stege
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
- Institute on Aging and Lifelong Health, Victoria, British Columbia, Canada
| |
Collapse
|
2
|
Mediati DG, Dan W, Lalaouna D, Dinh H, Pokhrel A, Rowell KN, Michie KA, Stinear TP, Cain AK, Tree JJ. The 3' UTR of vigR is required for virulence in Staphylococcus aureus and has expanded through STAR sequence repeat insertions. Cell Rep 2024; 43:114082. [PMID: 38583155 DOI: 10.1016/j.celrep.2024.114082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 01/17/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024] Open
Abstract
Infections caused by methicillin-resistant Staphylococcus aureus (MRSA) are alarmingly common, and treatment is confined to last-line antibiotics. Vancomycin is the treatment of choice for MRSA bacteremia, and treatment failure is often associated with vancomycin-intermediate S. aureus isolates. The regulatory 3' UTR of the vigR mRNA contributes to vancomycin tolerance and upregulates the autolysin IsaA. Using MS2-affinity purification coupled with RNA sequencing, we find that the vigR 3' UTR also regulates dapE, a succinyl-diaminopimelate desuccinylase required for lysine and peptidoglycan synthesis, suggesting a broader role in controlling cell wall metabolism and vancomycin tolerance. Deletion of the 3' UTR increased virulence, while the isaA mutant is completely attenuated in a wax moth larvae model. Sequence and structural analyses of vigR indicated that the 3' UTR has expanded through the acquisition of Staphylococcus aureus repeat insertions that contribute sequence for the isaA interaction seed and may functionalize the 3' UTR.
Collapse
Affiliation(s)
- Daniel G Mediati
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia; Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia.
| | - William Dan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - David Lalaouna
- Université de Strasbourg, CNRS, ARN UPR 9002, Strasbourg, France
| | - Hue Dinh
- School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Alaska Pokhrel
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia; School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Keiran N Rowell
- Structural Biology Facility, University of New South Wales, Sydney, NSW, Australia
| | - Katharine A Michie
- Structural Biology Facility, University of New South Wales, Sydney, NSW, Australia
| | - Timothy P Stinear
- Department of Microbiology and Immunology, Peter Doherty Institute, University of Melbourne, Melbourne, VIC, Australia
| | - Amy K Cain
- School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Jai J Tree
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
| |
Collapse
|
3
|
Rouse WB, Tompkins VS, O'Leary CA, Moss WN. The RNA secondary structure of androgen receptor-FL and V7 transcripts reveals novel regulatory regions. Nucleic Acids Res 2024:gkae220. [PMID: 38554103 DOI: 10.1093/nar/gkae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
The androgen receptor (AR) is a ligand-dependent nuclear transcription factor belonging to the steroid hormone nuclear receptor family. Due to its roles in regulating cell proliferation and differentiation, AR is tightly regulated to maintain proper levels of itself and the many genes it controls. AR dysregulation is a driver of many human diseases including prostate cancer. Though this dysregulation often occurs at the RNA level, there are many unknowns surrounding post-transcriptional regulation of AR mRNA, particularly the role that RNA secondary structure plays. Thus, a comprehensive analysis of AR transcript secondary structure is needed. We address this through the computational and experimental analyses of two key isoforms, full length (AR-FL) and truncated (AR-V7). Here, a combination of in-cell RNA secondary structure probing experiments (targeted DMS-MaPseq) and computational predictions were used to characterize the static structural landscape and conformational dynamics of both isoforms. Additionally, in-cell assays were used to identify functionally relevant structures in the 5' and 3' UTRs of AR-FL. A notable example is a conserved stem loop structure in the 5'UTR of AR-FL that can bind to Poly(RC) Binding Protein 2 (PCBP2). Taken together, our results reveal novel features that regulate AR expression.
Collapse
Affiliation(s)
- Warren B Rouse
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Current Address: Departments of Biology and Chemistry, Cornell College, Mount Vernon, IA 52314, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
4
|
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences. Nat Methods 2024; 21:435-443. [PMID: 38238559 DOI: 10.1038/s41592-023-02148-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/07/2023] [Indexed: 03/13/2024]
Abstract
RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.
Collapse
Affiliation(s)
- Shunsuke Sumi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan.
| | - Hirohide Saito
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan.
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
5
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
6
|
Peterson JM, O'Leary CA, Coppenbarger EC, Tompkins VS, Moss WN. Discovery of RNA secondary structural motifs using sequence-ordered thermodynamic stability and comparative sequence analysis. MethodsX 2023; 11:102275. [PMID: 37448951 PMCID: PMC10336498 DOI: 10.1016/j.mex.2023.102275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/28/2023] [Indexed: 07/18/2023] Open
Abstract
Major advances in RNA secondary structural motif prediction have been achieved in the last few years; however, few methods harness the predictive power of multiple approaches to deliver in-depth characterizations of local RNA motifs and their potential functionality. Additionally, most available methods do not predict RNA pseudoknots. This work combines complementary bioinformatic systems into one robust discovery pipeline where: •RNA sequences are folded to search for thermodynamically favorable motifs utilizing ScanFold.•Motifs are expanded and refolded into alternate pseudoknot conformations by Knotty/Iterative HFold.•All conformations are evaluated for covariance via the cm-builder pipeline (Infernal and R-scape).
Collapse
|
7
|
Escamilla-Gutiérrez A, Córdova-Espinoza MG, Sánchez-Monciváis A, Tecuatzi-Cadena B, Regalado-García AG, Medina-Quero K. In silico selection of aptamers for bacterial toxins detection. J Biomol Struct Dyn 2023; 41:10909-10918. [PMID: 36546716 DOI: 10.1080/07391102.2022.2159529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
The most commonly used toxins in biological warfare are staphylococcal enterotoxin B (3SEB), cholera toxin (1XTC), and botulinum toxin (3BTA). Uncovering novel strategies for identifying these toxins is paramount; therefore, aptamers are used for this purpose. Aptamers are single-stranded DNA or RNA oligonucleotides selected via Systematic Evolution of Ligands by Exponential Enrichment (SELEX) with high binding affinity and specificity against target molecules. However, SELEX in vitro is tedious; hence, adopting alternative in silico molecular docking approaches is necessary. We aimed to conduct molecular docking with accessible tools and obtain RNA aptamers. First, 4,820,095 sequences obtained from an initial library of 9.5 × 109 Python script sequences were used. The GraphClust program was used to create representative groups or clusters, and the DoGSiteScorer (https://proteins.plus/) was used to conduct binding site detection of the proteins: 5DO4 (thrombin), 3SEB, 1XTC, and 3BTA. rDock, HDock, and PatchDock were adopted, combining different docking program results (consensus scoring), to improve receptor-ligand prediction. An analysis of the poses and root mean square deviation (RMSD) was performed, and 468 structurally different aptamers were obtained. The DoGSiteScorer program predicted the binding site of each protein to direct the interaction with the aptamer. Candidate aptamers for 3SEB, 1XTC, and 3BTA were selected according to the pose value considering the closeness of the interaction with a lower mean of 45.923 Å, 45.854 Å, and 72.490 Å, respectively.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Alejandro Escamilla-Gutiérrez
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Hospital General, Instituto Mexicano del Seguro Social IMSS, Ciudad de México, México
| | - María Guadalupe Córdova-Espinoza
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Anahí Sánchez-Monciváis
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Brenda Tecuatzi-Cadena
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Ana Gabriela Regalado-García
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Karen Medina-Quero
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| |
Collapse
|
8
|
Fremin BJ, Bhatt AS, Kyrpides NC. Identification of over ten thousand candidate structured RNAs in viruses and phages. Comput Struct Biotechnol J 2023; 21:5630-5639. [PMID: 38047235 PMCID: PMC10690425 DOI: 10.1016/j.csbj.2023.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 12/05/2023] Open
Abstract
Structured RNAs play crucial roles in viruses, exerting influence over both viral and host gene expression. However, the extensive diversity of structured RNAs and their ability to act in cis or trans positions pose challenges for predicting and assigning their functions. While comparative genomics approaches have successfully predicted candidate structured RNAs in microbes on a large scale, similar efforts for viruses have been lacking. In this study, we screened over 5 million DNA and RNA viral sequences, resulting in the prediction of 10,006 novel candidate structured RNAs. These predictions are widely distributed across taxonomy and ecosystem. We found transcriptional evidence for 206 of these candidate structured RNAs in the human fecal microbiome. These candidate RNAs exhibited evidence of nucleotide covariation, indicative of selective pressure maintaining the predicted secondary structures. Our analysis revealed a diverse repertoire of candidate structured RNAs, encompassing a substantial number of putative tRNAs or tRNA-like structures, Rho-independent transcription terminators, and potentially cis-regulatory structures consistently positioned upstream of genes. In summary, our findings shed light on the extensive diversity of structured RNAs in viruses, offering a valuable resource for further investigations into their functional roles and implications in viral gene expression and pave the way for a deeper understanding of the intricate interplay between viruses and their hosts at the molecular level.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ami S. Bhatt
- Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine (Hematology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Lead Contact, USA
| |
Collapse
|
9
|
Rivas E. RNA covariation at helix-level resolution for the identification of evolutionarily conserved RNA structure. PLoS Comput Biol 2023; 19:e1011262. [PMID: 37450549 PMCID: PMC10370758 DOI: 10.1371/journal.pcbi.1011262] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Many biologically important RNAs fold into specific 3D structures conserved through evolution. Knowing when an RNA sequence includes a conserved RNA structure that could lead to new biology is not trivial and depends on clues left behind by conservation in the form of covariation and variation. For that purpose, the R-scape statistical test was created to identify from alignments of RNA sequences, the base pairs that significantly covary above phylogenetic expectation. R-scape treats base pairs as independent units. However, RNA base pairs do not occur in isolation. The Watson-Crick (WC) base pairs stack together forming helices that constitute the scaffold that facilitates the formation of the non-WC base pairs, and ultimately the complete 3D structure. The helix-forming WC base pairs carry most of the covariation signal in an RNA structure. Here, I introduce a new measure of statistically significant covariation at helix-level by aggregation of the covariation significance and covariation power calculated at base-pair-level resolution. Performance benchmarks show that helix-level aggregated covariation increases sensitivity in the detection of evolutionarily conserved RNA structure without sacrificing specificity. This additional helix-level sensitivity reveals an artifact that results from using covariation to build an alignment for a hypothetical structure and then testing the alignment for whether its covariation significantly supports the structure. Helix-level reanalysis of the evolutionary evidence for a selection of long non-coding RNAs (lncRNAs) reinforces the evidence against these lncRNAs having a conserved secondary structure.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
10
|
Kumar A, Daripa P, Maiti S, Jain N. Interaction of hnRNPB1 with Helix-12 of hHOTAIR Reveals the Distinctive Mode of RNA Recognition That Enables the Structural Rearrangement by LCD. Biochemistry 2023; 62:2041-2054. [PMID: 37307069 DOI: 10.1021/acs.biochem.3c00181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The lncRNA human Hox transcript antisense intergenic RNA (hHOTAIR) regulates gene expression by recruiting chromatin modifiers. The prevailing model suggests that hHOTAIR recruits hnRNPB1 to facilitate intermolecular RNA-RNA interactions between the lncRNA HOTAIR and its target gene transcripts. This B1-mediated RNA-RNA interaction modulates the structure of hHOTAIR, attenuates its inhibitory effect on polycomb repression complex 2, and enhances its methyl transferase activity. However, the molecular details by which the nuclear hnRNPB1 protein assembles on the lncRNA HOTAIR have not yet been described. Here, we investigate the molecular interactions between hnRNPB1 and Helix-12 (hHOTAIR). We show that the low-complexity domain segment (LCD) of hnRNPB1 interacts with a strong affinity for Helix-12. Our studies revealed that unbound Helix-12 folds into a specific base-pairing pattern and contains an internal loop that, as determined by thermal melting and NMR studies, exhibits hydrogen bonding between strands and forms the recognition site for the LCD segment. In addition, mutation studies show that the secondary structure of Helix-12 makes an important contribution by acting as a landing pad for hnRNPB1. The secondary structure of Helix-12 is involved in specific interactions with different domains of hnRNPB1. Finally, we show that the LCD unwinds Helix-12 locally, indicating its importance in the hHOTAIR restructuring mechanism.
Collapse
Affiliation(s)
- Ajit Kumar
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Purba Daripa
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Souvik Maiti
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Niyati Jain
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
| |
Collapse
|
11
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
12
|
Rivas E. RNA covariation at helix-level resolution for the identification of evolutionarily conserved RNA structure. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.14.536965. [PMID: 37131783 PMCID: PMC10153129 DOI: 10.1101/2023.04.14.536965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Many biologically important RNAs fold into specific 3D structures conserved through evolution. Knowing when an RNA sequence includes a conserved RNA structure that could lead to new biology is not trivial and depends on clues left behind by conservation in the form of covariation and variation. For that purpose, the R-scape statistical test was created to identify from alignments of RNA sequences, the base pairs that significantly covary above phylogenetic expectation. R-scape treats base pairs as independent units. However, RNA base pairs do not occur in isolation. The Watson-Crick (WC) base pairs stack together forming helices that constitute the scaffold that facilitates the formation of the non-WC base pairs, and ultimately the complete 3D structure. The helix-forming WC base pairs carry most of the covariation signal in an RNA structure. Here, I introduce a new measure of statistically significant covariation at helix-level by aggregation of the covariation significance and covariation power calculated at base-pair-level resolution. Performance benchmarks show that helix-level aggregated covariation increases sensitivity in the detection of evolutionarily conserved RNA structure without sacrificing specificity. This additional helix-level sensitivity reveals an artifact that results from using covariation to build an alignment for a hypothetical structure and then testing the alignment for whether its covariation significantly supports the structure. Helix-level reanalysis of the evolutionary evidence for a selection of long non-coding RNAs (lncRNAs) reinforces the evidence against these lncRNAs having a conserved secondary structure. Availability Helix aggregated E-values are integrated in the R-scape software package (version 2.0.0.p and higher). The R-scape web server eddylab.org/R-scape includes a link to download the source code. Contact elenarivas@fas.harvard.edu. Supplementary information Supplementary data and code are provided with this manuscript at rivaslab.org .
Collapse
|
13
|
Mattick JS. RNA out of the mist. Trends Genet 2023; 39:187-207. [PMID: 36528415 DOI: 10.1016/j.tig.2022.11.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 11/08/2022] [Accepted: 11/27/2022] [Indexed: 12/23/2022]
Abstract
RNA has long been regarded primarily as the intermediate between genes and proteins. It was a surprise then to discover that eukaryotic genes are mosaics of mRNA sequences interrupted by large tracts of transcribed but untranslated sequences, and that multicellular organisms also express many long 'intergenic' and antisense noncoding RNAs (lncRNAs). The identification of small RNAs that regulate mRNA translation and half-life did not disturb the prevailing view that animals and plant genomes are full of evolutionary debris and that their development is mainly supervised by transcription factors. Gathering evidence to the contrary involved addressing the low conservation, expression, and genetic visibility of lncRNAs, demonstrating their cell-specific roles in cell and developmental biology, and their association with chromatin-modifying complexes and phase-separated domains. The emerging picture is that most lncRNAs are the products of genetic loci termed 'enhancers', which marshal generic effector proteins to their sites of action to control cell fate decisions during development.
Collapse
Affiliation(s)
- John S Mattick
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia; UNSW RNA Institute, UNSW, Sydney, NSW 2052, Australia.
| |
Collapse
|
14
|
O’Leary CA, Tompkins VS, Rouse WB, Nam G, Moss W. Thermodynamic and structural characterization of an EBV infected B-cell lymphoma transcriptome. NAR Genom Bioinform 2022; 4:lqac082. [PMID: 36285286 PMCID: PMC9585548 DOI: 10.1093/nargab/lqac082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/12/2022] Open
Abstract
Epstein-Barr virus (EBV) is a widely prevalent human herpes virus infecting over 95% of all adults and is associated with a variety of B-cell cancers and induction of multiple sclerosis. EBV accomplishes this in part by expression of coding and noncoding RNAs and alteration of the host cell transcriptome. To better understand the structures which are forming in the viral and host transcriptomes of infected cells, the RNA structure probing technique Structure-seq2 was applied to the BJAB-B1 cell line (an EBV infected B-cell lymphoma). This resulted in reactivity profiles and secondary structural analyses for over 10000 human mRNAs and lncRNAs, along with 19 lytic and latent EBV transcripts. We report in-depth structural analyses for the human MYC mRNA and the human lncRNA CYTOR. Additionally, we provide a new model for the EBV noncoding RNA EBER2 and provide the first reported model for the EBV tandem terminal repeat RNA. In-depth thermodynamic and structural analyses were carried out with the motif discovery tool ScanFold and RNAfold prediction tool; subsequent covariation analyses were performed on resulting models finding various levels of support. ScanFold results for all analyzed transcripts are made available for viewing and download on the user-friendly RNAStructuromeDB.
Collapse
Affiliation(s)
- Collin A O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Warren B Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Gijong Nam
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
15
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
16
|
Andrews RJ, Rouse WB, O’Leary CA, Booher NJ, Moss WN. ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes. PeerJ 2022; 10:e14361. [PMID: 36389431 PMCID: PMC9651051 DOI: 10.7717/peerj.14361] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022] Open
Abstract
A major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural stability, evidence of sequence-ordered (potentially evolved) structure, and unique model structures comprised of recurring base pairs with the greatest structural bias. A key step in quantifying this propensity for ordered structure is the prediction of secondary structural stability for randomized sequences which, in the original implementation of ScanFold, is explicitly evaluated. This slow process has limited the rapid identification of ordered structures in large genomes/transcriptomes, which we seek to overcome in this current work introducing ScanFold 2.0. In this revised version of ScanFold, we no longer explicitly evaluate randomized sequence folding energy, but rather estimate it using a machine learning approach. For high randomization numbers, this can increase prediction speeds over 100-fold compared to ScanFold 1.0, allowing for the analysis of large sequences, as well as the use of additional folding algorithms that may be computationally expensive. In the testing of ScanFold 2.0, we re-evaluate the Zika, HIV, and SARS-CoV-2 genomes and compare both the consistency of results and the time of each run to ScanFold 1.0. We also re-evaluate the SARS-CoV-2 genome to assess the quality of ScanFold 2.0 predictions vs several biochemical structure probing datasets and compare the results to those of the original ScanFold program.
Collapse
Affiliation(s)
- Ryan J. Andrews
- Department of Biochemistry, University of Utah, Salt Lake City, UT, United States
| | - Warren B. Rouse
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| | - Collin A. O’Leary
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| | - Nicholas J. Booher
- Infrastructure and Research IT Services, Iowa State University, Ames, IA, United States
| | - Walter N. Moss
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| |
Collapse
|
17
|
Childs-Disney JL, Yang X, Gibaut QMR, Tong Y, Batey RT, Disney MD. Targeting RNA structures with small molecules. Nat Rev Drug Discov 2022; 21:736-762. [PMID: 35941229 PMCID: PMC9360655 DOI: 10.1038/s41573-022-00521-4] [Citation(s) in RCA: 132] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/17/2022] [Indexed: 01/07/2023]
Abstract
RNA adopts 3D structures that confer varied functional roles in human biology and dysfunction in disease. Approaches to therapeutically target RNA structures with small molecules are being actively pursued, aided by key advances in the field including the development of computational tools that predict evolutionarily conserved RNA structures, as well as strategies that expand mode of action and facilitate interactions with cellular machinery. Existing RNA-targeted small molecules use a range of mechanisms including directing splicing - by acting as molecular glues with cellular proteins (such as branaplam and the FDA-approved risdiplam), inhibition of translation of undruggable proteins and deactivation of functional structures in noncoding RNAs. Here, we describe strategies to identify, validate and optimize small molecules that target the functional transcriptome, laying out a roadmap to advance these agents into the next decade.
Collapse
Affiliation(s)
| | - Xueyi Yang
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | | | - Yuquan Tong
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | - Robert T Batey
- Department of Biochemistry, University of Colorado, Boulder, CO, USA.
| | | |
Collapse
|
18
|
Zhang J, Fei Y, Sun L, Zhang QC. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods 2022; 19:1193-1207. [PMID: 36203019 DOI: 10.1038/s41592-022-01623-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/23/2022] [Indexed: 11/09/2022]
Abstract
Beyond transferring genetic information, RNAs are molecules with diverse functions that include catalyzing biochemical reactions and regulating gene expression. Most of these activities depend on RNAs' specific structures. Therefore, accurately determining RNA structure is integral to advancing our understanding of RNA functions. Here, we summarize the state-of-the-art experimental and computational technologies developed to evaluate RNA secondary and tertiary structures. We also highlight how the rapid increase of experimental data facilitates the integrative modeling approaches for better resolving RNA structures. Finally, we provide our thoughts on the latest advances and challenges in RNA structure determination methods, as well as on future directions for both experimental approaches and artificial intelligence-based computational tools to model RNA structure. Ultimately, we hope the technological advances will deepen our understanding of RNA biology and facilitate RNA structure-based biomedical research such as designing specific RNA structures for therapeutics and deploying RNA-targeting small-molecule drugs.
Collapse
Affiliation(s)
- Jinsong Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Yuhan Fei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Lei Sun
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
19
|
Variant-Specific Analysis Reveals a Novel Long-Range RNA-RNA Interaction in SARS-CoV-2 Orf1a. Int J Mol Sci 2022; 23:ijms231911050. [PMID: 36232353 PMCID: PMC9570297 DOI: 10.3390/ijms231911050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/05/2022] [Accepted: 09/08/2022] [Indexed: 01/08/2023] Open
Abstract
Since the start of the COVID-19 pandemic, understanding the pathology of the SARS-CoV-2 RNA virus and its life cycle has been the priority of many researchers. Currently, new variants of the virus have emerged with various levels of pathogenicity and abundance within the human-host population. Although much of viral pathogenicity is attributed to the viral Spike protein’s binding affinity to human lung cells’ ACE2 receptor, comprehensive knowledge on the distinctive features of viral variants that might affect their life cycle and pathogenicity is yet to be attained. Recent in vivo studies into the RNA structure of the SARS-CoV-2 genome have revealed certain long-range RNA-RNA interactions. Using in silico predictions and a large population of SARS-CoV-2 sequences, we observed variant-specific evolutionary changes for certain long-range RRIs. We also found statistical evidence for the existence of one of the thermodynamic-based RRI predictions, namely Comp1, in the Beta variant sequences. A similar test that disregarded sequence variant information did not, however, lead to significant results. When performing population-based analyses, aggregate tests may fail to identify novel interactions due to variant-specific changes. Variant-specific analyses can result in de novo RRI identification.
Collapse
|
20
|
False-positive IRESes from Hoxa9 and other genes resulting from errors in mammalian 5' UTR annotations. Proc Natl Acad Sci U S A 2022; 119:e2122170119. [PMID: 36037358 PMCID: PMC9456764 DOI: 10.1073/pnas.2122170119] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Hyperconserved genomic sequences have great promise for understanding core biological processes. It has been recently proposed that scores of hyperconserved 5' untranslated regions (UTRs), also known as transcript leaders (hTLs), encode internal ribosome entry sites (IRESes) that drive cap-independent translation, in part, via interactions with ribosome expansion segments. However, the direct functional significance of such interactions has not yet been definitively demonstrated. We provide evidence that the putative IRESes previously reported in Hox gene hTLs are rarely included in transcript leaders. Instead, these regions function independently as transcriptional promoters. In addition, we find the proposed RNA structure of the putative Hoxa9 IRES is not conserved. Instead, sequences previously shown to be essential for putative IRES activity encode a hyperconserved transcription factor binding site (E-box) that contributes to its promoter activity and is bound by several transcription factors, including USF1 and USF2. Similar E-box sequences enhance the promoter activities of other putative Hoxa gene IRESes. Moreover, we provide evidence that the vast majority of hTLs with putative IRES activity overlap transcriptional promoters, enhancers, and 3' splice sites that are most likely responsible for their reported IRES activities. These results argue strongly against recently reported widespread IRES-like activities from hTLs and contradict proposed interactions between ribosomal expansion segment ES9S and putative IRESes. Furthermore, our work underscores the importance of accurate transcript annotations, controls in bicistronic reporter assays, and the power of synthesizing publicly available data from multiple sources.
Collapse
|
21
|
Omoru OB, Pereira F, Janga SC, Manzourolajdad A. A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2. PLoS One 2022; 17:e0260331. [PMID: 36048827 PMCID: PMC9436084 DOI: 10.1371/journal.pone.0260331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/22/2022] [Indexed: 12/15/2022] Open
Abstract
SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.
Collapse
Affiliation(s)
- Okiemute Beatrice Omoru
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
| | - Filipe Pereira
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- IDENTIFICA Genetic Testing, Maia, Portugal
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, Indianapolis, Indiana, United States of America
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), Indianapolis, Indiana, United States of America
| | - Amirhossein Manzourolajdad
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Computer Science, Colgate University, Hamilton, NY, United States of America
| |
Collapse
|
22
|
Westhof E. Data, data, burning deep, in the forests of the net. Biochem Biophys Res Commun 2022; 633:42-44. [DOI: 10.1016/j.bbrc.2022.09.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022]
|
23
|
Ponting CP, Haerty W. Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annu Rev Genomics Hum Genet 2022; 23:153-172. [PMID: 35395170 DOI: 10.1146/annurev-genom-112921-123710] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom;
| | | |
Collapse
|
24
|
Rouse WB, O'Leary CA, Booher NJ, Moss WN. Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome. Sci Rep 2022; 12:14515. [PMID: 36008510 PMCID: PMC9403969 DOI: 10.1038/s41598-022-18699-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/17/2022] [Indexed: 11/22/2022] Open
Abstract
RNA plays vital functional roles in almost every component of biology, and these functional roles are often influenced by its folding into secondary and tertiary structures. An important role of RNA secondary structure is in maintaining proper gene regulation; therefore, making accurate predictions of the structures involved in these processes is important. In this study, we have expanded on our previous work that led to the creation of the RNAStructuromeDB. Unlike this previous study that analyzed the human genome at low resolution, we have now scanned the protein-coding human transcriptome at high (single nt) resolution. This provides more robust structure predictions for over 100,000 isoforms of known protein-coding genes. Notably, we also utilize the motif identification tool, ScanFold, to model structures with high propensity for ordered/evolved stability. All data have been uploaded to the RNAStructuromeDB, allowing for easy searching of transcripts, visualization of data tracks (via the Integrative Genomics Viewer or IGV), and download of ScanFold data—including unique highly-ordered motifs. Herein, we provide an example analysis of MAT2A to demonstrate the utility of ScanFold at finding known and novel secondary structures, highlighting regions of potential functionality, and guiding generation of functional hypotheses through use of the data.
Collapse
Affiliation(s)
- Warren B Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Nicholas J Booher
- Infrastructure and Research IT Services, Iowa State University, Ames, IA, 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
25
|
Ross CJ, Ulitsky I. Discovering functional motifs in long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1708. [PMID: 34981665 DOI: 10.1002/wrna.1708] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 12/27/2022]
Abstract
Long noncoding RNAs (lncRNAs) are products of pervasive transcription that closely resemble messenger RNAs on the molecular level, yet function through largely unknown modes of action. The current model is that the function of lncRNAs often relies on specific, typically short, conserved elements, connected by linkers in which specific sequences and/or structures are less important. This notion has fueled the development of both computational and experimental methods focused on the discovery of functional elements within lncRNA genes, based on diverse signals such as evolutionary conservation, predicted structural elements, or the ability to rescue loss-of-function phenotypes. In this review, we outline the main challenges that the different methods need to overcome, describe the recently developed approaches, and discuss their respective limitations. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Caroline Jane Ross
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Ulitsky
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
26
|
Mestre MR, Gao LA, Shah SA, López-Beltrán A, González-Delgado A, Martínez-Abarca F, Iranzo J, Redrejo-Rodríguez M, Zhang F, Toro N. UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions. Nucleic Acids Res 2022; 50:6084-6101. [PMID: 35648479 PMCID: PMC9226505 DOI: 10.1093/nar/gkac467] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/11/2022] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
Reverse transcriptases (RTs) are enzymes capable of synthesizing DNA using RNA as a template. Within the last few years, a burst of research has led to the discovery of novel prokaryotic RTs with diverse antiviral properties, such as DRTs (Defense-associated RTs), which belong to the so-called group of unknown RTs (UG) and are closely related to the Abortive Infection system (Abi) RTs. In this work, we performed a systematic analysis of UG and Abi RTs, increasing the number of UG/Abi members up to 42 highly diverse groups, most of which are predicted to be functionally associated with other gene(s) or domain(s). Based on this information, we classified these systems into three major classes. In addition, we reveal that most of these groups are associated with defense functions and/or mobile genetic elements, and demonstrate the antiphage role of four novel groups. Besides, we highlight the presence of one of these systems in novel families of human gut viruses infecting members of the Bacteroidetes and Firmicutes phyla. This work lays the foundation for a comprehensive and unified understanding of these highly diverse RTs with enormous biotechnological potential.
Collapse
Affiliation(s)
- Mario Rodríguez Mestre
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Linyi Alex Gao
- Howard Hughes Medical Institute, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.,Society of Fellows, Harvard University, Cambridge, MA 02138, USA
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood, Copenhagen University Hospital, Herlev-Gentofte, Ledreborg Allé 34, DK-2820 Gentofte, Denmark
| | - Adrián López-Beltrán
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Alejandro González-Delgado
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Francisco Martínez-Abarca
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain.,Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
| | - Modesto Redrejo-Rodríguez
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Feng Zhang
- Howard Hughes Medical Institute, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA.,McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nicolás Toro
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| |
Collapse
|
27
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
28
|
Abstract
Noncoding RNAs with secondary structures play important roles in CRISPR-Cas systems. Many of these structures likely remain undiscovered. We used a large-scale comparative genomics approach to predict 156 novel candidate structured RNAs from 36,111 CRISPR-Cas systems. A number of these were found to overlap with coding genes, including palindromic candidates that overlapped with a variety of Cas genes in type I and III systems. Among these 156 candidates, we identified 46 new models of CRISPR direct repeats and 1 tracrRNA. This tracrRNA model occasionally overlapped with predicted cas9 coding regions, emphasizing the importance of expanding our search windows for novel structure RNAs in coding regions. We also demonstrated that the antirepeat sequence in this tracrRNA model can be used to accurately assign thousands of predicted CRISPR arrays to type II-C systems. This study highlights the importance of unbiased identification of candidate structured RNAs across CRISPR-Cas systems.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
29
|
Prince S, Munoz C, Filion-Bienvenue F, Rioux P, Sarrasin M, Lang BF. Refining Mitochondrial Intron Classification With ERPIN: Identification Based on Conservation of Sequence Plus Secondary Structure Motifs. Front Microbiol 2022; 13:866187. [PMID: 35369492 PMCID: PMC8971849 DOI: 10.3389/fmicb.2022.866187] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 02/28/2022] [Indexed: 12/02/2022] Open
Abstract
Mitochondrial genomes—in particular those of fungi—often encode genes with a large number of Group I and Group II introns that are conserved at both the sequence and the RNA structure level. They provide a rich resource for the investigation of intron and gene structure, self- and protein-guided splicing mechanisms, and intron evolution. Yet, the degree of sequence conservation of introns is limited, and the primary sequence differs considerably among the distinct intron sub-groups. It makes intron identification, classification, structural modeling, and the inference of gene models a most challenging and error-prone task—frequently passed on to an “expert” for manual intervention. To reduce the need for manual curation of intron structures and mitochondrial gene models, computational methods using ERPIN sequence profiles were initially developed in 2007. Here we present a refinement of search models and alignments using the now abundant publicly available fungal mtDNA sequences. In addition, we have tested in how far members of the originally proposed sub-groups are clearly distinguished and validated by our computational approach. We confirm clearly distinct mitochondrial Group I sub-groups IA1, IA3, IB3, IC1, IC2, and ID. Yet, IB1, IB2, and IB4 ERPIN models are overlapping substantially in predictions, and are therefore combined and reported as IB. We have further explored the conversion of our ERPIN profiles into covariance models (CM). Current limitations and prospects of the CM approach will be discussed.
Collapse
|
30
|
Tompkins VS, Rouse WB, O’Leary CA, Andrews RJ, Moss WN. Analyses of human cancer driver genes uncovers evolutionarily conserved RNA structural elements involved in posttranscriptional control. PLoS One 2022; 17:e0264025. [PMID: 35213597 PMCID: PMC8880891 DOI: 10.1371/journal.pone.0264025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/01/2022] [Indexed: 12/02/2022] Open
Abstract
Experimental breakthroughs have provided unprecedented insights into the genes involved in cancer. The identification of such cancer driver genes is a major step in gaining a fuller understanding of oncogenesis and provides novel lists of potential therapeutic targets. A key area that requires additional study is the posttranscriptional control mechanisms at work in cancer driver genes. This is important not only for basic insights into the biology of cancer, but also to advance new therapeutic modalities that target RNA—an emerging field with great promise toward the treatment of various cancers. In the current study we performed an in silico analysis on the transcripts associated with 800 cancer driver genes (10,390 unique transcripts) that identified 179,190 secondary structural motifs with evidence of evolutionarily ordered structures with unusual thermodynamic stability. Narrowing to one transcript per gene, 35,426 predicted structures were subjected to phylogenetic comparisons of sequence and structural conservation. This identified 7,001 RNA secondary structures embedded in transcripts with evidence of covariation between paired sites, supporting structure models and suggesting functional significance. A select set of seven structures were tested in vitro for their ability to regulate gene expression; all were found to have significant effects. These results indicate potentially widespread roles for RNA structure in posttranscriptional control of human cancer driver genes.
Collapse
Affiliation(s)
- Van S. Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Warren B. Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Collin A. O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Ryan J. Andrews
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Walter N. Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
- * E-mail:
| |
Collapse
|
31
|
Soszynska-Jozwiak M, Ruszkowska A, Kierzek R, O’Leary CA, Moss WN, Kierzek E. Secondary Structure of Subgenomic RNA M of SARS-CoV-2. Viruses 2022; 14:322. [PMID: 35215915 PMCID: PMC8878378 DOI: 10.3390/v14020322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/25/2022] [Accepted: 01/31/2022] [Indexed: 02/06/2023] Open
Abstract
SARS-CoV-2 belongs to the Coronavirinae family. Like other coronaviruses, SARS-CoV-2 is enveloped and possesses a positive-sense, single-stranded RNA genome of ~30 kb. Genomic RNA is used as the template for replication and transcription. During these processes, positive-sense genomic RNA (gRNA) and subgenomic RNAs (sgRNAs) are created. Several studies presented the importance of the genomic RNA secondary structure in SARS-CoV-2 replication. However, the structure of sgRNAs has remained largely unsolved so far. In this study, we probed the sgRNA M model of SARS-CoV-2 in vitro. The presented model molecule includes 5'UTR and a coding sequence of gene M. This is the first experimentally informed secondary structure model of sgRNA M, which presents features likely to be important in sgRNA M function. The knowledge of sgRNA M structure provides insights to better understand virus biology and could be used for designing new therapeutics.
Collapse
Affiliation(s)
- Marta Soszynska-Jozwiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Agnieszka Ruszkowska
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Ryszard Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Collin A. O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA; (C.A.O.); (W.N.M.)
| | - Walter N. Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA; (C.A.O.); (W.N.M.)
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| |
Collapse
|
32
|
Peterson JM, O'Leary CA, Moss WN. In silico analysis of local RNA secondary structure in influenza virus A, B and C finds evidence of widespread ordered stability but little evidence of significant covariation. Sci Rep 2022; 12:310. [PMID: 35013354 PMCID: PMC8748542 DOI: 10.1038/s41598-021-03767-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/02/2021] [Indexed: 12/13/2022] Open
Abstract
Influenza virus is a persistent threat to human health; indeed, the deadliest modern pandemic was in 1918 when an H1N1 virus killed an estimated 50 million people globally. The intent of this work is to better understand influenza from an RNA-centric perspective to provide local, structural motifs with likely significance to the influenza infectious cycle for therapeutic targeting. To accomplish this, we analyzed over four hundred thousand RNA sequences spanning three major clades: influenza A, B and C. We scanned influenza segments for local secondary structure, identified/modeled motifs of likely functionality, and coupled the results to an analysis of evolutionary conservation. We discovered 185 significant regions of predicted ordered stability, yet evidence of sequence covariation was limited to 7 motifs, where 3-found in influenza C-had higher than expected amounts of sequence covariation.
Collapse
Affiliation(s)
- Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
33
|
Dingle K, Ghaddar F, Šulc P, Louis AA. Phenotype Bias Determines How Natural RNA Structures Occupy the Morphospace of All Possible Shapes. Mol Biol Evol 2022; 39:msab280. [PMID: 34542628 PMCID: PMC8763027 DOI: 10.1093/molbev/msab280] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Morphospaces-representations of phenotypic characteristics-are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavoring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that noncoding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon a random sampling of sequences. We show that: 1) only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored; 2) remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far; and 3) perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon a uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather a strong phenotype bias in the RNA genotype-phenotype map, a type of developmental bias or "findability constraint," which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to "find."
Collapse
Affiliation(s)
- Kamaludin Dingle
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Fatme Ghaddar
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics at the Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
34
|
Zhao J, Kennedy SD, Turner DH. Nuclear Magnetic Resonance Spectra and AMBER OL3 and ROC-RNA Simulations of UCUCGU Reveal Force Field Strengths and Weaknesses for Single-Stranded RNA. J Chem Theory Comput 2022; 18:1241-1254. [PMID: 34990548 DOI: 10.1021/acs.jctc.1c00643] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Single-stranded regions of RNA are important for folding of sequences into 3D structures and for design of therapeutics targeting RNA. Prediction of ensembles of 3D structures for single-stranded regions often involves classical mechanical approximations of interactions defined by quantum mechanical calculations on small model systems. Nuclear magnetic resonance (NMR) spectra and molecular dynamics (MD) simulations of short single strands provide tests for how well the approximations model many of the interactions. Here, the NMR spectra for UCUCGU at 2, 15, and 30 °C are compared to simulations with the AMBER force fields, OL3 and ROC-RNA. This is the first such comparison to an oligoribonucleotide containing an internal guanosine nucleotide (G). G is particularly interesting because of its many H-bonding groups, large dipole moment, and proclivity for both syn and anti conformations. Results reveal formation of a G amino to phosphate non-bridging oxygen H-bond. The results also demonstrate dramatic differences in details of the predicted structures. The variations emphasize the dependence of predictions on individual parameters and their balance with the rest of the force field. The NMR data can serve as a benchmark for future force fields.
Collapse
|
35
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
36
|
Chen L, Zhu QH. The evolutionary landscape and expression pattern of plant lincRNAs. RNA Biol 2022; 19:1190-1207. [PMID: 36382947 PMCID: PMC9673970 DOI: 10.1080/15476286.2022.2144609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are important regulators of cellular processes, including development and stress response. Many lincRNAs have been bioinformatically identified in plants, but their evolutionary dynamics and expression characteristics are still elusive. Here, we systematically identified thousands of lincRNAs in 26 plant species, including 6 non-flowering plants, investigated the conservation of the identified lincRNAs in different levels of plant lineages based on sequence and/or synteny homology and explored characteristics of the conserved lincRNAs during plant evolution and their co-expression relationship with protein-coding genes (PCGs). In addition to confirmation of the features well documented in literature for lincRNAs, such as species-specific, fewer exons, tissue-specific expression patterns and less abundantly expressed, we revealed that histone modification signals and/or binding sites of transcription factors were enriched in the conserved lincRNAs, implying their biological functionalities, as demonstrated by identifying conserved lincRNAs related to flower development in both the Brassicaceae and grass families and ancient lincRNAs potentially functioning in meristem development of non-flowering plants. Compared to PCGs, lincRNAs are more likely to be associated with transposable elements (TEs), but with different characteristics in different evolutionary lineages, for instance, the types of TEs and the variable level of association in lincRNAs with different conservativeness. Together, these results provide a comprehensive view on the evolutionary landscape of plant lincRNAs and shed new insights on the conservation and functionality of plant lincRNAs.
Collapse
Affiliation(s)
- Li Chen
- School of Life Sciences, Westlake University, Hangzhou, China,Institute for Biology, Plant Cell and Molecular Biology, Humboldt-Universität Zu Berlin, Berlin, Germany,CONTACT Li Chen
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Canberra, Australia,Qian-Hao Zhu CSIRO Agriculture and Food, Canberra, ACT2601, Australia
| |
Collapse
|
37
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
38
|
Soszynska-Jozwiak M, Pszczola M, Piasecka J, Peterson JM, Moss WN, Taras-Goslinska K, Kierzek R, Kierzek E. Universal and strain specific structure features of segment 8 genomic RNA of influenza A virus-application of 4-thiouridine photocrosslinking. J Biol Chem 2021; 297:101245. [PMID: 34688660 PMCID: PMC8666676 DOI: 10.1016/j.jbc.2021.101245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 11/24/2022] Open
Abstract
RNA structure in the influenza A virus (IAV) has been the focus of several studies that have shown connections between conserved secondary structure motifs and their biological function in the virus replication cycle. Questions have arisen on how to best recognize and understand the pandemic properties of IAV strains from an RNA perspective, but determination of the RNA secondary structure has been challenging. Herein, we used chemical mapping to determine the secondary structure of segment 8 viral RNA (vRNA) of the pandemic A/California/04/2009 (H1N1) strain of IAV. Additionally, this long, naturally occurring RNA served as a model to evaluate RNA mapping with 4-thiouridine (4sU) crosslinking. We explored 4-thiouridine as a probe of nucleotides in close proximity, through its incorporation into newly transcribed RNA and subsequent photoactivation. RNA secondary structural features both universal to type A strains and unique to the A/California/04/2009 (H1N1) strain were recognized. 4sU mapping confirmed and facilitated RNA structure prediction, according to several rules: 4sU photocross-linking forms efficiently in the double-stranded region of RNA with some flexibility, in the ends of helices, and across bulges and loops when their structural mobility is permitted. This method highlighted three-dimensional properties of segment 8 vRNA secondary structure motifs and allowed to propose several long-range three-dimensional interactions. 4sU mapping combined with chemical mapping and bioinformatic analysis could be used to enhance the RNA structure determination as well as recognition of target regions for antisense strategies or viral RNA detection.
Collapse
Affiliation(s)
| | - Maciej Pszczola
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Julita Piasecka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | | | - Ryszard Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
39
|
Bonilla SL, Sherlock ME, MacFadden A, Kieft JS. A viral RNA hijacks host machinery using dynamic conformational changes of a tRNA-like structure. Science 2021; 374:955-960. [PMID: 34793227 PMCID: PMC9033304 DOI: 10.1126/science.abe8526] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Viruses require multifunctional structured RNAs to hijack their host’s biochemistry, but their mechanisms can be obscured by the difficulty of solving conformationally dynamic RNA structures. Using cryo–electron microscopy (cryo-EM), we visualized the structure of the mysterious viral transfer RNA (tRNA)–like structure (TLS) from the brome mosaic virus, which affects replication, translation, and genome encapsidation. Structures in isolation and those bound to tyrosyl-tRNA synthetase (TyrRS) show that this ~55-kilodalton purported tRNA mimic undergoes large conformational rearrangements to bind TyrRS in a form that differs substantially from that of tRNA. Our study reveals how viral RNAs can use a combination of static and dynamic RNA structures to bind host machinery through highly noncanonical interactions, and we highlight the utility of cryo-EM for visualizing small, conformationally dynamic structured RNAs.
Collapse
Affiliation(s)
- Steve L. Bonilla
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Madeline E. Sherlock
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Andrea MacFadden
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jeffrey S. Kieft
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- RNA BioScience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO 10 80045, USA
| |
Collapse
|
40
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
41
|
Gao W, Jones TA, Rivas E. Discovery of 17 conserved structural RNAs in fungi. Nucleic Acids Res 2021; 49:6128-6143. [PMID: 34086938 PMCID: PMC8216456 DOI: 10.1093/nar/gkab355] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/25/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time. Consequently, the presence of conserved structure in multiple sequence alignments can be used to identify candidate functional non-coding RNAs. Here, we present a bioinformatics method that couples iterative homology search with covariation analysis to assess whether a genomic region has evidence of conserved RNA structure. We used this method to examine all unannotated regions of five well-studied fungal genomes (Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, Aspergillus fumigatus, and Schizosaccharomyces pombe). We identified 17 novel structurally conserved non-coding RNA candidates, which include four H/ACA box small nucleolar RNAs, four intergenic RNAs and nine RNA structures located within the introns and untranslated regions (UTRs) of mRNAs. For the two structures in the 3' UTRs of the metabolic genes GLY1 and MET13, we performed experiments that provide evidence against them being eukaryotic riboswitches.
Collapse
Affiliation(s)
- William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| | - Thomas A Jones
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| |
Collapse
|
42
|
Chen SC, Olsthoorn RCL, Yu CH. Structural phylogenetic analysis reveals lineage-specific RNA repetitive structural motifs in all coronaviruses and associated variations in SARS-CoV-2. Virus Evol 2021; 7:veab021. [PMID: 34141447 PMCID: PMC8206606 DOI: 10.1093/ve/veab021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In many single-stranded (ss) RNA viruses, the cis-acting packaging signal that confers selectivity genome packaging usually encompasses short structured RNA repeats. These structural units, termed repetitive structural motifs (RSMs), potentially mediate capsid assembly by specific RNA–protein interactions. However, general knowledge of the conservation and/or the diversity of RSMs in the positive-sense ssRNA coronaviruses (CoVs) is limited. By performing structural phylogenetic analysis, we identified a variety of RSMs in nearly all CoV genomic RNAs, which are exclusively located in the 5′-untranslated regions (UTRs) and/or in the inter-domain regions of poly-protein 1ab coding sequences in a lineage-specific manner. In all alpha- and beta-CoVs, except for Embecovirus spp, two to four copies of 5′-gUUYCGUc-3′ RSMs displaying conserved hexa-loop sequences were generally identified in Stem-loop 5 (SL5) located in the 5′-UTRs of genomic RNAs. In Embecovirus spp., however, two to eight copies of 5′-agc-3′/guAAu RSMs were found in the coding regions of non-structural protein (NSP) 3 and/or NSP15 in open reading frame (ORF) 1ab. In gamma- and delta-CoVs, other types of RSMs were found in several clustered structural elements in 5′-UTRs and/or ORF1ab. The identification of RSM-encompassing structural elements in all CoVs suggests that these RNA elements play fundamental roles in the life cycle of CoVs. In the recently emerged SARS-CoV-2, beta-CoV-specific RSMs are also found in its SL5, displaying two copies of 5′-gUUUCGUc-3′ motifs. However, multiple sequence alignment reveals that the majority of SARS-CoV-2 possesses a variant RSM harboring SL5b C241U, and intriguingly, several variations in the coding sequences of viral proteins, such as Nsp12 P323L, S protein D614G, and N protein R203K-G204R, are concurrently found with such variant RSM. In conclusion, the comprehensive exploration for RSMs reveals phylogenetic insights into the RNA structural elements in CoVs as a whole and provides a new perspective on variations currently found in SARS-CoV-2.
Collapse
Affiliation(s)
- Shih-Cheng Chen
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - René C L Olsthoorn
- Department of Supramolecular Biomaterials Chemistry, Leiden Institute of Chemistry, Gorlaeus Laboratories, Leiden University, Einsteinweg 55, 2333 CC, Leiden,The Netherlands
| | - Chien-Hung Yu
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| |
Collapse
|
43
|
Multi-omics annotation of human long non-coding RNAs. Biochem Soc Trans 2021; 48:1545-1556. [PMID: 32756901 DOI: 10.1042/bst20191063] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/05/2020] [Accepted: 07/07/2020] [Indexed: 12/12/2022]
Abstract
LncRNAs (long non-coding RNAs) are pervasively transcribed in the human genome and also extensively involved in a variety of essential biological processes and human diseases. The comprehensive annotation of human lncRNAs is of great significance in navigating the functional landscape of the human genome and deepening the understanding of the multi-featured RNA world. However, the unique characteristics of lncRNAs as well as their enormous quantity have complicated and challenged the annotation of lncRNAs. Advances in high-throughput sequencing technologies give rise to a large volume of omics data that are generated at an unprecedented rate and scale, providing possibilities in the identification, characterization and functional annotation of lncRNAs. Here, we review the recent important discoveries of human lncRNAs through analysis of various omics data and summarize specialized lncRNA database resources. Moreover, we highlight the multi-omics integrative analysis as a powerful strategy to efficiently discover and characterize the functional lncRNAs and elucidate their potential molecular mechanisms.
Collapse
|
44
|
Andrews RJ, O’Leary CA, Tompkins VS, Peterson JM, Haniff H, Williams C, Disney MD, Moss WN. A map of the SARS-CoV-2 RNA structurome. NAR Genom Bioinform 2021; 3:lqab043. [PMID: 34046592 PMCID: PMC8140738 DOI: 10.1093/nargab/lqab043] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/06/2021] [Accepted: 04/28/2021] [Indexed: 12/11/2022] Open
Abstract
SARS-CoV-2 has exploded throughout the human population. To facilitate efforts to gain insights into SARS-CoV-2 biology and to target the virus therapeutically, it is essential to have a roadmap of likely functional regions embedded in its RNA genome. In this report, we used a bioinformatics approach, ScanFold, to deduce the local RNA structural landscape of the SARS-CoV-2 genome with the highest likelihood of being functional. We recapitulate previously-known elements of RNA structure and provide a model for the folding of an essential frameshift signal. Our results find that SARS-CoV-2 is greatly enriched in unusually stable and likely evolutionarily ordered RNA structure, which provides a large reservoir of potential drug targets for RNA-binding small molecules. Results are enhanced via the re-analyses of publicly-available genome-wide biochemical structure probing datasets that are broadly in agreement with our models. Additionally, ScanFold was updated to incorporate experimental data as constraints in the analysis to facilitate comparisons between ScanFold and other RNA modelling approaches. Ultimately, ScanFold was able to identify eight highly structured/conserved motifs in SARS-CoV-2 that agree with experimental data, without explicitly using these data. All results are made available via a public database (the RNAStructuromeDB: https://structurome.bb.iastate.edu/sars-cov-2) and model comparisons are readily viewable at https://structurome.bb.iastate.edu/sars-cov-2-global-model-comparisons.
Collapse
Affiliation(s)
- Ryan J Andrews
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Hafeez S Haniff
- Department of Chemistry, The Scripps Research Institute, Jupiter, FL 33458, USA
| | | | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, Jupiter, FL 33458, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
45
|
Langeberg CJ, Sherlock ME, MacFadden A, Kieft JS. An expanded class of histidine-accepting viral tRNA-like structures. RNA (NEW YORK, N.Y.) 2021; 27:653-664. [PMID: 33811147 PMCID: PMC8127992 DOI: 10.1261/rna.078550.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 03/30/2021] [Indexed: 05/12/2023]
Abstract
Structured RNA elements are common in the genomes of RNA viruses, often playing critical roles during viral infection. Some viral RNA elements use forms of tRNA mimicry, but the diverse ways this mimicry can be achieved are poorly understood. Histidine-accepting tRNA-like structures (TLSHis) are examples found at the 3' termini of some positive-sense single-stranded RNA (+ssRNA) viruses where they interact with several host proteins, induce histidylation of the RNA genome, and facilitate processes important for infection, to include genome replication. As only five TLSHis examples had been reported, we explored the possible larger phylogenetic distribution and diversity of this TLS class using bioinformatic approaches. We identified many new examples of TLSHis, yielding a rigorous consensus sequence and secondary structure model that we validated by chemical probing of representative TLSHis RNAs. We confirmed new examples as authentic TLSHis by demonstrating their ability to be histidylated in vitro, then used mutational analyses to imply a tertiary interaction that is likely analogous to the D- and T-loop interaction found in canonical tRNAs. These results expand our understanding of how diverse RNA sequences achieve tRNA-like structure and function in the context of viral RNA genomes and lay the groundwork for high-resolution structural studies of tRNA mimicry by histidine-accepting TLSs.
Collapse
Affiliation(s)
- Conner J Langeberg
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, Colorado 80045, USA
| | - Madeline E Sherlock
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, Colorado 80045, USA
| | - Andrea MacFadden
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, Colorado 80045, USA
| | - Jeffrey S Kieft
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, Colorado 80045, USA
- RNA BioScience Initiative, University of Colorado Denver School of Medicine, Aurora, Colorado 80045, USA
| |
Collapse
|
46
|
Conserved long-range base pairings are associated with pre-mRNA processing of human genes. Nat Commun 2021; 12:2300. [PMID: 33863890 PMCID: PMC8052449 DOI: 10.1038/s41467-021-22549-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 03/20/2021] [Indexed: 02/07/2023] Open
Abstract
The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3'-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.
Collapse
|
47
|
Fremin BJ, Bhatt AS. Comparative genomics identifies thousands of candidate structured RNAs in human microbiomes. Genome Biol 2021; 22:100. [PMID: 33845850 PMCID: PMC8040213 DOI: 10.1186/s13059-021-02319-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 03/19/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Structured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches that search for motif structures in genomic sequence data. The human microbiome contains thousands of species and strains of microbes. Yet, much of the metagenomic data from the human microbiome remains unmined for structured RNA motifs primarily due to computational limitations. RESULTS We sought to apply a large-scale, comparative genomics approach to these organisms to identify candidate structured RNAs. With a carefully constructed, though computationally intensive automated analysis, we identify 3161 conserved candidate structured RNAs in intergenic regions, as well as 2022 additional candidate structured RNAs that may overlap coding regions. We validate the RNA expression of 177 of these candidate structures by analyzing small fragment RNA-seq data from four human fecal samples. CONCLUSIONS This approach identifies a wide variety of candidate structured RNAs, including tmRNAs, antitoxins, and likely ribosome protein leaders, from a wide variety of taxa. Overall, our pipeline enables conservative predictions of thousands of novel candidate structured RNAs from human microbiomes.
Collapse
Affiliation(s)
- Brayon J Fremin
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Ami S Bhatt
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
- Department of Medicine (Hematology), Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
48
|
Functional and structural basis of extreme conservation in vertebrate 5' untranslated regions. Nat Genet 2021; 53:729-741. [PMID: 33821006 PMCID: PMC8825242 DOI: 10.1038/s41588-021-00830-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 02/26/2021] [Indexed: 01/07/2023]
Abstract
The lack of knowledge about extreme conservation in genomes remains a major gap in our understanding of the evolution of gene regulation. Here, we reveal an unexpected role of extremely conserved 5' untranslated regions (UTRs) in noncanonical translational regulation that is linked to the emergence of essential developmental features in vertebrate species. Endogenous deletion of conserved elements within these 5' UTRs decreased gene expression, and extremely conserved 5' UTRs possess cis-regulatory elements that promote cell-type-specific regulation of translation. We further developed in-cell mutate-and-map (icM2), a new methodology that maps RNA structure inside cells. Using icM2, we determined that an extremely conserved 5' UTR encodes multiple alternative structures and that each single nucleotide within the conserved element maintains the balance of alternative structures important to control the dynamic range of protein expression. These results explain how extreme sequence conservation can lead to RNA-level biological functions encoded in the untranslated regions of vertebrate genomes.
Collapse
|
49
|
Rivas E. Evolutionary conservation of RNA sequence and structure. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1649. [PMID: 33754485 PMCID: PMC8250186 DOI: 10.1002/wrna.1649] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022]
Abstract
An RNA structure prediction from a single‐sequence RNA folding program is not evidence for an RNA whose structure is important for function. Random sequences have plausible and complex predicted structures not easily distinguishable from those of structural RNAs. How to tell when an RNA has a conserved structure is a question that requires looking at the evolutionary signature left by the conserved RNA. This question is important not just for long noncoding RNAs which usually lack an identified function, but also for RNA binding protein motifs which can be single stranded RNAs or structures. Here we review recent advances using sequence and structural analysis to determine when RNA structure is conserved or not. Although covariation measures assess structural RNA conservation, one must distinguish covariation due to RNA structure from covariation due to independent phylogenetic substitutions. We review a statistical test to measure false positives expected under the null hypothesis of phylogenetic covariation alone (specificity). We also review a complementary test that measures power, that is, expected covariation derived from sequence variation alone (sensitivity). Power in the absence of covariation signals the absence of a conserved RNA structure. We analyze artifacts that falsely identify conserved RNA structure such as the misuse of programs that do not assess significance, the use of inappropriate statistics confounded by signals other than covariation, or misalignments that induce spurious covariation. Among artifacts that obscure the signal of a conserved RNA structure, we discuss the inclusion of pseudogenes in alignments which increase power but destroy covariation. This article is categorized under:RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry RNA Evolution and Genomics > Computational Analyses of RNA RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
50
|
Abzhanova A, Hirschi A, Reiter NJ. An exon-biased biophysical approach and NMR spectroscopy define the secondary structure of a conserved helical element within the HOTAIR long non-coding RNA. J Struct Biol 2021; 213:107728. [PMID: 33753203 DOI: 10.1016/j.jsb.2021.107728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/16/2021] [Accepted: 03/17/2021] [Indexed: 11/16/2022]
Abstract
HOTAIR is a large, multi-exon spliced non-coding RNA proposed to function as a molecular scaffold and competes with chromatin to bind to histone modification enzymes. Previous sequence analysis and biochemical experiments identified potential conserved regions and characterized the full length HOTAIR secondary structure. Here, we examine the thermodynamic folding properties and structural propensity of the individual exonic regions of HOTAIR using an array of biophysical methods and NMR spectroscopy. We demonstrate that different exons of HOTAIR contain variable degrees of heterogeneity, and identify one exonic region, exon 4, that adopts a stable and compact fold under low magnesium concentrations. Close agreement of NMR spectroscopy and chemical probing unambiguously confirm conserved base pair interactions within the structural element, termed helix 10 of exon 4, located within domain I of human HOTAIR. This combined exon-biased and integrated biophysical approach introduces a new strategy to examine conformational heterogeneity in lncRNAs and emphasizes NMR as a key method to validate base pair interactions and corroborate large RNA secondary structures.
Collapse
Affiliation(s)
- Ainur Abzhanova
- Department of Chemistry, Marquette University, Milwaukee 53233, WI, United States
| | - Alexander Hirschi
- Department of Biochemistry, Vanderbilt University Medical Center, Nashville 37205-0146, TN, United States
| | - Nicholas J Reiter
- Department of Chemistry, Marquette University, Milwaukee 53233, WI, United States.
| |
Collapse
|