1
|
Mittal A, Ali SE, Mathews DH. Using the RNAstructure Software Package to Predict Conserved RNA Structures. Curr Protoc 2024; 4:e70054. [PMID: 39540715 DOI: 10.1002/cpz1.70054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
The structures of many non-coding RNAs (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. Here, we provide protocols for the use of four programs in the RNAstructure suite to predict conserved structures: Multilign, TurboFold, Dynalign, and PARTS. TurboFold iteratively aligns multiple homologous sequences and estimates the pairing probabilities for the conserved structure. Dynalign, PARTS, and Multilign are dynamic programming algorithms that simultaneously align sequences and identify the common secondary structure. Dynalign uses a pair of homologs and finds the lowest free energy common structure. PARTS uses a pair of homologs and estimates pairing probabilities from the base pairing probabilities estimated for each sequence. Multilign uses two or more homologs and finds the lowest free energy common structure using multiple pairwise calculations with Dynalign. It scales linearly with the number of sequences. We outline the strengths of each program. These programs can be run through web servers, on the command line, or with graphical user interfaces. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Predicting a structure conserved in three or more sequences with the RNAstructure web server Basic Protocol 2: Predicting a structure conserved in two sequences with the RNAstructure web server Alternative Protocol 1: Predicting a structure conserved in multiple sequences in the RNAstructure graphical user interface Alternative Protocol 2: Predicting a structure conserved in two sequences with Dynalign in the RNAstructure graphical user interface Alternative Protocol 3: Running TurboFold on the command line.
Collapse
Affiliation(s)
- Abhinav Mittal
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - Sara E Ali
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| |
Collapse
|
2
|
Dhar D, Mehanovic S, Moss W, Miller CL. Sequences at gene segment termini inclusive of untranslated regions and partial open reading frames play a critical role in mammalian orthoreovirus S gene packaging. PLoS Pathog 2024; 20:e1012037. [PMID: 38394338 PMCID: PMC10917250 DOI: 10.1371/journal.ppat.1012037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 03/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
Mammalian orthoreovirus (MRV) is a prototypic member of the Spinareoviridae family and has ten double-stranded RNA segments. One copy of each segment must be faithfully packaged into the mature virion, and prior literature suggests that nucleotides (nts) at the terminal ends of each gene likely facilitate their packaging. However, little is known about the precise packaging sequences required or how the packaging process is coordinated. Using a novel approach, we have determined that 200 nts at each terminus, inclusive of untranslated regions (UTR) and parts of the open reading frame (ORF), are sufficient for packaging S gene segments (S1-S4) individually and together into replicating virus. Further, we mapped the minimal sequences required for packaging the S1 gene segment into a replicating virus to 25 5' nts and 50 3' nts. The S1 UTRs, while not sufficient, were necessary for efficient packaging, as mutations of the 5' or 3' UTRs led to a complete loss of virus recovery. Using a second novel assay, we determined that 50 5' nts and 50 3' nts of S1 are sufficient to package a non-viral gene segment into MRV. The 5' and 3' termini of the S1 gene are predicted to form a panhandle structure and specific mutations within the stem of the predicted panhandle region led to a significant decrease in viral recovery. Additionally, mutation of six nts that are conserved across the three major serotypes of MRV that are predicted to form an unpaired loop in the S1 3' UTR, led to a complete loss of viral recovery. Overall, our data provide strong experimental proof that MRV packaging signals lie at the terminal ends of the S gene segments and offer support that the sequence requirements for efficient packaging of the S1 segment include a predicted panhandle structure and specific sequences within an unpaired loop in the 3' UTR.
Collapse
Affiliation(s)
- Debarpan Dhar
- Interdepartmental Microbiology Graduate Program, Iowa State University, Ames, Iowa, United States of America
- Department of Veterinary Microbiology and Preventive Medicine, College of Veterinary Medicine, Iowa State University, Ames, Iowa, United States of America
| | - Samir Mehanovic
- Department of Veterinary Microbiology and Preventive Medicine, College of Veterinary Medicine, Iowa State University, Ames, Iowa, United States of America
| | - Walter Moss
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States of America
| | - Cathy L. Miller
- Interdepartmental Microbiology Graduate Program, Iowa State University, Ames, Iowa, United States of America
- Department of Veterinary Microbiology and Preventive Medicine, College of Veterinary Medicine, Iowa State University, Ames, Iowa, United States of America
| |
Collapse
|
3
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
4
|
Abstract
RNAstructure is a user-friendly program for the prediction and analysis of RNA secondary structure. It is available as a web server, a program with a graphical user interface, or a set of command line tools. The programs are available for Microsoft Windows, macOS, or Linux. This article provides protocols for prediction of RNA secondary structure (using the web server, the graphical user interface, or the command line) and high-affinity oligonucleotide binding sites to a structured RNA target (using the graphical user interface). © 2023 Wiley Periodicals LLC. Basic Protocol 1: Predicting RNA secondary structure using the RNAstructure web server Alternate Protocol 1: Predicting secondary structure and base pair probabilities using the RNAstructure graphical user interface Alternate Protocol 2: Predicting secondary structure and base pair probabilities using the RNAstructure command line interface Basic Protocol 2: Predicting binding affinities of oligonucleotides complementary to an RNA target using OligoWalk.
Collapse
Affiliation(s)
- Sara E. Ali
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - Abhinav Mittal
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642
| |
Collapse
|
5
|
Dhar D, Mehanovic S, Moss W, Miller CL. Sequences at gene segment termini inclusive of untranslated regions and partial open reading frames play a critical role in mammalian orthoreovirus S gene packaging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.25.542362. [PMID: 37292944 PMCID: PMC10245979 DOI: 10.1101/2023.05.25.542362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Mammalian orthoreovirus (MRV) is a prototypic member of the Spinareoviridae family and has ten double-stranded RNA segments. One copy of each segment must be faithfully packaged into the mature virion, and prior literature suggests that nucleotides (nts) at the terminal ends of each gene likely facilitate their packaging. However, little is known about the precise packaging sequences required or how the packaging process is coordinated. Using a novel approach, we have determined that 200 nts at each terminus, inclusive of untranslated regions (UTR) and parts of the open reading frame (ORF), are sufficient for packaging each S gene segment (S1-S4) individually and together into replicating virus. Further, we mapped the minimal sequences required for packaging the S1 gene segment to 25 5' nts and 50 3' nts. The S1 UTRs alone are not sufficient, but are necessary for packaging, as mutations of the 5' or 3' UTRs led to a complete loss of virus recovery. Using a second novel assay, we determined that 50 5'nts and 50 3' nts of S1 are sufficient to package a non-viral gene segment into MRV. The 5' and 3' termini of the S1 gene are predicted to form a panhandle structure and specific mutations within the predicted stem of the panhandle region led to a significant decrease in viral recovery. Additionally, mutation of six nts that are conserved in the three major serotypes of MRV and are predicted to form an unpaired loop in the S1 3'UTR, led to a complete loss of viral recovery. Overall, our data provide strong experimental proof that MRV packaging signals lie at the terminal ends of the S gene segments and offer support that the sequence requirements for efficient packaging of the S1 segment include a predicted panhandle structure and specific sequences within an unpaired loop in the 3' UTR.
Collapse
|
6
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
7
|
O’Leary CA, Tompkins VS, Rouse WB, Nam G, Moss W. Thermodynamic and structural characterization of an EBV infected B-cell lymphoma transcriptome. NAR Genom Bioinform 2022; 4:lqac082. [PMID: 36285286 PMCID: PMC9585548 DOI: 10.1093/nargab/lqac082] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/12/2022] Open
Abstract
Epstein-Barr virus (EBV) is a widely prevalent human herpes virus infecting over 95% of all adults and is associated with a variety of B-cell cancers and induction of multiple sclerosis. EBV accomplishes this in part by expression of coding and noncoding RNAs and alteration of the host cell transcriptome. To better understand the structures which are forming in the viral and host transcriptomes of infected cells, the RNA structure probing technique Structure-seq2 was applied to the BJAB-B1 cell line (an EBV infected B-cell lymphoma). This resulted in reactivity profiles and secondary structural analyses for over 10000 human mRNAs and lncRNAs, along with 19 lytic and latent EBV transcripts. We report in-depth structural analyses for the human MYC mRNA and the human lncRNA CYTOR. Additionally, we provide a new model for the EBV noncoding RNA EBER2 and provide the first reported model for the EBV tandem terminal repeat RNA. In-depth thermodynamic and structural analyses were carried out with the motif discovery tool ScanFold and RNAfold prediction tool; subsequent covariation analyses were performed on resulting models finding various levels of support. ScanFold results for all analyzed transcripts are made available for viewing and download on the user-friendly RNAStructuromeDB.
Collapse
Affiliation(s)
- Collin A O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Warren B Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Gijong Nam
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
8
|
Biesiada M, Hu MY, Williams LD, Purzycka KJ, Petrov AS. rRNA expansion segment 7 in eukaryotes: from Signature Fold to tentacles. Nucleic Acids Res 2022; 50:10717-10732. [PMID: 36200812 PMCID: PMC9561286 DOI: 10.1093/nar/gkac844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/14/2022] Open
Abstract
The ribosomal core is universally conserved across the tree of life. However, eukaryotic ribosomes contain diverse rRNA expansion segments (ESs) on their surfaces. Sites of ES insertions are predicted from sites of insertion of micro-ESs in archaea. Expansion segment 7 (ES7) is one of the most diverse regions of the ribosome, emanating from a short stem loop and ranging to over 750 nucleotides in mammals. We present secondary and full-atom 3D structures of ES7 from species spanning eukaryotic diversity. Our results are based on experimental 3D structures, the accretion model of ribosomal evolution, phylogenetic relationships, multiple sequence alignments, RNA folding algorithms and 3D modeling by RNAComposer. ES7 contains a distinct motif, the 'ES7 Signature Fold', which is generally invariant in 2D topology and 3D structure in all eukaryotic ribosomes. We establish a model in which ES7 developed over evolution through a series of elementary and recursive growth events. The data are sufficient to support an atomic-level accretion path for rRNA growth. The non-monophyletic distribution of some ES7 features across the phylogeny suggests acquisition via convergent processes. And finally, illustrating the power of our approach, we constructed the 2D and 3D structure of the entire LSU rRNA of Mus musculus.
Collapse
Affiliation(s)
- Marcin Biesiada
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Michael Y Hu
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Katarzyna J Purzycka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Anton S Petrov
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
9
|
Yang SL, Ponti RD, Wan Y, Huber RG. Computational and Experimental Approaches to Study the RNA Secondary Structures of RNA Viruses. Viruses 2022; 14:1795. [PMID: 36016417 PMCID: PMC9415818 DOI: 10.3390/v14081795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/12/2022] [Accepted: 08/13/2022] [Indexed: 11/16/2022] Open
Abstract
Most pandemics of recent decades can be traced to RNA viruses, including HIV, SARS, influenza, dengue, Zika, and SARS-CoV-2. These RNA viruses impose considerable social and economic burdens on our society, resulting in a high number of deaths and high treatment costs. As these RNA viruses utilize an RNA genome, which is important for different stages of the viral life cycle, including replication, translation, and packaging, studying how the genome folds is important to understand virus function. In this review, we summarize recent advances in computational and high-throughput RNA structure-mapping approaches and their use in understanding structures within RNA virus genomes. In particular, we focus on the genome structures of the dengue, Zika, and SARS-CoV-2 viruses due to recent significant outbreaks of these viruses around the world.
Collapse
Affiliation(s)
- Siwy Ling Yang
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| | - Riccardo Delli Ponti
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore
| | - Yue Wan
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| | - Roland G. Huber
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138671, Singapore
| |
Collapse
|
10
|
Chen JC, Chen JP, Shen MW, Wornow M, Bae M, Yeh WH, Hsu A, Liu DR. Generating experimentally unrelated target molecule-binding highly functionalized nucleic-acid polymers using machine learning. Nat Commun 2022; 13:4541. [PMID: 35927274 PMCID: PMC9352670 DOI: 10.1038/s41467-022-31955-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 07/11/2022] [Indexed: 11/09/2022] Open
Abstract
In vitro selection queries large combinatorial libraries for sequence-defined polymers with target binding and reaction catalysis activity. While the total sequence space of these libraries can extend beyond 1022 sequences, practical considerations limit starting sequences to ≤~1015 distinct molecules. Selection-induced sequence convergence and limited sequencing depth further constrain experimentally observable sequence space. To address these limitations, we integrate experimental and machine learning approaches to explore regions of sequence space unrelated to experimentally derived variants. We perform in vitro selections to discover highly side-chain-functionalized nucleic acid polymers (HFNAPs) with potent affinities for a target small molecule (daunomycin KD = 5-65 nM). We then use the selection data to train a conditional variational autoencoder (CVAE) machine learning model to generate diverse and unique HFNAP sequences with high daunomycin affinities (KD = 9-26 nM), even though they are unrelated in sequence to experimental polymers. Coupling in vitro selection with a machine learning model thus enables direct generation of active variants, demonstrating a new approach to the discovery of functional biopolymers.
Collapse
Affiliation(s)
- Jonathan C. Chen
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| | - Jonathan P. Chen
- grid.512059.aWork conducted at Uber AI Labs, Uber Technologies, Inc., San Francisco, CA USA ,Meta Platforms, Menlo Park, CA USA
| | - Max W. Shen
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA ,grid.116068.80000 0001 2341 2786Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA USA
| | - Michael Wornow
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA
| | - Minwoo Bae
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA
| | - Wei-Hsi Yeh
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XProgram in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA USA
| | - Alvin Hsu
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| | - David R. Liu
- grid.66859.340000 0004 0546 1623Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA USA ,grid.38142.3c000000041936754XDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA USA ,grid.38142.3c000000041936754XHoward Hughes Medical Institute, Harvard University, Cambridge, MA USA
| |
Collapse
|
11
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
12
|
Yang TH, Lin YC, Hsia M, Liao ZY. SSRTool: a web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability. Comput Struct Biotechnol J 2022; 20:2473-2483. [PMID: 35664227 PMCID: PMC9136272 DOI: 10.1016/j.csbj.2022.05.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 05/13/2022] [Accepted: 05/13/2022] [Indexed: 01/02/2023] Open
Abstract
RNA secondary structures can carry out essential cellular functions alone or interact with one another to form the hierarchical tertiary structures. Experimental structure identification approa ches can show the in vitro structures of RNA molecules. However, they usually have limits in the resolution and are costly. In silico structure prediction tools are thus primarily relied on for pre-experiment analysis. Various structure prediction models have been developed over the decades. Since these tools are usually used before knowing the actual RNA structures, evaluating and ranking the pile of secondary structure predictions of a given sequence is essential in computational analysis. In this research, we implemented a web service called SSRTool (RNA Secondary Structure prediction Ranking Tool) to assist in the ranking and evaluation of the generated predicted structures of a given sequence. Based on the computed species-specific interpretability significance in four common RNA structure–function aspects, SSRTool provides three functions along with visualization interfaces: (1) Rank user-generated predictions. (2) Provide an automated streamline of structure prediction and ranking for a given sequence. (3) Infer the functional aspects of a given structure. We demonstrated the applicability of SSRTool via real case studies and reported the similar trends between computed species-specific rankings and the corresponding prediction F1 values. The SSRTool web service is available online at https://cobisHSS0.im.nuk.edu.tw/SSRTool/, http://cosbi3.ee.ncku.edu.tw/SSRTool/, or the redirecting site https://github.com/cobisLab/SSRTool/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
- Corresponding author.
| | - Yu-Cian Lin
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Min Hsia
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Zhan-Yi Liao
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
13
|
Winkler J, Urgese G, Ficarra E, Reinert K. LaRA 2: parallel and vectorized program for sequence-structure alignment of RNA sequences. BMC Bioinformatics 2022; 23:18. [PMID: 34991448 PMCID: PMC8734264 DOI: 10.1186/s12859-021-04532-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson-Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. RESULTS We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. CONCLUSIONS With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.
Collapse
Affiliation(s)
- Jörg Winkler
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Gianvito Urgese
- Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Elisa Ficarra
- Department of Control and Computer Science, Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Turin, Italy
| | - Knut Reinert
- Department of Mathematics and Computer Science, Free University Berlin, Takustraße 9, 14195 Berlin, Germany
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| |
Collapse
|
14
|
Zambrano RAI, Hernandez-Perez C, Takahashi MK. RNA Structure Prediction, Analysis, and Design: An Introduction to Web-Based Tools. Methods Mol Biol 2022; 2518:253-269. [PMID: 35666450 DOI: 10.1007/978-1-0716-2421-0_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Understanding RNA structure has become critical in the study of RNA in their roles as mediators of biological processes. To aid in these studies, computational algorithms that utilize thermodynamics have been developed to predict RNA secondary structure. Due to the importance of intermolecular interactions, the algorithms have been expanded to determine and predict RNA-RNA hybridization. This chapter discusses popular webservers with the tools for RNA secondary structure prediction, RNA-RNA hybridization, and design. We address key features that distinguish common-functioning programs and their purposes for the interests of the user. Ultimately, we hope this review elucidates web-based tools researchers may take advantage of in their investigations of RNA structure and function.
Collapse
Affiliation(s)
| | | | - Melissa K Takahashi
- Department of Biology, California State University Northridge, Northridge, CA, USA.
| |
Collapse
|
15
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
16
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
17
|
Abdelsattar AS, Mansour Y, Aboul-Ela F. The Perturbed Free-Energy Landscape: Linking Ligand Binding to Biomolecular Folding. Chembiochem 2021; 22:1499-1516. [PMID: 33351206 DOI: 10.1002/cbic.202000695] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/19/2020] [Indexed: 12/24/2022]
Abstract
The effects of ligand binding on biomolecular conformation are crucial in drug design, enzyme mechanisms, the regulation of gene expression, and other biological processes. Descriptive models such as "lock and key", "induced fit", and "conformation selection" are common ways to interpret such interactions. Another historical model, linked equilibria, proposes that the free-energy landscape (FEL) is perturbed by the addition of ligand binding energy for the bound population of biomolecules. This principle leads to a unified, quantitative theory of ligand-induced conformation change, building upon the FEL concept. We call the map of binding free energy over biomolecular conformational space the "binding affinity landscape" (BAL). The perturbed FEL predicts/explains ligand-induced conformational changes conforming to all common descriptive models. We review recent experimental and computational studies that exemplify the perturbed FEL, with emphasis on RNA. This way of understanding ligand-induced conformation dynamics motivates new experimental and theoretical approaches to ligand design, structural biology and systems biology.
Collapse
Affiliation(s)
- Abdallah S Abdelsattar
- Center for X-Ray Determination of the Structure of Matter, Zewail City of Science and Technology, Ahmed Zewail Road, October Gardens, 12578, Giza, Egypt
| | - Youssef Mansour
- Center for X-Ray Determination of the Structure of Matter, Zewail City of Science and Technology, Ahmed Zewail Road, October Gardens, 12578, Giza, Egypt
| | - Fareed Aboul-Ela
- Center for X-Ray Determination of the Structure of Matter, Zewail City of Science and Technology, Ahmed Zewail Road, October Gardens, 12578, Giza, Egypt
| |
Collapse
|
18
|
Tickner ZJ, Zhong G, Sheptack KR, Farzan M. Selection of High-Affinity RNA Aptamers That Distinguish between Doxycycline and Tetracycline. Biochemistry 2020; 59:3473-3486. [PMID: 32857495 DOI: 10.1021/acs.biochem.0c00586] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Oligonucleotide aptamers are found in prokaryotes and eukaryotes, and they can be selected from large synthetic libraries to bind protein or small-molecule ligands with high affinities and specificities. Aptamers can function as biosensors, as protein recognition elements, and as components of riboswitches allowing ligand-dependent control of gene expression. One of the best studied laboratory-selected aptamers binds the antibiotic tetracycline, but it binds with a much lower affinity to the closely related but more bioavailable antibiotic doxycycline. Here we report enrichment of doxycycline binding aptamers from a selectively randomized library of tetracycline aptamer variants over four selection rounds. Selected aptamers distinguish between doxycycline, which they bind with dissociation constants of approximately 7 nM, and tetracycline, which they bind undetectably. They thus function as orthogonal complements to the original tetracycline aptamer. Unexpectedly, doxycycline aptamers adopt a conformation distinct from that of the tetracycline aptamer and depend on constant regions originally installed as primer binding sites. We show that the fluorescence emission intensity of doxycycline increases upon aptamer binding, permitting their use as biosensors. This new class of aptamers can be used in multiple contexts where doxycycline detection, or doxycycline-mediated regulation, is necessary.
Collapse
Affiliation(s)
- Zachary J Tickner
- Department of Immunology and Microbiology, The Scripps Research Institute, Jupiter, Florida 33458, United States
| | - Guocai Zhong
- Department of Immunology and Microbiology, The Scripps Research Institute, Jupiter, Florida 33458, United States
| | - Kelly R Sheptack
- Department of Immunology and Microbiology, The Scripps Research Institute, Jupiter, Florida 33458, United States
| | - Michael Farzan
- Department of Immunology and Microbiology, The Scripps Research Institute, Jupiter, Florida 33458, United States
| |
Collapse
|
19
|
Bayegan AH, Clote P. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment. PLoS One 2020; 15:e0227177. [PMID: 31978147 PMCID: PMC6980424 DOI: 10.1371/journal.pone.0227177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 12/13/2019] [Indexed: 11/19/2022] Open
Abstract
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
Collapse
Affiliation(s)
- Amir H. Bayegan
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
- * E-mail:
| |
Collapse
|
20
|
Andrews RJ, Moss WN. Computational approaches for the discovery of splicing regulatory RNA structures. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1862:194380. [PMID: 31048028 DOI: 10.1016/j.bbagrm.2019.04.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 04/15/2019] [Accepted: 04/16/2019] [Indexed: 12/14/2022]
Abstract
Global RNA structure and local functional motifs mediate interactions important in determining the rates and patterns of mRNA splicing. In this review, we overview approaches for the computational prediction of RNA secondary structure with a special emphasis on the discovery of motifs important to RNA splicing. The process of identifying and modeling potential splicing regulatory structures is illustrated using a recently-developed approach for RNA structural motif discovery, the ScanFold pipeline, which is applied to the identification of a known splicing regulatory structure in influenza virus.
Collapse
Affiliation(s)
- Ryan J Andrews
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, 2437 Pammel Drive, Ames, IA 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, 2437 Pammel Drive, Ames, IA 50011, USA.
| |
Collapse
|
21
|
Mathews DH. How to benchmark RNA secondary structure prediction accuracy. Methods 2019; 162-163:60-67. [PMID: 30951834 DOI: 10.1016/j.ymeth.2019.04.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/24/2019] [Accepted: 04/01/2019] [Indexed: 11/18/2022] Open
Abstract
RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.
Collapse
Affiliation(s)
- David H Mathews
- Center for RNA Biology, Department of Biochemistry & Biophysics, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, United States.
| |
Collapse
|
22
|
Zhao M, Li W, Liu K, Li H, Lan X. C4-HSL aptamers for blocking qurom sensing and inhibiting biofilm formation in Pseudomonas aeruginosa and its structure prediction and analysis. PLoS One 2019; 14:e0212041. [PMID: 30779754 PMCID: PMC6380626 DOI: 10.1371/journal.pone.0212041] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/25/2019] [Indexed: 12/13/2022] Open
Abstract
This study aimed to screen DNA aptamers against the signal molecule C4-HSL of the rhl system for the inhibition of biofilm formation of Pseudomonas aeruginosa using an improved systematic evolution of ligand by exponential enrichment (SELEX) method based on a structure-switching fluorescent activating bead. The aptamers against the C4-HSL with a high affinity and specifity were successfully obtained and evaluated in real-time by this method. Results of biofilm inhibition experiments in vitro showed that the biofilm formation of P. aeruginosa was efficiently reduced to about 1/3 by the aptamers compared with that of the groups without the aptamers. Independent secondary structure simulation and computer-aided tertiary structure prediction (3dRNA) showed that the aptamers contained a highly conserved Y-shaped structural unit. Therefore, this study benefits the search for new methods for the detection and treatment of P. aeruginosa biofilm formation.
Collapse
Affiliation(s)
- Meng Zhao
- Second Military Medical University, Shanghai, China
- Institute for Laboratory Medicine, The 900th Hospital of Joint Service Support Force, Fuzhou, Fujian, China
| | - Weibin Li
- Institute for Laboratory Medicine, The 900th Hospital of Joint Service Support Force, Fuzhou, Fujian, China
| | - Kuancan Liu
- Institute for Laboratory Medicine, The 900th Hospital of Joint Service Support Force, Fuzhou, Fujian, China
| | - Huiling Li
- Institute for Laboratory Medicine, The 900th Hospital of Joint Service Support Force, Fuzhou, Fujian, China
| | - Xiaopeng Lan
- Second Military Medical University, Shanghai, China
- Institute for Laboratory Medicine, The 900th Hospital of Joint Service Support Force, Fuzhou, Fujian, China
- * E-mail:
| |
Collapse
|
23
|
Lin L, McKerrow WH, Richards B, Phonsom C, Lawrence CE. Characterization and visualization of RNA secondary structure Boltzmann ensemble via information theory. BMC Bioinformatics 2018; 19:82. [PMID: 29506466 PMCID: PMC5836418 DOI: 10.1186/s12859-018-2078-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 02/20/2018] [Indexed: 12/26/2022] Open
Abstract
Background The nearest neighbor model and associated dynamic programming algorithms allow for the efficient estimation of the RNA secondary structure Boltzmann ensemble. However because a given RNA secondary structure only contains a fraction of the possible helices that could form from a given sequence, the Boltzmann ensemble is multimodal. Several methods exist for clustering structures and finding those modes. However less focus is given to exploring the underlying reasons for this multimodality: the presence of conflicting basepairs. Information theory, or more specifically mutual information, provides a method to identify those basepairs that are key to the secondary structure. Results To this end we find most informative basepairs and visualize the effect of these basepairs on the secondary structure. Knowing whether a most informative basepair is present tells us not only the status of the particular pair but also provides a large amount of information about which other pairs are present or not present. We find that a few basepairs account for a large amount of the structural uncertainty. The identification of these pairs indicates small changes to sequence or stability that will have a large effect on structure. Conclusion We provide a novel algorithm that uses mutual information to identify the key basepairs that lead to a multimodal Boltzmann distribution. We then visualize the effect of these pairs on the overall Boltzmann ensemble.
Collapse
Affiliation(s)
- Luan Lin
- Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, 20993, MD, USA
| | - Wilson H McKerrow
- Division of Applied Mathematics, Brown University, Providence, 02912, RI, USA
| | | | - Chukiat Phonsom
- Department of Mathematics, University of Southern California, Los Angeles, 90089, CA, USA
| | - Charles E Lawrence
- Division of Applied Mathematics, Brown University, Providence, 02912, RI, USA.
| |
Collapse
|
24
|
Identification and functional characterization of bacterial small non-coding RNAs and their target: A review. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.01.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
25
|
Arslan AN, Anandan J, Fry E, Monschke K, Ganneboina N, Bowerman J. Efficient RNA structure comparison algorithms. J Bioinform Comput Biol 2017; 15:1740009. [DOI: 10.1142/s0219720017400091] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.
Collapse
Affiliation(s)
- Abdullah N. Arslan
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Jithendar Anandan
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Eric Fry
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Keith Monschke
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Nitin Ganneboina
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| | - Jason Bowerman
- Department of Computer Science, Texas A&M University-Commerce, Commerce, TX 75428, USA
| |
Collapse
|
26
|
Tan Z, Fu Y, Sharma G, Mathews DH. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res 2017; 45:11570-11581. [PMID: 29036420 PMCID: PMC5714223 DOI: 10.1093/nar/gkx815] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 09/12/2017] [Indexed: 12/26/2022] Open
Abstract
This paper presents TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II augments the structure prediction capabilities of TurboFold by additionally providing multiple sequence alignments. Probabilities for alignment of nucleotide positions between all pairs of input sequences are iteratively estimated in TurboFold II by incorporating information from both the sequence identity and secondary structures. A multiple sequence alignment is obtained from these probabilities by using a probabilistic consistency transformation and a hierarchically computed guide tree. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools, including methods that focus on alignment alone and methods that provide both alignment and structure prediction. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. TurboFold II is part of the RNAstructure software package, which is freely available for download at http://rna.urmc.rochester.edu under a GPL license.
Collapse
Affiliation(s)
- Zhen Tan
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA.,Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
| |
Collapse
|
27
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
28
|
Abstract
Deciphering the folding pathways and predicting the structures of complex three-dimensional biomolecules is central to elucidating biological function. RNA is single-stranded, which gives it the freedom to fold into complex secondary and tertiary structures. These structures endow RNA with the ability to perform complex chemistries and functions ranging from enzymatic activity to gene regulation. Given that RNA is involved in many essential cellular processes, it is critical to understand how it folds and functions in vivo. Within the last few years, methods have been developed to probe RNA structures in vivo and genome-wide. These studies reveal that RNA often adopts very different structures in vivo and in vitro, and provide profound insights into RNA biology. Nonetheless, both in vitro and in vivo approaches have limitations: studies in the complex and uncontrolled cellular environment make it difficult to obtain insight into RNA folding pathways and thermodynamics, and studies in vitro often lack direct cellular relevance, leaving a gap in our knowledge of RNA folding in vivo. This gap is being bridged by biophysical and mechanistic studies of RNA structure and function under conditions that mimic the cellular environment. To date, most artificial cytoplasms have used various polymers as molecular crowding agents and a series of small molecules as cosolutes. Studies under such in vivo-like conditions are yielding fresh insights, such as cooperative folding of functional RNAs and increased activity of ribozymes. These observations are accounted for in part by molecular crowding effects and interactions with other molecules. In this review, we report milestones in RNA folding in vitro and in vivo and discuss ongoing experimental and computational efforts to bridge the gap between these two conditions in order to understand how RNA folds in the cell.
Collapse
|
29
|
Abstract
Experimental probing data can be used to improve the accuracy of RNA secondary structure prediction. The software package RNAstructure can take advantage of enzymatic cleavage data, FMN cleavage data, traditional chemical modification reactivity data, and SHAPE reactivity data for secondary structure modeling. This chapter provides protocols for using experimental probing data with RNAstructure to restrain or constrain RNA secondary structure prediction.
Collapse
Affiliation(s)
- Zhenjiang Zech Xu
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY, 14642, USA.
| |
Collapse
|
30
|
Abstract
RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction.
Collapse
|
31
|
Abstract
RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs.
Collapse
|
32
|
Moss WN, Steitz JA. In silico discovery and modeling of non-coding RNA structure in viruses. Methods 2015; 91:48-56. [PMID: 26116541 DOI: 10.1016/j.ymeth.2015.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/17/2015] [Accepted: 06/22/2015] [Indexed: 11/30/2022] Open
Abstract
This review covers several computational methods for discovering structured non-coding RNAs in viruses and modeling their putative secondary structures. Here we will use examples from two target viruses to highlight these approaches: influenza A virus-a relatively small, segmented RNA virus; and Epstein-Barr virus-a relatively large DNA virus with a complex transcriptome. Each system has unique challenges to overcome and unique characteristics to exploit. From these particular cases, generically useful approaches can be derived for the study of additional viral targets.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA
| | - Joan A Steitz
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA.
| |
Collapse
|
33
|
Fu Y, Xu ZZ, Lu ZJ, Zhao S, Mathews DH. Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures. PLoS One 2015; 10:e0130200. [PMID: 26075601 PMCID: PMC4468099 DOI: 10.1371/journal.pone.0130200] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/17/2015] [Indexed: 01/05/2023] Open
Abstract
Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary.
Collapse
Affiliation(s)
- Yinghan Fu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Zhenjiang Zech Xu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Zhi J. Lu
- MOE Key Lab of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Shan Zhao
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, United States of America
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York, United States of America
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, New York, United States of America
- * E-mail:
| |
Collapse
|
34
|
Fu Y, Sharma G, Mathews DH. Dynalign II: common secondary structure prediction for RNA homologs with domain insertions. Nucleic Acids Res 2015; 42:13939-48. [PMID: 25416799 PMCID: PMC4267632 DOI: 10.1093/nar/gku1172] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Yinghan Fu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | - Gaurav Sharma
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
- To whom correspondence should be addressed. Tel: +1 585 275 1734; Fax: +1 585 275 6007;
| | - David H. Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Center for RNA Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 630, Rochester, NY 14642, USA
- To whom correspondence should be addressed. Tel: +1 585 275 1734; Fax: +1 585 275 6007;
| |
Collapse
|
35
|
The RNA structurome: transcriptome-wide structure probing with next-generation sequencing. Trends Biochem Sci 2015; 40:221-32. [DOI: 10.1016/j.tibs.2015.02.005] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 01/16/2023]
|
36
|
Sloma MF, Mathews DH. Improving RNA secondary structure prediction with structure mapping data. Methods Enzymol 2015; 553:91-114. [PMID: 25726462 DOI: 10.1016/bs.mie.2014.10.053] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Methods to probe RNA secondary structure, such as small molecule modifying agents, secondary structure-specific nucleases, inline probing, and SHAPE chemistry, are widely used to study the structure of functional RNA. Computational secondary structure prediction programs can incorporate probing data to predict structure with high accuracy. In this chapter, an overview of current methods for probing RNA secondary structure is provided, including modern high-throughput methods. Methods for guiding secondary structure prediction algorithms using these data are explained, and best practices for using these data are provided. This chapter concludes by listing a number of open questions about how to best use probing data, and what these data can provide.
Collapse
Affiliation(s)
- Michael F Sloma
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Box 712, Rochester, New York, USA; Center for RNA Biology, University of Rochester Medical Center, Box 712, Rochester, New York, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Box 712, Rochester, New York, USA; Center for RNA Biology, University of Rochester Medical Center, Box 712, Rochester, New York, USA.
| |
Collapse
|
37
|
Affiliation(s)
- David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center Rochester New York
| |
Collapse
|
38
|
Mathews DH. Using the RNAstructure Software Package to Predict Conserved RNA Structures. ACTA ACUST UNITED AC 2014; 46:12.4.1-12.4.22. [PMID: 24939126 DOI: 10.1002/0471250953.bi1204s46] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The structures of many non-coding RNA (ncRNA) are conserved by evolution to a greater extent than their sequences. By predicting the conserved structure of two or more homologous sequences, the accuracy of secondary structure prediction can be improved as compared to structure prediction for a single sequence. This unit provides protocols for the use of four programs in the RNAstructure suite for prediction of conserved structures, Multilign, TurboFold, Dynalign, and PARTS. These programs can be run via Web servers, on the command line, or with graphical interfaces.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York
| |
Collapse
|
39
|
Jabbari H, Condon A. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures. BMC Bioinformatics 2014; 15:147. [PMID: 24884954 PMCID: PMC4064103 DOI: 10.1186/1471-2105-15-147] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 05/08/2014] [Indexed: 12/12/2022] Open
Abstract
Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php.
Collapse
Affiliation(s)
- Hosna Jabbari
- Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, Canada.
| | | |
Collapse
|
40
|
The structural and phylogenetic profile of the 3' terminus of coxsackievirus B3 negative strand. Virus Res 2014; 188:81-9. [PMID: 24675276 DOI: 10.1016/j.virusres.2014.03.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2013] [Revised: 03/14/2014] [Accepted: 03/16/2014] [Indexed: 11/20/2022]
Abstract
In the replication process of RNA(+) viruses both the positive-strand template and the newly synthesized negative strand appear in a double-stranded form, RF. It has been shown for poliovirus that prior to the initiation of positive-strand synthesis, the 5'-terminus of the positive strand must adopt a cloverleaf structure. When that happens, the 3'-terminal region of the negative strand is released from the RF form and is able to form into its own defined structure. In order to determine the secondary structure of this region, a comprehensive approach consisting of experimental mapping methods, phylogenetic analysis and computer predictions was applied. Here we propose the first structural model of the 3'-terminal region of the coxsackievirus B3 (CV-B3) negative strand, approximately 450 nucleotides in length. The region folds into three highly defined structural domains, I'-III'. The most 3'-terminal part of this region is domain I', which folds into a cloverleaf structure similar to that found in the viral RNA strand of positive-polarity. Remarkably, this motif is conserved among all analyzed viral isolates of CV-B3 despite the observed sequence diversity. Several other conserved structural motifs within the 3'-terminal region of the viral negative strand were also identified. The structure of this region may be crucial for the replication complex assembly.
Collapse
|
41
|
Stern HA, Mathews DH. Accelerating calculations of RNA secondary structure partition functions using GPUs. Algorithms Mol Biol 2013; 8:29. [PMID: 24180434 PMCID: PMC4175106 DOI: 10.1186/1748-7188-8-29] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 10/14/2013] [Indexed: 01/06/2023] Open
Abstract
Background RNA performs many diverse functions in the cell in addition to its role as a messenger of genetic information. These functions depend on its ability to fold to a unique three-dimensional structure determined by the sequence. The conformation of RNA is in part determined by its secondary structure, or the particular set of contacts between pairs of complementary bases. Prediction of the secondary structure of RNA from its sequence is therefore of great interest, but can be computationally expensive. In this work we accelerate computations of base-pair probababilities using parallel graphics processing units (GPUs). Results Calculation of the probabilities of base pairs in RNA secondary structures using nearest-neighbor standard free energy change parameters has been implemented using CUDA to run on hardware with multiprocessor GPUs. A modified set of recursions was introduced, which reduces memory usage by about 25%. GPUs are fastest in single precision, and for some hardware, restricted to single precision. This may introduce significant roundoff error. However, deviations in base-pair probabilities calculated using single precision were found to be negligible compared to those resulting from shifting the nearest-neighbor parameters by a random amount of magnitude similar to their experimental uncertainties. For large sequences running on our particular hardware, the GPU implementation reduces execution time by a factor of close to 60 compared with an optimized serial implementation, and by a factor of 116 compared with the original code. Conclusions Using GPUs can greatly accelerate computation of RNA secondary structure partition functions, allowing calculation of base-pair probabilities for large sequences in a reasonable amount of time, with a negligible compromise in accuracy due to working in single precision. The source code is integrated into the RNAstructure software package and available for download at http://rna.urmc.rochester.edu.
Collapse
|
42
|
Priore SF, Kierzek E, Kierzek R, Baman JR, Moss WN, Dela-Moss LI, Turner DH. Secondary structure of a conserved domain in the intron of influenza A NS1 mRNA. PLoS One 2013; 8:e70615. [PMID: 24023714 PMCID: PMC3759394 DOI: 10.1371/journal.pone.0070615] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/22/2013] [Indexed: 12/04/2022] Open
Abstract
Influenza A virus is a segmented single-stranded (−)RNA virus that causes substantial annual morbidity and mortality. The transcriptome of influenza A is predicted to have extensive RNA secondary structure. The smallest genome segment, segment 8, encodes two proteins, NS1 and NEP, via alternative splicing. A conserved RNA domain in the intron of segment 8 may be important for regulating production of NS1. Two different multi-branch loop structures have been proposed for this region. A combination of in vitro chemical mapping and isoenergetic microarray techniques demonstrate that the consensus sequence for this region folds into a hairpin conformation. These results provide an alternative folding for this region and a foundation for designing experiments to probe its functional role in the influenza life cycle.
Collapse
Affiliation(s)
- Salvatore F. Priore
- Department of Chemistry and Center for RNA Biology, University of Rochester, Rochester, New York, United States of America
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Ryszard Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jayson R. Baman
- Department of Chemistry and Center for RNA Biology, University of Rochester, Rochester, New York, United States of America
| | - Walter N. Moss
- Department of Chemistry and Center for RNA Biology, University of Rochester, Rochester, New York, United States of America
| | - Lumbini I. Dela-Moss
- Department of Chemistry and Center for RNA Biology, University of Rochester, Rochester, New York, United States of America
| | - Douglas H. Turner
- Department of Chemistry and Center for RNA Biology, University of Rochester, Rochester, New York, United States of America
- * E-mail:
| |
Collapse
|
43
|
Bellaousov S, Reuter JS, Seetin MG, Mathews DH. RNAstructure: Web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res 2013; 41:W471-4. [PMID: 23620284 PMCID: PMC3692136 DOI: 10.1093/nar/gkt290] [Citation(s) in RCA: 298] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
RNAstructure is a software package for RNA secondary structure prediction and analysis. This contribution describes a new set of web servers to provide its functionality. The web server offers RNA secondary structure prediction, including free energy minimization, maximum expected accuracy structure prediction and pseudoknot prediction. Bimolecular secondary structure prediction is also provided. Additionally, the server can predict secondary structures conserved in either two homologs or more than two homologs. Folding free energy changes can be predicted for a given RNA structure using nearest neighbor rules. Secondary structures can be compared using circular plots or the scoring methods, sensitivity and positive predictive value. Additionally, structure drawings can be rendered as SVG, postscript, jpeg or pdf. The web server is freely available for public use at: http://rna.urmc.rochester.edu/RNAstructureWeb.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA
| | | | | | | |
Collapse
|
44
|
Puton T, Kozlowski LP, Rother KM, Bujnicki JM. CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res 2013; 41:4307-23. [PMID: 23435231 PMCID: PMC3627593 DOI: 10.1093/nar/gkt101] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
Collapse
Affiliation(s)
- Tomasz Puton
- Bioinformatics Laboratory, Institute for Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, 61-614 Poznan, Poland
| | | | | | | |
Collapse
|
45
|
Fu Y, Xu Z, Lu ZJ, Zhao S, Mathews DH. 31 Discovery of novel ncRNA by scanning multiple genome alignments. J Biomol Struct Dyn 2013. [DOI: 10.1080/07391102.2013.786463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
46
|
Ghiselli F, Milani L, Guerra D, Chang PL, Breton S, Nuzhdin SV, Passamonti M. Structure, transcription, and variability of metazoan mitochondrial genome: perspectives from an unusual mitochondrial inheritance system. Genome Biol Evol 2013; 5:1535-54. [PMID: 23882128 PMCID: PMC3762199 DOI: 10.1093/gbe/evt112] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2013] [Indexed: 12/13/2022] Open
Abstract
Despite its functional conservation, the mitochondrial genome (mtDNA) presents strikingly different features among eukaryotes, such as size, rearrangements, and amount of intergenic regions. Nonadaptive processes such as random genetic drift and mutation rate play a fundamental role in shaping mtDNA: the mitochondrial bottleneck and the number of germ line replications are critical factors, and different patterns of germ line differentiation could be responsible for the mtDNA diversity observed in eukaryotes. Among metazoan, bivalve mollusc mtDNAs show unusual features, like hypervariable gene arrangements, high mutation rates, large amount of intergenic regions, and, in some species, an unique inheritance system, the doubly uniparental inheritance (DUI). The DUI system offers the possibility to study the evolutionary dynamics of mtDNAs that, despite being in the same organism, experience different genetic drift and selective pressures. We used the DUI species Ruditapes philippinarum to study intergenic mtDNA functions, mitochondrial transcription, and polymorphism in gonads. We observed: 1) the presence of conserved functional elements and novel open reading frames (ORFs) that could explain the evolutionary persistence of intergenic regions and may be involved in DUI-specific features; 2) that mtDNA transcription is lineage-specific and independent from the nuclear background; and 3) that male-transmitted and female-transmitted mtDNAs have a similar amount of polymorphism but of different kinds, due to different population size and selection efficiency. Our results are consistent with the hypotheses that mtDNA evolution is strongly dependent on the dynamics of germ line formation, and that the establishment of a male-transmitted mtDNA lineage can increase male fitness through selection on sperm function.
Collapse
Affiliation(s)
- Fabrizio Ghiselli
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali (BiGeA), Università di Bologna, Bologna, Italy.
| | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
RNA is now appreciated to serve numerous cellular roles, and understanding RNA structure is important for understanding a mechanism of action. This contribution discusses the methods available for predicting RNA structure. Secondary structure is the set of the canonical base pairs, and secondary structure can be accurately determined by comparative sequence analysis. Secondary structure can also be predicted. The most commonly used method is free energy minimization. The accuracy of structure prediction is improved either by using experimental mapping data or by predicting a structure conserved in a set of homologous sequences. Additionally, tertiary structure, the three-dimensional arrangement of atoms, can be modeled with guidance from comparative analysis and experimental techniques. New approaches are also available for predicting tertiary structure.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY, USA
| | | |
Collapse
|
48
|
Xu Z, Almudevar A, Mathews DH. Statistical evaluation of improvement in RNA secondary structure prediction. Nucleic Acids Res 2011; 40:e26. [PMID: 22139940 PMCID: PMC3287165 DOI: 10.1093/nar/gkr1081] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example.
Collapse
Affiliation(s)
- Zhenjiang Xu
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | | | | |
Collapse
|
49
|
Abstract
RNAstructure is a user-friendly program for the prediction and analysis of RNA secondary structure under Microsoft Windows. This unit provides protocols for RNA secondary structure prediction and prediction of high-affinity oligonucleotide binding sites to a structured RNA target.
Collapse
Affiliation(s)
- David H Mathews
- University of Rochester Medical Center, Rochester, New York, USA
| |
Collapse
|