1
|
Automatic recognition of complementary strands: lessons regarding machine learning abilities in RNA folding. Front Genet 2023; 14:1254226. [PMID: 37732325 PMCID: PMC10507318 DOI: 10.3389/fgene.2023.1254226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 08/16/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction: Prediction of RNA secondary structure from single sequences still needs substantial improvements. The application of machine learning (ML) to this problem has become increasingly popular. However, ML algorithms are prone to overfitting, limiting the ability to learn more about the inherent mechanisms governing RNA folding. It is natural to use high-capacity models when solving such a difficult task, but poor generalization is expected when too few examples are available. Methods: Here, we report the relation between capacity and performance on a fundamental related problem: determining whether two sequences are fully complementary. Our analysis focused on the impact of model architecture and capacity as well as dataset size and nature on classification accuracy. Results: We observed that low-capacity models are better suited for learning with mislabelled training examples, while large capacities improve the ability to generalize to structurally dissimilar data. It turns out that neural networks struggle to grasp the fundamental concept of base complementarity, especially in lengthwise extrapolation context. Discussion: Given a more complex task like RNA folding, it comes as no surprise that the scarcity of useable examples hurdles the applicability of machine learning techniques to this field.
Collapse
|
2
|
D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences. J Mol Biol 2023; 435:168181. [PMID: 37468182 DOI: 10.1016/j.jmb.2023.168181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/21/2023]
Abstract
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
Collapse
|
3
|
The DynaSig-ML Python package: automated learning of biomolecular dynamics-function relationships. Bioinformatics 2023; 39:7133737. [PMID: 37079725 PMCID: PMC10130421 DOI: 10.1093/bioinformatics/btad180] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 03/09/2023] [Accepted: 03/30/2023] [Indexed: 04/22/2023] Open
Abstract
The DynaSig-ML ("Dynamical Signatures-Machine Learning") Python package allows the efficient, user-friendly exploration of 3D dynamics-function relationships in biomolecules, using datasets of experimental measures from large numbers of sequence variants. It does so by predicting 3D structural dynamics for every variant using the Elastic Network Contact Model (ENCoM), a sequence-sensitive coarse-grained normal mode analysis model. Dynamical Signatures represent the fluctuation at every position in the biomolecule and are used as features fed into machine learning models of the user's choice. Once trained, these models can be used to predict experimental outcomes for theoretical variants. The whole pipeline can be run with just a few lines of Python and modest computational resources. The compute-intensive steps are easily parallelized in the case of either large biomolecules or vast amounts of sequence variants. As an example application, we use the DynaSig-ML package to predict the maturation efficiency of human microRNA miR-125a variants from high-throughput enzymatic assays. AVAILABILITY DynaSig-ML is open-source software available at https://github.com/gregorpatof/dynasigml_package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
4
|
Sequence-sensitive elastic network captures dynamical features necessary for miR-125a maturation. PLoS Comput Biol 2022; 18:e1010777. [PMID: 36516216 PMCID: PMC9797095 DOI: 10.1371/journal.pcbi.1010777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 12/28/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open
Abstract
The Elastic Network Contact Model (ENCoM) is a coarse-grained normal mode analysis (NMA) model unique in its all-atom sensitivity to the sequence of the studied macromolecule and thus to the effect of mutations. We adapted ENCoM to simulate the dynamics of ribonucleic acid (RNA) molecules, benchmarked its performance against other popular NMA models and used it to study the 3D structural dynamics of human microRNA miR-125a, leveraging high-throughput experimental maturation efficiency data of over 26 000 sequence variants. We also introduce a novel way of using dynamical information from NMA to train multivariate linear regression models, with the purpose of highlighting the most salient contributions of dynamics to function. ENCoM has a similar performance profile on RNA than on proteins when compared to the Anisotropic Network Model (ANM), the most widely used coarse-grained NMA model; it has the advantage on predicting large-scale motions while ANM performs better on B-factors prediction. A stringent benchmark from the miR-125a maturation dataset, in which the training set contains no sequence information in common with the testing set, reveals that ENCoM is the only tested model able to capture signal beyond the sequence. This ability translates to better predictive power on a second benchmark in which sequence features are shared between the train and test sets. When training the linear regression model using all available data, the dynamical features identified as necessary for miR-125a maturation point to known patterns but also offer new insights into the biogenesis of microRNAs. Our novel approach combining NMA with multivariate linear regression is generalizable to any macromolecule for which relatively high-throughput mutational data is available.
Collapse
|
5
|
CAMAP: Artificial neural networks unveil the role of codon arrangement in modulating MHC-I peptides presentation. PLoS Comput Biol 2021; 17:e1009482. [PMID: 34679099 PMCID: PMC8577786 DOI: 10.1371/journal.pcbi.1009482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 11/09/2021] [Accepted: 09/27/2021] [Indexed: 12/02/2022] Open
Abstract
MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and neoplastic cells by CD8 T cells. However, accurately predicting the MAP repertoire remains difficult, because only a fraction of the transcriptome generates MAPs. In this study, we investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons (MCCs), while excluding the MCC per se. CAMAP predictions were significantly more accurate when using original codon sequences than shuffled codon sequences which reflect amino acid usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, transcript expression level and CAMAP scores was particularly useful to increase MAP prediction accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon arrangement in the regulation of MAP presentation and support integration of both translational and post-translational events in predictive algorithms to ameliorate modeling of the immunopeptidome. MHC-I associated peptides (MAPs) are small fragments of intracellular proteins presented at the surface of cells and used by the immune system to detect and eliminate cancerous or virus-infected cells. While it is theoretically possible to predict which portions of the intracellular proteins will be naturally processed by the cells to ultimately reach the surface, current methodologies have prohibitively high false discovery rates. Here we introduce an artificial neural network called Codon Arrangement MAP Predictor (CAMAP) which integrates information from mRNA-to-protein translation to other factors regulating MAP biogenesis (e.g. MAP ligand score and transcript expression levels) to improve MAP prediction accuracy. While most MAP predictive approaches focus on MAP sequences per se, CAMAP’s novelty is to analyze the MAP-flanking mRNA sequences, thereby providing completely independent information for MAP prediction. We show on several datasets that the integration of CAMAP scores with other known factors involved in MAP presentation (i.e. MAP ligand score and mRNA expression) significantly improves MAP prediction accuracy, and further validate CAMAP learned features using an in-vitro assay. These findings may have major implications for the design of vaccines against cancers and viruses, and in times of pandemics could accelerate the identification of relevant MAPs of viral origins.
Collapse
|
6
|
From transient recognition to efficient silencing: a RISCky business. Nat Struct Mol Biol 2020; 27:519-520. [PMID: 32472108 DOI: 10.1038/s41594-020-0451-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
A transcriptome-based approach to identify functional modules within and across primary human immune cells. PLoS One 2020; 15:e0233543. [PMID: 32469933 PMCID: PMC7259617 DOI: 10.1371/journal.pone.0233543] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 05/07/2020] [Indexed: 11/20/2022] Open
Abstract
Genome-wide transcriptomic analyses have provided valuable insight into fundamental biology and disease pathophysiology. Many studies have taken advantage of the correlation in the expression patterns of the transcriptome to infer a potential biologic function of uncharacterized genes, and multiple groups have examined the relationship between co-expression, co-regulation, and gene function on a broader scale. Given the unique characteristics of immune cells circulating in the blood, we were interested in determining whether it was possible to identify functional co-expression modules in human immune cells. Specifically, we sequenced the transcriptome of nine immune cell types from peripheral blood cells of healthy donors and, using a combination of global and targeted analyses of genes within co-expression modules, we were able to determine functions for these modules that were cell lineage-specific or shared among multiple cell lineages. In addition, our analyses identified transcription factors likely important for immune cell lineage commitment and/or maintenance.
Collapse
|
8
|
RNA-MoIP: prediction of RNA secondary structure and local 3D motifs from sequence data. Nucleic Acids Res 2019; 45:W440-W444. [PMID: 28525607 PMCID: PMC5793723 DOI: 10.1093/nar/gkx429] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 05/12/2017] [Indexed: 11/13/2022] Open
Abstract
RNA structures are hierarchically organized. The secondary structure is articulated around sophisticated local three-dimensional (3D) motifs shaping the full 3D architecture of the molecule. Recent contributions have identified and organized recurrent local 3D motifs, but applications of this knowledge for predictive purposes is still in its infancy. We recently developed a computational framework, named RNA-MoIP, to reconcile RNA secondary structure and local 3D motif information available in databases. In this paper, we introduce a web service using our software for predicting RNA hybrid 2D–3D structures from sequence data only. Optionally, it can be used for (i) local 3D motif prediction or (ii) the refinement of user-defined secondary structures. Importantly, our web server automatically generates a script for the MC-Sym software, which can be immediately used to quickly predict all-atom RNA 3D models. The web server is available at http://rnamoip.cs.mcgill.ca.
Collapse
|
9
|
Apoptotic endothelial cells release small extracellular vesicles loaded with immunostimulatory viral-like RNAs. Sci Rep 2019; 9:7203. [PMID: 31076589 PMCID: PMC6510763 DOI: 10.1038/s41598-019-43591-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 04/26/2019] [Indexed: 02/07/2023] Open
Abstract
Endothelial cells have multifaceted interactions with the immune system, both as initiators and targets of immune responses. In vivo, apoptotic endothelial cells release two types of extracellular vesicles upon caspase-3 activation: apoptotic bodies and exosome-like nanovesicles (ApoExos). Only ApoExos are immunogenic: their injection causes inflammation and autoimmunity in mice. Based on deep sequencing of total RNA, we report that apoptotic bodies and ApoExos are loaded with divergent RNA cargos that are not released by healthy endothelial cells. Apoptotic bodies, like endothelial cells, contain mainly ribosomal RNA whereas ApoExos essentially contain non-ribosomal non-coding RNAs. Endogenous retroelements, bearing viral-like features, represented half of total ApoExos RNA content. ApoExos also contained several copies of unedited Alu repeats and large amounts of non-coding RNAs with a demonstrated role in autoimmunity such as U1 RNA and Y RNA. Moreover, ApoExos RNAs had a unique nucleotide composition and secondary structure characterized by strong enrichment in U-rich motifs and unstably folded RNAs. Globally, ApoExos were therefore loaded with RNAs that can stimulate a variety of RIG-I-like receptors and endosomal TLRs. Hence, apoptotic endothelial cells selectively sort in ApoExos a diversified repertoire of immunostimulatory "self RNAs" that are tailor-made for initiation of innate immune responses and autoimmunity.
Collapse
|
10
|
The sequence features that define efficient and specific hAGO2-dependent miRNA silencing guides. Nucleic Acids Res 2018; 46:8181-8196. [PMID: 30239883 PMCID: PMC6144789 DOI: 10.1093/nar/gky546] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 05/10/2018] [Accepted: 06/05/2018] [Indexed: 01/18/2023] Open
Abstract
MicroRNAs (miRNAs) are ribonucleic acids (RNAs) of ∼21 nucleotides that interfere with the translation of messenger RNAs (mRNAs) and play significant roles in development and diseases. In bilaterian animals, the specificity of miRNA targeting is determined by sequence complementarity involving the seed. However, the role of the remaining nucleotides (non-seed) is only vaguely defined, impacting negatively on our ability to efficiently use miRNAs exogenously to control gene expression. Here, using reporter assays, we deciphered the role of the base pairs formed between the non-seed region and target mRNA. We used molecular modeling to reveal that this mechanism corresponds to the formation of base pairs mediated by ordered motions of the miRNA-induced silencing complex. Subsequently, we developed an algorithm based on this distinctive recognition to predict from sequence the levels of mRNA downregulation with high accuracy (r2 > 0.5, P-value < 10-12). Overall, our discovery improves the design of miRNA-guide sequences used to simultaneously downregulate the expression of multiple predetermined target genes.
Collapse
|
11
|
Corrigendum: RNA-MoIP: prediction of RNA secondary structure and local 3D motifs from sequence data. Nucleic Acids Res 2017; 45:W573. [PMID: 28666332 PMCID: PMC5570209 DOI: 10.1093/nar/gkx575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
12
|
RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA (NEW YORK, N.Y.) 2017; 23:655-672. [PMID: 28138060 PMCID: PMC5393176 DOI: 10.1261/rna.060368.116] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2016] [Accepted: 01/26/2017] [Indexed: 05/21/2023]
Abstract
RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5'-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson-Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/.
Collapse
|
13
|
Structural dynamics control the MicroRNA maturation pathway. Nucleic Acids Res 2016; 44:9956-9964. [PMID: 27651454 PMCID: PMC5175353 DOI: 10.1093/nar/gkw793] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Revised: 08/26/2016] [Accepted: 08/29/2016] [Indexed: 12/25/2022] Open
Abstract
MicroRNAs (miRNAs) are crucial gene expression regulators and first-order suspects in the development and progression of many diseases. Comparative analysis of cancer cell expression data highlights many deregulated miRNAs. Low expression of miR-125a was related to poor breast cancer prognosis. Interestingly, a single nucleotide polymorphism (SNP) in miR-125a was located within a minor allele expressed by breast cancer patients. The SNP is not predicted to affect the ground state structure of the primary transcript or precursor, but neither the precursor nor mature product is detected by RT-qPCR. How this SNP modulates the maturation of miR-125a is poorly understood. Here, building upon a model of RNA dynamics derived from nuclear magnetic resonance studies, we developed a quantitative model enabling the visualization and comparison of networks of transient structures. We observed a high correlation between the distances between networks of variants with that of their respective wild types and their relative degrees of maturation to the latter, suggesting an important role of transient structures in miRNA homeostasis. We classified the human miRNAs according to pairwise distances between their networks of transient structures.
Collapse
|
14
|
MiRBooking simulates the stoichiometric mode of action of microRNAs. Nucleic Acids Res 2015; 43:6730-8. [PMID: 26089388 PMCID: PMC4538818 DOI: 10.1093/nar/gkv619] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 06/02/2015] [Indexed: 12/21/2022] Open
Abstract
In eucaryotes, gene expression is regulated by microRNAs (miRNAs) which bind to messenger RNAs (mRNAs) and interfere with their translation into proteins, either by promoting their degradation or inducing their repression. We study the effect of miRNA interference on each gene using experimental methods, such as microarrays and RNA-seq at the mRNA level, or luciferase reporter assays and variations of SILAC at the protein level. Alternatively, computational predictions would provide clear benefits. However, no algorithm toward this task has ever been proposed. Here, we introduce a new algorithm to predict genome-wide expression data from initial transcriptome abundance. The algorithm simulates the miRNA and mRNA hybridization competition that occurs in given cellular conditions, and derives the whole set of miRNA::mRNA interactions at equilibrium (microtargetome). Interestingly, solving the competition improves the accuracy of miRNA target predictions. Furthermore, this model implements a previously reported and fundamental property of the microtargetome: the binding between a miRNA and a mRNA depends on their sequence complementarity, but also on the abundance of all RNAs expressed in the cell, i.e. the stoichiometry of all the miRNA sites and all the miRNAs given their respective abundance. This model generalizes the miRNA-induced synchronistic silencing previously observed, and described as sponges and competitive endogenous RNAs.
Collapse
|
15
|
RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures. RNA (NEW YORK, N.Y.) 2015; 21:1066-84. [PMID: 25883046 PMCID: PMC4436661 DOI: 10.1261/rna.049502.114] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2015] [Accepted: 02/12/2015] [Indexed: 05/04/2023]
Abstract
This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.
Collapse
|
16
|
Computational identification of RNA functional determinants by three-dimensional quantitative structure-activity relationships. Nucleic Acids Res 2014; 42:11261-71. [PMID: 25200082 PMCID: PMC4176186 DOI: 10.1093/nar/gku816] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Anti-infection drugs target vital functions of infectious agents, including their ribosome and other essential non-coding RNAs. One of the reasons infectious agents become resistant to drugs is due to mutations that eliminate drug-binding affinity while maintaining vital elements. Identifying these elements is based on the determination of viable and lethal mutants and associated structures. However, determining the structure of enough mutants at high resolution is not always possible. Here, we introduce a new computational method, MC-3DQSAR, to determine the vital elements of target RNA structure from mutagenesis and available high-resolution data. We applied the method to further characterize the structural determinants of the bacterial 23S ribosomal RNA sarcin–ricin loop (SRL), as well as those of the lead-activated and hammerhead ribozymes. The method was accurate in confirming experimentally determined essential structural elements and predicting the viability of new SRL variants, which were either observed in bacteria or validated in bacterial growth assays. Our results indicate that MC-3DQSAR could be used systematically to evaluate the drug-target potentials of any RNA sites using current high-resolution structural data.
Collapse
|
17
|
Towards 3D structure prediction of large RNA molecules: an integer programming framework to insert local 3D motifs in RNA secondary structure. Bioinformatics 2013; 28:i207-14. [PMID: 22689763 PMCID: PMC3371858 DOI: 10.1093/bioinformatics/bts226] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Motivation: The prediction of RNA 3D structures from its sequence only is a milestone to RNA function analysis and prediction. In recent years, many methods addressed this challenge, ranging from cycle decomposition and fragment assembly to molecular dynamics simulations. However, their predictions remain fragile and limited to small RNAs. To expand the range and accuracy of these techniques, we need to develop algorithms that will enable to use all the structural information available. In particular, the energetic contribution of secondary structure interactions is now well documented, but the quantification of non-canonical interactions—those shaping the tertiary structure—is poorly understood. Nonetheless, even if a complete RNA tertiary structure energy model is currently unavailable, we now have catalogues of local 3D structural motifs including non-canonical base pairings. A practical objective is thus to develop techniques enabling us to use this knowledge for robust RNA tertiary structure predictors. Results: In this work, we introduce RNA-MoIP, a program that benefits from the progresses made over the last 30 years in the field of RNA secondary structure prediction and expands these methods to incorporate the novel local motif information available in databases. Using an integer programming framework, our method refines predicted secondary structures (i.e. removes incorrect canonical base pairs) to accommodate the insertion of RNA 3D motifs (i.e. hairpins, internal loops and k-way junctions). Then, we use predictions as templates to generate complete 3D structures with the MC-Sym program. We benchmarked RNA-MoIP on a set of 9 RNAs with sizes varying from 53 to 128 nucleotides. We show that our approach (i) improves the accuracy of canonical base pair predictions; (ii) identifies the best secondary structures in a pool of suboptimal structures; and (iii) predicts accurate 3D structures of large RNA molecules. Availability:RNA-MoIP is publicly available at: http://csb.cs.mcgill.ca/RNAMoIP. Contact:jeromew@cs.mcgill.ca
Collapse
|
18
|
Abstract
We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.
Collapse
|
19
|
Determining RNA three-dimensional structures using low-resolution data. J Struct Biol 2012; 179:252-60. [PMID: 22387042 DOI: 10.1016/j.jsb.2011.12.024] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Revised: 11/29/2011] [Accepted: 12/06/2011] [Indexed: 11/25/2022]
Abstract
Knowing the 3-D structure of an RNA is fundamental to understand its biological function. Nowadays X-ray crystallography and NMR spectroscopy are systematically applied to newly discovered RNAs. However, the application of these high-resolution techniques is not always possible, and thus scientists must turn to lower resolution alternatives. Here, we introduce a pipeline to systematically generate atomic resolution 3-D structures that are consistent with low-resolution data sets. We compare and evaluate the discriminative power of a number of low-resolution experimental techniques to reproduce the structure of the Escherichia coli tRNA(VAL) and P4-P6 domain of the Tetrahymena thermophila group I intron. We test single and combinations of the most accessible low-resolution techniques, i.e. hydroxyl radical footprinting (OH), methidiumpropyl-EDTA (MPE), multiplexed hydroxyl radical cleavage (MOHCA), and small-angle X-ray scattering (SAXS). We show that OH-derived constraints are accurate to discriminate structures at the atomic level, whereas EDTA-based constraints apply to global shape determination. We provide a guide for choosing which experimental techniques or combination of thereof is best in which context. The pipeline represents an important step towards high-throughput low-resolution RNA structure determination.
Collapse
|
20
|
Approaches Targeting KV10.1 Open a Novel Window for Cancer Diagnosis and Therapy. Curr Med Chem 2012; 19:675-82. [DOI: 10.2174/092986712798992011] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Accepted: 10/19/2011] [Indexed: 11/22/2022]
|
21
|
RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles. Nucleic Acids Res 2010; 38:8149-63. [PMID: 20860998 PMCID: PMC3001093 DOI: 10.1093/nar/gkq804] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 08/24/2010] [Accepted: 08/30/2010] [Indexed: 01/25/2023] Open
Abstract
Tandem stretches of guanines can associate in hydrogen-bonded arrays to form G-quadruplexes, which are stabilized by K(+) ions. Using computational methods, we searched for G-Quadruplex Sequence (GQS) patterns in the model plant species Arabidopsis thaliana. We found ∼ 1200 GQS with a G(3) repeat sequence motif, most of which are located in the intergenic region. Using a Markov modeled genome, we determined that GQS are significantly underrepresented in the genome. Additionally, we found ∼ 43,000 GQS with a G(2) repeat sequence motif; notably, 80% of these were located in genic regions, suggesting that these sequences may fold at the RNA level. Gene Ontology functional analysis revealed that GQS are overrepresented in genes encoding proteins of certain functional categories, including enzyme activity. Conversely, GQS are underrepresented in other categories of genes, notably those for non-coding RNAs such as tRNAs and rRNAs. We also find that genes that are differentially regulated by drought are significantly more likely to contain a GQS. CD-detected K(+) titrations performed on representative RNAs verified formation of quadruplexes at physiological K(+) concentrations. Overall, this study indicates that GQS are present at unique locations in Arabidopsis and that folding of RNA GQS may play important roles in regulating gene expression.
Collapse
|
22
|
Abstract
Exploiting the experimental information from small-angle X-ray solution scattering (SAXS) in conjunction with structure prediction algorithms can be advantageous in the case of ribonucleic acids (RNA), where global restraints on the 3D fold are often lacking. Traditional usage of SAXS data often starts by attempting to reconstruct the molecular shape ab initio, which is subsequently used to assess the quality of a model. Here, an alternative strategy is explored whereby the models from a very large decoy set are directly sorted according to their fit to the SAXS data. For rapid computation of SAXS patterns, the method developed here makes use of a coarse-grained representation of RNA. It also accounts for the explicit treatment of the contribution to the scattering of water molecules and ions surrounding the RNA. The method, called Fast-SAXS-RNA, is first calibrated using a tRNA (tRNA-val) and then tested on the P4-P6 fragment of group I intron (P4-P6). Fast-SAXS-RNA is then used as a filter for decoy models generated by the MC-Fold and MC-Sym pipeline, a suite of RNA 3D all-atom structure algorithms that encode and exploit RNA 3D architectural principles. The ability of Fast-SAXS-RNA to discriminate native folds is tested against three widely used RNA molecules in molecular modeling benchmarks: the tRNA, the P4-P6, and a synthetic hairpin suspected to assemble into a homodimer. For each molecule, a large pool of decoys are generated, scored, and ranked using Fast-SAXS-RNA. The method is able to identify low-rmsd models among top ranking structures, for both tRNA and P4-P6. For the hairpin, the approach correctly identifies the dimeric state as the solution structure over the monomeric state and alternative secondary structures. The method offers a powerful strategy for recognizing native RNA conformations as well as multimeric assemblies and alternative secondary structures, thus enabling high-throughput RNA structure determination using SAXS data.
Collapse
|
23
|
Designing small multiple-target artificial RNAs. Nucleic Acids Res 2010; 38:e140. [PMID: 20453028 PMCID: PMC2910070 DOI: 10.1093/nar/gkq354] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2009] [Revised: 04/19/2010] [Accepted: 04/22/2010] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRNAs) are naturally occurring small RNAs that regulate the expression of several genes. MiRNAs' targeting rules are based on sequence complementarity between their mature products and targeted genes' mRNAs. Based on our present understanding of those rules, we developed an algorithm to design artificial miRNAs to target simultaneously a set of predetermined genes. To validate in silico our algorithm, we tested different sets of genes known to be targeted by a single miRNA. The algorithm finds the seed of the corresponding miRNA among the solutions, which also include the seeds of new artificial miRNA sequences potentially capable of targeting these genes as well. We also validated the functionality of some artificial miRNAs designed to target simultaneously members of the E2F family. These artificial miRNAs reproduced the effects of E2Fs inhibition in both normal human fibroblasts and prostate cancer cells where they inhibited cell proliferation and induced cellular senescence. We conclude that the current miRNA targeting rules based on the seed sequence work to design multiple-target artificial miRNAs. This approach may find applications in both research and therapeutics.
Collapse
|
24
|
Abstract
Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently -generated information arising from such efforts as genome sequencing projects, microarray analyses, structure determination and RNA SELEX experiments. While RNAML, an XML-based representation, has been proposed as an exchange format for a select subset of information, it lacks domain-specific semantics that are essential for answering questions that require expert knowledge. Here, we describe an RNA knowledge base (RKB) for structure-based knowledge using RDF/OWL Semantic Web technologies. RKB extends a number of ontologies and contains basic terminology for nucleic acid composition along with context/model-specific structural features such as sugar conformations, base pairings and base stackings. RKB (available at http://semanticscience.org/projects/rkb) is populated with PDB entries and MC-Annotate structural annotation. We show queries to the RKB using description logic reasoning, thus opening the door to question answering over independently-published RNA knowledge using Semantic Web technologies.
Collapse
|
25
|
Enzymaktivitäten und genetische Polymorphismen bei der Maus als Modelltier unter dem Einfluβ der Selektion auf Belastbarkeit und Proteinansatz1. ACTA ACUST UNITED AC 2010. [DOI: 10.1111/j.1439-0388.1981.tb00325.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
Preliminary selection results for the combination of total protein deposition and endurance in mice1. ACTA ACUST UNITED AC 2010. [DOI: 10.1111/j.1439-0388.1979.tb00217.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Enzymaktivitäten und genetische Polymorphismen bei der Maus als Modelltier unter dem Einfluß der Selektion auf Belastbarkeit und Proteinansatz. ACTA ACUST UNITED AC 2010. [DOI: 10.1111/j.1439-0388.1977.tb01534.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
28
|
|
29
|
New metrics for comparing and assessing discrepancies between RNA 3D structures and models. RNA (NEW YORK, N.Y.) 2009; 15:1875-85. [PMID: 19710185 PMCID: PMC2743038 DOI: 10.1261/rna.1700409] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
To benchmark progress made in RNA three-dimensional modeling and assess newly developed techniques, reliable and meaningful comparison metrics and associated tools are necessary. Generally, the average root-mean-square deviations (RMSDs) are quoted. However, RMSD can be misleading since errors are spread over the whole molecule and do not account for the specificity of RNA base interactions. Here, we introduce two new metrics that are particularly suitable to RNAs: the deformation index and deformation profile. The deformation index is calibrated by the interaction network fidelity, which considers base-base-stacking and base-base-pairing interactions within the target structure. The deformation profile highlights dissimilarities between structures at the nucleotide scale for both intradomain and interdomain interactions. Our results show that there is little correlation between RMSD and interaction network fidelity. The deformation profile is a tool that allows for rapid assessment of the origins of discrepancies.
Collapse
|
30
|
Recognition and coupling of A-to-I edited sites are determined by the tertiary structure of the RNA. Nucleic Acids Res 2009; 37:6916-26. [PMID: 19740768 PMCID: PMC2777444 DOI: 10.1093/nar/gkp731] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Adenosine-to-inosine (A-to-I) editing has been shown to be an important mechanism that increases protein diversity in the brain of organisms from human to fly. The family of ADAR enzymes converts some adenosines of RNA duplexes to inosines through hydrolytic deamination. The adenosine recognition mechanism is still largely unknown. Here, to investigate it, we analyzed a set of selectively edited substrates with a cluster of edited sites. We used a large set of individual transcripts sequenced by the 454 sequencing technique. On average, we analyzed 570 single transcripts per edited region at four different developmental stages from embryogenesis to adulthood. To our knowledge, this is the first time, large-scale sequencing has been used to determine synchronous editing events. We demonstrate that edited sites are only coupled within specific distances from each other. Furthermore, our results show that the coupled sites of editing are positioned on the same side of a helix, indicating that the three-dimensional structure is key in ADAR enzyme substrate recognition. Finally, we propose that editing by the ADAR enzymes is initiated by their attraction to one principal site in the substrate.
Collapse
|
31
|
Molecular basis of TRAP-5'SL RNA interaction in the Bacillus subtilis trp operon transcription attenuation mechanism. RNA (NEW YORK, N.Y.) 2009; 15:55-66. [PMID: 19033375 PMCID: PMC2612762 DOI: 10.1261/rna.1314409] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Accepted: 10/14/2008] [Indexed: 05/27/2023]
Abstract
Expression of the Bacillus subtilis trpEDCFBA operon is regulated by the interaction of tryptophan-activated TRAP with 11 (G/U)AG trinucleotide repeats that lie in the leader region of the nascent trp transcript. Bound TRAP prevents folding of an antiterminator structure and favors formation of an overlapping intrinsic terminator hairpin upstream of the trp operon structural genes. A 5'-stem-loop (5'SL) structure that forms just upstream of the triplet repeat region increases the affinity of TRAP-trp RNA interaction, thereby increasing the efficiency of transcription termination. Single-stranded nucleotides in the internal loop and in the hairpin loop of the 5'SL are important for TRAP binding. We show here that altering the distance between these two loops suggests that G7, A8, and A9 from the internal loop and A19 and G20 from the hairpin loop constitute two structurally discrete TRAP-binding regions. Photochemical cross-linking experiments also show that the hairpin loop of the 5'SL is in close proximity to the flexible loop region of TRAP during TRAP-5'SL interaction. The dimensions of B. subtilis TRAP and of a three-dimensional model of the 5'SL generated using the MC-Sym and MC-Fold pipeline imply that the 5'SL binds the protein in an orientation where the helical axis of the 5'SL is perpendicular to the plane of TRAP. This interaction not only increases the affinity of TRAP-trp leader RNA interaction, but also orients the downstream triplet repeats for interaction with the 11 KKR motifs that lie on TRAP's perimeter, increasing the likelihood that TRAP will bind in time to promote termination.
Collapse
|
32
|
The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008; 452:51-5. [PMID: 18322526 DOI: 10.1038/nature06684] [Citation(s) in RCA: 580] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2007] [Accepted: 01/11/2008] [Indexed: 12/17/2022]
Abstract
The classical RNA secondary structure model considers A.U and G.C Watson-Crick as well as G.U wobble base pairs. Here we substitute it for a new one, in which sets of nucleotide cyclic motifs define RNA structures. This model allows us to unify all base pairing energetic contributions in an effective scoring function to tackle the problem of RNA folding. We show how pipelining two computer algorithms based on nucleotide cyclic motifs, MC-Fold and MC-Sym, reproduces a series of experimentally determined RNA three-dimensional structures from the sequence. This demonstrates how crucial the consideration of all base-pairing interactions is in filling the gap between sequence and structure. We use the pipeline to define rules of precursor microRNA folding in double helices, despite the presence of a number of presumed mismatches and bulges, and to propose a new model of the human immunodeficiency virus-1 -1 frame-shifting element.
Collapse
|
33
|
Abstract
Substrate recognition by the VS ribozyme involves a magnesium-dependent loop/loop interaction between the SLI substrate and the SLV hairpin from the catalytic domain. Recent NMR studies of SLV demonstrated that magnesium ions stabilize a U-turn loop structure and trigger a conformational change for the extruded loop residue U700, suggesting a role for U700 in SLI recognition. Here, we kinetically characterized VS ribozyme mutants to evaluate the contribution of U700 and other SLV loop residues to SLI recognition. To help interpret the kinetic data, we structurally characterized the SLV mutants by NMR spectroscopy and generated a three-dimensional model of the SLI/SLV complex by homology modeling with MC-Sym. We demonstrated that the mutation of U700 by A, C, or G does not significantly affect ribozyme activity, whereas deletion of U700 dramatically impairs this activity. The U700 backbone is likely important for SLI recognition, but does not appear to be required for either the structural integrity of the SLV loop or for direct interactions with SLI. Thus, deletion of U700 may affect other aspects of SLI recognition, such as magnesium ion binding and SLV loop dynamics. As part of our NMR studies, we developed a convenient assay based on detection of unusual (31)P and (15)N N7 chemical shifts to probe the formation of U-turn structures in RNAs. Our model of the SLI/SLV complex, which is compatible with biochemical data, leads us to propose novel interactions at the loop I/loop V interface.
Collapse
|
34
|
A comparative analysis of the triloops in all high-resolution RNA structures reveals sequence structure relationships. RNA (NEW YORK, N.Y.) 2007; 13:1537-45. [PMID: 17652406 PMCID: PMC1950765 DOI: 10.1261/rna.597507] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Despite an increasing number of experimentally determined RNA structures, the gap between the number of structures and that of RNA families is still growing. To overcome this limitation, efficient and reliable RNA modeling methodologies must be developed. In order to reach this goal, here, we show how triloop sequence-structure relationships have been inferred through a systematic analysis of all triloops found in available high-resolution structures. The structural annotation of all triloops allowed us to define discrete states of the triloop's conformational space, and therefore an explicit sequence-to-structure relation. The sequence-structure relationships inferred from this explicit relation are presented in a convenient modeling table that provides a limited set of possible three-dimensional structures given any triloop sequence. The table is indexed by the two nucleotides that form the triloop's flanking base pair, since they are shown to provide the most information about the triloop three-dimensional structures. We also report the observations in the X-ray crystallographic structures of important conformational variations, which we believe might be the result of RNA dynamic.
Collapse
|
35
|
Abstract
The formation of beta-sheet domains in proteins involves five energetically important factors: the formation of networks of hydrogen bonds and hydrophobic faces, and the residue propensities, or preferences, to be found at the edges of the beta-sheet, to adopt the extended conformation, and to make contact with other residues. These relative energy contributions define a potential energy function. Here, we show how optimizing this potential energy function reveals the formation of hydrophobic faces as the utmost factor. The potential energy function was optimized to minimize the Z-scores of the native topologies among the exhaustive sets of over 400 different beta-sheets. These results corroborate with experimental data that showed the environment of a protein is an important modulator of beta-sheet folding. The contact propensities were found to be the least important, which could explain the poor predictive power of beta-strand alignment methods based on pair-wise contact matrices.
Collapse
|
36
|
Staufen1 regulates diverse classes of mammalian transcripts. EMBO J 2007; 26:2670-81. [PMID: 17510634 PMCID: PMC1888674 DOI: 10.1038/sj.emboj.7601712] [Citation(s) in RCA: 158] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 04/05/2007] [Indexed: 11/10/2022] Open
Abstract
It is currently unknown how extensively the double-stranded RNA-binding protein Staufen (Stau)1 is utilized by mammalian cells to regulate gene expression. To date, Stau1 binding to the 3'-untranslated region (3'-UTR) of ADP ribosylation factor (ARF)1 mRNA has been shown to target ARF1 mRNA for Stau1-mediated mRNA decay (SMD). ARF1 SMD depends on translation and recruitment of the nonsense-mediated mRNA decay factor Upf1 to the ARF1 3'-UTR by Stau1. Here, we demonstrate that Stau1 binds to a complex structure within the ARF1 3'-UTR. We also use microarrays to show that 1.1 and 1.0% of the 11 569 HeLa-cell transcripts that were analyzed are upregulated and downregulated, respectively, at least two-fold upon Stau1 depletion in three independently performed experiments. We localize the Stau1 binding site to the 3'-UTR of four mRNAs that we define as natural SMD targets. Additionally, we provide evidence that the efficiency of SMD increases during the differentiation of C2C12 myoblasts to myotubes. We propose that Stau1 influences the expression of a wide variety of physiologic transcripts and metabolic pathways.
Collapse
|
37
|
Abstract
A new approach, graph-grammars, to encode RNA tertiary structure patterns is introduced and exemplified with the classical sarcin–ricin motif. The sarcin–ricin motif is found in the stem of the crucial ribosomal loop E (also referred to as the sarcin–ricin loop), which is sensitive to the α-sarcin and ricin toxins. Here, we generate a graph-grammar for the sarcin-ricin motif and apply it to derive putative sequences that would fold in this motif. The biological relevance of the derived sequences is confirmed by a comparison with those found in known sarcin–ricin sites in an alignment of over 800 bacterial 23S ribosomal RNAs. The comparison raised alternative alignments in few sarcin–ricin sites, which were assessed using tertiary structure predictions and 3D modeling. The sarcin–ricin motif graph-grammar was built with indivisible nucleotide interaction cycles that were recently observed in structured RNAs. A comparison of the sequences and 3D structures of each cycle that constitute the sarcin–ricin motif gave us additional insights about RNA sequence–structure relationships. In particular, this analysis revealed the sequence space of an RNA motif depends on a structural context that goes beyond the single base pairing and base-stacking interactions.
Collapse
|
38
|
Abstract
The E2F family of transcription factors is essential in the regulation of the cell cycle and apoptosis. While the activity of E2F1-3 is tightly controlled by the retinoblastoma family of proteins, the expression of these factors is also regulated at the level of transcription, post-translational modifications and protein stability. Recently, a new level of regulation of E2Fs has been identified, where micro-RNAs (miRNAs) from the mir-17-92 cluster influence the translation of the E2F1 mRNA. We now report that miR-20a, a member of the mir-17-92 cluster, modulates the translation of the E2F2 and E2F3 mRNAs via binding sites in their 3'-untranslated region. We also found that the endogenous E2F1, E2F2, and E2F3 directly bind the promoter of the mir-17-92 cluster activating its transcription, suggesting an autoregulatory feedback loop between E2F factors and miRNAs from the mir-17-92 cluster. Our data also point toward an anti-apoptotic role for miR-20a, since overexpression of this miRNA decreased apoptosis in a prostate cancer cell line, while inhibition of miR-20a by an antisense oligonucleotide resulted in increased cell death after doxorubicin treatment. This anti-apoptotic role of miR-20a may explain some of the oncogenic capacities of the mir-17-92 cluster. Altogether, these results suggest that the autoregulation between E2F1-3 and miR-20a is important for preventing an abnormal accumulation of E2F1-3 and may play a role in the regulation of cellular proliferation and apoptosis.
Collapse
|
39
|
Abstract
A minimum cycle basis of the tertiary structure of a large ribosomal subunit (LSU) X-ray crystal structure was analyzed. Most cycles are small, as they are composed of 3- to 5 nt, and repeated across the LSU tertiary structure. We used hierarchical clustering to quantify and classify the 4 nt cycles. One class is defined by the GNRA tetraloop motif. The inspection of the GNRA class revealed peculiar instances in sequence. First is the presence of UA, CA, UC and CC base pairs that substitute the usual sheared GA base pair. Second is the revelation of GNR(Xn)A tetraloops, where Xn is bulged out of the classical GNRA structure, and of GN/RA formed by the two strands of interior-loops. We were able to unambiguously characterize the cycle classes using base stacking and base pairing annotations. The cycles identified correspond to small and cyclic motifs that compose most of the LSU RNA tertiary structure and contribute to its thermodynamic stability. Consequently, the RNA minimum cycles could well be used as the basic elements of RNA tertiary structure prediction methods.
Collapse
|
40
|
Abstract
Systematic protein folding studies depend on protein three-dimensional structure annotation, the assignment of amino acid structural types from atomic coordinates. Significant stabilizing factors between adjacent beta-sheet peptide chains have recently been characterized and were not considered during the development of previously published annotation methods. To produce an accurate beta-sheet domain catalog and to encompass the full beta-sheet spectacle, we developed a method, beta-Spider, which evaluates a packing energy between adjacent peptide chains in accordance with the newly discovered stabilizing factors. While considering important energetic factors, our approach also minimizes the use of subjective criteria, such as (phi,psi) boundaries and sets of H-bonding motifs that are used in other existing methods. As a result of the application of beta-Spider to a set of available high-resolution X-ray crystal structures, we present here a new beta-sheet catalog that differs considerably from the one produced by the most acclaimed DSSP method. The catalog includes new H-bonding motifs that were never reported.
Collapse
|
41
|
Abstract
The aim of the RNA Ontology Consortium (ROC) is to create an integrated conceptual framework-an RNA Ontology (RO)-with a common, dynamic, controlled, and structured vocabulary to describe and characterize RNA sequences, secondary structures, three-dimensional structures, and dynamics pertaining to RNA function. The RO should produce tools for clear communication about RNA structure and function for multiple uses, including the integration of RNA electronic resources into the Semantic Web. These tools should allow the accurate description in computer-interpretable form of the coupling between RNA architecture, function, and evolution. The purposes for creating the RO are, therefore, (1) to integrate sequence and structural databases; (2) to allow different computational tools to interoperate; (3) to create powerful software tools that bring advanced computational methods to the bench scientist; and (4) to facilitate precise searches for all relevant information pertaining to RNA. For example, one initial objective of the ROC is to define, identify, and classify RNA structural motifs described in the literature or appearing in databases and to agree on a computer-interpretable definition for each of these motifs. To achieve these aims, the ROC will foster communication and promote collaboration among RNA scientists by coordinating frequent face-to-face workshops to discuss, debate, and resolve difficult conceptual issues. These meeting opportunities will create new directions at various levels of RNA research. The ROC will work closely with the PDB/NDB structural databases and the Gene, Sequence, and Open Biomedical Ontology Consortia to integrate the RO with existing biological ontologies to extend existing content while maintaining interoperability.
Collapse
|
42
|
Identification of two distinct intracellular localization signals in STT3-B. Arch Biochem Biophys 2005; 445:108-14. [PMID: 16297371 DOI: 10.1016/j.abb.2005.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2005] [Revised: 09/30/2005] [Accepted: 10/04/2005] [Indexed: 12/25/2022]
Abstract
The STT3 subunit of the oligosaccharyltransferase complex plays a critical role in the N-glycosylation process. From Arabidopsis thaliana to Homo sapiens, two functional STT3 isoforms have been identified, STT3-A and STT3-B. We report that the last transmembrane (TM) segment of STT3-B corresponds to a topogenic determinant that is sufficient for proper integration and orientation of STT3-B C-terminal domain. Notably, the last TM segment of STT3-A and -B isoforms present major differences in amino acid sequence and predicted 3D structure. We also identified a bipartite nuclear targeting sequence in the C-terminal tail of STT3-B that is absent in STT3-A. The latter sequence is sufficient to induce nucleolar localization of a reporter protein. Our results show that STT3-A and -B display two structural differences that may have a drastic influence on their function and might account for the remarkable evolutionary conservation of the two STT3 paralogs.
Collapse
|
43
|
A 6374 unigene set corresponding to low abundance transcripts expressed following fertilization in Solanum chacoense Bitt, and characterization of 30 receptor-like kinases. PLANT MOLECULAR BIOLOGY 2005; 59:515-32. [PMID: 16235114 DOI: 10.1007/s11103-005-0536-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2005] [Accepted: 07/04/2005] [Indexed: 05/04/2023]
Abstract
In order to characterize regulatory genes that are expressed in ovule tissues after fertilization we have undertaken an EST sequencing project in Solanum chacoense, a self-incompatible wild potato species. Two cDNA libraries made from ovule tissues covering embryo development from zygote to late torpedo-stage were constructed and plated at high density on nylon membranes. To decrease EST redundancy and enrich for transcripts corresponding to weakly expressed genes a self-probe subtraction method was used to select the colonies harboring the genes to be sequenced. 7741 good sequences were obtained and, from these, 6374 unigenes were isolated. Thus, the self-probe subtraction resulted in a strong enrichment in singletons, a decrease in the number of clones per contigs, and concomitantly, an enrichment in the total number of unigenes obtained (82%). To gain insights into signal transduction events occurring during embryo development all the receptor-like kinases (or protein receptor kinases) were analyzed by quantitative real-time RT-PCR. Interestingly, 28 out of the 30 RLK isolated were predominantly expressed in ovary tissues or young developing fruits, and 23 were transcriptionaly induced following fertilization. Thus, the self-probe subtraction did not select for genes weakly expressed in the target tissue while being highly expressed elsewhere in the plant. Of the receptor-like kinases (RLK) genes isolated, the leucine-rich repeat (LRR) family of RLK was by far the most represented with 25 members covering 11 LRR classes.
Collapse
|
44
|
Identification of a conserved RNA motif essential for She2p recognition and mRNA localization to the yeast bud. Mol Cell Biol 2005; 25:4752-66. [PMID: 15899876 PMCID: PMC1140632 DOI: 10.1128/mcb.25.11.4752-4766.2005] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In Saccharomyces cerevisiae, over twenty mRNAs localize to the bud tip of daughter cells, playing roles in processes as different as mating type switching and plasma membrane targeting. The localization of these transcripts depends on interactions between a cis-acting localization element(s) or zipcodes and the RNA-binding protein She2p. While previous studies identified four different localization elements in the bud-localized ASH1 mRNA, the main determinants for She2p recognition are still unknown. To investigate the RNA-binding specificity of She2p, we isolated She2p-binding RNAs by in vivo selection from libraries of partially randomized ASH1 localization elements. The RNAs isolated contained a similar loop-stem-loop structure with a highly conserved CGA triplet in one loop and a single conserved cytosine in the other loop. Mutating these conserved nucleotides or the stem separating them resulted in the loss of She2p binding and in the delocalization of a reporter mRNA. Using this information, we identified the same RNA motif in two other known bud-localized transcripts, suggesting that this motif is conserved among bud-localized mRNAs. These results show that mRNAs with zipcodes lacking primary sequence similarity can rely on a few conserved nucleotides properly oriented in their three-dimensional structure in order to be recognized by the same localization machinery.
Collapse
|
45
|
Abstract
The PrP-like Doppel (Dpl) protein causes apoptotic death of cerebellar neurons in transgenic mice, a process prevented by expression of the wild type (wt) cellular prion protein, PrP(C). Internally deleted forms of PrP(C) resembling Dpl such as PrPDelta32-121 produce a similar PrP(C)-sensitive pro-apoptotic phenotype in transgenic mice. Here we demonstrate that these phenotypic attributes of wt Dpl, wt PrP(C), and PrPDelta132-121 can be accurately recapitulated by transfected mouse cerebellar granule cell cultures. This system was then explored by mutagenesis of the co-expressed prion proteins to reveal functional determinants. By this means, neuroprotective activity of wt PrP(C) was shown to be nullified by a deletion of the N-terminal charged region implicated in endocytosis and retrograde axonal transport (PrPDelta23-28), by deletion of all five octarepeats (PrPDelta51-90), or by glycine replacement of four octarepeat histidine residues required for selective binding of copper ions (Prnp"H/G"). In the case of Dpl, overlapping deletions defined a requirement for the gene interval encoding helices B and B' (DplDelta101-125). These data suggest contributions of copper binding and neuronal trafficking to wt PrP(C) function in vivo and place constraints upon current hypotheses to explain Dpl/PrP(C) antagonism by competitive ligand binding. Further implementation of this assay should provide a fuller understanding of the attributes and subcellular localizations required for activity of these enigmatic proteins.
Collapse
|
46
|
Abstract
ERPIN is an RNA motif identification program that takes an RNA sequence alignment as an input and identifies related sequences using a profile-based dynamic programming algorithm. ERPIN differs from other RNA motif search programs in its ability to capture subtle biases in the training set and produce highly specific and sensitive searches, while keeping CPU requirements at a practical level. In its latest version, ERPIN also computes E-values, which tell biologists how likely they are to encounter a specific sequence match by chance-a useful indication of biological significance. We present here the ERPIN online search interface (http://tagc.univ-mrs.fr/erpin/). This web server automatically performs ERPIN searches for different RNA genes or motifs, using predefined training sets and search parameters. With a couple of clicks, users can analyze an entire bacterial genome or a genomic segment of up to 5Mb for the presence of tRNAs, 5S rRNAs, SRP RNA, C/D box snoRNAs, hammerhead motifs, miRNAs and other motifs. Search results are displayed with sequence, score, position, E-value and secondary structure graphics. An example of a complete genome scan is provided, as well as an evaluation of run times and specificity/sensitivity information for all available motifs.
Collapse
|
47
|
Modifications and deletions of helices within the hairpin ribozyme-substrate complex: an active ribozyme lacking helix 1. RNA (NEW YORK, N.Y.) 2004; 10:395-402. [PMID: 14970385 PMCID: PMC1370935 DOI: 10.1261/rna.5650904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2003] [Accepted: 11/18/2003] [Indexed: 05/24/2023]
Abstract
Within the hairpin ribozyme, structural elements required for formation of the active tertiary structure are localized in two independently folding domains, each consisting of an internal loop flanked by helical elements. Here, we present results of a systematic examination of the relationship between the structure of the helical elements and the ability of the RNA to form the catalytically active tertiary structure. Deletions and mutational analyses indicate that helix 1 (H1) in domain A can be entirely eliminated, while segments of helices 2, 3, and 4 can also be deleted. From these results, we derive a new active minimal ribozyme that contains three helical elements, an internal loop, and a terminal loop. A three-dimensional model of this truncated ribozyme was generated using MC-SYM, and confirms that the catalytic core of the minimized construct can adopt a tertiary structure that is very similar to that of the nontruncated version. A new strategy is described to study the functional importance of various residues and chemical groups and to identify specific interdomain interactions. This approach uses two physically separated and truncated domains derived from the minimal motif.
Collapse
|
48
|
|
49
|
RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res 2002; 30:4250-63. [PMID: 12364604 PMCID: PMC140540 DOI: 10.1093/nar/gkf540] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The problem of systematic and objective identification of canonical and non-canonical base pairs in RNA three-dimensional (3D) structures was studied. A probabilistic approach was applied, and an algorithm and its implementation in a computer program that detects and analyzes all the base pairs contained in RNA 3D structures were developed. The algorithm objectively distinguishes among canonical and non-canonical base pairing types formed by three, two and one hydrogen bonds (H-bonds), as well as those containing bifurcated and C-H.X...H-bonds. The nodes of a bipartite graph are used to encode the donor and acceptor atoms of a 3D structure. The capacities of the edges correspond to probabilities computed from the geometry of the donor and acceptor groups to form H-bonds. The maximum flow from donors to acceptors directly identifies base pairs and their types. A complete repertoire of base pairing types was built from the detected H-bonds of all X-ray crystal structures of a resolution of 3.0 A or better, including the large and small ribosomal subunits. The base pairing types are labeled using an extension of the nomenclature recently introduced by Leontis and Westhof. The probabilistic method was implemented in MC-Annotate, an RNA structure analysis computer program used to determine the base pairing parameters of the 3D modeling system MC-Sym.
Collapse
|
50
|
Abstract
RNA is an important component of many biological processes, including DNA encapsidation of bacteriophage phi29 of Bacillus subtilis. Interestingly, the prohead RNA is involved in this encapsidation, and was found in monomer, dimer, pentamer and hexamer conformations. This article presents and debates current knowledge about the prohead RNA structures, mechanisms, and roles in DNA encapsidation. A new dimer structure is presented, and its specific role in DNA encapsidation is discussed.
Collapse
|