1
|
RNA Secondary Structures with Given Motif Specification: Combinatorics and Algorithms. Bull Math Biol 2023; 85:21. [PMID: 36780044 DOI: 10.1007/s11538-023-01128-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/26/2023] [Indexed: 02/14/2023]
Abstract
The study of native motifs of RNA secondary structures helps us better understand the formation and eventually the functions of these molecules. Commonly known structural motifs include helices, hairpin loops, bulges, interior loops, exterior loops and multiloops. However, enumerative results and generating algorithms taking into account the joint distribution of these motifs are sparse. In this paper, we present progress on deriving such distributions employing a tree-bijection of RNA secondary structures obtained by Schmitt and Waterman and a novel rake decomposition of plane trees. The key feature of the latter is that the derived components encode motifs of the RNA secondary structures without pseudoknots associated with the plane trees very well. As an application, we present an algorithm (RakeSamp) generating uniformly random secondary structures without pseudoknots that satisfy fine motif specifications on the length and degree of various types of loops as well as helices.
Collapse
|
2
|
Guo Q, Jin Y, Li M, Sun LH, Xu Y. On the Number of Saturated and Optimal Extended 2-Regular Simple Stacks in the Nussinov-Jacobson Energy Model. J Comput Biol 2022; 29:425-440. [PMID: 35353583 DOI: 10.1089/cmb.2021.0421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
It is known that both RNA secondary structure and protein contact map can be presented using combinatorial diagrams, the combinatorial enumeration and related problems of which have been studied extensively. Motivated by previous enumeration works on saturated RNA secondary structures and extended stack structures of protein contact maps, we are interested in the enumeration problems of saturated and optimal extended stacks in the Nussinov-Jacobson energy model, in which each base pair contributes energy -1. Then optimal structures are those with most arcs, and locally optimal structures are exactly the saturated structures, in which no more arcs can be added without violating the structure definition. For saturated extended 2-regular simple stacks, whose degree configuration is related to the protein fold in two-dimensional honeycomb lattice, we obtain generating function equation and asymptotic formula for its number. Moreover, an explicit formula for the number of optimal extended 2-regular simple stacks is also obtained.
Collapse
Affiliation(s)
- Qianghui Guo
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Yinglie Jin
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Mengqin Li
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Lisa Hui Sun
- Center for Combinatorics, The Key Laboratory of Pure Mathematics and Combinatories of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Yanyan Xu
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| |
Collapse
|
3
|
Ebrahimi P, Kaur S, Baronti L, Petzold K, Chen AA. A two-dimensional replica-exchange molecular dynamics method for simulating RNA folding using sparse experimental restraints. Methods 2019; 162-163:96-107. [PMID: 31059830 DOI: 10.1016/j.ymeth.2019.05.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 05/01/2019] [Indexed: 10/26/2022] Open
Abstract
We present a 2D replica exchange protocol incorporating secondary structure information to dramatically improve 3D RNA folding using molecular dynamics simulations. We show that incorporating base-pairing restraints into all-atom, explicit solvent simulations enables the accurate recapitulation of the global tertiary fold for 4 representative RNAs ranging in length from 24 to 68 nt. This method can potentially utilize base-pairing information from a wide variety of experimental inputs to predict complex RNA tertiary folds including pseudoknots, multi-loop junctions, and non-canonical interactions.
Collapse
Affiliation(s)
- Parisa Ebrahimi
- Department of Chemistry, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA
| | - Simi Kaur
- Department of Chemistry, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA
| | - Lorenzo Baronti
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Katja Petzold
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Alan A Chen
- Department of Chemistry, University at Albany, State University of New York, 1400 Washington Avenue, Albany, NY 12222, USA; The RNA Institute, University at Albany, State University of New York, Albany, NY, USA.
| |
Collapse
|
4
|
Guo QH, Sun LH. Combinatorics of Contacts in Protein Contact Maps. Bull Math Biol 2017; 80:385-403. [PMID: 29230703 DOI: 10.1007/s11538-017-0380-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 12/05/2017] [Indexed: 10/18/2022]
Abstract
Contacts play a fundamental role in the study of protein structure and folding problems. The contact map of a protein can be represented by arranging its amino acids on a horizontal line and drawing an arc between two residues if they form a contact. In this paper, we are mainly concerned with the combinatorial enumeration of the arcs in m-regular linear stack, an elementary structure of the protein contact map, which was introduced by Chen et al. (J Comput Biol 21(12):915-935, 2014). We modify the generating function for m-regular linear stacks by introducing a new variable y regarding to the number of arcs and obtain an equation satisfied by the generating function for m-regular linear stacks with n vertices and k arcs. Consequently, we also derive an equation satisfied by the generating function of the overall number of arcs in m-regular linear stacks with n vertices.
Collapse
Affiliation(s)
- Qiang-Hui Guo
- Center for Combinatorics, LPMC, Nankai University, Tianjin, 300071, People's Republic of China
| | - Lisa H Sun
- Center for Combinatorics, LPMC, Nankai University, Tianjin, 300071, People's Republic of China.
| |
Collapse
|
5
|
Albrecht AA, Day L, Abdelhadi Ep Souki O, Steinhöfel K. A new heuristic method for approximating the number of local minima in partial RNA energy landscapes. Comput Biol Chem 2015; 60:43-52. [PMID: 26657221 DOI: 10.1016/j.compbiolchem.2015.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Revised: 10/17/2015] [Accepted: 11/10/2015] [Indexed: 11/28/2022]
Abstract
The analysis of energy landscapes plays an important role in mathematical modelling, simulation and optimisation. Among the main features of interest are the number and distribution of local minima within the energy landscape. Granier and Kallel proposed in 2002 a new sampling procedure for estimating the number of local minima. In the present paper, we focus on improved heuristic implementations of the general framework devised by Granier and Kallel with regard to run-time behaviour and accuracy of predictions. The new heuristic method is demonstrated for the case of partial energy landscapes induced by RNA secondary structures. While the computation of minimum free energy RNA secondary structures has been studied for a long time, the analysis of folding landscapes has gained momentum over the past years in the context of co-transcriptional folding and deeper insights into cell processes. The new approach has been applied to ten RNA instances of length between 99 nt and 504 nt and their respective partial energy landscapes defined by secondary structures within an energy offset ΔE above the minimum free energy conformation. The number of local minima within the partial energy landscapes ranges from 1440 to 3441. Our heuristic method produces for the best approximations on average a deviation below 3.0% from the true number of local minima.
Collapse
Affiliation(s)
| | - Luke Day
- King's College London, Informatics Department, London, UK
| | | | | |
Collapse
|
6
|
Clote P, Kranakis E, Krizanc D. Asymptotic structural properties of quasi-random saturated structures of RNA. Algorithms Mol Biol 2013; 8:24. [PMID: 24156624 PMCID: PMC3818986 DOI: 10.1186/1748-7188-8-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 09/21/2013] [Indexed: 01/18/2023] Open
Abstract
Background RNA folding depends on the distribution of kinetic traps in the landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In previous work, we investigated asymptotic combinatorics of both random saturated structures and of quasi-random saturated structures, where the latter are constructed by a natural stochastic process. Results We prove that for quasi-random saturated structures with the uniform distribution, the asymptotic expected number of external loops is O(logn) and the asymptotic expected maximum stem length is O(logn), while under the Zipf distribution, the asymptotic expected number of external loops is O(log2n) and the asymptotic expected maximum stem length is O(logn/log logn). Conclusions Quasi-random saturated structures are generated by a stochastic greedy method, which is simple to implement. Structural features of random saturated structures appear to resemble those of quasi-random saturated structures, and the latter appear to constitute a class for which both the generation of sampled structures as well as a combinatorial investigation of structural features may be simpler to undertake.
Collapse
|
7
|
Clote P, Kranakis E, Krizanc D. Asymptotic number of hairpins of saturated RNA secondary structures. Bull Math Biol 2013; 75:2410-30. [PMID: 24142625 DOI: 10.1007/s11538-013-9899-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 08/22/2013] [Indexed: 10/26/2022]
Abstract
In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685[Symbol: see text]n (-3/2)[Symbol: see text]1.62178 (n) , and the asymptotic expected number of hairpins follows a normal distribution with mean [Formula: see text]. Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194[Symbol: see text]n, a value greater than the asymptotic expected number 0.105573[Symbol: see text]n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a 1,…,a n , for each integer k, we compute that secondary structure S k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica(™) computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/ .
Collapse
Affiliation(s)
- Peter Clote
- Department of Biology, Boston College, Chestnut Hill, MA, 02467, USA,
| | | | | |
Collapse
|
8
|
Combinatorics of locally optimal RNA secondary structures. J Math Biol 2012; 68:341-75. [PMID: 23263300 DOI: 10.1007/s00285-012-0631-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2011] [Revised: 11/19/2012] [Indexed: 01/26/2023]
Abstract
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366∙n-3/2∙2.618034n. Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes -1 towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are 1.07427∙n-3/2∙2.35467n many saturated structures for a sequence of length n. In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes -1 toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).
Collapse
|
9
|
Lorenz WA, Clote P. Computing the partition function for kinetically trapped RNA secondary structures. PLoS One 2011; 6:e16178. [PMID: 21297972 PMCID: PMC3030561 DOI: 10.1371/journal.pone.0016178] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Accepted: 12/15/2010] [Indexed: 12/17/2022] Open
Abstract
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.
Collapse
Affiliation(s)
- William A. Lorenz
- Department of Mathematics and Computer Science, Denison University, Granville, Ohio, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
10
|
Clote P, Kranakis E, Krizanc D, Salvy B. Asymptotics of canonical and saturated RNA secondary structures. J Bioinform Comput Biol 2010; 7:869-93. [PMID: 19785050 DOI: 10.1142/s0219720009004333] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Revised: 05/17/2009] [Accepted: 06/13/2009] [Indexed: 11/18/2022]
Abstract
It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 . n(-3/2) . 2.618034(n). In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures - canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 . n(-3/2) . 1.96798(n) and that the expected number of base pairs in a canonical secondary structure is 0.31724 . n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 . n(-3/2) . 2.35467(n), the asymptotic expected number of base pairs is 0.337361 . n, and the asymptotic number of saturated stem-loop structures is 0.323954 . 1.69562(n), in contrast to the number 2(n - 2) of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 . n.
Collapse
Affiliation(s)
- Peter Clote
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA.
| | | | | | | |
Collapse
|
11
|
Does Hybridization Increase Evolutionary Rate? Data from the 28S-rDNA D8 Domain in Echinoderms. J Mol Evol 2008; 67:539-50. [DOI: 10.1007/s00239-008-9171-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 08/21/2008] [Accepted: 09/22/2008] [Indexed: 11/28/2022]
|
12
|
Smit S, Rother K, Heringa J, Knight R. From knotted to nested RNA structures: a variety of computational methods for pseudoknot removal. RNA (NEW YORK, N.Y.) 2008; 14:410-6. [PMID: 18230758 PMCID: PMC2248259 DOI: 10.1261/rna.881308] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Pseudoknots are abundant in RNA structures. Many computational analyses require pseudoknot-free structures, which means that some of the base pairs in the knotted structure must be disregarded to obtain a nested structure. There is a surprising diversity of methods to perform this pseudoknot removal task, but these methods are often poorly described and studies can therefore be difficult to reproduce (in part, because different procedures may be intuitively obvious to different investigators). Here we provide a variety of algorithms for pseudoknot removal, some of which can incorporate sequence or alignment information in the removal process. We demonstrate that different methods lead to different results, which might affect structure-based analyses. This work thus provides a starting point for discussion of the extent to which these different methods recapture the underlying biological reality. We provide access to reference implementations through a web interface (at http://www.ibi.vu.nl/programs/k2nwww), and the source code is available in the PyCogent project.
Collapse
Affiliation(s)
- Sandra Smit
- Centre for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| | | | | | | |
Collapse
|
13
|
Affiliation(s)
- W.A. Lorenz
- Biology Department, Boston College, Chestnut Hill, MA 02467.
| | - Y. Ponty
- Biology Department, Boston College, Chestnut Hill, MA 02467.
| | - P. Clote
- Biology Department, Boston College, Chestnut Hill, MA 02467.
| |
Collapse
|
14
|
Waldispühl J, Clote P. Computing the Partition Function and Sampling for Saturated Secondary Structures of RNA, with Respect to the Turner Energy Model. J Comput Biol 2007; 14:190-215. [PMID: 17456015 DOI: 10.1089/cmb.2006.0012] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
An RNA secondary structure is saturated if no base pairs can be added without violating the definition of secondary structure. Here we describe a new algorithm, RNAsat, which for a given RNA sequence a, an integral temperature 0 <or= T <or= 100 in degrees Celsius, and for all integers k, computes the Boltzmann partition function Z(k)(T)(a) = SigmaSepsilonSAT(k)(a) exp(-E(S)/RT), where the sum is over all saturated secondary structures of a which have exactly k base pairs, R is the universal gas constant and E(S) denotes the free energy with respect to the Turner nearest neighbor energy model. By dynamic programming, we compute Z(k)(T)simultaneously for all values of k in time O(n(5)) and space O(n(3)).Additionally, RNAsat computes the partition function Q(k)(T)(a) = SigmaSepsilonS(k)(a) exp(-E(S)/RT), where the sum is over all secondary structures of a which have k base pairs; the latter computation is performed simultaneously for all values of k in O(n(4)) time and O(n(3)) space. Lastly, using the partition function Z(k)(T) [resp. Q(k)(T)] with stochastic backtracking, RNAsat rigorously samples the collection of saturated secondary structures [resp. secondary structures] having k base pairs; for Q(k)(T) this provides a parametrized form of Sfold sampling (Ding and Lawrence, 2003). Using RNAsat, (i) we compute the ensemble free energy for saturated secondary structures having k base pairs, (ii) show cooperativity of the Turner model, (iii) demonstrate a temperature-dependent phase transition, (iv) illustrate the predictive advantage of RNAsat for precursor microRNA cel-mir-72 of C. elegans and for the pseudoknot PKB 00152 of Pseudobase (van Batenburg et al., 2001), (v) illustrate the RNA shapes (Giegerich et al., 2004) of sampled secondary structures [resp. saturated structures] having exactly k base pairs. A web server for RNAsat is under construction at bioinformatics.bc.edu/clotelab/RNAsat/.
Collapse
Affiliation(s)
- J Waldispühl
- Department of Biology, Boston College, Chestnut Hill, Massachusetts, USA
| | | |
Collapse
|