1
|
Guo Q, Jin Y, Li M, Sun LH, Xu Y. On the Number of Saturated and Optimal Extended 2-Regular Simple Stacks in the Nussinov-Jacobson Energy Model. J Comput Biol 2022; 29:425-440. [PMID: 35353583 DOI: 10.1089/cmb.2021.0421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
It is known that both RNA secondary structure and protein contact map can be presented using combinatorial diagrams, the combinatorial enumeration and related problems of which have been studied extensively. Motivated by previous enumeration works on saturated RNA secondary structures and extended stack structures of protein contact maps, we are interested in the enumeration problems of saturated and optimal extended stacks in the Nussinov-Jacobson energy model, in which each base pair contributes energy -1. Then optimal structures are those with most arcs, and locally optimal structures are exactly the saturated structures, in which no more arcs can be added without violating the structure definition. For saturated extended 2-regular simple stacks, whose degree configuration is related to the protein fold in two-dimensional honeycomb lattice, we obtain generating function equation and asymptotic formula for its number. Moreover, an explicit formula for the number of optimal extended 2-regular simple stacks is also obtained.
Collapse
Affiliation(s)
- Qianghui Guo
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Yinglie Jin
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Mengqin Li
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Lisa Hui Sun
- Center for Combinatorics, The Key Laboratory of Pure Mathematics and Combinatories of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| | - Yanyan Xu
- School of Mathematical Sciences, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China (LPMC), Nankai University, Tianjin, P.R. China
| |
Collapse
|
2
|
Zhou T, Wang H, Zeng C, Zhao Y. RPocket: an intuitive database of RNA pocket topology information with RNA-ligand data resources. BMC Bioinformatics 2021; 22:428. [PMID: 34496744 PMCID: PMC8424408 DOI: 10.1186/s12859-021-04349-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/27/2021] [Indexed: 11/13/2022] Open
Abstract
Background RNA regulates a variety of biological functions by interacting with other molecules. The ligand often binds in the RNA pocket to trigger structural changes or functions. Thus, it is essential to explore and visualize the RNA pocket to elucidate the structural and recognition mechanism for the RNA-ligand complex formation. Results In this work, we developed one user-friendly bioinformatics tool, RPocket. This database provides geometrical size, centroid, shape, secondary structure element for RNA pocket, RNA-ligand interaction information, and functional sites. We extracted 240 RNA pockets from 94 non-redundant RNA-ligand complex structures. We developed RPDescriptor to calculate the pocket geometrical property quantitatively. The geometrical information was then subjected to RNA-ligand binding analysis by incorporating the sequence, secondary structure, and geometrical combinations. This new approach takes advantage of both the atom-level precision of the structure and the nucleotide-level tertiary interactions. The results show that the higher-level topological pattern indeed improves the tertiary structure prediction. We also proposed a potential mechanism for RNA-ligand complex formation. The electrostatic interactions are responsible for long-range recognition, while the Van der Waals and hydrophobic contacts for short-range binding and optimization. These interaction pairs can be considered as distance constraints to guide complex structural modeling and drug design. Conclusion RPocket database would facilitate RNA-ligand engineering to regulate the complex formation for biological or medical applications. RPocket is available at http://zhaoserver.com.cn/RPocket/RPocket.html. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04349-4.
Collapse
Affiliation(s)
- Ting Zhou
- Department of Physics, Institute of Biophysics, Central China Normal University, Wuhan, 430079, China
| | - Huiwen Wang
- Department of Physics, Institute of Biophysics, Central China Normal University, Wuhan, 430079, China
| | - Chen Zeng
- Department of Physics, George Washington University, Washington, DC, 20052, USA
| | - Yunjie Zhao
- Department of Physics, Institute of Biophysics, Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
3
|
Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data. JOURNAL OF HEALTHCARE ENGINEERING 2021. [DOI: 10.1155/2021/6699996] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.
Collapse
|
4
|
Huang F, Reidys C, Rezazadegan R. Fatgraph models of RNA structure. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2017. [DOI: 10.1515/mlbmb-2017-0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract In this review paper we discuss fatgraphs as a conceptual framework for RNA structures. We discuss various notions of coarse-grained RNA structures and relate them to fatgraphs.We motivate and discuss the main intuition behind the fatgraph model and showcase its applicability to canonical as well as noncanonical base pairs. Recent discoveries regarding novel recursions of pseudoknotted (pk) configurations as well as their translation into context-free grammars for pk-structures are discussed. This is shown to allow for extending the concept of partition functions of sequences w.r.t. a fixed structure having non-crossing arcs to pk-structures. We discuss minimum free energy folding of pk-structures and combine these above results outlining how to obtain an inverse folding algorithm for PK structures.
Collapse
Affiliation(s)
- Fenix Huang
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Christian Reidys
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Reza Rezazadegan
- 2Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, U.S.A., United States of America
| |
Collapse
|
5
|
Regular Simple Queues of Protein Contact Maps. Bull Math Biol 2016; 79:21-35. [PMID: 27844300 DOI: 10.1007/s11538-016-0212-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/21/2016] [Indexed: 10/20/2022]
Abstract
A protein fold can be viewed as a self-avoiding walk in certain lattice model, and its contact map is a graph that represents the patterns of contacts in the fold. Goldman, Istrail, and Papadimitriou showed that a contact map in the 2D square lattice can be decomposed into at most two stacks and one queue. In the terminology of combinatorics, stacks and queues are noncrossing and nonnesting partitions, respectively. In this paper, we are concerned with 2-regular and 3-regular simple queues, for which the degree of each vertex is at most one and the arc lengths are at least 2 and 3, respectively. We show that 2-regular simple queues are in one-to-one correspondence with hill-free Motzkin paths, which have been enumerated by Barcucci, Pergola, Pinzani, and Rinaldi by using the Enumerating Combinatorial Objects method. We derive a recurrence relation for the generating function of Motzkin paths with [Formula: see text] peaks at level i, which reduces to the generating function for hill-free Motzkin paths. Moreover, we show that 3-regular simple queues are in one-to-one correspondence with Motzkin paths avoiding certain patterns. Then we obtain a formula for the generating function of 3-regular simple queues. Asymptotic formulas for 2-regular and 3-regular simple queues are derived based on the generating functions.
Collapse
|
6
|
Abstract
The contact map of a protein fold is a graph that represents the patterns of contacts in the fold. It is known that the contact map can be decomposed into stacks and queues. RNA secondary structures are special stacks in which the degree of each vertex is at most one and each arc has length of at least two. Waterman and Smith derived a formula for the number of RNA secondary structures of length n with exactly k arcs. Höner zu Siederdissen et al. developed a folding algorithm for extended RNA secondary structures in which each vertex has maximum degree two. An equation for the generating function of extended RNA secondary structures was obtained by Müller and Nebel by using a context-free grammar approach, which leads to an asymptotic formula. In this article, we consider m-regular linear stacks, where each arc has length at least m and the degree of each vertex is bounded by two. Extended RNA secondary structures are exactly 2-regular linear stacks. For any m ≥ 2, we obtain an equation for the generating function of the m-regular linear stacks. For given m, we deduce a recurrence relation and an asymptotic formula for the number of m-regular linear stacks on n vertices. To establish the equation, we use the reduction operation of Chen, Deng, and Du to transform an m-regular linear stack to an m-reduced zigzag (or alternating) stack. Then we find an equation for m-reduced zigzag stacks leading to an equation for m-regular linear stacks.
Collapse
Affiliation(s)
- William Y C Chen
- 1 Center for Combinatorics, The Key Laboratory of Pure Mathematics and Combinatorics of Ministry of Education of China and Tianjin Key Laboratory of Combinatorics (LPMC-TJKLC), Nankai University , Tianjin, P.R. China
| | | | | | | |
Collapse
|
7
|
Bhadola P, Deo N. Genus distribution and thermodynamics of a random matrix model of RNA with Penner interaction. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 88:032706. [PMID: 24125293 DOI: 10.1103/physreve.88.032706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 08/05/2013] [Indexed: 06/02/2023]
Abstract
The nonlinear Penner external interaction is introduced and studied in the random matrix model of homo RNA. A numerical technique is developed to study the partition function, and a general formula is obtained for all lengths. The genus distribution function for the system is obtained, plotted, and compared with the genus distribution for the real RNA structures found from the protein databank. The genus distribution shows that the nonlinear interaction favors the formation of low genus structures and matches the result for real RNA structures. The distribution of structure with temperature suggests that nonlinear interaction is biased toward the planar structures. The variation of chemical potential with temperature and interaction strength indicates the presence of additional molecules in the system other than the magnesium ions and possibly represents a phase transition. The specific heat has a bump and its derivatives shows a double-peak behavior at a particular temperature. On analyzing the specific heat and derivatives for each genus separately, the planar structure (genus zero) is shown to contribute the most to the bump and double peak. This observation in the nonlinear model is similar to that observed in the unfolding experiments on RNA.
Collapse
Affiliation(s)
- Pradeep Bhadola
- Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India
| | | |
Collapse
|
8
|
Abstract
Recently, Yoffe and colleagues observed that the average distances between 5'-3' ends of RNA molecules are very small and largely independent of sequence length. This observation is based on numerical computations as well as theoretical arguments maximizing certain entropy functionals. In this article, we compute the exact distribution of 5'-3' distances of RNA secondary structures for any finite n. Furthermore, we compute the limit distribution and show that for n = 30 the exact distribution and the limit distribution are very close. Our results show that the distances of random RNA secondary structures are distinctively lower than those of minimum free energy structures of random RNA sequences.
Collapse
Affiliation(s)
- Hillary S W Han
- Institut for Matematik og Datalogi, University of Southern Denmark, Odense, Denmark
| | | |
Collapse
|
9
|
Nebel ME, Reidys CM, Wang RR. Loops in canonical RNA pseudoknot structures. J Comput Biol 2011; 18:1793-806. [PMID: 21417777 DOI: 10.1089/cmb.2010.0022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this article, we compute the limit distributions of the numbers of hairpin-loops, interior-loops and bulges in k-noncrossing RNA structures. The latter are coarse-grained RNA structures allowing for cross-serial interactions, subject to the constraint that there are at most k - 1 mutually crossing arcs in the diagram representation of the molecule. We prove central limit theorems by means of studying the corresponding bivariate generating functions. These generating functions are obtained by symbolic inflation of [Formula: see text]-shapes introduced by Reidys and Wang (2009).
Collapse
|
10
|
Reidys CM, Huang FWD, Andersen JE, Penner RC, Stadler PF, Nebel ME. Topology and prediction of RNA pseudoknots. Bioinformatics 2011; 27:1076-85. [DOI: 10.1093/bioinformatics/btr090] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
11
|
|
12
|
Gao JZM, Li LYM, Reidys CM. Inverse folding of RNA pseudoknot structures. Algorithms Mol Biol 2010; 5:27. [PMID: 20573197 PMCID: PMC2909241 DOI: 10.1186/1748-7188-5-27] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 06/23/2010] [Indexed: 11/30/2022] Open
Abstract
Background RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and G-U-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, RNAinverse, RNA-SSD as well as INFO-RNA are limited to RNA secondary structures, we present in this paper the inverse folding algorithm Inv which can deal with 3-noncrossing, canonical pseudoknot structures. Results In this paper we present the inverse folding algorithm Inv. We give a detailed analysis of Inv, including pseudocodes. We show that Inv allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures-a class which is difficult to construct via dynamic programming routines. Inv is freely available at http://www.combinatorics.cn/cbpc/inv.html. Conclusions The algorithm Inv extends inverse folding capabilities to RNA pseudoknot structures. In comparison with RNAinverse it uses new ideas, for instance by considering sets of competing structures. As a result, Inv is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.
Collapse
|
13
|
Abstract
In this paper, we introduce a combinatorial framework that provides an interpretation of RNA pseudoknot structures as sampling paths of a Markov process. Our results facilitate a variety of applications ranging from the energy-based sampling of pseudoknot structures as well as the ab initio folding via hidden Markov models. Our main result is an algorithm that generates RNA pseudoknot structures with uniform probability. This algorithm serves as a steppingstone to sequence-specific as well as energy-based transition probabilities. The approach employs a correspondence between pseudoknot structures, parametrized in terms of the maximal number of mutually crossing arcs and certain tableau sequences. The latter can be viewed as lattice paths. The main idea of this paper is to view each such lattice path as a sampling path of a stochastic process and to make use of D-finiteness for the efficient computation of the corresponding transition probabilities.
Collapse
|
14
|
Abstract
In this paper, we study irreducibility in RNA structures. By RNA structure, we mean RNA secondary as well as RNA pseudoknot structures as abstract contact structures. We give an analysis contrasting random and minimum free energy (mfe) configurations and secondary versus pseudoknots structures. In the process, we compute various distributions: the numbers of irreducible substructures and their locations and sizes, parameterized in terms of the maximal number of mutually crossing arcs, k-1, and the minimal size of stacks sigma. In particular, we analyze the size of the largest irreducible substructure for random and mfe structures, which is the key factor for the folding time of mfe configurations. We show that the largest irreducible substructure is typically unique and contains almost all nucleotides.
Collapse
|
15
|
Huang FW, Peng WW, Reidys CM. Folding 3-Noncrossing RNA Pseudoknot Structures. J Comput Biol 2009; 16:1549-75. [DOI: 10.1089/cmb.2008.0194] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
| | - Wade W.J. Peng
- Center for Combinatorics, LPMC-TJKLC, Tianjin, P.R. China
| | - Christian M. Reidys
- Center for Combinatorics, LPMC-TJKLC, Tianjin, P.R. China
- College of Life Sciences, Nankai University, Tianjin, P.R. China
| |
Collapse
|
16
|
Abstract
In this paper we study the distribution of stacks/loops in k-non-crossing, tau-canonical RNA pseudoknot structures (k,tau-structures). Here, an RNA structure is called k-non-crossing if it has no more than k-1 mutually crossing arcs and tau-canonical if each arc is contained in a stack of length at least tau. Based on the ordinary generating function of k,tau-structures [G. Ma, C.M. Reidys, Canonical RNA pseudoknot structures, J. Comput. Biol. 15 (10) (2008) 1257] we derive the bivariate generating function T(k, tau)(x, u) = Sigma(n>or=0)Sigma(0<or=t<or=n/2 T(k,tau)(n, t)u(t)x(n), where T(k,tau)(n, t) is the number of k,tau-structures having exactly t stacks and study its singularities. We show that for a specific parametrization of the variable u, T(k, tau)(x, u) exhibits a unique, dominant singularity. The particular shift of this singularity parametrized by u implies a central limit theorem for the distribution of stack-numbers. Our results are of importance for understanding the 'language' of minimum-free energy RNA pseudoknot structures, generated by computer folding algorithms.
Collapse
Affiliation(s)
- Hillary S W Han
- Center for Combinatorics, LPMC-TJKLC, Nankai University, Tianjin 300071, PR China
| | | |
Collapse
|
17
|
Abstract
Background The analysis of sequence-structure relations of RNA is based on a specific notion and folding of RNA structure. The notion of coarse grained structure employed here is that of canonical RNA pseudoknot contact-structures with at most two mutually crossing bonds (3-noncrossing). These structures are folded by a novel, ab initio prediction algorithm, cross, capable of searching all 3-noncrossing RNA structures. The algorithm outputs the minimum free energy structure. Results After giving some background on RNA pseudoknot structures and providing an outline of the folding algorithm being employed, we present in this paper various, statistical results on the mapping from RNA sequences into 3-noncrossing RNA pseudoknot structures. We study properties, like the fraction of pseudoknot structures, the dominant pseudoknot-shapes, neutral walks, neutral neighbors and local connectivity. We then put our results into context of molecular evolution of RNA. Conclusion Our results imply that, in analogy to RNA secondary structures, 3-noncrossing pseudoknot RNA represents a molecular phenotype that is well suited for molecular and in particular neutral evolution. We can conclude that extended, percolating neutral networks of pseudoknot RNA exist.
Collapse
Affiliation(s)
- Fenix W D Huang
- Center for Combinatorics, LPMC-TJKLC, Nankai University, Tianjin 300071, PR China.
| | | | | |
Collapse
|
18
|
Local Connectivity of Neutral Networks. Bull Math Biol 2008; 71:265-90. [DOI: 10.1007/s11538-008-9356-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Accepted: 09/02/2008] [Indexed: 11/27/2022]
|
19
|
Affiliation(s)
- Gang Ma
- Center for Combinatorics, Nankai University, Tianjin, P.R. China
| | | |
Collapse
|
20
|
Abstract
In this article, we study k-noncrossing RNA structures with minimum arc-length 4 and at most k - 1 mutually crossing bonds. Let T(k)([4])(n) denote the number of k-noncrossing RNA structures with arc-length > or =4 over n vertices. We (a) prove a functional equation for the generating function summation operator(n> or =0) T(k)([4])(n)z(n) and (b) derive for 4 < or = k < or = 9 the asymptotic formula T(k)([4])(n) approximately c(k) n(-((k-1)(2)+(k-1)/2)) gamma(k)(-n). Furthermore, we explicitly compute the exponential growth rates gamma(k)(-1) and asymptotic formulas for 4 < or = k < or = 9.
Collapse
Affiliation(s)
- Hillary S W Han
- Center for Combinatorics, Nankai University, Tianjin, PR China
| | | |
Collapse
|
21
|
Asymptotic Enumeration of RNA Structures with Pseudoknots. Bull Math Biol 2008; 70:951-70. [DOI: 10.1007/s11538-007-9265-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2007] [Accepted: 08/22/2007] [Indexed: 10/22/2022]
|
22
|
Jin EY, Qin J, Reidys CM. Neutral networks of sequence to shape maps. J Theor Biol 2008; 250:484-97. [DOI: 10.1016/j.jtbi.2007.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2007] [Revised: 09/10/2007] [Accepted: 09/10/2007] [Indexed: 10/22/2022]
|
23
|
Jin EY, Reidys CM. Central and local limit theorems for RNA structures. J Theor Biol 2007; 250:547-59. [PMID: 18045620 DOI: 10.1016/j.jtbi.2007.09.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/15/2007] [Accepted: 09/17/2007] [Indexed: 11/25/2022]
Abstract
A k-noncrossing RNA pseudoknot structure is a graph over {1,...,n} without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA structures. In this paper we prove a central and a local limit theorem for the distribution of the number of 3-noncrossing RNA structures over n nucleotides with exactly h bonds. Our analysis employs the generating function of k-noncrossing RNA pseudoknot structures and the asymptotics for the coefficients. The results of this paper explain the findings on the number of arcs of RNA secondary structures obtained by molecular folding algorithms and are of relevance for prediction algorithms of k-noncrossing RNA structures.
Collapse
Affiliation(s)
- Emma Y Jin
- Center for Combinatorics, LPMC-TJKLC, Nankai University, Tianjin 300071, PR China
| | | |
Collapse
|