1
|
Barrett C, Bura A, He Q, Huang F, Reidys C. The arithmetic topology of genetic alignments. J Math Biol 2023; 86:34. [PMID: 36695949 PMCID: PMC9875784 DOI: 10.1007/s00285-023-01868-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 01/03/2023] [Accepted: 01/06/2023] [Indexed: 01/26/2023]
Abstract
We propose a novel mathematical paradigm for the study of genetic variation in sequence alignments. This framework originates from extending the notion of pairwise relations, upon which current analysis is based on, to k-ary dissimilarity. This dissimilarity naturally leads to a generalization of simplicial complexes by endowing simplices with weights, compatible with the boundary operator. We introduce the notion of k-stances and dissimilarity complex, the former encapsulating arithmetic as well as topological structure expressing these k-ary relations. We study basic mathematical properties of dissimilarity complexes and show how this approach captures watershed moments of viral dynamics in the context of SARS-CoV-2 and H1N1 flu genomic data.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA ,Department of Computer Science, University of Virginia, 351 McCormick Road, Charlottesville, VA 22904 USA
| | - Andrei Bura
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Qijun He
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Fenix Huang
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA 22911 USA
| | - Christian Reidys
- Biocomplexity Institute, University of Virginia, 994 Research Park Boulevard, Charlottesville, VA, 22911, USA. .,Department of Mathematics, University of Virginia, 141 Cabell Drive, Charlottesville, VA, 22904, USA.
| |
Collapse
|
2
|
Pevzner P, Vingron M, Reidys C, Sun F, Istrail S. Michael Waterman's Contributions to Computational Biology and Bioinformatics. J Comput Biol 2022; 29:601-615. [PMID: 35727100 DOI: 10.1089/cmb.2022.29066.pp] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
On the occasion of Dr. Michael Waterman's 80th birthday, we review his major contributions to the field of computational biology and bioinformatics including the famous Smith-Waterman algorithm for sequence alignment, the probability and statistics theory related to sequence alignment, algorithms for sequence assembly, the Lander-Waterman model for genome physical mapping, combinatorics and predictions of ribonucleic acid structures, word counting statistics in molecular sequences, alignment-free sequence comparison, and algorithms for haplotype block partition and tagSNP selection related to the International HapMap Project. His books Introduction to Computational Biology: Maps, Sequences and Genomes for graduate students and Computational Genome Analysis: An Introduction geared toward undergraduate students played key roles in computational biology and bioinformatics education. We also highlight his efforts of building the computational biology and bioinformatics community as the founding editor of the Journal of Computational Biology and a founding member of the International Conference on Research in Computational Molecular Biology (RECOMB).
Collapse
Affiliation(s)
- Pavel Pevzner
- Department of Computer Science and Engineering, University of California San Diego, San Diego, California, USA
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Christian Reidys
- Department of Mathematics, Biocomplexity Institute & Initiative, University of Virginia, Charlottesville, Virginia, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Sorin Istrail
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
3
|
Abstract
BACKGROUND The neutral theory of Motoo Kimura stipulates that evolution is mostly driven by neutral mutations. However adaptive pressure eventually leads to changes in phenotype that involve non-neutral mutations. The relation between neutrality and adaptation has been studied in the context of RNA before and here we further study transitional mutations in the context of degenerate (plastic) RNA sequences and genetic assimilation. We propose quasineutral mutations, i.e. mutations which preserve an element of the phenotype set, as minimal mutations and study their properties. We also propose a general probabilistic interpretation of genetic assimilation and specialize it to the Boltzmann ensemble of RNA sequences. RESULTS We show that degenerate sequences i.e. sequences with more than one structure at the MFE level have the highest evolvability among all sequences and are central to evolutionary innovation. Degenerate sequences also tend to cluster together in the sequence space. The selective pressure in an evolutionary simulation causes the population to move towards regions with more degenerate sequences, i.e. regions at the intersection of different neutral networks, and this causes the number of such sequences to increase well beyond the average percentage of degenerate sequences in the sequence space. We also observe that evolution by quasineutral mutations tends to conserve the number of base pairs in structures and thereby maintains structural integrity even in the presence of pressure to the contrary. CONCLUSIONS We conclude that degenerate RNA sequences play a major role in evolutionary adaptation.
Collapse
Affiliation(s)
- Reza Rezazadegan
- University of Virginia Biocomplexity Institute, 995 Research Park Boulevard, Charlottesville, 22911 USA
| | - Christian Reidys
- University of Virginia Biocomplexity Institute, 995 Research Park Boulevard, Charlottesville, 22911 USA
- Department of Mathematics, University of Virginia, 141 Cabell Drive, Charlottesville, 22904 USA
| |
Collapse
|
4
|
Huang F, Reidys C, Rezazadegan R. Fatgraph models of RNA structure. Computational and Mathematical Biophysics 2017. [DOI: 10.1515/mlbmb-2017-0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract In this review paper we discuss fatgraphs as a conceptual framework for RNA structures. We discuss various notions of coarse-grained RNA structures and relate them to fatgraphs.We motivate and discuss the main intuition behind the fatgraph model and showcase its applicability to canonical as well as noncanonical base pairs. Recent discoveries regarding novel recursions of pseudoknotted (pk) configurations as well as their translation into context-free grammars for pk-structures are discussed. This is shown to allow for extending the concept of partition functions of sequences w.r.t. a fixed structure having non-crossing arcs to pk-structures. We discuss minimum free energy folding of pk-structures and combine these above results outlining how to obtain an inverse folding algorithm for PK structures.
Collapse
Affiliation(s)
- Fenix Huang
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Christian Reidys
- 1Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, United States of America
| | - Reza Rezazadegan
- 2Biocomplexity Institute of Virginia Tech, 1015 Life Science Circle, VA 24060, Blacksburg, U.S.A., United States of America
| |
Collapse
|
5
|
Abstract
Folding of RNA sequences into secondary structures is viewed as a map that assigns a uniquely defined base pairing pattern to every sequence. The mapping is non-invertible since many sequences fold into the same minimum free energy (secondary) structure or shape. The pre-images of this map, called neutral networks, are uniquely associated with the shapes and vice versa. Random graph theory is used to construct networks in sequence space which are suitable models for neutral networks. The theory of molecular quasispecies has been applied to replication and mutation on single-peak fitness landscapes. This concept is extended by considering evolution on degenerate multi-peak landscapes which originate from neutral networks by assuming that one particular shape is fitter than all the others. On such a single-shape landscape the superior fitness value is assigned to all sequences belonging to the master shape. All other shapes are lumped together and their fitness values are averaged in a way that is reminiscent of mean field theory. Replication and mutation on neutral networks are modeled by phenomenological rate equations as well as by a stochastic birth-and-death model. In analogy to the error threshold in sequence space the phenotypic error threshold separates two scenarios: (i) a stationary (fittest) master shape surrounded by closely related shapes and (ii) populations drifting through shape space by a diffusion-like process. The error classes of the quasispecies model are replaced by distance classes between the master shape and the other structures. Analytical results are derived for single-shape landscapes, in particular, simple expressions are obtained for the mean fraction of master shapes in a population and for phenotypic error thresholds. The analytical results are complemented by data obtained from computer simulation of the underlying stochastic processes. The predictions of the phenomenological approach on the single-shape landscape are very well reproduced by replication and mutation kinetics of tRNA(phe). Simulation of the stochastic process at a resolution of individual distance classes yields data which are in excellent agreement with the results derived from the birth-and-death model.
Collapse
Affiliation(s)
- C Reidys
- Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | |
Collapse
|
6
|
Abstract
Random graph theory is used to model and analyse the relationships between sequences and secondary structures of RNA molecules, which are understood as mappings from sequence space into shape space. These maps are non-invertible since there are always many orders of magnitude more sequences than structures. Sequences folding into identical structures form neutral networks. A neutral network is embedded in the set of sequences that are compatible with the given structure. Networks are modeled as graphs and constructed by random choice of vertices from the space of compatible sequences. The theory characterizes neutral networks by the mean fraction of neutral neighbors (lambda). The networks are connected and percolate sequence space if the fraction of neutral nearest neighbors exceeds a threshold value (lambda > lambda *). Below threshold (lambda < lambda *), the networks are partitioned into a largest "giant" component and several smaller components. Structures are classified as "common" or "rare" according to the sizes of their pre-images, i.e. according to the fractions of sequences folding into them. The neutral networks of any pair of two different common structures almost touch each other, and, as expressed by the conjecture of shape space covering sequences folding into almost all common structures, can be found in a small ball of an arbitrary location in sequence space. The results from random graph theory are compared to data obtained by folding large samples of RNA sequences. Differences are explained in terms of specific features of RNA molecular structures.
Collapse
Affiliation(s)
- C Reidys
- Santa Fe Institute, NM 87501, USA
| | | | | |
Collapse
|
7
|
Kopp S, Reidys C, Schuster P. Insights into evolution of RNA-structures. ORIGINS LIFE EVOL B 1996. [DOI: 10.1007/bf02459837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Abstract
Shapes of biological macromolecules--RNA, DNA, and proteins--can be represented by abstract algebraic structures provided that a suitably coarse resolution is chosen. These abstract structures, for instance partially ordered sets and permutation groups, can be used for deriving new metric distances between bimolecular shapes and for proving surprising theorems on sequence-structure relations.
Collapse
Affiliation(s)
- C Reidys
- Institut für Molekulare Biotechnologie, Beutenbergstrasse 11, PF 100813, D-07708 Jena, Germany
| | | |
Collapse
|