1
|
Douglas J, Bouckaert R, Harris SC, Carter CW, Wills PR. Evolution is coupled with branching across many granularities of life. Proc Biol Sci 2025; 292:20250182. [PMID: 40425161 DOI: 10.1098/rspb.2025.0182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 03/15/2025] [Accepted: 03/17/2025] [Indexed: 05/29/2025] Open
Abstract
Across many scales of life, the rate of evolutionary change is often accelerated at the time when one lineage splits into two. The emergence of novel protein function can be facilitated by gene duplication (neofunctionalization); rapid morphological change is often accompanied by speciation (punctuated equilibrium); and the establishment of cultural identity is frequently driven by sociopolitical division (schismogenesis). In each case, the changes resist re-homogenization; promoting assortment into distinct lineages that are susceptible to different selective pressures, leading to rapid divergence. The traditional gradualistic view of evolution struggles to detect this phenomenon. We propose a probabilistic framework that constructs phylogenies, tests for saltative branching and improves divergence time estimation by estimating the independent contributions of gradual and abrupt change on each lineage. We provide evidence of saltative branching for proteins (aminoacyl transfer RNA (tRNA) synthetases), animal morphologies (cephalopods) and human languages (Indo-European). These three cases provide unique insights: for aminoacyl-tRNA synthetases, the trees are substantially different from those obtained under gradualist models; we estimate that 99% of cephalopod morphological changes coincided with speciation events; and Indo-European dispersal is estimated to have started around 6000 BCE, corroborating the recently proposed hybrid explanation. Our open-source code is available under a General Public License.
Collapse
Affiliation(s)
| | - Remco Bouckaert
- Computer Science, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Simon C Harris
- Statistics, University of Auckland, Auckland, New Zealand
| | - Charles W Carter
- Biochemistry and Biophysics, University of North Carolina at Chapel Hill, North Carolina, USA
| | - Peter R Wills
- Physics, University of Auckland, Auckland, New Zealand
- Integrative Transcriptomics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Patra SK, Randolph N, Kuhlman B, Dieckhaus H, Betts L, Douglas J, Wills PR, Carter CW. Aminoacyl-tRNA synthetase urzymes optimized by deep learning behave as a quasispecies. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2025; 12:024701. [PMID: 40290414 PMCID: PMC12033045 DOI: 10.1063/4.0000294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Accepted: 03/19/2025] [Indexed: 04/30/2025]
Abstract
Protein design plays a key role in our efforts to work out how genetic coding began. That effort entails urzymes. Urzymes are small, conserved excerpts from full-length aminoacyl-tRNA synthetases that remain active. Urzymes require design to connect disjoint pieces and repair naked nonpolar patches created by removing large domains. Rosetta allowed us to create the first urzymes, but those urzymes were only sparingly soluble. We could measure activity, but it was hard to concentrate those samples to levels required for structural biology. Here, we used the deep learning algorithms ProteinMPNN and AlphaFold2 to redesign a set of optimized LeuAC urzymes derived from leucyl-tRNA synthetase. We select a balanced, representative subset of eight variants for testing using principal component analysis. Most tested variants are much more soluble than the original LeuAC. They also span a range of catalytic proficiency and amino acid specificity. The data enable detailed statistical analyses of the sources of both solubility and specificity. In that way, we show how to begin to unwrap the elements of protein chemistry that were hidden within the neural networks. Deep learning networks have thus helped us surmount several vexing obstacles to further investigations into the nature of ancestral proteins. Finally, we discuss how the eight variants might resemble a sample drawn from a population similar to one subject to natural selection.
Collapse
Affiliation(s)
- Sourav Kumar Patra
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA
| | - Nicholas Randolph
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA
| | | | | | - Laurie Betts
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA
| | - Jordan Douglas
- Department of Physics, University of Auckland, Auckland, New Zealand
| | - Peter R. Wills
- Department of Physics, University of Auckland, Auckland, New Zealand
| | | |
Collapse
|
3
|
Wills PR. Origins of Genetic Coding: Self-Guided Molecular Self-Organisation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1281. [PMID: 37761580 PMCID: PMC10527755 DOI: 10.3390/e25091281] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023]
Abstract
The origin of genetic coding is characterised as an event of cosmic significance in which quantum mechanical causation was transcended by constructive computation. Computational causation entered the physico-chemical processes of the pre-biotic world by the incidental satisfaction of a condition of reflexivity between polymer sequence information and system elements able to facilitate their own production through translation of that information. This event, which has previously been modelled in the dynamics of Gene-Replication-Translation systems, is properly described as a process of self-guided self-organisation. The spontaneous emergence of a primordial genetic code between two-letter alphabets of nucleotide triplets and amino acids is easily possible, starting with random peptide synthesis that is RNA-sequence-dependent. The evident self-organising mechanism is the simultaneous quasi-species bifurcation of the populations of information-carrying genes and enzymes with aminoacyl-tRNA synthetase-like activities. This mechanism allowed the code to evolve very rapidly to the ~20 amino acid limit apparent for the reflexive differentiation of amino acid properties using protein catalysts. The self-organisation of semantics in this domain of physical chemistry conferred on emergent molecular biology exquisite computational control over the nanoscopic events needed for its self-construction.
Collapse
Affiliation(s)
- Peter R Wills
- Department of Physics, University of Auckland, Auckland PB 92019, New Zealand
| |
Collapse
|
4
|
Reflexivity, coding and quantum biology. Biosystems 2019; 185:104027. [PMID: 31494127 DOI: 10.1016/j.biosystems.2019.104027] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 08/29/2019] [Accepted: 08/31/2019] [Indexed: 12/31/2022]
Abstract
Biological systems are fundamentally computational in that they process information in an apparently purposeful fashion rather than just transferring bits of it in a purely syntactical manner. Biological information, such has genetic information stored in DNA sequences, has semantic content. It carries meaning that is defined by the molecular context of its cellular environment. Information processing in biological systems displays an inherent reflexivity, a tendency for the computational information-processing to be "about" the behaviour of the molecules that participate in the computational process. This is most evident in the operation of the genetic code, where the specificity of the reactions catalysed by the aminoacyl-tRNA synthetase (aaRS) enzymes is required to be self-sustaining. A cell's suite of aaRS enzymes completes a reflexively autocatalytic set of molecular components capable of making themselves through the operation of the code. This set requires the existence of a body of reflexive information to be stored in an organism's genome. The genetic code is a reflexively self-organised mapping of the chemical properties of amino acid sidechains onto codon "tokens". It is a highly evolved symbolic system of chemical self-description. Although molecular biological coding is generally portrayed in terms of classical bit-transfer events, various biochemical events explicitly require quantum coherence for their occurrence. Whether the implicit transfer of quantum information, qbits, is indicative of wide-ranging quantum computation in living systems is currently the subject of extensive investigation and speculation in the field of Quantum Biology.
Collapse
|
5
|
Carter CW, Wills PR. Interdependence, Reflexivity, Fidelity, Impedance Matching, and the Evolution of Genetic Coding. Mol Biol Evol 2018; 35:269-286. [PMID: 29077934 PMCID: PMC5850816 DOI: 10.1093/molbev/msx265] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Genetic coding is generally thought to have required ribozymes whose functions were taken over by polypeptide aminoacyl-tRNA synthetases (aaRS). Two discoveries about aaRS and their interactions with tRNA substrates now furnish a unifying rationale for the opposite conclusion: that the key processes of the Central Dogma of molecular biology emerged simultaneously and naturally from simple origins in a peptide•RNA partnership, eliminating the epistemological utility of a prior RNA world. First, the two aaRS classes likely arose from opposite strands of the same ancestral gene, implying a simple genetic alphabet. The resulting inversion symmetries in aaRS structural biology would have stabilized the initial and subsequent differentiation of coding specificities, rapidly promoting diversity in the proteome. Second, amino acid physical chemistry maps onto tRNA identity elements, establishing reflexive, nanoenvironmental sensing in protein aaRS. Bootstrapping of increasingly detailed coding is thus intrinsic to polypeptide aaRS, but impossible in an RNA world. These notions underline the following concepts that contradict gradual replacement of ribozymal aaRS by polypeptide aaRS: 1) aaRS enzymes must be interdependent; 2) reflexivity intrinsic to polypeptide aaRS production dynamics promotes bootstrapping; 3) takeover of RNA-catalyzed aminoacylation by enzymes will necessarily degrade specificity; and 4) the Central Dogma's emergence is most probable when replication and translation error rates remain comparable. These characteristics are necessary and sufficient for the essentially de novo emergence of a coupled gene-replicase-translatase system of genetic coding that would have continuously preserved the functional meaning of genetically encoded protein genes whose phylogenetic relationships match those observed today.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Peter R Wills
- Department of Physics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
6
|
Wills PR, Carter CW. Insuperable problems of the genetic code initially emerging in an RNA world. Biosystems 2018; 164:155-166. [PMID: 28903058 PMCID: PMC5895081 DOI: 10.1016/j.biosystems.2017.09.006] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 09/04/2017] [Accepted: 09/07/2017] [Indexed: 11/23/2022]
Abstract
Differential equations for error-prone information transfer (template replication, transcription or translation) are developed in order to consider, within the theory of autocatalysis, the advent of coded protein synthesis. Variations of these equations furnish a basis for comparing the plausibility of contrasting scenarios for the emergence of specific tRNA aminoacylation, ultimately by enzymes, and the relationship of this process with the origin of the universal system of molecular biological information processing embodied in the Central Dogma. The hypothetical RNA World does not furnish an adequate basis for explaining how this system came into being, but principles of self-organisation that transcend Darwinian natural selection furnish an unexpectedly robust basis for a rapid, concerted transition to genetic coding from a peptide·RNA world.
Collapse
Affiliation(s)
- Peter R Wills
- Department of Physics, University of Auckland, PB 92109, Auckland 1142, New Zealand; Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
| | - Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
7
|
Wills PR, Nieselt K, McCaskill JS. Emergence of coding and its specificity as a physico-informatic problem. ORIGINS LIFE EVOL B 2015; 45:249-55. [PMID: 25813662 DOI: 10.1007/s11084-015-9434-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We explore the origin-of-life consequences of the view that biological systems are demarcated from inanimate matter by their possession of referential information, which is processed computationally to control choices of specific physico-chemical events. Cells are cybernetic: they use genetic information in processes of communication and control, subjecting physical events to a system of integrated governance. The genetic code is the most obvious example of how cells use information computationally, but the historical origin of the usefulness of molecular information is not well understood. Genetic coding made information useful because it imposed a modular metric on the evolutionary search and thereby offered a general solution to the problem of finding catalysts of any specificity. We use the term "quasispecies symmetry breaking" to describe the iterated process of self-organisation whereby the alphabets of distinguishable codons and amino acids increased, step by step.
Collapse
Affiliation(s)
- Peter R Wills
- Department of Physics, University of Auckland, PB 92019, Auckland, 1142, New Zealand,
| | | | | |
Collapse
|
8
|
Wills PR. Spontaneous mutual ordering of nucleic acids and proteins. ORIGINS LIFE EVOL B 2014; 44:293-8. [PMID: 25585807 DOI: 10.1007/s11084-014-9396-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 11/04/2014] [Indexed: 11/29/2022]
Abstract
It is proposed that the prebiotic ordering of nucleic acid and peptide sequences was a cooperative process in which nearly random populations of both kinds of polymers went through a codependent series of self-organisation events that simultaneously refined not only the accuracy of genetic replication and coding but also the functional specificity of protein catalysts, especially nascent aminoacyl-tRNA synthetase "urzymes".
Collapse
Affiliation(s)
- Peter R Wills
- Department of Physics, University of Auckland, PB 92019, Auckland, 1142, New Zealand,
| |
Collapse
|
9
|
Wills PR. Informed Generation: Physical origin and biological evolution of genetic codescript interpreters. J Theor Biol 2009; 257:345-58. [DOI: 10.1016/j.jtbi.2008.12.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Revised: 11/18/2008] [Accepted: 12/17/2008] [Indexed: 11/26/2022]
|
10
|
Salzberg C. From machine and tape to structure and function: formulation of a reflexively computing system. ARTIFICIAL LIFE 2006; 12:487-512. [PMID: 16953782 DOI: 10.1162/artl.2006.12.4.487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The relationship between structure and function is explored via a system of labeled directed graph structures upon which a single elementary read/write rule is applied locally. Boundaries between static (information-carrying) and active (information-processing) objects, imposed by mandate of the rules or physics in earlier models, emerge instead as a result of a structure-function dynamic that is reflexive: objects may operate directly on their own structure. A representation of an arbitrary Turing machine is reproduced in terms of structural constraints by means of a simple mapping from tape squares and machine states to a uniform medium of nodes and links, establishing computation universality. Exploiting flexibility of the formulation, examples of other unconventional "self-computing" structures are demonstrated. A straightforward representation of a kinematic machine system based on the model devised by Laing is also reproduced in detail. Implications of the findings are discussed in terms of their relation to other formal models of computation and construction. It is argued that reflexivity of the structure-function relationship is a critical informational dynamic in biochemical systems, overlooked in previous models but well captured by the proposed formulation.
Collapse
Affiliation(s)
- Chris Salzberg
- Ikegami Lab, Department of General Systems Studies, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan.
| |
Collapse
|
11
|
Abstract
Autocatalytic self-construction in macromolecular systems requires the existence of a reflexive relationship between structural components and the functional operations they perform to synthesise themselves. The possibility of reflexivity depends on formal, semiotic features of the catalytic structure-function relationship, that is, the embedding of catalytic functions in the space of polymeric structures. Reflexivity is a semiotic property of some genetic sequences. Such sequences may serve as the basis for the evolution of coding as a result of autocatalytic self-organisation in a population of assignment catalysts. Autocatalytic selection is a mechanism whereby matter becomes differentiated in primitive biochemical systems. In the case of coding self-organisation, it corresponds to the creation of symbolic information. Prions are present-day entities whose replication through autocatalysis reflects aspects of biological semiotics less obvious than genetic coding.
Collapse
Affiliation(s)
- P R Wills
- Department of Physics, University of Auckland, Private Bag 92019, Auckland, New Zealand.
| |
Collapse
|