1
|
Carter CW. Simultaneous codon usage, the origin of the proteome, and the emergence of de-novo proteins. Curr Opin Struct Biol 2021; 68:142-148. [PMID: 33529785 DOI: 10.1016/j.sbi.2021.01.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022]
Abstract
Genetic coding generally uses only one of a gene's two strands; its complement serving as template for replication. Aminoacyl-tRNA synthetases, aaRS, apparently first emerged as pairs on bidirectional genes, in which anticodons in the template strand served as codons for an entirely different protein. Interpreting both strands in frame constrained such genes sufficiently that it was rapidly superseded, leaving only traces in the elevated pairing between codon middle bases in antiparallel alignments. Codon assignments actually promote using information from both strands in multiple reading frames. Related phenomena, known as overprinting, are widely associated with viruses. In-frame bidirectional coding and overprinting nevertheless imply different structural and functional relationships, and different roles in generating folded proteins throughout the evolution of the proteome.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry, Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, United States.
| |
Collapse
|
2
|
Carter CW. Coding of Class I and II Aminoacyl-tRNA Synthetases. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 966:103-148. [PMID: 28828732 PMCID: PMC5927602 DOI: 10.1007/5584_2017_93] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The aminoacyl-tRNA synthetases and their cognate transfer RNAs translate the universal genetic code. The twenty canonical amino acids are sufficiently diverse to create a selective advantage for dividing amino acid activation between two distinct, apparently unrelated superfamilies of synthetases, Class I amino acids being generally larger and less polar, Class II amino acids smaller and more polar. Biochemical, bioinformatic, and protein engineering experiments support the hypothesis that the two Classes descended from opposite strands of the same ancestral gene. Parallel experimental deconstructions of Class I and II synthetases reveal parallel losses in catalytic proficiency at two novel modular levels-protozymes and Urzymes-associated with the evolution of catalytic activity. Bi-directional coding supports an important unification of the proteome; affords a genetic relatedness metric-middle base-pairing frequencies in sense/antisense alignments-that probes more deeply into the evolutionary history of translation than do single multiple sequence alignments; and has facilitated the analysis of hitherto unknown coding relationships in tRNA sequences. Reconstruction of native synthetases by modular thermodynamic cycles facilitated by domain engineering emphasizes the subtlety associated with achieving high specificity, shedding new light on allosteric relationships in contemporary synthetases. Synthetase Urzyme structural biology suggests that they are catalytically-active molten globules, broadening the potential manifold of polypeptide catalysts accessible to primitive genetic coding and motivating revisions of the origins of catalysis. Finally, bi-directional genetic coding of some of the oldest genes in the proteome places major limitations on the likelihood that any RNA World preceded the origins of coded proteins.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7260, USA.
| |
Collapse
|
3
|
Caetano-Anollés D, Caetano-Anollés G. Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks. Life (Basel) 2016; 6:life6040043. [PMID: 27918435 PMCID: PMC5198078 DOI: 10.3390/life6040043] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 11/21/2016] [Accepted: 11/29/2016] [Indexed: 01/10/2023] Open
Abstract
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates.
Collapse
Affiliation(s)
- Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, 24306 Plön, Germany.
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
4
|
Cheregi O, Vermaas W, Funk C. The search for new chlorophyll-binding proteins in the cyanobacterium Synechocystis sp. PCC 6803. J Biotechnol 2012; 162:124-33. [PMID: 22759916 DOI: 10.1016/j.jbiotec.2012.06.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 06/21/2012] [Accepted: 06/25/2012] [Indexed: 01/24/2023]
Abstract
Light harvesting provides a major challenge in the production of biofuels from microorganisms; while sunlight provides the energy necessary for biomass/biofuel production, at the same time it damages the cells. The genome of Synechocystis sp. PCC 6803 was searched for open reading frames that might code for yet unidentified chlorophyll-binding proteins with low molecular mass that could be involved in stress-adaptation. Amongst 9167 hypothetical ORFs corresponding to potential polypeptides of 100 amino acids or less, two were identified that had the potential to be pigment-binding, because they (i) encoded a potential transmembrane region, (ii) showed sequence similarity with known chlorophyll-binding domains, (iii) were conserved in other cyanobacterial species, and (iv) their codon adaptation index indicated significant translation probability. The two ORFs were located complementary (antisense) and internal to the ferrochelatase (hemH) and the pyruvate dehydrogenase (pdh) genes and therefore were named a-fch and a-pdh, respectively. Transcription of both genes was confirmed; however, no translated proteins could be detected immunologically. Whereas mutations within a-pdh or a-fch did not lead to any obvious phenotype, it is clear that transcripts and proteins over and above the currently known set may play a role in defining the physiology of cyanobacteria and other organisms.
Collapse
Affiliation(s)
- Otilia Cheregi
- Department of Chemistry, Umeå University, SE 90187 Umeå, Sweden.
| | | | | |
Collapse
|
5
|
The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis. J Mol Evol 2012; 74:1-34. [PMID: 22210458 DOI: 10.1007/s00239-011-9480-1] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Accepted: 12/12/2011] [Indexed: 12/20/2022]
Abstract
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Collapse
|
6
|
Huether R, Liu ZJ, Xu H, Wang BC, Pletnev VZ, Mao Q, Duax WL, Umland TC. Sequence fingerprint and structural analysis of the SCOR enzyme A3DFK9 from Clostridium thermocellum. Proteins 2010; 78:603-13. [PMID: 19774618 DOI: 10.1002/prot.22584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have identified a highly conserved fingerprint of 40 residues in the TGYK subfamily of the short-chain oxidoreductase enzymes. The TGYK subfamily is defined by the presence of an N-terminal TGxxxGxG motif and a catalytic YxxxK motif. This subfamily contains more than 12,000 members, with individual members displaying unique substrate specificities. The 40 fingerprint residues are critical to catalysis, cofactor binding, protein folding, and oligomerization but are substrate independent. Their conservation provides critical insight into evolution of the folding and function of TGYK enzymes. Substrate specificity is determined by distinct combinations of residues in three flexible loops that make up the substrate-binding pocket. Here, we report the structure determinations of the TGYK enzyme A3DFK9 from Clostridium thermocellum in its apo form and with bound NAD(+) cofactor. The function of this protein is unknown, but our analysis of the substrate-binding loops putatively identifies A3DFK9 as a carbohydrate or polyalcohol metabolizing enzyme. C. thermocellum has potential commercial applications because of its ability to convert biomaterial into ethanol. A3DFK9 contains 31 of the 40 TGYK subfamily fingerprint residues. The most significant variations are the substitution of a cysteine (Cys84) for a highly conserved glycine within a characteristic VNNAG motif, and the substitution of a glycine (Gly106) for a highly conserved asparagine residue at a helical kink. Both of these variations occur at positions typically participating in the formation of a catalytically important proton transfer network. An alternate means of stabilizing this proton wire was observed in the A3DFK9 crystal structures.
Collapse
Affiliation(s)
- Robert Huether
- Department of Structural Biology, SUNY at Buffalo, Buffalo, New York 14203, USA
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Rodin AS, Rodin SN, Carter CW. On primordial sense-antisense coding. J Mol Evol 2009; 69:555-67. [PMID: 19956936 PMCID: PMC2853367 DOI: 10.1007/s00239-009-9288-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 09/18/2009] [Indexed: 11/29/2022]
Abstract
The genetic code is implemented by aminoacyl-tRNA synthetases (aaRS). These 20 enzymes are divided into two classes that, despite performing same functions, have nothing common in structure. The mystery of this striking partition of aaRSs might have been concealed in their sterically complementary modes of tRNA recognition that, as we have found recently, protect the tRNAs with complementary anticodons from confusion in translation. This finding implies that, in the beginning, life increased its coding repertoire by the pairs of complementary codons (rather than one-by-one) and used both complementary strands of genes as templates for translation. The class I and class II aaRSs may represent one of the most important examples of such primordial sense-antisense (SAS) coding (Rodin and Ohno, Orig Life Evol Biosph 25:565-589, 1995). In this report, we address the issue of SAS coding in a wider scope. We suggest a variety of advantages that such coding would have had in exploring a wider sequence space before translation became highly specific. In particular, we confirm that in Achlya klebsiana a single gene might have originally coded for an HSP70 chaperonin (class II aaRS homolog) and an NAD-specific GDH-like enzyme (class I aaRS homolog) via its sense and antisense strands. Thus, in contrast to the conclusions in Williams et al. (Mol Biol Evol 26:445-450, 2009), this could indeed be a "Rosetta stone" gene (Carter and Duax, Mol Cell 10:705-708, 2002) (eroded somewhat, though) for the SAS origin of the two aaRS classes.
Collapse
Affiliation(s)
- Andrei S Rodin
- Human Genetics Center, School of Public Health, University of Texas, Houston, TX 77225, USA.
| | | | | |
Collapse
|
8
|
|
9
|
Biro JC. The Proteomic Code: a molecular recognition code for proteins. Theor Biol Med Model 2007; 4:45. [PMID: 17999762 PMCID: PMC2206014 DOI: 10.1186/1742-4682-4-45] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2007] [Accepted: 11/13/2007] [Indexed: 11/30/2022] Open
Abstract
Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.
Collapse
Affiliation(s)
- Jan C Biro
- Homulus Foundation, 88 Howard, #1205, San Francisco, CA 94105, USA.
| |
Collapse
|
10
|
Pham Y, Li L, Kim A, Erdogan O, Weinreb V, Butterfoss GL, Kuhlman B, Carter CW. A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and II aminoacyl-tRNA synthetases. Mol Cell 2007; 25:851-62. [PMID: 17386262 DOI: 10.1016/j.molcel.2007.02.010] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2006] [Revised: 01/03/2007] [Accepted: 02/05/2007] [Indexed: 10/23/2022]
Abstract
The emergence of polypeptide catalysts for amino acid activation, the slowest step in protein synthesis, poses a significant puzzle associated with the origin of biology. This problem is compounded as the 20 contemporary aminoacyl-tRNA synthetases belong to two quite distinct families. We describe here the use of protein design to show experimentally that a minimal class I aminoacyl-tRNA synthetase active site might have functioned in the distant past. We deleted the anticodon binding domain from tryptophanyl-tRNA synthetase and fused the discontinuous segments comprising its active site. The resulting 130 residue minimal catalytic domain activates tryptophan. This residual catalytic activity constitutes the first experimental evidence that the conserved class I signature sequences, HIGH and KMSKS, might have arisen in-frame, opposite motifs 2 and 1 from class II, as complementary sense and antisense strands of the same ancestral gene.
Collapse
Affiliation(s)
- Yen Pham
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
From recent developments of the early evolution theory it follows that the earliest mRNAs were short ( approximately 20 nt) (G+C)-rich polynucleotides. These short sequences could form hairpins, which would be of high evolutionary advantage because of stability and uniqueness of their conformations. Due to mutations accumulated during billions of years of evolution, the speculated earliest hairpins would largely lose the initial complementarities. Some of the original complementary base-to-base contacts, however, may have survived. Computational analysis of modern prokaryotic mRNA sequences reveals excess population of the expected short range complementarities. The derived earliest mRNA hairpin size fully corresponds to the predicted size of ancient coding duplexes. The repertoire of the surviving hairpins traced in modern mRNA confirms duplex structure of the earliest mRNA, suggested by the early molecular evolution theory.
Collapse
Affiliation(s)
- Idan Gabdank
- Department of Computer Science, Ben Gurion University of the Negev, P.O.B 653, Be'er Sheva 84105, Israel.
| | | | | |
Collapse
|