1
|
McConnell BS, Parker MW. Protein intrinsically disordered regions have a non-random, modular architecture. Bioinformatics 2023; 39:btad732. [PMID: 38039154 PMCID: PMC10719218 DOI: 10.1093/bioinformatics/btad732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 11/03/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION Protein sequences can be broadly categorized into two classes: those which adopt stable secondary structure and fold into a domain (i.e. globular proteins), and those that do not. The sequences belonging to this latter class are conformationally heterogeneous and are described as being intrinsically disordered. Decades of investigation into the structure and function of globular proteins has resulted in a suite of computational tools that enable their sub-classification by domain type, an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is unknown if sequences of disordered protein regions are subject to broadly generalizable organizational principles that would enable their sub-classification. RESULTS Here, we report the development of a statistical approach that quantifies linear variance in amino acid composition across a sequence. With multiple examples, we provide evidence that intrinsically disordered regions are organized into statistically non-random modules of unique compositional bias. Modularity is observed for both low and high-complexity sequences and, in some cases, we find that modules are organized in repetitive patterns. These data demonstrate that disordered sequences are non-randomly organized into modular architectures and motivate future experiments to comprehensively classify module types and to determine the degree to which modules constitute functionally separable units analogous to the domains of globular proteins. AVAILABILITY AND IMPLEMENTATION The source code, documentation, and data to reproduce all figures are freely available at https://github.com/MWPlabUTSW/Chi-Score-Analysis.git. The analysis is also available as a Google Colab Notebook (https://colab.research.google.com/github/MWPlabUTSW/Chi-Score-Analysis/blob/main/ChiScore_Analysis.ipynb).
Collapse
Affiliation(s)
- Brendan S McConnell
- Department of Biophysics, , University of Texas Southwestern Medical Center, Dallas, TX 75235, United States
| | - Matthew W Parker
- Department of Biophysics, , University of Texas Southwestern Medical Center, Dallas, TX 75235, United States
| |
Collapse
|
2
|
Nevers Y, Glover NM, Dessimoz C, Lecompte O. Protein length distribution is remarkably uniform across the tree of life. Genome Biol 2023; 24:135. [PMID: 37291671 PMCID: PMC10251718 DOI: 10.1186/s13059-023-02973-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/16/2023] [Indexed: 06/10/2023] Open
Abstract
BACKGROUND In every living species, the function of a protein depends on its organization of structural domains, and the length of a protein is a direct reflection of this. Because every species evolved under different evolutionary pressures, the protein length distribution, much like other genomic features, is expected to vary across species but has so far been scarcely studied. RESULTS Here we evaluate this diversity by comparing protein length distribution across 2326 species (1688 bacteria, 153 archaea, and 485 eukaryotes). We find that proteins tend to be on average slightly longer in eukaryotes than in bacteria or archaea, but that the variation of length distribution across species is low, especially compared to the variation of other genomic features (genome size, number of proteins, gene length, GC content, isoelectric points of proteins). Moreover, most cases of atypical protein length distribution appear to be due to artifactual gene annotation, suggesting the actual variation of protein length distribution across species is even smaller. CONCLUSIONS These results open the way for developing a genome annotation quality metric based on protein length distribution to complement conventional quality measures. Overall, our findings show that protein length distribution between living species is more uniform than previously thought. Furthermore, we also provide evidence for a universal selection on protein length, yet its mechanism and fitness effect remain intriguing open questions.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| | - Natasha M Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute for Bioinformatics, University of Lausanne, Lausanne, Switzerland
- Department of Computer Science, University College London, London, UK
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Odile Lecompte
- Department of Computer Science, Centre de Recherche en Biomédecine de Strasbourg, ICube, UMR 7357, University of Strasbourg, CNRS, Strasbourg, France
| |
Collapse
|
3
|
Virtual 2D map of cyanobacterial proteomes. PLoS One 2022; 17:e0275148. [PMID: 36190972 PMCID: PMC9529120 DOI: 10.1371/journal.pone.0275148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 09/12/2022] [Indexed: 11/05/2022] Open
Abstract
Cyanobacteria are prokaryotic Gram-negative organisms prevalent in nearly all habitats. A detailed proteomics study of Cyanobacteria has not been conducted despite extensive study of their genome sequences. Therefore, we conducted a proteome-wide analysis of the Cyanobacteria proteome and found Calothrix desertica as the largest (680331.825 kDa) and Candidatus synechococcus spongiarum as the smallest (42726.77 kDa) proteome of the cyanobacterial kingdom. A Cyanobacterial proteome encodes 312.018 amino acids per protein, with a molecular weight of 182173.1324 kDa per proteome. The isoelectric point (pI) of the Cyanobacterial proteome ranges from 2.13 to 13.32. It was found that the Cyanobacterial proteome encodes a greater number of acidic-pI proteins, and their average pI is 6.437. The proteins with higher pI are likely to contain repetitive amino acids. A virtual 2D map of Cyanobacterial proteome showed a bimodal distribution of molecular weight and pI. Several proteins within the Cyanobacterial proteome were found to encode Selenocysteine (Sec) amino acid, while Pyrrolysine amino acids were not detected. The study can enable us to generate a high-resolution cell map to monitor proteomic dynamics. Through this computational analysis, we can gain a better understanding of the bias in codon usage by analyzing the amino acid composition of the Cyanobacterial proteome.
Collapse
|
4
|
Tretyachenko V, Vymětal J, Neuwirthová T, Vondrášek J, Fujishima K, Hlouchová K. Modern and prebiotic amino acids support distinct structural profiles in proteins. Open Biol 2022; 12:220040. [PMID: 35728622 PMCID: PMC9213115 DOI: 10.1098/rsob.220040] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.
Collapse
Affiliation(s)
- Vyacheslav Tretyachenko
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic,Department of Biochemistry, Faculty of Science, Charles University, Prague 12843, Czech Republic
| | - Jiří Vymětal
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Tereza Neuwirthová
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan,Graduate School of Media and Governance, Keio University, Fujisawa 2520882 Japan
| | - Klára Hlouchová
- Department of Cell Biology, Faculty of Science, Charles University, Prague 12843, Czech Republic,Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
5
|
Fried SD, Fujishima K, Makarov M, Cherepashuk I, Hlouchova K. Peptides before and during the nucleotide world: an origins story emphasizing cooperation between proteins and nucleic acids. J R Soc Interface 2022; 19:20210641. [PMID: 35135297 PMCID: PMC8833103 DOI: 10.1098/rsif.2021.0641] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 01/05/2022] [Indexed: 12/14/2022] Open
Abstract
Recent developments in Origins of Life research have focused on substantiating the narrative of an abiotic emergence of nucleic acids from organic molecules of low molecular weight, a paradigm that typically sidelines the roles of peptides. Nevertheless, the simple synthesis of amino acids, the facile nature of their activation and condensation, their ability to recognize metals and cofactors and their remarkable capacity to self-assemble make peptides (and their analogues) favourable candidates for one of the earliest functional polymers. In this mini-review, we explore the ramifications of this hypothesis. Diverse lines of research in molecular biology, bioinformatics, geochemistry, biophysics and astrobiology provide clues about the progression and early evolution of proteins, and lend credence to the idea that early peptides served many central prebiotic roles before they were encodable by a polynucleotide template, in a putative 'peptide-polynucleotide stage'. For example, early peptides and mini-proteins could have served as catalysts, compartments and structural hubs. In sum, we shed light on the role of early peptides and small proteins before and during the nucleotide world, in which nascent life fully grasped the potential of primordial proteins, and which has left an imprint on the idiosyncratic properties of extant proteins.
Collapse
Affiliation(s)
- Stephen D. Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, MD 21212, USA
- Department of Biophysics, Johns Hopkins University, Baltimore, MD 21212, USA
| | - Kosuke Fujishima
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 1528550, Japan
- Graduate School of Media and Governance, Keio University, Fujisawa 2520882, Japan
| | - Mikhail Makarov
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Ivan Cherepashuk
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
| | - Klara Hlouchova
- Department of Cell Biology, Faculty of Science, Charles University, BIOCEV, Prague 12800, Czech Republic
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
6
|
Wang G, Sun S, Zhang Z. Randomness in Sequence Evolution Increases over Time. PLoS One 2016; 11:e0155935. [PMID: 27224236 PMCID: PMC4880282 DOI: 10.1371/journal.pone.0155935] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 05/06/2016] [Indexed: 12/02/2022] Open
Abstract
The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.
Collapse
Affiliation(s)
- Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shixiang Sun
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
7
|
tRNA acceptor stem and anticodon bases form independent codes related to protein folding. Proc Natl Acad Sci U S A 2015; 112:7489-94. [PMID: 26034281 DOI: 10.1073/pnas.1507569112] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Aminoacyl-tRNA synthetases recognize tRNA anticodon and 3' acceptor stem bases. Synthetase Urzymes acylate cognate tRNAs even without anticodon-binding domains, in keeping with the possibility that acceptor stem recognition preceded anticodon recognition. Representing tRNA identity elements with two bits per base, we show that the anticodon encodes the hydrophobicity of each amino acid side-chain as represented by its water-to-cyclohexane distribution coefficient, and this relationship holds true over the entire temperature range of liquid water. The acceptor stem codes preferentially for the surface area or size of each side-chain, as represented by its vapor-to-cyclohexane distribution coefficient. These orthogonal experimental properties are both necessary to account satisfactorily for the exposed surface area of amino acids in folded proteins. Moreover, the acceptor stem codes correctly for β-branched and carboxylic acid side-chains, whereas the anticodon codes for a wider range of such properties, but not for size or β-branching. These and other results suggest that genetic coding of 3D protein structures evolved in distinct stages, based initially on the size of the amino acid and later on its compatibility with globular folding in water.
Collapse
|
8
|
Prokunina-Olsson L, Muchmore B, Tang W, Pfeiffer RM, Park H, Dickensheets H, Hergott D, Porter-Gill P, Mumy A, Kohaar I, Chen S, Brand N, Tarway M, Liu L, Sheikh F, Astemborski J, Bonkovsky HL, Edlin BR, Howell CD, Morgan TR, Thomas DL, Rehermann B, Donnelly RP, O'Brien TR. A variant upstream of IFNL3 (IL28B) creating a new interferon gene IFNL4 is associated with impaired clearance of hepatitis C virus. Nat Genet 2013; 45:164-71. [PMID: 23291588 DOI: 10.1038/ng.2521] [Citation(s) in RCA: 755] [Impact Index Per Article: 62.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 12/07/2012] [Indexed: 02/06/2023]
Abstract
Chronic infection with hepatitis C virus (HCV) is a common cause of liver cirrhosis and cancer. We performed RNA sequencing in primary human hepatocytes activated with synthetic double-stranded RNA to mimic HCV infection. Upstream of IFNL3 (IL28B) on chromosome 19q13.13, we discovered a new transiently induced region that harbors a dinucleotide variant ss469415590 (TT or ΔG), which is in high linkage disequilibrium with rs12979860, a genetic marker strongly associated with HCV clearance. ss469415590[ΔG] is a frameshift variant that creates a novel gene, designated IFNL4, encoding the interferon-λ4 protein (IFNL4), which is moderately similar to IFNL3. Compared to rs12979860, ss469415590 is more strongly associated with HCV clearance in individuals of African ancestry, although it provides comparable information in Europeans and Asians. Transient overexpression of IFNL4 in a hepatoma cell line induced STAT1 and STAT2 phosphorylation and the expression of interferon-stimulated genes. Our findings provide new insights into the genetic regulation of HCV clearance and its clinical management.
Collapse
Affiliation(s)
- Ludmila Prokunina-Olsson
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, US National Institutes of Health, Bethesda, Maryland, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 2012; 5:85. [PMID: 22296664 PMCID: PMC3296660 DOI: 10.1186/1756-0500-5-85] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 02/01/2012] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The sizes of proteins are relevant to their biochemical structure and for their biological function. The statistical distribution of protein lengths across a diverse set of taxa can provide hints about the evolution of proteomes. RESULTS Using the full genomic sequences of over 1,302 prokaryotic and 140 eukaryotic species two datasets containing 1.2 and 6.1 million proteins were generated and analyzed statistically. The lengthwise distribution of proteins can be roughly described with a gamma type or log-normal model, depending on the species. However the shape parameter of the gamma model has not a fixed value of 2, as previously suggested, but varies between 1.5 and 3 in different species. A gamma model with unrestricted shape parameter described best the distributions in ~48% of the species, whereas the log-normal distribution described better the observed protein sizes in 42% of the species. The gamma restricted function and the sum of exponentials distribution had a better fitting in only ~5% of the species. Eukaryotic proteins have an average size of 472 aa, whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average). Average protein sizes in different phylogenetic groups were: Alveolata (628 aa), Amoebozoa (533 aa), Fornicata (543 aa), Placozoa (453 aa), Eumetazoa (486 aa), Fungi (487 aa), Stramenopila (486 aa), Viridiplantae (392 aa). Amino acid composition is biased according to protein size. Protein length correlated negatively with %C, %M, %K, %F, %R, %W, %Y and positively with %D, %E, %Q, %S and %T. Prokaryotic proteins had a different protein size bias for %E, %G, %K and %M as compared to eukaryotes. CONCLUSIONS Mathematical modeling of protein length empirical distributions can be used to asses the quality of small ORFs annotation in genomic releases (detection of too many false positive small ORFs). There is a negative correlation between average protein size and total number of proteins among eukaryotes but not in prokaryotes. The %GC content is positively correlated to total protein number and protein size in prokaryotes but not in eukaryotes. Small proteins have a different amino acid bias than larger proteins. Compared to prokaryotic species, the evolution of eukaryotic proteomes was characterized by increased protein number (massive gene duplication) and substantial changes of protein size (domain addition/subtraction).
Collapse
Affiliation(s)
- Axel Tiessen
- Departamento de Ingeniería Genética, CINVESTAV Irapuato, Irapuato, CP 36821, Mexico
| | | | | |
Collapse
|
10
|
Bohr J, Bohr H, Brunak S. Protein folding and wring resonances. Biophys Chem 1997; 63:97-105. [PMID: 17029822 DOI: 10.1016/s0301-4622(96)02249-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/1996] [Revised: 08/19/1996] [Accepted: 10/17/1996] [Indexed: 10/18/2022]
Abstract
The polypeptide chain of a protein is shown to obey topological constraints which enable long range excitations in the form of wring modes of the protein backbone. Wring modes of proteins of specific lengths can therefore resonate with molecular modes present in the cell. It is suggested that protein folding takes place when the amplitude of a wring excitation becomes so large that it is energetically favorable to bend the protein backbone. The condition under which such structural transformations can occur is found, and it is shown that both cold and hot denaturation (the unfolding of proteins) are natural consequences of the suggested wring mode model. Native (folded) proteins are found to possess an intrinsic standing wring mode.
Collapse
Affiliation(s)
- J Bohr
- Physics Department, Building 307, The Technical University of Denmark, DK-2800 Lyngby, Denmark.
| | | | | |
Collapse
|
11
|
Abstract
Reverse transcription has been an important mediator of genomic change. This influence dates back more than three billion years, when the RNA genome was converted into the DNA genome. While the current cellular role(s) of reverse transcriptase are not yet completely understood, it has become clear over the last few years that this enzyme is still responsible for generating significant genomic change and that its activities are one of the driving forces of evolution. Reverse transcriptase generates, for example, extra gene copies (retrogenes), using as a template mature messenger RNAs. Such retrogenes do not always end up as nonfunctional pseudogenes but form, after reinsertion into the genome, new unions with resident promoter elements that may alter the gene's temporal and/or spatial expression levels. More frequently, reverse transcriptase produces copies of nonmessenger RNAs, such as small nuclear or cytoplasmic RNAs. Extremely high copy numbers can be generated by this process. The resulting reinserted DNA copies are therefore referred to as short interspersed repetitive elements (SINEs). SINEs have long been considered selfish DNA, littering the genome via exponential propagation but not contributing to the host's fitness. Many SINEs, however, can give rise to novel genes encoding small RNAs, and are the migrant carriers of numerous control elements and sequence motifs that can equip resident genes with novel regulatory elements [Brosius J. and Gould S.J., Proc Natl Acad Sci USA 89, 10706-10710, 1992]. Retrosequences, such as SINEs and portions of retroelements (e.g., long terminal repeats, LTRs), are capable of donating sequence motifs for nucleosome positioning, DNA methylation, transcriptional enhancers and silencers, poly(A) addition sequences, determinants of RNA stability or transport, splice sites, and even amino acid codons for incorporation into open reading frames as novel protein domains. Retroposition can therefore be considered as a major pacemaker for evolution (including speciation). Retroposons, with their unique properties and actions, form the molecular basis of important evolutionary concepts, such as exaptation [Gould S.J. and Vrba E., Paleobiology 8, 4-15, 1982] and punctuated equilibrium [Elredge N. and Gould S.J. in Schopf T.J.M. (ed). Models in Paleobiology. Freeman, Cooper, San Francisco, 1972, pp. 82-115].
Collapse
Affiliation(s)
- J Brosius
- Institute for Experimental Pathology, ZMBE University of Münster, Germany.
| | | |
Collapse
|
12
|
Trifonov EN. Segmented structure of protein sequences and early evolution of genome by combinatorial fusion of DNA elements. J Mol Evol 1995; 40:337-42. [PMID: 7723061 DOI: 10.1007/bf00163239] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
A theory of an early stage of genome evolution by combinatorial fusion of circular DNA units is suggested, based on protein sequence "fossil" evidence. The evidence includes preference of protein sequence lengths for certain sizes--multiples of 123 aa for eukaryotes and multiples of 152 aa for prokaryotes. At the DNA level these sizes correspond to 350-450 base pairs--the known optimal range for DNA ring closure. The methionine residues repeatedly appear along the sequences with the same period of about 120 aa (in eukaryotes), presumably marking the sites of insertion of the early genes--rings of protein-coding DNA. No torsional constraint in this DNA results in very sharp estimate of the helical periodicity of the early DNA, indistinguishable from the experimental mean value for extant DNA. According to the combinatorial fusion theory, based on the above evidence, in the pregenomic, prerecombinational stage the genes and the noncoding sequences existed in form of autonomously replicating DNA rings of close to standard size, randomly segregating between dividing cells, like modern plasmids do. In the recombinational early genomic stage the rings started to fuse, forming larger DNA molecules consisting of several unit genes connected in various combinations and forming long protein-coding sequences (combinatorial fusion). This process, which involved, perhaps, noncoding sequences as well, eventually resulted in the formation of large genomes. The dispersed circular DNA--or, rather, evolutionarily advanced derivatives thereof--may still exist in the form of various mobile DNA elements.
Collapse
Affiliation(s)
- E N Trifonov
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
13
|
Muller AW. Were the first organisms heat engines? A new model for biogenesis and the early evolution of biological energy conversion. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 1995; 63:193-231. [PMID: 7542789 DOI: 10.1016/0079-6107(95)00004-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- A W Muller
- E.C. Slater Institute, BioCentrum Amsterdam, Universiteit van Amsterdam, The Netherlands
| |
Collapse
|