1
|
Potera K, Tomala K. Using yeasts for the studies of nonfunctional factors in protein evolution. Yeast 2024; 41:529-536. [PMID: 38895906 DOI: 10.1002/yea.3970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/08/2024] [Accepted: 06/06/2024] [Indexed: 06/21/2024] Open
Abstract
The evolution of protein sequence is driven not only by factors directly related to protein function and shape but also by nonfunctional factors. Such factors in protein evolution might be categorized as those connected to energetic costs, synthesis efficiency, and avoidance of misfolding and toxicity. A common approach to studying them is correlational analysis contrasting them with some characteristics of the protein, like amino acid composition, but these features are interdependent. To avoid possible bias, empirical studies are needed, and not enough work has been done to date. In this review, we describe the role of nonfunctional factors in protein evolution and present an experimental approach using yeast as a suitable model organism. The focus of the proposed approach is on the potential negative impact on the fitness of mutations that change protein properties not related to function and the frequency of mutations that change these properties. Experimental results of testing the misfolding avoidance hypothesis as an explanation for why highly expressed proteins evolve slowly are inconsistent with correlational research results. Therefore, more efforts should be made to empirically test the effects of nonfunctional factors in protein evolution and to contrast these results with the results of the correlational analysis approach.
Collapse
Affiliation(s)
- Katarzyna Potera
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
- Doctoral School of Exact and Natural Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
2
|
Tanoz I, Timsit Y. Protein Fold Usages in Ribosomes: Another Glance to the Past. Int J Mol Sci 2024; 25:8806. [PMID: 39201491 PMCID: PMC11354259 DOI: 10.3390/ijms25168806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open
Abstract
The analysis of protein fold usage, similar to codon usage, offers profound insights into the evolution of biological systems and the origins of modern proteomes. While previous studies have examined fold distribution in modern genomes, our study focuses on the comparative distribution and usage of protein folds in ribosomes across bacteria, archaea, and eukaryotes. We identify the prevalence of certain 'super-ribosome folds,' such as the OB fold in bacteria and the SH3 domain in archaea and eukaryotes. The observed protein fold distribution in the ribosomes announces the future power-law distribution where only a few folds are highly prevalent, and most are rare. Additionally, we highlight the presence of three copies of proto-Rossmann folds in ribosomes across all kingdoms, showing its ancient and fundamental role in ribosomal structure and function. Our study also explores early mechanisms of molecular convergence, where different protein folds bind equivalent ribosomal RNA structures in ribosomes across different kingdoms. This comparative analysis enhances our understanding of ribosomal evolution, particularly the distinct evolutionary paths of the large and small subunits, and underscores the complex interplay between RNA and protein components in the transition from the RNA world to modern cellular life. Transcending the concept of folds also makes it possible to group a large number of ribosomal proteins into five categories of urfolds or metafolds, which could attest to their ancestral character and common origins. This work also demonstrates that the gradual acquisition of extensions by simple but ordered folds constitutes an inexorable evolutionary mechanism. This observation supports the idea that simple but structured ribosomal proteins preceded the development of their disordered extensions.
Collapse
Affiliation(s)
- Inzhu Tanoz
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
| | - Youri Timsit
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 Rue Michel-Ange, 75016 Paris, France
| |
Collapse
|
3
|
Seppi M, Pasqualini J, Facchin S, Savarino EV, Suweis S. Emergent Functional Organization of Gut Microbiomes in Health and Diseases. Biomolecules 2023; 14:5. [PMID: 38275746 PMCID: PMC10813293 DOI: 10.3390/biom14010005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/27/2024] Open
Abstract
Continuous and significant progress in sequencing technologies and bioinformatics pipelines has revolutionized our comprehension of microbial communities, especially for human microbiomes. However, most studies have focused on studying the taxonomic composition of the microbiomes and we are still not able to characterize dysbiosis and unveil the underlying ecological consequences. This study explores the emergent organization of functional abundances and correlations of gut microbiomes in health and disease. Leveraging metagenomic sequences, taxonomic and functional tables are constructed, enabling comparative analysis. First, we show that emergent taxonomic and functional patterns are not useful to characterize dysbiosis. Then, through differential abundance analyses applied to functions, we reveal distinct functional compositions in healthy versus unhealthy microbiomes. In addition, we inquire into the functional correlation structure, revealing significant differences between the healthy and unhealthy groups, which may significantly contribute to understanding dysbiosis. Our study demonstrates that scrutinizing the functional organization in the microbiome provides novel insights into the underlying state of the microbiome. The shared data structure underlying the functional and taxonomic compositions allows for a comprehensive macroecological examination. Our findings not only shed light on dysbiosis, but also underscore the importance of studying functional interrelationships for a nuanced understanding of the dynamics of the microbial community. This research proposes a novel approach, bridging the gap between microbial ecology and functional analyses, promising a deeper understanding of the intricate world of the gut microbiota and its implications for human health.
Collapse
Affiliation(s)
- Marcello Seppi
- Laboratory of Interdisciplinary Physics (LIPh), Physics and Astronomy Department, University of Padua, Via Marzolo 8, 35131 Padua, Italy; (M.S.); (J.P.)
| | - Jacopo Pasqualini
- Laboratory of Interdisciplinary Physics (LIPh), Physics and Astronomy Department, University of Padua, Via Marzolo 8, 35131 Padua, Italy; (M.S.); (J.P.)
| | - Sonia Facchin
- Department of Surgery, Oncology and Gastroenterology (DiSCOG), University of Padua, Via Giustiniani 2, 35121 Padua, Italy; (S.F.); (E.V.S.)
| | - Edoardo Vincenzo Savarino
- Department of Surgery, Oncology and Gastroenterology (DiSCOG), University of Padua, Via Giustiniani 2, 35121 Padua, Italy; (S.F.); (E.V.S.)
| | - Samir Suweis
- Laboratory of Interdisciplinary Physics (LIPh), Physics and Astronomy Department, University of Padua, Via Marzolo 8, 35131 Padua, Italy; (M.S.); (J.P.)
| |
Collapse
|
4
|
Rosandić M, Paar V. The Supersymmetry Genetic Code Table and Quadruplet Symmetries of DNA Molecules Are Unchangeable and Synchronized with Codon-Free Energy Mapping during Evolution. Genes (Basel) 2023; 14:2200. [PMID: 38137022 PMCID: PMC10743133 DOI: 10.3390/genes14122200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/03/2023] [Accepted: 12/09/2023] [Indexed: 12/24/2023] Open
Abstract
The Supersymmetry Genetic code (SSyGC) table is based on five physicochemical symmetries: (1) double mirror symmetry on the principle of the horizontal and vertical mirror symmetry axis between all bases (purines [A, G) and pyrimidines (U, C)] and (2) of bases in the form of codons; (3) direct-complement like codon/anticodon symmetry in the sixteen alternating boxes of the genetic code columns; (4) A + T-rich and C + G-rich alternate codons in the same row between both columns of the genetic code; (5) the same position between divided and undivided codon boxes in relation to horizontal mirror symmetry axis. The SSyGC table has a unique physicochemical purine-pyrimidine symmetry net which is as the core symmetry common for all, with more than thirty different nuclear and mitochondrial genetic codes. This net is present in the SSyGC table of all RNA and DNA living species. None of these symmetries are present in the Standard Genetic Code (SGC) table which is constructed on the alphabetic horizontal and vertical U-C-A-G order of bases. Here, we show that the free energy value of each codon incorporated as fundamentally mapping the "energy code" in the SSyGC table is compatible with mirror symmetry. On the other hand, in the SGC table, the same free energy values of codons are dispersed and a mirror symmetry between them is not recognizable. At the same time, the mirror symmetry of the SSyGC table and the DNA quadruplets together with our classification of codons/trinucleotides are perfectly imbedded in the mirror symmetry energy mapping of codons/trinucleotides and point out in favor of maintaining the integrity of the genetic code and DNA genome. We also argue that physicochemical symmetries of the SSyGC table in the manner of the purine-pyrimidine symmetry net, the quadruplet symmetry of DNA molecule, and the free energy of codons have remined unchanged during all of evolution. The unchangeable and universal symmetry properties of the genetic code, DNA molecules, and the energy code are decreasing disorder between codons/trinucleotides and shed a new light on evolution. Diversity in all living species on Earth is broad, but the symmetries of the Supersymmetry Genetic Code as the code of life and the DNA quadruplets related to the "energy code" are unique, unchangeable, and have the power of natural laws.
Collapse
Affiliation(s)
- Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
- Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
5
|
Rosandić M, Paar V. The Evolution of Life Is a Road Paved with the DNA Quadruplet Symmetry and the Supersymmetry Genetic Code. Int J Mol Sci 2023; 24:12029. [PMID: 37569405 PMCID: PMC10418607 DOI: 10.3390/ijms241512029] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/19/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Symmetries have not been completely determined and explained from the discovery of the DNA structure in 1953 and the genetic code in 1961. We show, during 10 years of investigation and research, our discovery of the Supersymmetry Genetic Code table in the form of 2 × 8 codon boxes, quadruplet DNA symmetries, and the classification of trinucleotides/codons, all built with the same physiochemical double mirror symmetry and Watson-Crick pairing. We also show that single-stranded RNA had the complete code of life in the form of the Supersymmetry Genetic Code table simultaneously with instructions of codons' relationship as to how to develop the DNA molecule on the principle of Watson-Crick pairing. We show that the same symmetries between the genetic code and DNA quadruplet are highly conserved during the whole evolution even between phylogenetically distant organisms. In this way, decreasing disorder and entropy enabled the evolution of living beings up to sophisticated species with cognitive features. Our hypothesis that all twenty amino acids are necessary for the origin of life on the Earth, which entirely changes our view on evolution, confirms the evidence of organic natural amino acids from the extra-terrestrial asteroid Ryugu, which is nearly as old as our solar system.
Collapse
Affiliation(s)
- Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia;
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
6
|
Roberts M, Josephs EB. Weaker selection on genes with treatment-specific expression consistent with a limit on plasticity evolution in Arabidopsis thaliana. Genetics 2023; 224:iyad074. [PMID: 37094602 PMCID: PMC10484170 DOI: 10.1093/genetics/iyad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 03/06/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023] Open
Abstract
Differential gene expression between environments often underlies phenotypic plasticity. However, environment-specific expression patterns are hypothesized to relax selection on genes, and thus limit plasticity evolution. We collated over 27 terabases of RNA-sequencing data on Arabidopsis thaliana from over 300 peer-reviewed studies and 200 treatment conditions to investigate this hypothesis. Consistent with relaxed selection, genes with more treatment-specific expression have higher levels of nucleotide diversity and divergence at nonsynonymous sites but lack stronger signals of positive selection. This result persisted even after controlling for expression level, gene length, GC content, the tissue specificity of expression, and technical variation between studies. Overall, our investigation supports the existence of a hypothesized trade-off between the environment specificity of a gene's expression and the strength of selection on said gene in A. thaliana. Future studies should leverage multiple genome-scale datasets to tease apart the contributions of many variables in limiting plasticity evolution.
Collapse
Affiliation(s)
- Miles Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
7
|
Lazzardi S, Valle F, Mazzolini A, Scialdone A, Caselle M, Osella M. Emergent statistical laws in single-cell transcriptomic data. Phys Rev E 2023; 107:044403. [PMID: 37198814 DOI: 10.1103/physreve.107.044403] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/24/2023] [Indexed: 05/19/2023]
Abstract
Large-scale data on single-cell gene expression have the potential to unravel the specific transcriptional programs of different cell types. The structure of these expression datasets suggests a similarity with several other complex systems that can be analogously described through the statistics of their basic building blocks. Transcriptomes of single cells are collections of messenger RNA abundances transcribed from a common set of genes just as books are different collections of words from a shared vocabulary, genomes of different species are specific compositions of genes belonging to evolutionary families, and ecological niches can be described by their species abundances. Following this analogy, we identify several emergent statistical laws in single-cell transcriptomic data closely similar to regularities found in linguistics, ecology, or genomics. A simple mathematical framework can be used to analyze the relations between different laws and the possible mechanisms behind their ubiquity. Importantly, treatable statistical models can be useful tools in transcriptomics to disentangle the actual biological variability from general statistical effects present in most component systems and from the consequences of the sampling process inherent to the experimental technique.
Collapse
Affiliation(s)
- Silvia Lazzardi
- Department of Physics, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy
| | - Filippo Valle
- Department of Physics, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy
| | - Andrea Mazzolini
- Laboratoire de Physique de l'École Normale Supérieure (PSL University), CNRS, Sorbonne Université and Université de Paris, 75005 Paris, France
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Zentrum München, Feodor-Lynen-Straße 21, 81377 München, Germany and Institute of Functional Epigenetics and Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Michele Caselle
- Department of Physics, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy
| | - Matteo Osella
- Department of Physics, University of Turin and INFN, via P. Giuria 1, 10125 Turin, Italy
| |
Collapse
|
8
|
Kozyrev S. Learning by Population Genetics and Matrix Riccati Equation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:348. [PMID: 36832714 PMCID: PMC9955902 DOI: 10.3390/e25020348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 01/26/2023] [Accepted: 02/12/2023] [Indexed: 06/18/2023]
Abstract
A model of learning as a generalization of the Eigen's quasispecies model in population genetics is introduced. Eigen's model is considered as a matrix Riccati equation. The error catastrophe in the Eigen's model (when the purifying selection becomes ineffective) is discussed as the divergence of the Perron-Frobenius eigenvalue of the Riccati model in the limit of large matrices. A known estimate for the Perron-Frobenius eigenvalue provides an explanation for observed patterns of genomic evolution. We propose to consider the error catastrophe in Eigen's model as an analog of overfitting in learning theory; this gives a criterion for the presence of overfitting in learning.
Collapse
Affiliation(s)
- Sergei Kozyrev
- Steklov Mathematical Institute of Russian Academy of Sciences, Gubkina St. 8, 119991 Moscow, Russia
| |
Collapse
|
9
|
Getting higher on rugged landscapes: Inversion mutations open access to fitter adaptive peaks in NK fitness landscapes. PLoS Comput Biol 2022; 18:e1010647. [PMID: 36315581 PMCID: PMC9648849 DOI: 10.1371/journal.pcbi.1010647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 11/10/2022] [Accepted: 10/09/2022] [Indexed: 11/12/2022] Open
Abstract
Molecular evolution is often conceptualised as adaptive walks on rugged fitness landscapes, driven by mutations and constrained by incremental fitness selection. It is well known that epistasis shapes the ruggedness of the landscape’s surface, outlining their topography (with high-fitness peaks separated by valleys of lower fitness genotypes). However, within the strong selection weak mutation (SSWM) limit, once an adaptive walk reaches a local peak, natural selection restricts passage through downstream paths and hampers any possibility of reaching higher fitness values. Here, in addition to the widely used point mutations, we introduce a minimal model of sequence inversions to simulate adaptive walks. We use the well known NK model to instantiate rugged landscapes. We show that adaptive walks can reach higher fitness values through inversion mutations, which, compared to point mutations, allows the evolutionary process to escape local fitness peaks. To elucidate the effects of this chromosomal rearrangement, we use a graph-theoretical representation of accessible mutants and show how new evolutionary paths are uncovered. The present model suggests a simple mechanistic rationale to analyse escapes from local fitness peaks in molecular evolution driven by (intragenic) structural inversions and reveals some consequences of the limits of point mutations for simulations of molecular evolution. Ninety years ago, Wright translated Darwin’s core idea of survival of the fittest into rugged landscapes—a highly influential metaphor—with peaks representing high values of fitness separated by valleys of lower fitness. In this picture, once a population has reached a local peak, the adaptive dynamics may stall as further adaptation requires crossing a valley. At the DNA level, adaptation is often modelled as a space of genotypes that is explored through point mutations. Therefore, once a local peak is reached, any genotype fitter than that of the peak will be away from the neighbourhood of genotypes accessible through point mutations. Here we present a simple computational model for inversion mutations, one of the most frequent structural variations, and show that adaptive processes in rugged landscapes can escape from local peaks through intragenic inversion mutations. This new escape mechanism reveals the innovative role of inversions at the DNA level and provides a step towards more realistic models of adaptive dynamics, beyond the dominance of point mutations in theories of molecular evolution.
Collapse
|
10
|
Rosandić M, Vlahović I, Pilaš I, Glunčić M, Paar V. An Explanation of Exceptions from Chargaff's Second Parity Rule/Strand Symmetry of DNA Molecules. Genes (Basel) 2022; 13:1929. [PMID: 36360166 PMCID: PMC9689577 DOI: 10.3390/genes13111929] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/12/2022] [Accepted: 10/17/2022] [Indexed: 11/04/2022] Open
Abstract
In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine-pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson-Crick pairing and the natural law of DNA creation and conservation, according to which the same mono- or oligonucleotide insertion must be inserted simultaneously into both strands of DNA. Taken together, they lead to quadruplets with central mirror symmetry and bidirectional DNA strand orientation and are incorporated into Chargaff's second parity rule (CSPR). Performing our quadruplet frequency analysis of all human chromosomes and of Neuroblastoma BreakPoint Family (NBPF) genes, which code Olduvai protein domains in the human genome, we show that the coding part of DNA violates CSPR. This may shed new light and give rise to a novel hypothesis on DNA creation and its evolution. In this framework, the logarithmic relationship between oligonucleotide order and minimal DNA sequence length, to establish the validity of CSPR, automatically follows from the quadruplet structure of the genomic sequence. The problem of the violation of CSPR in rare symbionts is discussed.
Collapse
Affiliation(s)
- Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ines Vlahović
- Faculty of Science, Algebra University College, 10000 Zagreb, Croatia
| | - Ivan Pilaš
- Forest Research Institute, 10450 Jastrebarsko, Croatia
| | - Matko Glunčić
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
11
|
Garza DR, von Meijenfeldt FAB, van Dijk B, Boleij A, Huynen MA, Dutilh BE. Nutrition or nature: using elementary flux modes to disentangle the complex forces shaping prokaryote pan-genomes. BMC Ecol Evol 2022; 22:101. [PMID: 35974327 PMCID: PMC9382767 DOI: 10.1186/s12862-022-02052-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 07/22/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Microbial pan-genomes are shaped by a complex combination of stochastic and deterministic forces. Even closely related genomes exhibit extensive variation in their gene content. Understanding what drives this variation requires exploring the interactions of gene products with each other and with the organism's external environment. However, to date, conceptual models of pan-genome dynamics often represent genes as independent units and provide limited information about their mechanistic interactions. RESULTS We simulated the stochastic process of gene-loss using the pooled genome-scale metabolic reaction networks of 46 taxonomically diverse bacterial and archaeal families as proxies for their pan-genomes. The frequency by which reactions are retained in functional networks when stochastic gene loss is simulated in diverse environments allowed us to disentangle the metabolic reactions whose presence depends on the metabolite composition of the external environment (constrained by "nutrition") from those that are independent of the environment (constrained by "nature"). By comparing the frequency of reactions from the first group with their observed frequencies in bacterial and archaeal families, we predicted the metabolic niches that shaped the genomic composition of these lineages. Moreover, we found that the lineages that were shaped by a more diverse metabolic niche also occur in more diverse biomes as assessed by global environmental sequencing datasets. CONCLUSION We introduce a computational framework for analyzing and interpreting pan-reactomes that provides novel insights into the ecological and evolutionary drivers of pan-genome dynamics.
Collapse
Affiliation(s)
- Daniel R Garza
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
- Microbial Systems Biology, Laboratory of Molecular Bacteriology, Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Louvain, Belgium.
| | - F A Bastiaan von Meijenfeldt
- Department of Marine Microbiology and Biogeochemistry (MMB), NIOZ Royal Netherlands Institute for Sea Research, PO Box 59, 1790 AB, Den Burg, The Netherlands
| | - Bram van Dijk
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Annemarie Boleij
- Department of Pathology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, Geert Grooteplein-Zuid 10, 6525 GA, Nijmegen, The Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
- Institute of Biodiversity, Faculty of Biology, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
12
|
Almirantis Y, Provata A, Li W. Noether's Theorem as a Metaphor for Chargaff's 2nd Parity Rule in Genomics. J Mol Evol 2022; 90:231-238. [PMID: 35704064 DOI: 10.1007/s00239-022-10062-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/18/2022] [Indexed: 10/18/2022]
Abstract
In the present note, the genomic compositional rule largely known as 'Chargaff's 2nd parity rule' (asserting equimolarity between Adenine-Thymine and Guanine-Cytosine in any of the two DNA strands) is regarded in association with Noether's theorem linking symmetries with conservation laws in physics. In the case of the genome, the strict physical and mathematical prerequisites of Noether's theorem do not hold. However, we conclude that a metaphor can be established with Noether's theorem, as inter-strand symmetry concerning DNA functionality engenders specific features in genome composition. Inversely, when inter-strand symmetry does not hold, the corresponding quantitative relations fail to appear. This association is also considered from the point of view of the existence of emergent laws and properties in evolutionary genomics.
Collapse
Affiliation(s)
- Yannis Almirantis
- Theoretical Biology and Computational Genomics Laboratory, Institute of Bioscience and Applications, National Center for Scientific Research "Demokritos", 15341, Athens, Greece.
| | - Astero Provata
- Statistical Mechanics and Dynamical Systems Laboratory, Institute of Nanoscience and Nanotechnology, National Center for Scientific Research, "Demokritos", 15341, Athens, Greece
| | - Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| |
Collapse
|
13
|
Hsu TK, Asmussen J, Koire A, Choi BK, Gadhikar MA, Huh E, Lin CH, Konecki DM, Kim YW, Pickering CR, Kimmel M, Donehower LA, Frederick MJ, Myers JN, Katsonis P, Lichtarge O. A general calculus of fitness landscapes finds genes under selection in cancers. Genome Res 2022; 32:916-929. [PMID: 35301263 PMCID: PMC9104707 DOI: 10.1101/gr.275811.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 03/14/2022] [Indexed: 11/24/2022]
Abstract
Genetic variants drive the evolution of traits and diseases. We previously modeled these variants as small displacements in fitness landscapes and estimated their functional impact by differentiating the evolutionary relationship between genotype and phenotype. Conversely, here we integrate these derivatives to identify genes steering specific traits. Over cancer cohorts, integration identified 460 likely tumor-driving genes. Many have literature and experimental support but had eluded prior genomic searches for positive selection in tumors. Beyond providing cancer insights, these results introduce a general calculus of evolution to quantify the genotype-phenotype relationship and discover genes associated with complex traits and diseases.
Collapse
Affiliation(s)
- Teng-Kuei Hsu
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jennifer Asmussen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Amanda Koire
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Byung-Kwon Choi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mayur A Gadhikar
- Department of Head and Neck Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Chih-Hsu Lin
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel M Konecki
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Young Won Kim
- Program in Integrative Molecular and Biomedical Sciences, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Curtis R Pickering
- Department of Head and Neck Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Marek Kimmel
- Departments of Statistics and Bioengineering, Rice University, Houston, Texas 77005, USA
- Department of Systems Engineering and Biology, Silesian University of Technology, 44-100 Gliwice, Poland
| | - Lawrence A Donehower
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Mitchell J Frederick
- Department of Otolaryngology-Head and Neck Surgery, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jeffrey N Myers
- Department of Head and Neck Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Olivier Lichtarge
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas 77030, USA
- Program in Integrative Molecular and Biomedical Sciences, Baylor College of Medicine, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
14
|
Urchueguía A, Galbusera L, Chauvin D, Bellement G, Julou T, van Nimwegen E. Genome-wide gene expression noise in Escherichia coli is condition-dependent and determined by propagation of noise through the regulatory network. PLoS Biol 2021; 19:e3001491. [PMID: 34919538 PMCID: PMC8719677 DOI: 10.1371/journal.pbio.3001491] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 12/31/2021] [Accepted: 11/23/2021] [Indexed: 11/21/2022] Open
Abstract
Although it is well appreciated that gene expression is inherently noisy and that transcriptional noise is encoded in a promoter’s sequence, little is known about the extent to which noise levels of individual promoters vary across growth conditions. Using flow cytometry, we here quantify transcriptional noise in Escherichia coli genome-wide across 8 growth conditions and find that noise levels systematically decrease with growth rate, with a condition-dependent lower bound on noise. Whereas constitutive promoters consistently exhibit low noise in all conditions, regulated promoters are both more noisy on average and more variable in noise across conditions. Moreover, individual promoters show highly distinct variation in noise across conditions. We show that a simple model of noise propagation from regulators to their targets can explain a significant fraction of the variation in relative noise levels and identifies TFs that most contribute to both condition-specific and condition-independent noise propagation. In addition, analysis of the genome-wide correlation structure of various gene properties shows that gene regulation, expression noise, and noise plasticity are all positively correlated genome-wide and vary independently of variations in absolute expression, codon bias, and evolutionary rate. Together, our results show that while absolute expression noise tends to decrease with growth rate, relative noise levels of genes are highly condition-dependent and determined by the propagation of noise through the gene regulatory network. Genome-wide flow cytometry measurements reveal that gene expression noise in bacteria is highly condition-dependent; while absolute noise levels of all genes decrease with growth-rate, theoretical modeling shows that the relative noise levels of different genes are determined by the propagation of noise through the gene regulatory network (GRN). Thus GRN structure controls not only mean expression but also noise levels.
Collapse
Affiliation(s)
- Arantxa Urchueguía
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Luca Galbusera
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dany Chauvin
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gwendoline Bellement
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Thomas Julou
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (TJ); (EvN)
| | - Erik van Nimwegen
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
- * E-mail: (TJ); (EvN)
| |
Collapse
|
15
|
Desvignes T, Sydes J, Montfort J, Bobe J, Postlethwait JH. Evolution after Whole-Genome Duplication: Teleost MicroRNAs. Mol Biol Evol 2021; 38:3308-3331. [PMID: 33871629 PMCID: PMC8321539 DOI: 10.1093/molbev/msab105] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are important gene expression regulators implicated in many biological processes, but we lack a global understanding of how miRNA genes evolve and contribute to developmental canalization and phenotypic diversification. Whole-genome duplication events likely provide a substrate for species divergence and phenotypic change by increasing gene numbers and relaxing evolutionary pressures. To understand the consequences of genome duplication on miRNA evolution, we studied miRNA genes following the teleost genome duplication (TGD). Analysis of miRNA genes in four teleosts and in spotted gar, whose lineage diverged before the TGD, revealed that miRNA genes were retained in ohnologous pairs more frequently than protein-coding genes, and that gene losses occurred rapidly after the TGD. Genomic context influenced retention rates, with clustered miRNA genes retained more often than nonclustered miRNA genes and intergenic miRNA genes retained more frequently than intragenic miRNA genes, which often shared the evolutionary fate of their protein-coding host. Expression analyses revealed both conserved and divergent expression patterns across species in line with miRNA functions in phenotypic canalization and diversification, respectively. Finally, major strands of miRNA genes experienced stronger purifying selection, especially in their seeds and 3'-complementary regions, compared with minor strands, which nonetheless also displayed evolutionary features compatible with constrained function. This study provides the first genome-wide, multispecies analysis of the mechanisms influencing metazoan miRNA evolution after whole-genome duplication.
Collapse
Affiliation(s)
- Thomas Desvignes
- Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | - Jason Sydes
- Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | | | | | | |
Collapse
|
16
|
Droghetti R, Agier N, Fischer G, Gherardi M, Cosentino Lagomarsino M. An evolutionary model identifies the main evolutionary biases for the evolution of genome-replication profiles. eLife 2021; 10:63542. [PMID: 34013887 PMCID: PMC8213407 DOI: 10.7554/elife.63542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 05/20/2021] [Indexed: 12/13/2022] Open
Abstract
Recent results comparing the temporal program of genome replication of yeast species belonging to the Lachancea clade support the scenario that the evolution of the replication timing program could be mainly driven by correlated acquisition and loss events of active replication origins. Using these results as a benchmark, we develop an evolutionary model defined as birth-death process for replication origins and use it to identify the evolutionary biases that shape the replication timing profiles. Comparing different evolutionary models with data, we find that replication origin birth and death events are mainly driven by two evolutionary pressures, the first imposes that events leading to higher double-stall probability of replication forks are penalized, while the second makes less efficient origins more prone to evolutionary loss. This analysis provides an empirically grounded predictive framework for quantitative evolutionary studies of the replication timing program.
Collapse
Affiliation(s)
- Rossana Droghetti
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, Milan, Italy
| | - Nicolas Agier
- Sorbonne Universitè, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Gilles Fischer
- Sorbonne Universitè, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Marco Gherardi
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, Milan, Italy and INFN sezione di Milano, Milan, Italy
| | - Marco Cosentino Lagomarsino
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, Milan, Italy and INFN sezione di Milano, Milan, Italy.,IFOM Foundation, FIRC Institute for Molecular Oncology, via Adamello 16, Milan, Italy
| |
Collapse
|
17
|
Thompson CL, Alberti M, Barve S, Battistuzzi FU, Drake JL, Goncalves GC, Govaert L, Partridge C, Yang Y. Back to the future: Reintegrating biology to understand how past eco-evolutionary change can predict future outcomes. Integr Comp Biol 2021; 61:2218-2232. [PMID: 33964141 DOI: 10.1093/icb/icab068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
During the last few decades, biologists have made remarkable progress in understanding the fundamental processes that shape life. But despite the unprecedented level of knowledge now available, large gaps still remain in our understanding of the complex interplay of eco-evolutionary mechanisms across scales of life. Rapidly changing environments on Earth provide a pressing need to understand the potential implications of eco-evolutionary dynamics, which can be achieved by improving existing eco-evolutionary models and fostering convergence among the sub-fields of biology. We propose a new, data-driven approach that harnesses our knowledge of the functioning of biological systems to expand current conceptual frameworks and develop corresponding models that can more accurately represent and predict future eco-evolutionary outcomes. We suggest a roadmap toward achieving this goal. This long-term vision will move biology in a direction that can wield these predictive models for scientific applications that benefit humanity and increase the resilience of natural biological systems. We identify short, medium, and long-term key objectives to connect our current state of knowledge to this long-term vision, iteratively progressing across three stages: 1) utilizing knowledge of biological systems to better inform eco-evolutionary models, 2) generating models with more accurate predictions, and 3) applying predictive models to benefit the biosphere. Within each stage, we outline avenues of investigation and scientific applications related to the timescales over which evolution occurs, the parameter space of eco-evolutionary processes, and the dynamic interactions between these mechanisms. The ability to accurately model, monitor, and anticipate eco-evolutionary changes would be transformational to humanity's interaction with the global environment, providing novel tools to benefit human health, protect the natural world, and manage our planet's biosphere.
Collapse
Affiliation(s)
| | - Marina Alberti
- Department of Urban Design and Planning, University of Washington,
| | - Sahas Barve
- Smithsonian National Museum of Natural History,
| | | | - Jeana L Drake
- Department of Earth, Planetary, and Space Sciences, University of California Los Angeles,
| | | | - Lynn Govaert
- Department of Evolutionary Biology and Environmental Studies, University of Zurich; Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology, URPP Global Change and Biodiversity, University of Zurich,
| | | | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota,
| |
Collapse
|
18
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
19
|
Heng J, Heng HH. Genome chaos: Creating new genomic information essential for cancer macroevolution. Semin Cancer Biol 2020; 81:160-175. [PMID: 33189848 DOI: 10.1016/j.semcancer.2020.11.003] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 10/26/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022]
Abstract
Cancer research has traditionally focused on the characterization of individual molecular mechanisms that can contribute to cancer. Due to the multiple levels of genomic and non-genomic heterogeneity, however, overwhelming molecular mechanisms have been identified, most with low clinical predictability. It is thus necessary to search for new concepts to unify these diverse mechanisms and develop better strategies to understand and treat cancer. In recent years, two-phased cancer evolution (comprised of the genome reorganization-mediated punctuated phase and gene mutation-mediated stepwise phase), initially described by tracing karyotype evolution, was confirmed by the Cancer Genome Project. In particular, genome chaos, the process of rapid and massive genome reorganization, has been commonly detected in various cancers-especially during key phase transitions, including cellular transformation, metastasis, and drug resistance-suggesting the importance of genome-level changes in cancer evolution. In this Perspective, genome chaos is used as a discussion point to illustrate new genome-mediated somatic evolutionary frameworks. By rephrasing cancer as a new system emergent from normal tissue, we present the multiple levels (or scales) of genomic and non-genomic information. Of these levels, evolutionary studies at the chromosomal level are determined to be of ultimate importance, since altered genomes change the karyotype coding and karyotype change is the key event for punctuated cellular macroevolution. Using this lens, we differentiate and analyze developmental processes and cancer evolution, as well as compare the informational relationship between genome chaos and its various subtypes in the context of macroevolution under crisis. Furthermore, the process of deterministic genome chaos is discussed to interpret apparently random events (including stressors, chromosomal variation subtypes, surviving cells with new karyotypes, and emergent stable cellular populations) as nonrandom patterns, which supports the new cancer evolutionary model that unifies genome and gene contributions during different phases of cancer evolution. Finally, the new perspective of using cancer as a model for organismal evolution is briefly addressed, emphasizing the Genome Theory as a new and necessary conceptual framework for future research and its practical implications, not only in cancer but evolutionary biology as a whole.
Collapse
Affiliation(s)
- Julie Heng
- Harvard College, 86 Brattle Street Cambridge, MA, 02138, USA
| | - Henry H Heng
- Center for Molecular Medicine and Genomics, Wayne State University School of Medicine, Detroit, MI, 48201, USA; Department of Pathology, Wayne State University School of Medicine, Detroit, MI, 48201, USA.
| |
Collapse
|
20
|
Weisman CM, Murray AW, Eddy SR. Many, but not all, lineage-specific genes can be explained by homology detection failure. PLoS Biol 2020; 18:e3000862. [PMID: 33137085 PMCID: PMC7660931 DOI: 10.1371/journal.pbio.3000862] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 11/12/2020] [Accepted: 09/21/2020] [Indexed: 12/21/2022] Open
Abstract
Genes for which homologs can be detected only in a limited group of evolutionarily related species, called “lineage-specific genes,” are pervasive: Essentially every lineage has them, and they often comprise a sizable fraction of the group’s total genes. Lineage-specific genes are often interpreted as “novel” genes, representing genetic novelty born anew within that lineage. Here, we develop a simple method to test an alternative null hypothesis: that lineage-specific genes do have homologs outside of the lineage that, even while evolving at a constant rate in a novelty-free manner, have merely become undetectable by search algorithms used to infer homology. We show that this null hypothesis is sufficient to explain the lack of detected homologs of a large number of lineage-specific genes in fungi and insects. However, we also find that a minority of lineage-specific genes in both clades are not well explained by this novelty-free model. The method provides a simple way of identifying which lineage-specific genes call for special explanations beyond homology detection failure, highlighting them as interesting candidates for further study. Lineage-specific gene families may arise from evolutionary innovations such as de novo gene origination, or may simply mean that a similarity search program failed to identify more distant homologs. A new computational method for modeling the expected decay of similarity search scores with evolutionary distance allows distinction between the two explanations.
Collapse
Affiliation(s)
- Caroline M. Weisman
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Andrew W. Murray
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sean R. Eddy
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
21
|
Hämälä T, Tiffin P. Biased Gene Conversion Constrains Adaptation in Arabidopsis thaliana. Genetics 2020; 215:831-846. [PMID: 32414868 PMCID: PMC7337087 DOI: 10.1534/genetics.120.303335] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 05/14/2020] [Indexed: 02/01/2023] Open
Abstract
Reduction of fitness due to deleterious mutations imposes a limit to adaptive evolution. By characterizing features that influence this genetic load we may better understand constraints on responses to both natural and human-mediated selection. Here, using whole-genome, transcriptome, and methylome data from >600 Arabidopsis thaliana individuals, we set out to identify important features influencing selective constraint. Our analyses reveal that multiple factors underlie the accumulation of maladaptive mutations, including gene expression level, gene network connectivity, and gene-body methylation. We then focus on a feature with major effect, nucleotide composition. The ancestral vs. derived status of segregating alleles suggests that GC-biased gene conversion, a recombination-associated process that increases the frequency of G and C nucleotides regardless of their fitness effects, shapes sequence patterns in A. thaliana Through estimation of mutational effects, we present evidence that biased gene conversion hinders the purging of deleterious mutations and contributes to a genome-wide signal of decreased efficacy of selection. By comparing these results to two outcrossing relatives, Arabidopsis lyrata and Capsella grandiflora, we find that protein evolution in A. thaliana is as strongly affected by biased gene conversion as in the outcrossing species. Last, we perform simulations to show that natural levels of outcrossing in A. thaliana are sufficient to facilitate biased gene conversion despite increased homozygosity due to selfing. Together, our results show that even predominantly selfing taxa are susceptible to biased gene conversion, suggesting that it may constitute an important constraint to adaptation among plant species.
Collapse
Affiliation(s)
- Tuomas Hämälä
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota 55108
| | - Peter Tiffin
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota 55108
| |
Collapse
|
22
|
Kozyrev SV. Learning Problem for Functional Programming and Model of Biological Evolution. ACTA ACUST UNITED AC 2020. [DOI: 10.1134/s207004662002003x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
23
|
Rosandić M, Vlahović I, Paar V. Novel look at DNA and life-Symmetry as evolutionary forcing. J Theor Biol 2019; 483:109985. [PMID: 31469987 DOI: 10.1016/j.jtbi.2019.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 06/21/2018] [Accepted: 08/22/2019] [Indexed: 11/20/2022]
Abstract
After explanation of the Chargaff´s first parity rule in terms of the Watson-Crick base-pairing between the two DNA strands, the Chargaff´s second parity rule for each strand of DNA (also named strand symmetry), which cannot be explained by Watson-Crick base-pairing only, is still a challenging issue already fifty years. We show that during evolution DNA preserves its identity in the form of quadruplet A+T and C+G rich matrices based on purine-pyrimidine mirror symmetries of trinucleotides. Identical symmetries are present in our classification of trinucleotides and the genetic code table. All eukaryotes and almost all prokaryotes (bacteria and archaea) have quadruplet mirror symmetries in structural form and frequencies following the principle of Chargaff's second parity rule and Natural symmetry law of DNA creation and conservation. Some rare symbionts have mirror symmetry only in their structural form within each DNA strand. Based on our matrix analysis of closely related species, humans and Neanderthals, we find that the circular cycle of inverse proportionality between trinucleotides preserves identical relative frequencies of trinucleotides in each quadruplet and in the whole genome. According to our calculations, a change in frequencies in quadruplet matrices could lead to the creation of new species. Violation of quadruplet symmetries is practically inconsistent with life. DNA symmetries provide a key for understanding the restriction of disorder (entropy) due to mutations in the evolution of DNA.
Collapse
Affiliation(s)
- Marija Rosandić
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; University hospital centre Zagreb (ret.), Zagreb, Croatia.
| | - Ines Vlahović
- Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia; Algebra University College, 10000 Zagreb, Croatia.
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia; Department of Physics, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia.
| |
Collapse
|
24
|
Shelyakin PV, Bochkareva OO, Karan AA, Gelfand MS. Micro-evolution of three Streptococcus species: selection, antigenic variation, and horizontal gene inflow. BMC Evol Biol 2019; 19:83. [PMID: 30917781 PMCID: PMC6437910 DOI: 10.1186/s12862-019-1403-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 02/25/2019] [Indexed: 02/07/2023] Open
Abstract
Background The genus Streptococcus comprises pathogens that strongly influence the health of humans and animals. Genome sequencing of multiple Streptococcus strains demonstrated high variability in gene content and order even in closely related strains of the same species and created a newly emerged object for genomic analysis, the pan-genome. Here we analysed the genome evolution of 25 strains of Streptococcus suis, 50 strains of Streptococcus pyogenes and 28 strains of Streptococcus pneumoniae. Results Fractions of the pan-genome, unique, periphery, and universal genes differ in size, functional composition, the level of nucleotide substitutions, and predisposition to horizontal gene transfer and genomic rearrangements. The density of substitutions in intergenic regions appears to be correlated with selection acting on adjacent genes, implying that more conserved genes tend to have more conserved regulatory regions. The total pan-genome of the genus is open, but only due to strain-specific genes, whereas other pan-genome fractions reach saturation. We have identified the set of genes with phylogenies inconsistent with species and non-conserved location in the chromosome; these genes are rare in at least one species and have likely experienced recent horizontal transfer between species. The strain-specific fraction is enriched with mobile elements and hypothetical proteins, but also contains a number of candidate virulence-related genes, so it may have a strong impact on adaptability and pathogenicity. Mapping the rearrangements to the phylogenetic tree revealed large parallel inversions in all species. A parallel inversion of length 15 kB with breakpoints formed by genes encoding surface antigen proteins PhtD and PhtB in S. pneumoniae leads to replacement of gene fragments that likely indicates the action of an antigen variation mechanism. Conclusions Members of genus Streptococcus have a highly dynamic, open pan-genome, that potentially confers them with the ability to adapt to changing environmental conditions, i.e. antibiotic resistance or transmission between different hosts. Hence, integrated analysis of all aspects of genome evolution is important for the identification of potential pathogens and design of drugs and vaccines. Electronic supplementary material The online version of this article (10.1186/s12862-019-1403-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pavel V Shelyakin
- Vavilov Institute of General Genetics Russian Academy of Sciences, Gubkina str. 3, Moscow, 119991, Russia. .,Kharkevich Institute for Information Transmission Problems, 19, Bolshoy Karetny per., Moscow, 127051, Russia. .,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.
| | - Olga O Bochkareva
- Kharkevich Institute for Information Transmission Problems, 19, Bolshoy Karetny per., Moscow, 127051, Russia.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Anna A Karan
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Mikhail S Gelfand
- Kharkevich Institute for Information Transmission Problems, 19, Bolshoy Karetny per., Moscow, 127051, Russia.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia.,Faculty of Computer Science, Higher School of Economics, Moscow, Russia
| |
Collapse
|
25
|
Abstract
Genomes appear similar to natural language texts, and protein domains can be treated as analogs of words. To investigate the linguistic properties of genomes further, we calculated the complexity of the “protein languages” in all major branches of life and identified a nearly universal value of information gain associated with the transition from a random domain arrangement to the current protein domain architecture. An exploration of the evolutionary relationship of the protein languages identified the domain combinations that discriminate between the major branches of cellular life. We conclude that there exists a “quasi-universal grammar” of protein domains and that the nearly constant information gain we identified corresponds to the minimal complexity required to maintain a functional cell. From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
Collapse
|
26
|
|
27
|
Marek A, Tomala K. The Contribution of Purifying Selection, Linkage, and Mutation Bias to the Negative Correlation between Gene Expression and Polymorphism Density in Yeast Populations. Genome Biol Evol 2018; 10:2986-2996. [PMID: 30321329 PMCID: PMC6250307 DOI: 10.1093/gbe/evy225] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 11/13/2022] Open
Abstract
The negative correlation between the rate of protein evolution and expression level of a gene has been recognized as a universal law of the evolutionary biology (Koonin 2011). In our study, we apply a population-based approach to systematically investigate the relative importance of unequal mutation rate, linkage, and selection in the origin of the expression-polymorphism anticorrelation. We analyzed the DNA sequence of protein coding genes of 24 Saccharomyces cerevisiae and 58 Schizosaccharomyces pombe strains. We found that highly expressed genes had a substantially decreased number of polymorphic sites when compared with genes transcribed less extensively. This expression-dependent reduction was especially strong in the nonsynonymous sites, although it was also present in the synonymous sites and untranslated regions, both up and down of a gene. Most importantly, no such trend was found in introns. We used these observations, as well as analyses of site frequency spectra and data from mutation accumulation experiments, to show that the purifying selection acting on nonsynonymous sites was the main, but not exclusive, factor impeding molecular evolution within the coding sequences of highly expressed genes. Linkage could not fully explain the observed pattern of polymorphism within the untranslated regions and synonymous sites, although the contribution of selection acting directly on synonymous variants was extremely small. Finally, we found that the impact of mutational bias was rather negligible.
Collapse
Affiliation(s)
- Agnieszka Marek
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
28
|
Mazzolini A, Grilli J, De Lazzari E, Osella M, Lagomarsino MC, Gherardi M. Zipf and Heaps laws from dependency structures in component systems. Phys Rev E 2018; 98:012315. [PMID: 30110773 DOI: 10.1103/physreve.98.012315] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Indexed: 06/08/2023]
Abstract
Complex natural and technological systems can be considered, on a coarse-grained level, as assemblies of elementary components: for example, genomes as sets of genes or texts as sets of words. On one hand, the joint occurrence of components emerges from architectural and specific constraints in such systems. On the other hand, general regularities may unify different systems, such as the broadly studied Zipf and Heaps laws, respectively concerning the distribution of component frequencies and their number as a function of system size. Dependency structures (i.e., directed networks encoding the dependency relations between the components in a system) were proposed recently as a possible organizing principles underlying some of the regularities observed. However, the consequences of this assumption were explored only in binary component systems, where solely the presence or absence of components is considered, and multiple copies of the same component are not allowed. Here we consider a simple model that generates, from a given ensemble of dependency structures, a statistical ensemble of sets of components, allowing for components to appear with any multiplicity. Our model is a minimal extension that is memoryless and therefore accessible to analytical calculations. A mean-field analytical approach (analogous to the "Zipfian ensemble" in the linguistics literature) captures the relevant laws describing the component statistics as we show by comparison with numerical computations. In particular, we recover a power-law Zipf rank plot, with a set of core components, and a Heaps law displaying three consecutive regimes (linear, sublinear, and saturating) that we characterize quantitatively.
Collapse
Affiliation(s)
- Andrea Mazzolini
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Jacopo Grilli
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Eleonora De Lazzari
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
| | - Matteo Osella
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Marco Cosentino Lagomarsino
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- CNRS, UMR 7238, Paris, France
- IFOM, Milan, Italy
| | - Marco Gherardi
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
29
|
Aguirre J, Catalán P, Cuesta JA, Manrubia S. On the networked architecture of genotype spaces and its critical effects on molecular evolution. Open Biol 2018; 8:180069. [PMID: 29973397 PMCID: PMC6070719 DOI: 10.1098/rsob.180069] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 06/12/2018] [Indexed: 12/26/2022] Open
Abstract
Evolutionary dynamics is often viewed as a subtle process of change accumulation that causes a divergence among organisms and their genomes. However, this interpretation is an inheritance of a gradualistic view that has been challenged at the macroevolutionary, ecological and molecular level. Actually, when the complex architecture of genotype spaces is taken into account, the evolutionary dynamics of molecular populations becomes intrinsically non-uniform, sharing deep qualitative and quantitative similarities with slowly driven physical systems: nonlinear responses analogous to critical transitions, sudden state changes or hysteresis, among others. Furthermore, the phenotypic plasticity inherent to genotypes transforms classical fitness landscapes into multiscapes where adaptation in response to an environmental change may be very fast. The quantitative nature of adaptive molecular processes is deeply dependent on a network-of-networks multilayered structure of the map from genotype to function that we begin to unveil.
Collapse
Affiliation(s)
- Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Programa de Biología de Sistemas, Centro Nacional de Biotecnología (CSIC), Madrid, Spain
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
- Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain
- UC3M-BS Institute of Financial Big Data (IFiBiD), Universidad Carlos III de Madrid, Getafe, Madrid, Spain
| | - Susanna Manrubia
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain
- Programa de Biología de Sistemas, Centro Nacional de Biotecnología (CSIC), Madrid, Spain
| |
Collapse
|
30
|
Škrlj B, Kunej T, Konc J. Insights from Ion Binding Site Network Analysis into Evolution and Functions of Proteins. Mol Inform 2018; 37:e1700144. [PMID: 29418080 DOI: 10.1002/minf.201700144] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 02/01/2018] [Indexed: 01/05/2023]
Abstract
Many biological phenomena can be represented as complex networks. Using a protein binding site comparison approach, we generated a network of ion binding sites on the scale of all known protein structures from the Protein Data Bank. We found that this ion binding site similarity network is scale-free, indicating a network in which a few ion binding site scaffolds are the network hubs, and these are connected to hundreds of nodes, whereas the vast majority of nodes have only a few neighbors. Enrichment and statistical analysis of the network components and communities yielded insights into underlying processes from the functional and the structural perspective. Largest components and communities were observed to be closely related to basic metabolic processes and some of the most common structural folds, which, from the evolutionary point of view, indicates that they may be the oldest ones. Further, we derived the first comprehensive map of ion interchangeability, based on binding site similarity. Several highly interchangeable protein-ion binding site pairs emerged (e.g., Ca2+ and Mg2+ ), as well as structurally distinct ones. The constructed network of ion binding site similarities will aid in understanding the general principles of protein-ion binding sites structure, function and evolution. We demonstrate potential uses of the network on proteins involved in cancer development and immune response, where individual ions play prominent roles in disease development.
Collapse
Affiliation(s)
- Blaž Škrlj
- Department of molecular modeling, National Institute of Chemistry, Hajdrihova 19, Ljubljana, Slovenia.,Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Slovenia
| | - Janez Konc
- Department of molecular modeling, National Institute of Chemistry, Hajdrihova 19, Ljubljana, Slovenia
| |
Collapse
|
31
|
Nikitin D, Penzar D, Garazha A, Sorokin M, Tkachev V, Borisov N, Poltorak A, Prassolov V, Buzdin AA. Profiling of Human Molecular Pathways Affected by Retrotransposons at the Level of Regulation by Transcription Factor Proteins. Front Immunol 2018; 9:30. [PMID: 29441061 PMCID: PMC5797644 DOI: 10.3389/fimmu.2018.00030] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 01/04/2018] [Indexed: 12/22/2022] Open
Abstract
Endogenous retroviruses and retrotransposons also termed retroelements (REs) are mobile genetic elements that were active until recently in human genome evolution. REs regulate gene expression by actively reshaping chromatin structure or by directly providing transcription factor binding sites (TFBSs). We aimed to identify molecular processes most deeply impacted by the REs in human cells at the level of TFBS regulation. By using ENCODE data, we identified ~2 million TFBS overlapping with putatively regulation-competent human REs located in 5-kb gene promoter neighborhood (~17% of all TFBS in promoter neighborhoods; ~9% of all RE-linked TFBS). Most of REs hosting TFBS were highly diverged repeats, and for the evolutionary young (0–8% diverged) elements we identified only ~7% of all RE-linked TFBS. The gene-specific distributions of RE-linked TFBS generally correlated with the distributions for all TFBS. However, several groups of molecular processes were highly enriched in the RE-linked TFBS regulation. They were strongly connected with the immunity and response to pathogens, with the negative regulation of gene transcription, ubiquitination, and protein degradation, extracellular matrix organization, regulation of STAT signaling, fatty acids metabolism, regulation of GTPase activity, protein targeting to Golgi, regulation of cell division and differentiation, development and functioning of perception organs and reproductive system. By contrast, the processes most weakly affected by the REs were linked with the conservative aspects of embryo development. We also identified differences in the regulation features by the younger and older fractions of the REs. The regulation by the older fraction of the REs was linked mainly with the immunity, cell adhesion, cAMP, IGF1R, Notch, Wnt, and integrin signaling, neuronal development, chondroitin sulfate and heparin metabolism, and endocytosis. The younger REs regulate other aspects of immunity, cell cycle progression and apoptosis, PDGF, TGF beta, EGFR, and p38 signaling, transcriptional repression, structure of nuclear lumen, catabolism of phospholipids, and heterocyclic molecules, insulin and AMPK signaling, retrograde Golgi-ER transport, and estrogen signaling. The immunity-linked pathways were highly represented in both categories, but their functional roles were different and did not overlap. Our results point to the most quickly evolving molecular pathways in the recent and ancient evolution of human genome.
Collapse
Affiliation(s)
- Daniil Nikitin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.,D. Rogachev Federal Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia
| | - Dmitry Penzar
- The Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Andrew Garazha
- D. Rogachev Federal Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia.,OmicsWay Corp., Walnut, CA, United States
| | - Maxim Sorokin
- OmicsWay Corp., Walnut, CA, United States.,National Research Centre Kurchatov Institute, Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, Moscow, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Nicolas Borisov
- OmicsWay Corp., Walnut, CA, United States.,National Research Centre Kurchatov Institute, Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, Moscow, Russia
| | - Alexander Poltorak
- Program in Immunology, Sackler Graduate School, Tufts University, Boston, MA, United States
| | - Vladimir Prassolov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Anton A Buzdin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.,D. Rogachev Federal Research Center of Pediatric Hematology, Oncology and Immunology, Moscow, Russia.,OmicsWay Corp., Walnut, CA, United States.,National Research Centre Kurchatov Institute, Centre for Convergence of Nano-, Bio-, Information and Cognitive Sciences and Technologies, Moscow, Russia
| |
Collapse
|
32
|
De Lazzari E, Grilli J, Maslov S, Cosentino Lagomarsino M. Family-specific scaling laws in bacterial genomes. Nucleic Acids Res 2017; 45:7615-7622. [PMID: 28605556 PMCID: PMC5737699 DOI: 10.1093/nar/gkx510] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 05/30/2017] [Indexed: 01/21/2023] Open
Abstract
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes.
Collapse
Affiliation(s)
- Eleonora De Lazzari
- Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 4 Place Jussieu, Paris 75005, France
| | - Jacopo Grilli
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th st 60637 Chicago, IL, USA
| | - Sergei Maslov
- Department of Bioengineering, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- To whom correspondence should be addressed. Tel: +33 144277341; . Correspondence may also be addressed to Sergei Maslov. Tel: +1 217 265 5705;
| | - Marco Cosentino Lagomarsino
- Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 4 Place Jussieu, Paris 75005, France
- CNRS, UMR 7238, Paris, France
- FIRC Institute of Molecular Oncology (IFOM), 20139 Milan, Italy
- To whom correspondence should be addressed. Tel: +33 144277341; . Correspondence may also be addressed to Sergei Maslov. Tel: +1 217 265 5705;
| |
Collapse
|
33
|
Mans BJ, Featherston J, de Castro MH, Pienaar R. Gene Duplication and Protein Evolution in Tick-Host Interactions. Front Cell Infect Microbiol 2017; 7:413. [PMID: 28993800 PMCID: PMC5622192 DOI: 10.3389/fcimb.2017.00413] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 09/06/2017] [Indexed: 01/01/2023] Open
Abstract
Ticks modulate their hosts' defense responses by secreting a biopharmacopiea of hundreds to thousands of proteins and bioactive chemicals into the feeding site (tick-host interface). These molecules and their functions evolved over millions of years as ticks adapted to blood-feeding, tick lineages diverged, and host-shifts occurred. The evolution of new proteins with new functions is mainly dependent on gene duplication events. Central questions around this are the rates of gene duplication, when they occurred and how new functions evolve after gene duplication. The current review investigates these questions in the light of tick biology and considers the possibilities of ancient genome duplication, lineage specific expansion events, and the role that positive selection played in the evolution of tick protein function. It contrasts current views in tick biology regarding adaptive evolution with the more general view that neutral evolution may account for the majority of biological innovations observed in ticks.
Collapse
Affiliation(s)
- Ben J Mans
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa.,Department of Veterinary Tropical Diseases, University of PretoriaPretoria, South Africa.,Department of Life and Consumer Sciences, University of South AfricaPretoria, South Africa
| | - Jonathan Featherston
- Agricultural Research Council-The Biotechnology PlatformOnderstepoort, South Africa
| | - Minique H de Castro
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa.,Department of Life and Consumer Sciences, University of South AfricaPretoria, South Africa.,Agricultural Research Council-The Biotechnology PlatformOnderstepoort, South Africa
| | - Ronel Pienaar
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa
| |
Collapse
|
34
|
Lee J, Konc J, Janežič D, Brooks BR. Global organization of a binding site network gives insight into evolution and structure-function relationships of proteins. Sci Rep 2017; 7:11652. [PMID: 28912495 PMCID: PMC5599562 DOI: 10.1038/s41598-017-10412-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 08/07/2017] [Indexed: 01/06/2023] Open
Abstract
The global organization of protein binding sites is analyzed by constructing a weighted network of binding sites based on their structural similarities and detecting communities of structurally similar binding sites based on the minimum description length principle. The analysis reveals that there are two central binding site communities that play the roles of the network hubs of smaller peripheral communities. The sizes of communities follow a power-law distribution, which indicates that the binding sites included in larger communities may be older and have been evolutionary structural scaffolds of more recent ones. Structurally similar binding sites in the same community bind to diverse ligands promiscuously and they are also embedded in diverse domain structures. Understanding the general principles of binding site interplay will pave the way for improved drug design and protein design.
Collapse
Affiliation(s)
- Juyong Lee
- Department of Chemistry, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Republic of Korea. .,Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, 20892, United States.
| | - Janez Konc
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000, Koper, Slovenia.,National Institute of Chemistry, Hajdrihova 19, SI-1000, Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000, Koper, Slovenia
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, 20892, United States
| |
Collapse
|
35
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
36
|
Takemoto K, Imoto M. Exosomes in mammals with greater habitat variability contain more proteins and RNAs. ROYAL SOCIETY OPEN SCIENCE 2017; 4:170162. [PMID: 28484642 PMCID: PMC5414279 DOI: 10.1098/rsos.170162] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 03/22/2017] [Indexed: 05/08/2023]
Abstract
Factors determining habitat variability are poorly understood despite possible explanations based on genome and physiology. This is because previous studies only focused on primary measures such as genome size and body size. In this study, we hypothesize that specific gene functions determine habitat variability in order to explore new factors beyond primary measures. We comprehensively evaluate the relationship between gene functions and the climate envelope while statistically controlling for potentially confounding effects by using data on the habitat range, genome, body size and metabolism of various mammals. Our analyses show that the number of proteins and RNAs contained in exosomes is predominantly associated with the climate envelope. This finding indicates the importance of exosomes to habitat range expansion of mammals and provides a new hypothesis for the relationship between the genome and habitat variability.
Collapse
|
37
|
Limiting fitness distributions in evolutionary dynamics. J Theor Biol 2017; 416:68-80. [PMID: 28069447 DOI: 10.1016/j.jtbi.2017.01.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 12/12/2016] [Accepted: 01/04/2017] [Indexed: 11/24/2022]
Abstract
Natural selection works on variation in fitness, but how should we measure "variation" to predict the rate of future evolution? Fisher's fundamental theorem of natural selection provides the short-run answer: the instantaneous rate of growth of a population's mean fitness is its variance in fitness. This identity captures an important feature of the evolutionary process, but, because it does not specify how the variance itself evolves in time, it cannot be used to predict evolutionary dynamics in the long run. In this paper we reconsider the problem of computing evolutionary trajectories from limited statistical information. We identify the feature of fitness distributions which controls their late-time evolution: their (suitably defined) tail indices. We show that the location, scale and shape of the fitness distribution can be predicted far into the future from the measurement of this tail index at some initial time. Unlike the "fitness waves" studied in the literature, this pattern encompasses both positive and negative selection and is not restricted to rapidly adapting populations. Our results are well supported by numerical simulations, both from the Wright-Fisher model and from a less structured genetic algorithm.
Collapse
|
38
|
Smith RJ. Explanations for adaptations, just-so stories, and limitations on evidence in evolutionary biology. Evol Anthropol 2016; 25:276-287. [DOI: 10.1002/evan.21495] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Indexed: 02/01/2023]
|
39
|
Li W, Fontanelli O, Miramontes P. Size distribution of function-based human gene sets and the split-merge model. ROYAL SOCIETY OPEN SCIENCE 2016; 3:160275. [PMID: 27853602 PMCID: PMC5108952 DOI: 10.1098/rsos.160275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 07/01/2016] [Indexed: 06/06/2023]
Abstract
The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Oscar Fontanelli
- Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, México 04510 DF, México
| | - Pedro Miramontes
- Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, México 04510 DF, México
- Bioinformatics Group and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstrasse 16–18, 04107 Leipzig, Germany
| |
Collapse
|
40
|
Gates DJ, Strickler SR, Mueller LA, Olson BJSC, Smith SD. Diversification of R2R3-MYB Transcription Factors in the Tomato Family Solanaceae. J Mol Evol 2016; 83:26-37. [PMID: 27364496 DOI: 10.1007/s00239-016-9750-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 06/15/2016] [Indexed: 11/26/2022]
Abstract
MYB transcription factors play an important role in regulating key plant developmental processes involving defense, cell shape, pigmentation, and root formation. Within this gene family, sequences containing an R2R3 MYB domain are the most abundant type and exhibit a wide diversity of functions. In this study, we identify 559 R2R3 MYB genes using whole genome data from four species of Solanaceae and reconstruct their evolutionary relationships. We compare the Solanaceae R2R3 MYBs to the well-characterized Arabidopsis thaliana sequences to estimate functional diversity and to identify gains and losses of MYB clades in the Solanaceae. We identify numerous R2R3 MYBs that do not appear closely related to Arabidopsis MYBs, and thus may represent clades of genes that have been lost along the Arabidopsis lineage or gained after the divergence of Rosid and Asterid lineages. Despite differences in the distribution of R2R3 MYBs across functional subgroups and species, the overall size of the R2R3 subfamily has changed relatively little over the roughly 50 million-year history of Solanaceae. We added our information regarding R2R3 MYBs in Solanaceae to other data and performed a meta-analysis to trace the evolution of subfamily size across land plants. The results reveal many shifts in the number of R2R3 genes, including a 54 % increase along the angiosperm stem lineage. The variation in R2R3 subfamily size across land plants is weakly positively correlated with genome size and strongly positively correlated with total number of genes. The retention of such a large number of R2R3 copies over long evolutionary time periods suggests that they have acquired new functions and been maintained by selection. Discovering the nature of this functional diversity will require integrating forward and reverse genetic approaches on an -omics scale.
Collapse
Affiliation(s)
- Daniel J Gates
- School of Biological Sciences, University of Nebraska, Lincoln, 68588, USA.
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, 80309, USA.
| | | | - Lukas A Mueller
- Boyce Thompson Institute for Plant Research, Ithaca, NY, 14853, USA
| | - Bradley J S C Olson
- Division of Molecular, Cellular and Developmental Biology, Kansas State University, Manhattan,, KS, 66506, USA
| | - Stacey D Smith
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, 80309, USA
| |
Collapse
|
41
|
Rosandić M, Vlahović I, Glunčić M, Paar V. Trinucleotide's quadruplet symmetries and natural symmetry law of DNA creation ensuing Chargaff's second parity rule. J Biomol Struct Dyn 2016; 34:1383-94. [PMID: 26524490 DOI: 10.1080/07391102.2015.1080628] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
For almost 50 years the conclusive explanation of Chargaff's second parity rule (CSPR), the equality of frequencies of nucleotides A=T and C=G or the equality of direct and reverse complement trinucleotides in the same DNA strand, has not been determined yet. Here, we relate CSPR to the interstrand mirror symmetry in 20 symbolic quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. The symmetries of Q-box corresponding to quadruplets can be obtained as a consequence of Watson-Crick base pairing and CSPR together. Alternatively, assuming Natural symmetry law for DNA creation that each trinucleotide in one strand of DNA must simultaneously appear also in the opposite strand automatically leads to Q-box direct-reverse mirror symmetry which in conjunction with Watson-Crick base pairing generates CSPR. We demonstrate quadruplet's symmetries in chromosomes of wide range of organisms, from Escherichia coli to Neanderthal and human genomes, introducing novel quadruplet-frequency histograms and 3D-diagrams with combined interstrand frequencies. These "landscapes" are mutually similar in all mammals, including extinct Neanderthals, and somewhat different in most of older species. In human chromosomes 1-12, and X, Y the "landscapes" are almost identical and slightly different in the remaining smaller and telocentric chromosomes. Quadruplet frequencies could provide a new robust tool for characterization and classification of genomes and their evolutionary trajectories.
Collapse
Affiliation(s)
- Marija Rosandić
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia
| | - Ines Vlahović
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Matko Glunčić
- b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| | - Vladimir Paar
- a Croatian Academy of Sciences and Arts, HAZU, Bioinformatics and Biological Physics , Zrinski trg 11, 10000 Zagreb , Croatia.,b Faculty of Science , University of Zagreb , Bijenicka 32, 10000 Zagreb , Croatia
| |
Collapse
|
42
|
Kemen AC, Agler MT, Kemen E. Host-microbe and microbe-microbe interactions in the evolution of obligate plant parasitism. THE NEW PHYTOLOGIST 2015; 206:1207-28. [PMID: 25622918 DOI: 10.1111/nph.13284] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Accepted: 12/12/2014] [Indexed: 05/03/2023]
Abstract
Research on obligate biotrophic plant parasites, which reproduce only on living hosts, has revealed a broad diversity of filamentous microbes that have independently acquired complex morphological structures, such as haustoria. Genome studies have also demonstrated a concerted loss of genes for metabolism and lytic enzymes, and gain of diversity of genes coding for effectors involved in host defense suppression. So far, these traits converge in all known obligate biotrophic parasites, but unexpected genome plasticity remains. This plasticity is manifested as transposable element (TE)-driven increases in genome size, observed to be associated with the diversification of virulence genes under selection pressure. Genome expansion could result from the governing of the pathogen response to ecological selection pressures, such as host or nutrient availability, or to microbial interactions, such as competition, hyperparasitism and beneficial cooperations. Expansion is balanced by alternating sexual and asexual cycles, as well as selfing and outcrossing, which operate to control transposon activity in populations. In turn, the prevalence of these balancing mechanisms seems to be correlated with external biotic factors, suggesting a complex, interconnected evolutionary network in host-pathogen-microbe interactions. Therefore, the next phase of obligate biotrophic pathogen research will need to uncover how this network, including multitrophic interactions, shapes the evolution and diversity of pathogens.
Collapse
Affiliation(s)
- Ariane C Kemen
- Max Planck Research Group Fungal Biodiversity, Max Planck Institute for Plant Breeding Research, Carl-von-Linne Weg 10, 50829, Cologne, Germany
| | - Matthew T Agler
- Max Planck Research Group Fungal Biodiversity, Max Planck Institute for Plant Breeding Research, Carl-von-Linne Weg 10, 50829, Cologne, Germany
| | - Eric Kemen
- Max Planck Research Group Fungal Biodiversity, Max Planck Institute for Plant Breeding Research, Carl-von-Linne Weg 10, 50829, Cologne, Germany
| |
Collapse
|
43
|
Hatton L, Warr G. Protein structure and evolution: are they constrained globally by a principle derived from information theory? PLoS One 2015; 10:e0125663. [PMID: 25970335 PMCID: PMC4429977 DOI: 10.1371/journal.pone.0125663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 02/19/2015] [Indexed: 01/01/2023] Open
Abstract
That the physicochemical properties of amino acids constrain the structure, function and evolution of proteins is not in doubt. However, principles derived from information theory may also set bounds on the structure (and thus also the evolution) of proteins. Here we analyze the global properties of the full set of proteins in release 13-11 of the SwissProt database, showing by experimental test of predictions from information theory that their collective structure exhibits properties that are consistent with their being guided by a conservation principle. This principle (Conservation of Information) defines the global properties of systems composed of discrete components each of which is in turn assembled from discrete smaller pieces. In the system of proteins, each protein is a component, and each protein is assembled from amino acids. Central to this principle is the inter-relationship of the unique amino acid count and total length of a protein and its implications for both average protein length and occurrence of proteins with specific unique amino acid counts. The unique amino acid count is simply the number of distinct amino acids (including those that are post-translationally modified) that occur in a protein, and is independent of the number of times that the particular amino acid occurs in the sequence. Conservation of Information does not operate at the local level (it is independent of the physicochemical properties of the amino acids) where the influences of natural selection are manifest in the variety of protein structure and function that is well understood. Rather, this analysis implies that Conservation of Information would define the global bounds within which the whole system of proteins is constrained; thus it appears to be acting to constrain evolution at a level different from natural selection, a conclusion that appears counter-intuitive but is supported by the studies described herein.
Collapse
Affiliation(s)
- Leslie Hatton
- Faculty of Science, Engineering and Computing, Kingston University, London, UK
- * E-mail:
| | - Gregory Warr
- Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
44
|
Takemoto K, Kawakami Y. The proportion of genes in a functional category is linked to mass-specific metabolic rate and lifespan. Sci Rep 2015; 5:10008. [PMID: 25943793 PMCID: PMC4421859 DOI: 10.1038/srep10008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 03/26/2015] [Indexed: 11/08/2022] Open
Abstract
Metabolic rate and lifespan are important biological parameters that are studied in a wide range of research fields. They are known to correlate with body mass, but their association with gene (protein) functions is poorly understood. In this study, we collected data on the metabolic rate and lifespan of various organisms and investigated the relationship of these parameters with their genomes. We showed that the proportion of genes in a functional category, but not genome size, was correlated with mass-specific metabolic rate and maximal lifespan. In particular, the proportion of genes in oxic reactions (which occur in the presence of oxygen) was significantly associated with these two biological parameters. Additionally, we found that temperature, taxonomy, and mode-of-life traits had little effect on the observed associations. Our findings emphasize the importance of considering the biological functions of genes when investigating the relationships between genome, metabolic rate, and lifespan. Moreover, this provides further insights into these relationships, and may be useful for estimating metabolic rate and lifespan in individuals and the ecosystem using a combination of body mass measurements and genomic data.
Collapse
Affiliation(s)
- Kazuhiro Takemoto
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| | - Yuko Kawakami
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
45
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
46
|
Slater N, Louzoun Y, Gragert L, Maiers M, Chatterjee A, Albrecht M. Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program. PLoS Comput Biol 2015; 11:e1004204. [PMID: 25901749 PMCID: PMC4406525 DOI: 10.1371/journal.pcbi.1004204] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 02/19/2015] [Indexed: 01/29/2023] Open
Abstract
Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics. The distribution of haplotypes and species tend to be heavy tailed. The heavy tail is expected from theoretical considerations and is observed in most populations. Accurate measures of diversity are difficult to achieve given that a limited number of common haplotypes represent the majority of the population, whereas the major contributor to haplotype diversity comes from unique haplotypes that are “rare” and present in only a fraction of the population. A major issue for unrelated HSCT donor registries is estimating population coverage with respect to servicing the public need. We here use a power-law methodology that accommodates heavy-tails to estimate both the population coverage by ethnicity in the US and the genetic diversity of alleles and haplotypes. For the European American population, which has the deepest sampling amongst ethnicities, we show that registry population coverage is better than 99%, but the diversity of this sample only represents 40% of the unique haplotypes expected to be found in the population. Population coverage for other ethnicities was poorer and ranged down to 92% as was the case for Native Americans that had the worst coverage. We further show that the formalism developed here produces better estimates of the population properties than existing methods.
Collapse
Affiliation(s)
- Noa Slater
- Gonda Brain Research Center, Bar-Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Gonda Brain Research Center, Bar-Ilan University, Ramat Gan, Israel
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Loren Gragert
- National Marrow Donor Program, Minneapolis, Minnesota, United States of America
| | - Martin Maiers
- National Marrow Donor Program, Minneapolis, Minnesota, United States of America
| | - Ansu Chatterjee
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Mark Albrecht
- National Marrow Donor Program, Minneapolis, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
47
|
Li X, Scanlon MJ, Yu J. Evolutionary patterns of DNA base composition and correlation to polymorphisms in DNA repair systems. Nucleic Acids Res 2015; 43:3614-25. [PMID: 25765652 PMCID: PMC4402523 DOI: 10.1093/nar/gkv197] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 02/24/2015] [Indexed: 11/15/2022] Open
Abstract
DNA base composition is a fundamental genome feature. However, the evolutionary pattern of base composition and its potential causes have not been well understood. Here, we report findings from comparative analysis of base composition at the whole-genome level across 2210 species, the polymorphic-site level across eight population comparison sets, and the mutation-site level in 12 mutation-tracking experiments. We first demonstrate that base composition follows the individual-strand base equality rule at the genome, chromosome and polymorphic-site levels. More intriguingly, clear separation of base-composition values calculated across polymorphic sites was consistently observed between basal and derived groups, suggesting common underlying mechanisms. Individuals in the derived groups show an A&T-increase/G&C-decrease pattern compared with the basal groups. Spontaneous and induced mutation experiments indicated these patterns of base composition change can emerge across mutation sites. With base-composition across polymorphic sites as a genome phenotype, genome scans with human 1000 Genomes and HapMap3 data identified a set of significant genomic regions enriched with Gene Ontology terms for DNA repair. For three DNA repair genes (BRIP1, PMS2P3 and TTDN), ENCODE data provided evidence for interaction between genomic regions containing these genes and regions containing the significant SNPs. Our findings provide insights into the mechanisms of genome evolution.
Collapse
Affiliation(s)
- Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Michael J Scanlon
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
48
|
A philosophical evaluation of adaptationism as a heuristic strategy. Acta Biotheor 2014; 62:479-98. [PMID: 24992988 DOI: 10.1007/s10441-014-9232-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 06/25/2014] [Indexed: 12/27/2022]
Abstract
Adaptationism has prompted many a debate in philosophy of biology but the focus is usually on empirical and explanatory issues rather than methodological adaptationism (MA). Likewise, the context of evolutionary biology has provided the grounding for most discussions of the heuristic role of adaptationism. This paper extends the debate by drawing on case studies from physiology and systems biology to discuss the productive and problematic aspects of adaptationism in functional as well as evolutionary studies at different levels of biological organization. Gould and Lewontin's Spandrels-paper famously criticized adaptationist methodology for implying a risk of generating 'blind spots' with respect to non-selective effects on evolution. Some have claimed that this bias can be accommodated through the testing of evolutionary hypotheses. Although this is an important aspect of overcoming the pitfalls of adaptationism, I argue that the issue of methodological biases is broader than the question of testability. I demonstrate the productivity of adaptationist heuristics but also discuss the deeper problematic aspects associated with the imperialistic tendencies of the strong account of MA.
Collapse
|
49
|
Abstract
The widespread exchange of genes between bacteria must have consequences on the global architecture of their genomes, which are being found in the abundant genomic data available today. Most of the expansion of bacterial protein families can be attributed to transfer events, which are positively biased for smaller evolutionary distances between genomes, and more frequent for classes that are larger, when summed over all known bacteria. Moreover, “innovation” events where horizontal transfers carry exogenous evolutionary families appear to be less frequent for larger genomes. This dynamic expansion of evolutionary families is interconnected with the acquisition of new biological functions and thus with the size and distribution of the genes’ functional categories found on a genome. This commentary presents our recent contributions to this line of work and possible future directions.
Collapse
Affiliation(s)
- Luigi Grassi
- Dipartimento di Fisica, Sapienza Università di Roma; Rome, Italy
| | | | | |
Collapse
|
50
|
Guo Z, Jiang W, Lages N, Borcherds W, Wang D. Relationship between gene duplicability and diversifiability in the topology of biochemical networks. BMC Genomics 2014; 15:577. [PMID: 25005725 PMCID: PMC4129122 DOI: 10.1186/1471-2164-15-577] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/26/2014] [Indexed: 01/21/2023] Open
Abstract
Background Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes. Results Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family. Conclusion Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
Collapse
Affiliation(s)
| | | | | | | | - Degeng Wang
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA.
| |
Collapse
|