1
|
Mughal F, Caetano-Anollés G. Evolution of intrinsic disorder in the structural domains of viral and cellular proteomes. Sci Rep 2025; 15:2878. [PMID: 39843714 PMCID: PMC11754631 DOI: 10.1038/s41598-025-86045-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/07/2025] [Indexed: 01/24/2025] Open
Abstract
Intrinsically disordered regions are flexible regions that complement the typical structured regions of proteins. Little is known however about their evolution. Here we leverage a comparative and evolutionary genomics approach to analyze intrinsic disorder in the structural domains of thousands of proteomes. Our analysis revealed that viral and cellular proteomes employ similar strategies to increase disorder but achieve different goals. Viral proteomes evolve disorder for economy of genomic material and multifunctionality. On the other hand, cellular proteomes evolve disorder to advance functionality with increasing genomic complexity. Remarkably, phylogenomic analysis of intrinsic disorder showed that ancient domains were ordered and that disorder evolved as a benefit acquired later in evolution. Evolutionary chronologies of domains indexed with disorder levels and distributions across Archaea, Bacteria, Eukarya and viruses revealed six evolutionary phases, the oldest two harboring only ordered and moderate disorder domains. A biphasic spectrum of disorder versus proteome makeup captured the dichotomy in the evolutionary trajectories of viral and cellular ancestors, one following reductive evolution driven by viral spread of molecular wealth and the other following expansive evolutionary trends to advance functionality through massive domain-forming co-option of disordered loop regions.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Mughal F, Caetano-Anollés G. Evolution of Intrinsic Disorder in Protein Loops. Life (Basel) 2023; 13:2055. [PMID: 37895436 PMCID: PMC10608553 DOI: 10.3390/life13102055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023] Open
Abstract
Intrinsic disorder accounts for the flexibility of protein loops, molecular building blocks that are largely responsible for the processes and molecular functions of the living world. While loops likely represent early structural forms that served as intermediates in the emergence of protein structural domains, their origin and evolution remain poorly understood. Here, we conduct a phylogenomic survey of disorder in loop prototypes sourced from the ArchDB classification. Tracing prototypes associated with protein fold families along an evolutionary chronology revealed that ancient prototypes tended to be more disordered than their derived counterparts, with ordered prototypes developing later in evolution. This highlights the central evolutionary role of disorder and flexibility. While mean disorder increased with time, a minority of ordered prototypes exist that emerged early in evolutionary history, possibly driven by the need to preserve specific molecular functions. We also revealed the percolation of evolutionary constraints from higher to lower levels of organization. Percolation resulted in trade-offs between flexibility and rigidity that impacted prototype structure and geometry. Our findings provide a deep evolutionary view of the link between structure, disorder, flexibility, and function, as well as insights into the evolutionary role of intrinsic disorder in loops and their contribution to protein structure and function.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
3
|
Caetano-Anollés G, Claverie JM, Nasir A. A critical analysis of the current state of virus taxonomy. Front Microbiol 2023; 14:1240993. [PMID: 37601376 PMCID: PMC10435761 DOI: 10.3389/fmicb.2023.1240993] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 07/20/2023] [Indexed: 08/22/2023] Open
Abstract
Taxonomical classification has preceded evolutionary understanding. For that reason, taxonomy has become a battleground fueled by knowledge gaps, technical limitations, and a priorism. Here we assess the current state of the challenging field, focusing on fallacies that are common in viral classification. We emphasize that viruses are crucial contributors to the genomic and functional makeup of holobionts, organismal communities that behave as units of biological organization. Consequently, viruses cannot be considered taxonomic units because they challenge crucial concepts of organismality and individuality. Instead, they should be considered processes that integrate virions and their hosts into life cycles. Viruses harbor phylogenetic signatures of genetic transfer that compromise monophyly and the validity of deep taxonomic ranks. A focus on building phylogenetic networks using alignment-free methodologies and molecular structure can help mitigate the impasse, at least in part. Finally, structural phylogenomic analysis challenges the polyphyletic scenario of multiple viral origins adopted by virus taxonomy, defeating a polyphyletic origin and supporting instead an ancient cellular origin of viruses. We therefore, prompt abandoning deep ranks and urgently reevaluating the validity of taxonomic units and principles of virus classification.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and C.R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Jean-Michel Claverie
- Structural and Genomic Information Laboratory (UMR7256), Mediterranean Institute of Microbiology (FR3479), IM2B, IOM, Aix Marseille University, CNRS, Marseille, France
| | | |
Collapse
|
4
|
Abstract
Biomolecular communication demands that interactions between parts of a molecular system act as scaffolds for message transmission. It also requires an organized system of signs-a communicative agency-for creating and transmitting meaning. The emergence of agency, the capacity to act in a given context and generate end-directed behaviors, has baffled evolutionary biologists for centuries. Here, I explore its emergence with knowledge grounded in over two decades of evolutionary genomic and bioinformatic exploration. Biphasic processes of growth and diversification exist that generate hierarchy and modularity in biological systems at widely ranging time scales. Similarly, a biphasic process exists in communication that constructs a message before it can be transmitted for interpretation. Transmission dissipates matter-energy and information and involves computation. Agency emerges when molecular machinery generates hierarchical layers of vocabularies in an entangled communication network clustered around the universal Turing machine of the ribosome. Computations canalize biological systems to perform biological functions in a dissipative quest to structure long-lived occurrents. This occurs within the confines of a "triangle of persistence" that maximizes invariance with trade-offs between economy, flexibility, and robustness. Thus, learning from previous historical and circumstantial experiences unifies modules in a hierarchy that expands the agency of systems.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| |
Collapse
|
5
|
Caetano-Anollés G. The Compressed Vocabulary of Microbial Life. Front Microbiol 2021; 12:655990. [PMID: 34305827 PMCID: PMC8292947 DOI: 10.3389/fmicb.2021.655990] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Communication is an undisputed central activity of life that requires an evolving molecular language. It conveys meaning through messages and vocabularies. Here, I explore the existence of a growing vocabulary in the molecules and molecular functions of the microbial world. There are clear correspondences between the lexicon, syntax, semantics, and pragmatics of language organization and the module, structure, function, and fitness paradigms of molecular biology. These correspondences are constrained by universal laws and engineering principles. Macromolecular structure, for example, follows quantitative linguistic patterns arising from statistical laws that are likely universal, including the Zipf's law, a special case of the scale-free distribution, the Heaps' law describing sublinear growth typical of economies of scales, and the Menzerath-Altmann's law, which imposes size-dependent patterns of decreasing returns. Trade-off solutions between principles of economy, flexibility, and robustness define a "triangle of persistence" describing the impact of the environment on a biological system. The pragmatic landscape of the triangle interfaces with the syntax and semantics of molecular languages, which together with comparative and evolutionary genomic data can explain global patterns of diversification of cellular life. The vocabularies of proteins (proteomes) and functions (functionomes) revealed a significant universal lexical core supporting a universal common ancestor, an ancestral evolutionary link between Bacteria and Eukarya, and distinct reductive evolutionary strategies of language compression in Archaea and Bacteria. A "causal" word cloud strategy inspired by the dependency grammar paradigm used in catenae unfolded the evolution of lexical units associated with Gene Ontology terms at different levels of ontological abstraction. While Archaea holds the smallest, oldest, and most homogeneous vocabulary of all superkingdoms, Bacteria heterogeneously apportions a more complex vocabulary, and Eukarya pushes functional innovation through mechanisms of flexibility and robustness.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, and C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, United States
| |
Collapse
|
6
|
Nasir A, Mughal F, Caetano-Anollés G. The tree of life describes a tripartite cellular world. Bioessays 2021; 43:e2000343. [PMID: 33837594 DOI: 10.1002/bies.202000343] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/11/2021] [Accepted: 03/15/2021] [Indexed: 12/28/2022]
Abstract
The canonical view of a 3-domain (3D) tree of life was recently challenged by the discovery of Asgardarchaeota encoding eukaryote signature proteins (ESPs), which were treated as missing links of a 2-domain (2D) tree. Here we revisit the debate. We discuss methodological limitations of building trees with alignment-dependent approaches, which often fail to satisfactorily address the problem of ''gaps.'' In addition, most phylogenies are reconstructed unrooted, neglecting the power of direct rooting methods. Alignment-free methodologies lift most difficulties but require employing realistic evolutionary models. We argue that the discoveries of Asgards and ESPs, by themselves, do not rule out the 3D tree, which is strongly supported by comparative and evolutionary genomic analyses and vast genomic and biochemical superkingdom distinctions. Given uncertainties of retrodiction and interpretation difficulties, we conclude that the 3D view has not been falsified but instead has been strengthened by genomic analyses. In turn, the objections to the 2D model have not been lifted. The debate remains open. Also see the video abstract here: https://youtu.be/-6TBN0bubI8.
Collapse
Affiliation(s)
- Arshan Nasir
- Theoretical Biology and Biophysics (T-6), Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | - Fizza Mughal
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
7
|
Mughal F, Nasir A, Caetano-Anollés G. The origin and evolution of viruses inferred from fold family structure. Arch Virol 2020; 165:2177-2191. [PMID: 32748179 PMCID: PMC7398281 DOI: 10.1007/s00705-020-04724-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 05/30/2020] [Indexed: 12/16/2022]
Abstract
The canonical frameworks of viral evolution describe viruses as cellular predecessors, reduced forms of cells, or entities that escaped cellular control. The discovery of giant viruses has changed these standard paradigms. Their genetic, proteomic and structural complexities resemble those of cells, prompting a redefinition and reclassification of viruses. In a previous genome-wide analysis of the evolution of structural domains in proteomes, with domains defined at the fold superfamily level, we found the origins of viruses intertwined with those of ancient cells. Here, we extend these data-driven analyses to the study of fold families confirming the co-evolution of viruses and ancient cells and the genetic ability of viruses to foster molecular innovation. The results support our suggestion that viruses arose by genomic reduction from ancient cells and validate a co-evolutionary ‘symbiogenic’ model of viral origins.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
8
|
Demongeot J, Seligmann H. Comparisons between small ribosomal RNA and theoretical minimal RNA ring secondary structures confirm phylogenetic and structural accretion histories. Sci Rep 2020; 10:7693. [PMID: 32376895 PMCID: PMC7203183 DOI: 10.1038/s41598-020-64627-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 04/01/2020] [Indexed: 12/16/2022] Open
Abstract
Ribosomal RNAs are complex structures that presumably evolved by tRNA accretions. Statistical properties of tRNA secondary structures correlate with genetic code integration orders of their cognate amino acids. Ribosomal RNA secondary structures resemble those of tRNAs with recent cognates. Hence, rRNAs presumably evolved from ancestral tRNAs. Here, analyses compare secondary structure subcomponents of small ribosomal RNA subunits with secondary structures of theoretical minimal RNA rings, presumed proto-tRNAs. Two independent methods determined different accretion orders of rRNA structural subelements: (a) classical comparative homology and phylogenetic reconstruction, and (b) a structural hypothesis assuming an inverted onion ring growth where the three-dimensional ribosome's core is most ancient and peripheral elements most recent. Comparisons between (a) and (b) accretions orders with RNA ring secondary structure scales show that recent rRNA subelements are: 1. more like RNA rings with recent cognates, indicating ongoing coevolution between tRNA and rRNA secondary structures; 2. less similar to theoretical minimal RNA rings with ancient cognates. Our method fits (a) and (b) in all examined organisms, more with (a) than (b). Results stress the need to integrate independent methods. Theoretical minimal RNA rings are potential evolutionary references for any sequence-based evolutionary analyses, independent of the focal data from that study.
Collapse
Affiliation(s)
- Jacques Demongeot
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700, La Tronche, France.
| | - Hervé Seligmann
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700, La Tronche, France
- The National Natural History Collections, The Hebrew University of Jerusalem, 91404, Jerusalem, Israel
| |
Collapse
|
9
|
Bokhari RH, Amirjan N, Jeong H, Kim KM, Caetano-Anollés G, Nasir A. Bacterial Origin and Reductive Evolution of the CPR Group. Genome Biol Evol 2020; 12:103-121. [PMID: 32031619 PMCID: PMC7093835 DOI: 10.1093/gbe/evaa024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2020] [Indexed: 12/24/2022] Open
Abstract
The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms.
Collapse
Affiliation(s)
| | - Nooreen Amirjan
- Department of Biosciences, COMSATS University Islamabad, Pakistan
| | - Hyeonsoo Jeong
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana
| | - Arshan Nasir
- Department of Biosciences, COMSATS University Islamabad, Pakistan
- Theoretical Biology & Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico
| |
Collapse
|
10
|
Demongeot J, Seligmann H. Accretion history of large ribosomal subunits deduced from theoretical minimal RNA rings is congruent with histories derived from phylogenetic and structural methods. Gene 2020; 738:144436. [PMID: 32027954 DOI: 10.1016/j.gene.2020.144436] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/24/2020] [Accepted: 02/01/2020] [Indexed: 12/17/2022]
Abstract
Accretions of tRNAs presumably formed the large complex ribosomal RNA structures. Similarities of tRNA secondary structures with rRNA secondary structures increase with the integration order of their cognate amino acid in the genetic code, indicating tRNA evolution towards rRNA-like structures. Here analyses rank secondary structure subelements of three large ribosomal RNAs (Prokaryota: Archaea: Thermus thermophilus; Bacteria: Escherichia coli; Eukaryota: Saccharomyces cerevisiae) in relation to their similarities with secondary structures formed by presumed proto-tRNAs, represented by 25 theoretical minimal RNA rings. These ranks are compared to those derived from two independent methods (ranks provide a relative evolutionary age to the rRNA substructure), (a) cladistic phylogenetic analyses and (b) 3D-crystallography where core subelements are presumed ancient and peripheral ones recent. Comparisons of rRNA secondary structure subelements with RNA ring secondary structures show congruence between ranks deduced by this method and both (a) and (b) (more with (a) than (b)), especially for RNA rings with predicted ancient cognate amino acid. Reconstruction of accretion histories of large rRNAs will gain from adequately integrating information from independent methods. Theoretical minimal RNA rings, sequences deterministically designed in silico according to specific coding constraints, might produce adequate scales for prebiotic and early life molecular evolution.
Collapse
Affiliation(s)
- Jacques Demongeot
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700 La Tronche, France.
| | - Hervé Seligmann
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, F-38700 La Tronche, France; The National Natural History Collections, The Hebrew University of Jerusalem, 91404 Jerusalem, Israel.
| |
Collapse
|
11
|
Demongeot J, Seligmann H. More Pieces of Ancient than Recent Theoretical Minimal Proto-tRNA-Like RNA Rings in Genes Coding for tRNA Synthetases. J Mol Evol 2019; 87:152-174. [DOI: 10.1007/s00239-019-09892-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 03/22/2019] [Indexed: 12/19/2022]
|
12
|
Seligmann H. Giant viruses: spore‐like missing links betweenRickettsiaand mitochondria? Ann N Y Acad Sci 2019; 1447:69-79. [DOI: 10.1111/nyas.14022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 01/10/2019] [Accepted: 01/16/2019] [Indexed: 12/27/2022]
Affiliation(s)
- Hervé Seligmann
- The National Natural History Collectionsthe Hebrew University of Jerusalem Jerusalem Israel
| |
Collapse
|
13
|
Estrada-Peña A, Cabezas-Cruz A. Phyloproteomic and functional analyses do not support a split in the genus Borrelia (phylum Spirochaetes). BMC Evol Biol 2019; 19:54. [PMID: 30760200 PMCID: PMC6375133 DOI: 10.1186/s12862-019-1379-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/31/2019] [Indexed: 12/14/2022] Open
Abstract
Background The evolutionary history of a species is frequently derived from molecular sequences, and the resulting phylogenetic trees do not include explicit functional information. Here, we aimed to assess the functional relationships among bacteria in the Spirochaetes phylum, based on the biological processes of 42,489 proteins in reference proteomes of 34 Spirochaetes species. We tested the hypothesis that the species in the genus Borrelia might be sufficiently different to warrant splitting them into two separate genera. Results A detrended canonical analysis demonstrated that the presence/absence of biological processes among selected bacteria contained a strong phylogenetic signal, which did not separate species of Borrelia. We examined the ten biological processes in which most proteins were involved consistently. This analysis demonstrated that species in Borrelia were more similar to each other than to free-life species (Sediminispirochaeta, Spirochaeta, Sphaerochaeta) or to pathogenic species without vectors (Leptospira, Treponema, Brachyspira), which are highly divergent. A dendrogram based on the presence/absence of proteins in the reference proteomes demonstrated that distances between species of the same genus among free-life or pathogenic non-vector species were higher than the distances between the 19 species (27 strains) of Borrelia. A phyloproteomic network supported the close functional association between species of Borrelia. In the proteome of 27 strains of Borrelia, only a few proteins had evolved separately, in the relapsing fever and Lyme borreliosis groups. The most prominent Borrelia proteins and processes were a subset of those also found in free-living and non-vectored pathogenic species. In addition, the functional innovation (i.e., unique biological processes or proteins) of Borrelia was very low, compared to other genera of Spirochaetes. Conclusions We found only marginal functional differences among Borrelia species. Phyloproteomic networks that included all pairwise combinations between species, proteins, and processes were more effective than other methods for evaluating the evolutionary relationships among taxa. With the limitations of data availability, our results did not support a split of the arthropod-transmitted spirochaetes into the proposed genera, Borrelia and Borreliella. Electronic supplementary material The online version of this article (10.1186/s12862-019-1379-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Agustín Estrada-Peña
- Department of Animal Pathology, Faculty of Veterinary Medicine, Miguel Servet, 177, 50013, Zaragoza, Spain.
| | - Alejandro Cabezas-Cruz
- UMR BIPAR, INRA, ANSES, Ecole Nationale Vétérinaire d'Alfort, Université Paris-Est, 94700, Maisons-Alfort, France
| |
Collapse
|
14
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
15
|
Staley JT, Caetano-Anollés G. Archaea-First and the Co-Evolutionary Diversification of Domains of Life. Bioessays 2018; 40:e1800036. [PMID: 29944192 DOI: 10.1002/bies.201800036] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/12/2018] [Indexed: 12/13/2022]
Abstract
The origins and evolution of the Archaea, Bacteria, and Eukarya remain controversial. Phylogenomic-wide studies of molecular features that are evolutionarily conserved, such as protein structural domains, suggest Archaea is the first domain of life to diversify from a stem line of descent. This line embodies the last universal common ancestor of cellular life. Here, we propose that ancestors of Euryarchaeota co-evolved with those of Bacteria prior to the diversification of Eukarya. This co-evolutionary scenario is supported by comparative genomic and phylogenomic analyses of the distributions of fold families of domains in the proteomes of free-living organisms, which show horizontal gene recruitments and informational process homologies. It also benefits from the molecular study of cell physiologies responsible for membrane phospholipids, methanogenesis, methane oxidation, cell division, gas vesicles, and the cell wall. Our theory however challenges popular cell fusion and two-domain of life scenarios derived from sequence analysis, demanding phylogenetic reconciliation. Also see the video abstract here: https://youtu.be/9yVWn_Q9faY.
Collapse
Affiliation(s)
- James T Staley
- Department of Microbiology and Astrobiology Program, University of Washington, Seattle, WA, 98195, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, C. R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
16
|
Abstract
The popular textbook image of viruses as noxious and selfish genetic parasites greatly underestimates the beneficial contributions of viruses to the biosphere. Given the crucial dependency of viruses to reproduce in an intracellular environment, viruses that engage in excessive killing (lysis) can drive their cellular hosts to extinction and will not survive. The lytic mode of virus propagation must, therefore, be tempered and balanced by non-lytic modes of virus latency and symbiosis. Here, we review recent bioinformatics and metagenomic studies to argue that viral endogenization and domestication may be more frequent mechanisms of virus persistence than lysis. We use a triangle diagram to explain the three major virus persistence strategies that explain the global scope of virus-cell interactions including lysis, latency and virus-cell symbiosis. This paradigm can help identify novel directions in virology research where scientists could artificially gain control over switching lytic and beneficial viral lifestyles. Also see the Video Abstract: http://youtu.be/GwXWz4N8o8.
Collapse
Affiliation(s)
- Arshan Nasir
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan.,Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| |
Collapse
|
17
|
Koç I, Caetano-Anollés G. The natural history of molecular functions inferred from an extensive phylogenomic analysis of gene ontology data. PLoS One 2017; 12:e0176129. [PMID: 28467492 PMCID: PMC5414959 DOI: 10.1371/journal.pone.0176129] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
The origin and natural history of molecular functions hold the key to the emergence of cellular organization and modern biochemistry. Here we use a genomic census of Gene Ontology (GO) terms to reconstruct phylogenies at the three highest (1, 2 and 3) and the lowest (terminal) levels of the hierarchy of molecular functions, which reflect the broadest and the most specific GO definitions, respectively. These phylogenies define evolutionary timelines of functional innovation. We analyzed 249 free-living organisms comprising the three superkingdoms of life, Archaea, Bacteria, and Eukarya. Phylogenies indicate catalytic, binding and transport functions were the oldest, suggesting a 'metabolism-first' origin scenario for biochemistry. Metabolism made use of increasingly complicated organic chemistry. Primordial features of ancient molecular functions and functional recruitments were further distilled by studying the oldest child terms of the oldest level 1 GO definitions. Network analyses showed the existence of an hourglass pattern of enzyme recruitment in the molecular functions of the directed acyclic graph of molecular functions. Older high-level molecular functions were thoroughly recruited at younger lower levels, while very young high-level functions were used throughout the timeline. This pattern repeated in every one of the three mappings, which gave a criss-cross pattern. The timelines and their mappings were remarkable. They revealed the progressive evolutionary development of functional toolkits, starting with the early rise of metabolic activities, followed chronologically by the rise of macromolecular biosynthesis, the establishment of controlled interactions with the environment and self, adaptation to oxygen, and enzyme coordinated regulation, and ending with the rise of structural and cellular complexity. This historical account holds important clues for dissection of the emergence of biomcomplexity and life.
Collapse
Affiliation(s)
- Ibrahim Koç
- Molecular Biology and Genetics, Gebze Technical University, Kocaeli, Turkey
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| |
Collapse
|
18
|
Falda M, Lavezzo E, Fontana P, Bianco L, Berselli M, Formentin E, Toppo S. Eliciting the Functional Taxonomy from protein annotations and taxa. Sci Rep 2016; 6:31971. [PMID: 27534507 PMCID: PMC4989186 DOI: 10.1038/srep31971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 08/01/2016] [Indexed: 11/30/2022] Open
Abstract
The advances of omics technologies have triggered the production of an enormous volume of data coming from thousands of species. Meanwhile, joint international efforts like the Gene Ontology (GO) consortium have worked to provide functional information for a vast amount of proteins. With these data available, we have developed FunTaxIS, a tool that is the first attempt to infer functional taxonomy (i.e. how functions are distributed over taxa) combining functional and taxonomic information. FunTaxIS is able to define a taxon specific functional space by exploiting annotation frequencies in order to establish if a function can or cannot be used to annotate a certain species. The tool generates constraints between GO terms and taxa and then propagates these relations over the taxonomic tree and the GO graph. Since these constraints nearly cover the whole taxonomy, it is possible to obtain the mapping of a function over the taxonomy. FunTaxIS can be used to make functional comparative analyses among taxa, to detect improper associations between taxa and functions, and to discover how functional knowledge is either distributed or missing. A benchmark test set based on six different model species has been devised to get useful insights on the generated taxonomic rules.
Collapse
Affiliation(s)
- Marco Falda
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Paolo Fontana
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, 38010, Italy
| | - Luca Bianco
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, 38010, Italy
| | - Michele Berselli
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Elide Formentin
- Department of Biology, University of Padova, Padova, 35131, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| |
Collapse
|
19
|
Nasir A, Caetano-Anollés G. A phylogenomic data-driven exploration of viral origins and evolution. SCIENCE ADVANCES 2015; 1:e1500527. [PMID: 26601271 PMCID: PMC4643759 DOI: 10.1126/sciadv.1500527] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/30/2015] [Indexed: 05/05/2023]
Abstract
The origin of viruses remains mysterious because of their diverse and patchy molecular and functional makeup. Although numerous hypotheses have attempted to explain viral origins, none is backed by substantive data. We take full advantage of the wealth of available protein structural and functional data to explore the evolution of the proteomic makeup of thousands of cells and viruses. Despite the extremely reduced nature of viral proteomes, we established an ancient origin of the "viral supergroup" and the existence of widespread episodes of horizontal transfer of genetic information. Viruses harboring different replicon types and infecting distantly related hosts shared many metabolic and informational protein structural domains of ancient origin that were also widespread in cellular proteomes. Phylogenomic analysis uncovered a universal tree of life and revealed that modern viruses reduced from multiple ancient cells that harbored segmented RNA genomes and coexisted with the ancestors of modern cells. The model for the origin and evolution of viruses and cells is backed by strong genomic and structural evidence and can be reconciled with existing models of viral evolution if one considers viruses to have originated from ancient cells and not from modern counterparts.
Collapse
|
20
|
Caetano-Anollés G, Caetano-Anollés D. Computing the origin and evolution of the ribosome from its structure - Uncovering processes of macromolecular accretion benefiting synthetic biology. Comput Struct Biotechnol J 2015; 13:427-47. [PMID: 27096056 PMCID: PMC4823900 DOI: 10.1016/j.csbj.2015.07.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 07/16/2015] [Accepted: 07/19/2015] [Indexed: 12/11/2022] Open
Abstract
Accretion occurs pervasively in nature at widely different timeframes. The process also manifests in the evolution of macromolecules. Here we review recent computational and structural biology studies of evolutionary accretion that make use of the ideographic (historical, retrodictive) and nomothetic (universal, predictive) scientific frameworks. Computational studies uncover explicit timelines of accretion of structural parts in molecular repertoires and molecules. Phylogenetic trees of protein structural domains and proteomes and their molecular functions were built from a genomic census of millions of encoded proteins and associated terminal Gene Ontology terms. Trees reveal a ‘metabolic-first’ origin of proteins, the late development of translation, and a patchwork distribution of proteins in biological networks mediated by molecular recruitment. Similarly, the natural history of ancient RNA molecules inferred from trees of molecular substructures built from a census of molecular features shows patchwork-like accretion patterns. Ideographic analyses of ribosomal history uncover the early appearance of structures supporting mRNA decoding and tRNA translocation, the coevolution of ribosomal proteins and RNA, and a first evolutionary transition that brings ribosomal subunits together into a processive protein biosynthetic complex. Nomothetic structural biology studies of tertiary interactions and ancient insertions in rRNA complement these findings, once concentric layering assumptions are removed. Patterns of coaxial helical stacking reveal a frustrated dynamics of outward and inward ribosomal growth possibly mediated by structural grafting. The early rise of the ribosomal ‘turnstile’ suggests an evolutionary transition in natural biological computation. Results make explicit the need to understand processes of molecular growth and information transfer of macromolecules.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, 1101W. Peabody Drive, Urbana, IL 61801, USA; C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| | - Derek Caetano-Anollés
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
21
|
McFall-Ngai MJ. Giving microbes their due – animal life in a microbially dominant world. J Exp Biol 2015; 218:1968-73. [DOI: 10.1242/jeb.115121] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
ABSTRACT
The new technology of next-generation sequencing is changing our perceptions of the form and function of the biological world. The emerging data reveal an array of microbes that is more vast and more central to all biological processes than previously appreciated. Further, evidence is accumulating that the alliances of microbes with one another and with constituents of the macrobiological world are critical for the health of the biosphere. This contribution summarizes the basic arguments as to why, when considering the biochemical adaptations of animals, we should integrate the roles of their microbial partners.
Collapse
|
22
|
Nasir A, Sun FJ, Kim KM, Caetano-Anollés G. Untangling the origin of viruses and their impact on cellular evolution. Ann N Y Acad Sci 2015; 1341:61-74. [PMID: 25758413 DOI: 10.1111/nyas.12735] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The origin and evolution of viruses remain mysterious. Here, we focus on the distribution of viral replicons in host organisms, their morphological features, and the evolution of highly conserved protein and nucleic acid structures. The apparent inability of RNA viral replicons to infect contemporary akaryotic species suggests an early origin of RNA viruses and their subsequent loss in akaryotes. A census of virion morphotypes reveals that advanced forms were unique to viruses infecting a specific supergroup, while simpler forms were observed in viruses infecting organisms in all forms of cellular life. Results hint toward an ancient origin of viruses from an ancestral virus harboring either filamentous or spherical virions. Finally, phylogenetic trees built from protein domain and tRNA structures in thousands of genomes suggest that viruses evolved via reductive evolution from ancient cells. The analysis presents a complete account of the evolutionary history of cells and viruses and identifies viruses as crucial agents influencing cellular evolution.
Collapse
Affiliation(s)
- Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Illinois Informatics Institute, University of Illinois, Urbana, Illinois
| | | | | | | |
Collapse
|
23
|
A phylogenomic census of molecular functions identifies modern thermophilic archaea as the most ancient form of cellular life. ARCHAEA-AN INTERNATIONAL MICROBIOLOGICAL JOURNAL 2014; 2014:706468. [PMID: 25249790 PMCID: PMC4164138 DOI: 10.1155/2014/706468] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Revised: 11/20/2013] [Accepted: 01/17/2014] [Indexed: 12/30/2022]
Abstract
The origins of diversified life remain mysterious despite considerable efforts devoted to untangling the roots of the universal tree of life. Here we reconstructed phylogenies that described the evolution of molecular functions and the evolution of species directly from a genomic census of gene ontology (GO) definitions. We sampled 249 free-living genomes spanning organisms in the three superkingdoms of life, Archaea, Bacteria, and Eukarya, and used the abundance of GO terms as molecular characters to produce rooted phylogenetic trees. Results revealed an early thermophilic origin of Archaea that was followed by genome reduction events in microbial superkingdoms. Eukaryal genomes displayed extraordinary functional diversity and were enriched with hundreds of novel molecular activities not detected in the akaryotic microbial cells. Remarkably, the majority of these novel functions appeared quite late in evolution, synchronized with the diversification of the eukaryal superkingdom. The distribution of GO terms in superkingdoms confirms that Archaea appears to be the simplest and most ancient form of cellular life, while Eukarya is the most diverse and recent.
Collapse
|