1
|
Lucaci AG, Zehr JD, Enard D, Thornton JW, Kosakovsky Pond SL. Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses. Mol Biol Evol 2023; 40:msad150. [PMID: 37395787 PMCID: PMC10336034 DOI: 10.1093/molbev/msad150] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/15/2023] [Accepted: 06/26/2023] [Indexed: 07/04/2023] Open
Abstract
Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases dN/dS-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected (1.4-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions.
Collapse
Affiliation(s)
- Alexander G Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Jordan D Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona
| | - Joseph W Thornton
- Department of Human Genetics, University of Chicago, Chicago, Illinois
- Department of Ecology & Evolution, University of Chicago, Chicago, Illinois
| | | |
Collapse
|
2
|
Borges DGF, Carvalho DS, Bomfim GC, Ramos PIP, Brzozowski J, Góes-Neto A, F. S. Andrade R, El-Hani C. On the origin of mitochondria: a multilayer network approach. PeerJ 2023; 11:e14571. [PMID: 36632145 PMCID: PMC9828282 DOI: 10.7717/peerj.14571] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 11/28/2022] [Indexed: 01/08/2023] Open
Abstract
Backgound The endosymbiotic theory is widely accepted to explain the origin of mitochondria from a bacterial ancestor. While ample evidence supports the intimate connection of Alphaproteobacteria to the mitochondrial ancestor, pinpointing its closest relative within sampled Alphaproteobacteria is still an open evolutionary debate. Many different phylogenetic methods and approaches have been used to answer this challenging question, further compounded by the heterogeneity of sampled taxa, varying evolutionary rates of mitochondrial proteins, and the inherent biases in each method, all factors that can produce phylogenetic artifacts. By harnessing the simplicity and interpretability of protein similarity networks, herein we re-evaluated the origin of mitochondria within an enhanced multilayer framework, which is an extension and improvement of a previously developed method. Methods We used a dataset of eight proteins found in mitochondria (N = 6 organisms) and bacteria (N = 80 organisms). The sequences were aligned and resulting identity matrices were combined to generate an eight-layer multiplex network. Each layer corresponded to a protein network, where nodes represented organisms and edges were placed following mutual sequence identity. The Multi-Newman-Girvan algorithm was applied to evaluate community structure, and bifurcation events linked to network partition allowed to trace patterns of divergence between studied taxa. Results In our network-based analysis, we first examined the topology of the 8-layer multiplex when mitochondrial sequences disconnected from the main alphaproteobacterial cluster. The resulting topology lent firm support toward an Alphaproteobacteria-sister placement for mitochondria, reinforcing the hypothesis that mitochondria diverged from the common ancestor of all Alphaproteobacteria. Additionally, we observed that the divergence of Rickettsiales was an early event in the evolutionary history of alphaproteobacterial clades. Conclusion By leveraging complex networks methods to the challenging question of circumscribing mitochondrial origin, we suggest that the entire Alphaproteobacteria clade is the closest relative to mitochondria (Alphaproteobacterial-sister hypothesis), echoing recent findings based on different datasets and methodologies.
Collapse
Affiliation(s)
| | - Daniel S. Carvalho
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Gilberto C. Bomfim
- Institute of Biology, Federal University of Bahia, Salvador, Bahia, Brazil
| | | | - Jerzy Brzozowski
- Philosophy Department, Federal University of Santa Catarina, Florianópolis, Santa Catarina, Brazil
| | - Aristóteles Góes-Neto
- Institute of Biological Sciences, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil,Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Roberto F. S. Andrade
- Institute of Physics, Federal University of Bahia, Salvador, Bahia, Brazil,National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Salvador, Bahia, Brazil
| | - Charbel El-Hani
- Institute of Biology, Federal University of Bahia, Salvador, Bahia, Brazil,National Institute of Science and Technology in Interdisciplinary and Transdisciplinary Studies in Ecology and Evolution (INCT IN-TREE), Salvador, Bahia, Brazil
| |
Collapse
|
3
|
Adam PS, Kolyfetis GE, Bornemann TLV, Vorgias CE, Probst AJ. Genomic remnants of ancestral methanogenesis and hydrogenotrophy in Archaea drive anaerobic carbon cycling. SCIENCE ADVANCES 2022; 8:eabm9651. [PMID: 36332026 PMCID: PMC9635834 DOI: 10.1126/sciadv.abm9651] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/19/2022] [Indexed: 05/19/2023]
Abstract
Anaerobic methane metabolism is among the hallmarks of Archaea, originating very early in their evolution. Here, we show that the ancestor of methane metabolizers was an autotrophic CO2-reducing hydrogenotrophic methanogen that possessed the two main complexes, methyl-CoM reductase (Mcr) and tetrahydromethanopterin-CoM methyltransferase (Mtr), the anaplerotic hydrogenases Eha and Ehb, and a set of other genes collectively called "methanogenesis markers" but could not oxidize alkanes. Overturning recent inferences, we demonstrate that methyl-dependent hydrogenotrophic methanogenesis has emerged multiple times independently, either due to a loss of Mtr while Mcr is inherited vertically or from an ancient lateral acquisition of Mcr. Even if Mcr is lost, Mtr, Eha, Ehb, and the markers can persist, resulting in mixotrophic metabolisms centered around the Wood-Ljungdahl pathway. Through their methanogenesis remnants, Thorarchaeia and two newly reconstructed order-level lineages in Archaeoglobi and Bathyarchaeia act as metabolically versatile players in carbon cycling of anoxic environments across the globe.
Collapse
Affiliation(s)
- Panagiotis S. Adam
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
- Corresponding author.
| | - George E. Kolyfetis
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15784 Athens, Greece
| | - Till L. V. Bornemann
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
| | - Constantinos E. Vorgias
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15784 Athens, Greece
| | - Alexander J. Probst
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
- Research Center One Health Ruhr, Research Alliance Ruhr, Environmental Metagenomics, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
| |
Collapse
|
4
|
Abstract
The reconstruction of genetic material of ancestral organisms constitutes a powerful application of evolutionary biology. A fundamental step in this inference is the ancestral sequence reconstruction (ASR), which can be performed with diverse methodologies implemented in computer frameworks. However, most of these methodologies ignore evolutionary properties frequently observed in microbes, such as genetic recombination and complex selection processes, that can bias the traditional ASR. From a practical perspective, here I review methodologies for the reconstruction of ancestral DNA and protein sequences, with particular focus on microbes, and including biases, recommendations, and software implementations. I conclude that microbial ASR is a complex analysis that should be carefully performed and that there is a need for methods to infer more realistic ancestral microbial sequences.
Collapse
Affiliation(s)
- Miguel Arenas
- Biomedical Research Center (CINBIO), University of Vigo, Vigo, Spain.
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Galicia Sur Health Research Institute (IIS Galicia Sur), Vigo, Spain.
| |
Collapse
|
5
|
Spielman SJ, Miraglia ML. Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty. BMC Ecol Evol 2021; 21:214. [PMID: 34844571 PMCID: PMC8628390 DOI: 10.1186/s12862-021-01931-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. RESULTS We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. CONCLUSIONS We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.
Collapse
Affiliation(s)
| | - Molly L Miraglia
- Department of Molecular and Cellular Biosciences, Rowan University, Glassboro, NJ, 08028, USA.,Fox Chase Cancer Center, Philadelphia, PA, 19111, USA
| |
Collapse
|
6
|
Spielman SJ. Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics. Mol Biol Evol 2021; 37:2110-2123. [PMID: 32191313 PMCID: PMC7306691 DOI: 10.1093/molbev/msaa075] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.
Collapse
|
7
|
Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics. Mol Biol Evol 2021; 37:1819-1831. [PMID: 32119075 PMCID: PMC7253201 DOI: 10.1093/molbev/msaa049] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Louise A Huuki
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Mary Kathleen Durnan
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
8
|
Abstract
Proteins are commonly used as molecular targets against pathogens such as viruses and bacteria. However, pathogens can evolve rapidly permitting their populations to increase in protein diversity over time and thus escape to the activity of a molecular therapy. Subsequently, in order to design more durable and robust therapies as well as to understand viral evolution in a host and subsequent transmission, it is central to understand the evolution of pathogen proteins. This understanding can enable the detection of protein regions that can be potential targets for therapies and predict the emergence of molecular resistance against therapies. In this direction, two articles published recently in the Journal of Molecular Evolution investigated the evolution of proteomes of diverse flaviviruses, including Zika virus, Dengue virus and West Nile virus. Here I discuss the importance of considering the evolution of viral proteins, with the use of as realistic as possible models and methods that mimic protein evolution, to improve the design of antiviral therapies.
Collapse
|
9
|
Abadi S, Azouri D, Pupko T, Mayrose I. Model selection may not be a mandatory step for phylogeny reconstruction. Nat Commun 2019; 10:934. [PMID: 30804347 PMCID: PMC6389923 DOI: 10.1038/s41467-019-08822-w] [Citation(s) in RCA: 204] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 01/29/2019] [Indexed: 11/29/2022] Open
Abstract
Determining the most suitable model for phylogeny reconstruction constitutes a fundamental step in numerous evolutionary studies. Over the years, various criteria for model selection have been proposed, leading to debate over which criterion is preferable. However, the necessity of this procedure has not been questioned to date. Here, we demonstrate that although incongruency regarding the selected model is frequent over empirical and simulated data, all criteria lead to very similar inferences. When topologies and ancestral sequence reconstruction are the desired output, choosing one criterion over another is not crucial. Moreover, skipping model selection and using instead the most parameter-rich model, GTR+I+G, leads to similar inferences, thus rendering this time-consuming step nonessential, at least under current strategies of model selection.
Collapse
Affiliation(s)
- Shiran Abadi
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
| | - Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
| | - Tal Pupko
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel.
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel.
| |
Collapse
|
10
|
Echave J. Beyond Stability Constraints: A Biophysical Model of Enzyme Evolution with Selection on Stability and Activity. Mol Biol Evol 2018; 36:613-620. [DOI: 10.1093/molbev/msy244] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín (UNSAM), Buenos Aires, Argentina
| |
Collapse
|
11
|
Large-Scale Analyses of Site-Specific Evolutionary Rates across Eukaryote Proteomes Reveal Confounding Interactions between Intrinsic Disorder, Secondary Structure, and Functional Domains. Genes (Basel) 2018; 9:genes9110553. [PMID: 30441862 PMCID: PMC6265720 DOI: 10.3390/genes9110553] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 11/09/2018] [Accepted: 11/09/2018] [Indexed: 12/31/2022] Open
Abstract
Various structural and functional constraints govern the evolution of protein sequences. As a result, the relative rates of amino acid replacement among sites within a protein can vary significantly. Previous large-scale work on Metazoan (Animal) protein sequence alignments indicated that amino acid replacement rates are partially driven by a complex interaction among three factors: intrinsic disorder propensity; secondary structure; and functional domain involvement. Here, we use sequence-based predictors to evaluate the effects of these factors on site-specific sequence evolutionary rates within four eukaryotic lineages: Metazoans; Plants; Saccharomycete Fungi; and Alveolate Protists. Our results show broad, consistent trends across all four Eukaryote groups. In all four lineages, there is a significant increase in amino acid replacement rates when comparing: (i) disordered vs. ordered sites; (ii) random coil sites vs. sites in secondary structures; and (iii) inter-domain linker sites vs. sites in functional domains. Additionally, within Metazoans, Plants, and Saccharomycetes, there is a strong confounding interaction between intrinsic disorder and secondary structure-alignment sites exhibiting both high disorder propensity and involvement in secondary structures have very low average rates of sequence evolution. Analysis of gene ontology (GO) terms revealed that in all four lineages, a high fraction of sequences containing these conserved, disordered-structured sites are involved in nucleic acid binding. We also observe notable differences in the statistical trends of Alveolates, where intrinsically disordered sites are more variable than in other Eukaryotes and the statistical interactions between disorder and other factors are less pronounced.
Collapse
|