1
|
Martin DP, Weaver S, Tegally H, San JE, Shank SD, Wilkinson E, Lucaci AG, Giandhari J, Naidoo S, Pillay Y, Singh L, Lessells RJ, Gupta RK, Wertheim JO, Nekturenko A, Murrell B, Harkins GW, Lemey P, MacLean OA, Robertson DL, de Oliveira T, Kosakovsky Pond SL. The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages. Cell 2021; 184:5189-5200.e7. [PMID: 34537136 PMCID: PMC8421097 DOI: 10.1016/j.cell.2021.09.003] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 07/05/2021] [Accepted: 09/01/2021] [Indexed: 12/18/2022]
Abstract
The independent emergence late in 2020 of the B.1.1.7, B.1.351, and P.1 lineages of SARS-CoV-2 prompted renewed concerns about the evolutionary capacity of this virus to overcome public health interventions and rising population immunity. Here, by examining patterns of synonymous and non-synonymous mutations that have accumulated in SARS-CoV-2 genomes since the pandemic began, we find that the emergence of these three "501Y lineages" coincided with a major global shift in the selective forces acting on various SARS-CoV-2 genes. Following their emergence, the adaptive evolution of 501Y lineage viruses has involved repeated selectively favored convergent mutations at 35 genome sites, mutations we refer to as the 501Y meta-signature. The ongoing convergence of viruses in many other lineages on this meta-signature suggests that it includes multiple mutation combinations capable of promoting the persistence of diverse SARS-CoV-2 lineages in the face of mounting host immune recognition.
Collapse
Affiliation(s)
- Darren P Martin
- Institute of Infectious Diseases and Molecular Medicine, Division Of Computational Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town 7701, South Africa.
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Houriiyah Tegally
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - James Emmanuel San
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Stephen D Shank
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Eduan Wilkinson
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Alexander G Lucaci
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Jennifer Giandhari
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Sureshnee Naidoo
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Yeshnee Pillay
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Lavanya Singh
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Richard J Lessells
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa
| | - Ravindra K Gupta
- Clinical Microbiology, University of Cambridge, Cambridge CB2 1TN, UK; Africa Health Research Institute, KwaZulu-Natal 4013, South Africa
| | - Joel O Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Nekturenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA 16802, USA
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 141 83, Sweden
| | - Gordon W Harkins
- South African Medical Research Council Capacity Development Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville 7635, South Africa
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven 3000, Belgium
| | - Oscar A MacLean
- MRC-University of Glasgow Centre for Virus Research, Glasgow 12 8QQ, Scotland, UK
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research, Glasgow 12 8QQ, Scotland, UK
| | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform, School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban 4001, South Africa; Department of Global Health, University of Washington, Seattle, WA 98195-4550, USA.
| | - Sergei L Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, PA 19122, USA.
| |
Collapse
|
2
|
Kosakovsky Pond SL, Wisotsky SR, Escalante A, Magalis BR, Weaver S. Contrast-FEL-A Test for Differences in Selective Pressures at Individual Sites among Clades and Sets of Branches. Mol Biol Evol 2021; 38:1184-1198. [PMID: 33064823 PMCID: PMC7947784 DOI: 10.1093/molbev/msaa263] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
A number of evolutionary hypotheses can be tested by comparing selective pressures among sets of branches in a phylogenetic tree. When the question of interest is to identify specific sites within genes that may be evolving differently, a common approach is to perform separate analyses on subsets of sequences and compare parameter estimates in a post hoc fashion. This approach is statistically suboptimal and not always applicable. Here, we develop a simple extension of a popular fixed effects likelihood method in the context of codon-based evolutionary phylogenetic maximum likelihood testing, Contrast-FEL. It is suitable for identifying individual alignment sites where any among the K≥2 sets of branches in a phylogenetic tree have detectably different ω ratios, indicative of different selective regimes. Using extensive simulations, we show that Contrast-FEL delivers good power, exceeding 90% for sufficiently large differences, while maintaining tight control over false positive rates, when the model is correctly specified. We conclude by applying Contrast-FEL to data from five previously published studies spanning a diverse range of organisms and focusing on different evolutionary questions.
Collapse
Affiliation(s)
| | - Sadie R Wisotsky
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Ananias Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Brittany Rife Magalis
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| |
Collapse
|
3
|
Puller V, Sagulenko P, Neher RA. Efficient inference, potential, and limitations of site-specific substitution models. Virus Evol 2020; 6:veaa066. [PMID: 33343922 PMCID: PMC7733610 DOI: 10.1093/ve/veaa066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Natural selection imposes a complex filter on which variants persist in a population resulting in evolutionary patterns that vary greatly along the genome. Some sites evolve close to neutrally, while others are highly conserved, allow only specific states, or only change in concert with other sites. On one hand, such constraints on sequence evolution can be to infer biological function, one the other hand they need to be accounted for in phylogenetic reconstruction. Phylogenetic models often account for this complexity by partitioning sites into a small number of discrete classes with different rates and/or state preferences. Appropriate model complexity is typically determined by model selection procedures. Here, we present an efficient algorithm to estimate more complex models that allow for different preferences at every site and explore the accuracy at which such models can be estimated from simulated data. Our iterative approximate maximum likelihood scheme uses information in the data efficiently and accurately estimates site-specific preferences from large data sets with moderately diverged sequences and known topology. However, the joint estimation of site-specific rates, and site-specific preferences, and phylogenetic branch length can suffer from identifiability problems, while ignoring variation in preferences across sites results in branch length underestimates. Site-specific preferences estimated from large HIV pol alignments show qualitative concordance with intra-host estimates of fitness costs. Analysis of these substitution models suggests near saturation of divergence after a few hundred years. Such saturation can explain the inability to infer deep divergence times of HIV and SIVs using molecular clock approaches and time-dependent rate estimates.
Collapse
Affiliation(s)
- Vadim Puller
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| | - Pavel Sagulenko
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| | - Richard A Neher
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| |
Collapse
|
4
|
Spielman SJ, Kosakovsky Pond SL. Relative Evolutionary Rates in Proteins Are Largely Insensitive to the Substitution Model. Mol Biol Evol 2018; 35:2307-2317. [PMID: 29924340 PMCID: PMC6107055 DOI: 10.1093/molbev/msy127] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The relative evolutionary rates at individual sites in proteins are informative measures of conservation or adaptation. Often used as evolutionarily aware conservation scores, relative rates reveal key functional or strongly selected residues. Estimating rates in a phylogenetic context requires specifying a protein substitution model, which is typically a phenomenological model trained on a large empirical data set. A strong emphasis has traditionally been placed on selecting the "best-fit" model, with the implicit understanding that suboptimal or otherwise ill-fitting models might bias inferences. However, the pervasiveness and degree of such bias has not been systematically examined. We investigated how model choice impacts site-wise relative rates in a large set of empirical protein alignments. We compared models designed for use on any general protein, models designed for specific domains of life, and the simple equal-rates Jukes Cantor-style model (JC). As expected, information theoretic measures showed overwhelming evidence that some models fit the data decidedly better than others. By contrast, estimates of site-specific evolutionary rates were impressively insensitive to the substitution model used, revealing an unexpected degree of robustness to potential model misspecification. A deeper examination of the fewer than 5% of sites for which model inferences differed in a meaningful way showed that the JC model could uniquely identify rapidly evolving sites that models with empirically derived exchangeabilities failed to detect. We conclude that relative protein rates appear robust to the applied substitution model, and any sensible model of protein evolution, regardless of its fit to the data, should produce broadly consistent evolutionary rates.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Sergei L Kosakovsky Pond
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| |
Collapse
|
5
|
Spielman SJ, Kosakovsky Pond SL. Relative evolutionary rate inference in HyPhy with LEISR. PeerJ 2018; 6:e4339. [PMID: 29423346 PMCID: PMC5804317 DOI: 10.7717/peerj.4339] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 01/18/2018] [Indexed: 01/10/2023] Open
Abstract
We introduce LEISR (Likehood Estimation of Individual Site Rates, pronounced “laser”), a tool to infer relative evolutionary rates from protein and nucleotide data, implemented in HyPhy. LEISR is based on the popular Rate4Site (Pupko et al., 2002) approach for inferring relative site-wise evolutionary rates, primarily from protein data. We extend the original method for more general use in several key ways: (i) we increase the support for nucleotide data with additional models, (ii) we allow for datasets of arbitrary size, (iii) we support analysis of site-partitioned datasets to correct for the presence of recombination breakpoints, (iv) we produce rate estimates at all sites rather than at just a subset of sites, and (v) we implemented LEISR as MPI-enabled to support rapid, high-throughput analysis. LEISR is available in HyPhy starting with version 2.3.8, and it is accessible as an option in the HyPhy analysis menu (“Relative evolutionary rate inference”), which calls the HyPhy batchfile LEISR.bf.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States of America
| | - Sergei L Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States of America
| |
Collapse
|
6
|
Spielman SJ, Wilke CO. Extensively Parameterized Mutation-Selection Models Reliably Capture Site-Specific Selective Constraint. Mol Biol Evol 2016; 33:2990-3002. [PMID: 27512115 DOI: 10.1093/molbev/msw171] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX Present address: Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| |
Collapse
|
7
|
Mingrone J, Susko E, Bielawski J. Smoothed Bootstrap Aggregation for Assessing Selection Pressure at Amino Acid Sites. Mol Biol Evol 2016; 33:2976-2989. [PMID: 27486222 DOI: 10.1093/molbev/msw160] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To detect positive selection at individual amino acid sites, most methods use an empirical Bayes approach. After parameters of a Markov process of codon evolution are estimated via maximum likelihood, they are passed to Bayes formula to compute the posterior probability that a site evolved under positive selection. A difficulty with this approach is that parameter estimates with large errors can negatively impact Bayesian classification. By assigning priors to some parameters, Bayes Empirical Bayes (BEB) mitigates this problem. However, as implemented, it imposes uniform priors, which causes it to be overly conservative in some cases. When standard regularity conditions are not met and parameter estimates are unstable, inference, even under BEB, can be negatively impacted. We present an alternative to BEB called smoothed bootstrap aggregation (SBA), which bootstraps site patterns from an alignment of protein coding DNA sequences to accommodate the uncertainty in the parameter estimates. We show that deriving the correction for parameter uncertainty from the data in hand, in combination with kernel smoothing techniques, improves site specific inference of positive selection. We compare BEB to SBA by simulation and real data analysis. Simulation results show that SBA balances accuracy and power at least as well as BEB, and when parameter estimates are unstable, the performance gap between BEB and SBA can widen in favor of SBA. SBA is applicable to a wide variety of other inference problems in molecular evolution.
Collapse
Affiliation(s)
- Joseph Mingrone
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada
| | - Joseph Bielawski
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada Department of Biology, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|
8
|
Smith MD, Wertheim JO, Weaver S, Murrell B, Scheffler K, Kosakovsky Pond SL. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol Biol Evol 2015; 32:1342-53. [PMID: 25697341 DOI: 10.1093/molbev/msv022] [Citation(s) in RCA: 501] [Impact Index Per Article: 50.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of "branch-site" models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for "branch-site" models. An aBSREL analysis of 8,893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, that is, current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity.
Collapse
Affiliation(s)
- Martin D Smith
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego
| | | | - Steven Weaver
- Department of Medicine, University of California San Diego
| | - Ben Murrell
- Department of Medicine, University of California San Diego
| | - Konrad Scheffler
- Department of Medicine, University of California San Diego Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
| | | |
Collapse
|
9
|
Wertheim JO, Murrell B, Smith MD, Kosakovsky Pond SL, Scheffler K. RELAX: detecting relaxed selection in a phylogenetic framework. Mol Biol Evol 2014; 32:820-32. [PMID: 25540451 DOI: 10.1093/molbev/msu400] [Citation(s) in RCA: 464] [Impact Index Per Article: 42.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Relaxation of selective strength, manifested as a reduction in the efficiency or intensity of natural selection, can drive evolutionary innovation and presage lineage extinction or loss of function. Mechanisms through which selection can be relaxed range from the removal of an existing selective constraint to a reduction in effective population size. Standard methods for estimating the strength and extent of purifying or positive selection from molecular sequence data are not suitable for detecting relaxed selection, because they lack power and can mistake an increase in the intensity of positive selection for relaxation of both purifying and positive selection. Here, we present a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework. Given two subsets of branches in a phylogeny, RELAX can determine whether selective strength was relaxed or intensified in one of these subsets relative to the other. We establish the validity of our test via simulations and show that it can distinguish between increased positive selection and a relaxation of selective strength. We also demonstrate the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously. We find that obligate and facultative γ-proteobacteria endosymbionts of insects are under relaxed selection compared with their free-living relatives and obligate endosymbionts are under relaxed selection compared with facultative endosymbionts. Selective strength is also relaxed in asexual Daphnia pulex lineages, compared with sexual lineages. Endogenous, nonfunctional, bornavirus-like elements are found to be under relaxed selection compared with exogenous Borna viruses. Finally, selection on the short-wavelength sensitive, SWS1, opsin genes in echolocating and nonecholocating bats is relaxed only in lineages in which this gene underwent pseudogenization; however, selection on the functional medium/long-wavelength sensitive opsin, M/LWS1, is found to be relaxed in all echolocating bats compared with nonecholocating bats.
Collapse
Affiliation(s)
| | - Ben Murrell
- Department of Medicine, University of California, San Diego
| | - Martin D Smith
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego
| | | | - Konrad Scheffler
- Department of Medicine, University of California, San Diego Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
| |
Collapse
|
10
|
Wertheim JO, Smith MD, Smith DM, Scheffler K, Kosakovsky Pond SL. Evolutionary origins of human herpes simplex viruses 1 and 2. Mol Biol Evol 2014; 31:2356-64. [PMID: 24916030 PMCID: PMC4137711 DOI: 10.1093/molbev/msu185] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Herpesviruses have been infecting and codiverging with their vertebrate hosts for hundreds of millions of years. The primate simplex viruses exemplify this pattern of virus-host codivergence, at a minimum, as far back as the most recent common ancestor of New World monkeys, Old World monkeys, and apes. Humans are the only primate species known to be infected with two distinct herpes simplex viruses: HSV-1 and HSV-2. Human herpes simplex viruses are ubiquitous, with over two-thirds of the human population infected by at least one virus. Here, we investigated whether the additional human simplex virus is the result of ancient viral lineage duplication or cross-species transmission. We found that standard phylogenetic models of nucleotide substitution are inadequate for distinguishing among these competing hypotheses; the extent of synonymous substitutions causes a substantial underestimation of the lengths of some of the branches in the phylogeny, consistent with observations in other viruses (e.g., avian influenza, Ebola, and coronaviruses). To more accurately estimate ancient viral divergence times, we applied a branch-site random effects likelihood model of molecular evolution that allows the strength of natural selection to vary across both the viral phylogeny and the gene alignment. This selection-informed model favored a scenario in which HSV-1 is the result of ancient codivergence and HSV-2 arose from a cross-species transmission event from the ancestor of modern chimpanzees to an extinct Homo precursor of modern humans, around 1.6 Ma. These results provide a new framework for understanding human herpes simplex virus evolution and demonstrate the importance of using selection-informed models of sequence evolution when investigating viral origin hypotheses.
Collapse
Affiliation(s)
| | - Martin D Smith
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego
| | - Davey M Smith
- Department of Medicine, University of California, San DiegoVeterans Affairs San Diego Healthcare System, San Diego, CA
| | - Konrad Scheffler
- Department of Medicine, University of California, San DiegoDepartment of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa
| | | |
Collapse
|