1
|
Thiltgen G, Dos Reis M, Goldstein RA. Finding Direction in the Search for Selection. J Mol Evol 2016; 84:39-50. [PMID: 27913840 PMCID: PMC5253163 DOI: 10.1007/s00239-016-9765-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 11/10/2016] [Indexed: 11/24/2022]
Abstract
Tests for positive selection have mostly been developed to look for diversifying selection where change away from the current amino acid is often favorable. However, in many cases we are interested in directional selection where there is a shift toward specific amino acids, resulting in increased fitness in the species. Recently, a few methods have been developed to detect and characterize directional selection on a molecular level. Using the results of evolutionary simulations as well as HIV drug resistance data as models of directional selection, we compare two such methods with each other, as well as against a standard method for detecting diversifying selection. We find that the method to detect diversifying selection also detects directional selection under certain conditions. One method developed for detecting directional selection is powerful and accurate for a wide range of conditions, while the other can generate an excessive number of false positives.
Collapse
Affiliation(s)
- Grant Thiltgen
- Institute of Child Health, University College London, London, UK
| | - Mario Dos Reis
- The School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | | |
Collapse
|
2
|
Huang YF, Golding GB. Inferring sequence regions under functional divergence in duplicate genes. Bioinformatics 2011; 28:176-83. [DOI: 10.1093/bioinformatics/btr635] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
3
|
Roure B, Philippe H. Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference. BMC Evol Biol 2011; 11:17. [PMID: 21235782 PMCID: PMC3034684 DOI: 10.1186/1471-2148-11-17] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Accepted: 01/14/2011] [Indexed: 11/13/2022] Open
Abstract
Background Model violations constitute the major limitation in inferring accurate phylogenies. Characterizing properties of the data that are not being correctly handled by current models is therefore of prime importance. One of the properties of protein evolution is the variation of the relative rate of substitutions across sites and over time, the latter is the phenomenon called heterotachy. Its effect on phylogenetic inference has recently obtained considerable attention, which led to the development of new models of sequence evolution. However, thus far focus has been on the quantitative heterogeneity of the evolutionary process, thereby overlooking more qualitative variations. Results We studied the importance of variation of the site-specific amino-acid substitution process over time and its possible impact on phylogenetic inference. We used the CAT model to define an infinite mixture of substitution processes characterized by equilibrium frequencies over the twenty amino acids, a useful proxy for qualitatively estimating the evolutionary process. Using two large datasets, we show that qualitative changes in site-specific substitution properties over time occurred significantly. To test whether this unaccounted qualitative variation can lead to an erroneous phylogenetic tree, we analyzed a concatenation of mitochondrial proteins in which Cnidaria and Porifera were erroneously grouped. The progressive removal of the sites with the most heterogeneous CAT profiles across clades led to the recovery of the monophyly of Eumetazoa (Cnidaria+Bilateria), suggesting that this heterogeneity can negatively influence phylogenetic inference. Conclusion The time-heterogeneity of the amino-acid replacement process is therefore an important evolutionary aspect that should be incorporated in future models of sequence change.
Collapse
Affiliation(s)
- Béatrice Roure
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Québec, Canada
| | | |
Collapse
|
4
|
Kamneva OK, Liberles DA, Ward NL. Genome-wide influence of indel Substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol Evol 2010; 2:870-86. [PMID: 21048002 PMCID: PMC3000692 DOI: 10.1093/gbe/evq071] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Whole-genome scans for positive Darwinian selection are widely used to detect evolution of genome novelty. Most approaches are based on evaluation of nonsynonymous to synonymous substitution rate ratio across evolutionary lineages. These methods are sensitive to saturation of synonymous sites and thus cannot be used to study evolution of distantly related organisms. In contrast, indels occur less frequently than amino acid replacements, accumulate more slowly, and can be employed to characterize evolution of diverged organisms. As indels are also subject to the forces of natural selection, they can generate functional changes through positive selection. Here, we present a new computational approach to detect selective constraints on indel substitutions at the whole-genome level for distantly related organisms. Our method is based on ancestral sequence reconstruction, takes into account the varying susceptibility of different types of secondary structure to indels, and according to simulation studies is conservative. We applied this newly developed framework to characterize the evolution of organisms of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. The superphylum contains organisms with unique cell biology, physiology, and diverse lifestyles. It includes bacteria with simple cell organization and more complex eukaryote-like compartmentalization. Lifestyles range from free-living organisms to obligate pathogens. In this study, we conduct a whole-genome level analysis of indel substitutions specific to evolutionary lineages of the PVC superphylum and found that indels evolved under positive selection on up to 12% of gene tree branches. We also analyzed possible functional consequences for several case studies of predicted indel events.
Collapse
Affiliation(s)
| | | | - Naomi L. Ward
- Department of Molecular Biology, University of Wyoming
- Department of Botany, University of Wyoming
- Program in Ecology, University of Wyoming
- Corresponding author: E-mail:
| |
Collapse
|
5
|
Hu XB, Yue QH, Zhang XQ, Xu XQ, Wen Y, Chen YZ, Cheng XD, Yang L, Mu SJ. Hepatitis B virus genotypes and evolutionary profiles from blood donors from the northwest region of China. Virol J 2009; 6:199. [PMID: 19917138 PMCID: PMC2781008 DOI: 10.1186/1743-422x-6-199] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 11/17/2009] [Indexed: 12/12/2022] Open
Abstract
Hepatitis B virus (HBV) is prevalent in China and screening of blood donors is mandatory. Up to now, ELISA has been universally used by the China blood bank. However, this strategy has sometimes failed due to the high frequency of nucleoside acid mutations. Understanding HBV evolution and strain diversity could help devise a better screening system for blood donors. However, this kind of information in China, especially in the northwest region, is lacking. In the present study, serological markers and the HBV DNA load of 11 samples from blood donor candidates from northwest China were determined. The HBV strains were most clustered into B and C genotypes and could not be clustered into similar types from reference sequences. Subsequent testing showed liver function impairment and increasing virus load in the positive donors. This HBV evolutionary data for China will allow for better ELISA and NAT screening efficiency in the blood bank of China, especially in the northwest region.
Collapse
Affiliation(s)
- Xing-Bin Hu
- Department of Blood Transfusion, Xijing Hospital, the Fourth Military Medical University, Xi'an 710032, PR China.
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Tamuri AU, dos Reis M, Hay AJ, Goldstein RA. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput Biol 2009; 5:e1000564. [PMID: 19911053 PMCID: PMC2770840 DOI: 10.1371/journal.pcbi.1000564] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 10/15/2009] [Indexed: 11/19/2022] Open
Abstract
The natural reservoir of Influenza A is waterfowl. Normally, waterfowl viruses are not adapted to infect and spread in the human population. Sometimes, through reassortment or through whole host shift events, genetic material from waterfowl viruses is introduced into the human population causing worldwide pandemics. Identifying which mutations allow viruses from avian origin to spread successfully in the human population is of great importance in predicting and controlling influenza pandemics. Here we describe a novel approach to identify such mutations. We use a sitewise non-homogeneous phylogenetic model that explicitly takes into account differences in the equilibrium frequencies of amino acids in different hosts and locations. We identify 172 amino acid sites with strong support and 518 sites with moderate support of different selection constraints in human and avian viruses. The sites that we identify provide an invaluable resource to experimental virologists studying adaptation of avian flu viruses to the human host. Identification of the sequence changes necessary for host shifts would help us predict the pandemic potential of various strains. The method is of broad applicability to investigating changes in selective constraints when the timing of the changes is known. Influenza A's natural reservoir is waterfowl. Sometimes avian virus genomic segments are able to shift to a human host, either in toto or by combining with those that underwent a previous host shift event. Such host shift events can cause worldwide pandemics in their immunologically naive hosts. In order for these host shifts to establish a stable lineage, the virus has to adapt to the new host. Identifying the changes that have occurred in the past can provide important clues about how this process happens, and how surveillance for new influenza threats should be targeted. Unfortunately, it is difficult to determine whether an amino acid has changed due to adaptation to the new host or whether the change occurred through random drift. Here we describe a novel phylogenetic approach to identifying locations where the nature of the selective pressure exerted on the location has changed corresponding to the host shift event. We identify a set of locations on a number of the genomic segments. The approach we describe is of wide applicability when the timing of the change of selective constraints is known in advance.
Collapse
Affiliation(s)
- Asif U. Tamuri
- National Institute for Medical Research, London, United Kingdom
| | - Mario dos Reis
- National Institute for Medical Research, London, United Kingdom
| | - Alan J. Hay
- National Institute for Medical Research, London, United Kingdom
| | | |
Collapse
|
7
|
Zhou Y, Brinkmann H, Rodrigue N, Lartillot N, Philippe H. A Dirichlet Process Covarion Mixture Model and Its Assessments Using Posterior Predictive Discrepancy Tests. Mol Biol Evol 2009; 27:371-84. [DOI: 10.1093/molbev/msp248] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
8
|
Wang HC, Susko E, Roger AJ. PROCOV: maximum likelihood estimation of protein phylogeny under covarion models and site-specific covarion pattern analysis. BMC Evol Biol 2009; 9:225. [PMID: 19737395 PMCID: PMC2758850 DOI: 10.1186/1471-2148-9-225] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 09/08/2009] [Indexed: 11/12/2022] Open
Abstract
Background The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference. Results PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors. Conclusion Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada.
| | | | | |
Collapse
|
9
|
Penn O, Stern A, Rubinstein ND, Dutheil J, Bacharach E, Galtier N, Pupko T. Evolutionary modeling of rate shifts reveals specificity determinants in HIV-1 subtypes. PLoS Comput Biol 2008; 4:e1000214. [PMID: 18989394 PMCID: PMC2566816 DOI: 10.1371/journal.pcbi.1000214] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 09/23/2008] [Indexed: 11/19/2022] Open
Abstract
A hallmark of the human immunodeficiency virus 1 (HIV-1) is its rapid rate of evolution within and among its various subtypes. Two complementary hypotheses are suggested to explain the sequence variability among HIV-1 subtypes. The first suggests that the functional constraints at each site remain the same across all subtypes, and the differences among subtypes are a direct reflection of random substitutions, which have occurred during the time elapsed since their divergence. The alternative hypothesis suggests that the functional constraints themselves have evolved, and thus sequence differences among subtypes in some sites reflect shifts in function. To determine the contribution of each of these two alternatives to HIV-1 subtype evolution, we have developed a novel Bayesian method for testing and detecting site-specific rate shifts. The RAte Shift EstimatoR (RASER) method determines whether or not site-specific functional shifts characterize the evolution of a protein and, if so, points to the specific sites and lineages in which these shifts have most likely occurred. Applying RASER to a dataset composed of large samples of HIV-1 sequences from different group M subtypes, we reveal rampant evolutionary shifts throughout the HIV-1 proteome. Most of these rate shifts have occurred during the divergence of the major subtypes, establishing that subtype divergence occurred together with functional diversification. We report further evidence for the emergence of a new sub-subtype, characterized by abundant rate-shifting sites. When focusing on the rate-shifting sites detected, we find that many are associated with known function relating to viral life cycle and drug resistance. Finally, we discuss mechanisms of covariation of rate-shifting sites. The AIDS epidemic, inflicted by the human immunodeficiency virus (HIV), has already claimed 25 million lives, thus posing a global threat. Since its discovery, several HIV subtypes have emerged, characterized by distinct genomic sequences and variable geographic locations. Here, we investigate the nature of the genetic differences among the subtypes. The neutral theory of evolution suggests that most genetic differences marginally affect the function of the encoded proteins (hence neutral) and thus occur randomly. Alternatively, changes in protein function are reflected by a pattern of nonrandom genetic differences. To address this issue, we developed a computational method, which studies the differences between sequences of different HIV subtypes, and estimates which of the explanations is more likely. Using a large sample of HIV protein sequences, we discovered that part of the variability among the subtypes is not random and possibly reflects different functional constraints imposed on the subtypes during the course of their evolution. An in-depth inspection of these nonrandom changes revealed a correlation with biological traits, such as drug resistance and mechanisms facilitating viral entry into the host cell. Interestingly, nonrandom changes are also characteristic of a viral strain that recently emerged in the former Soviet Union.
Collapse
Affiliation(s)
- Osnat Penn
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Adi Stern
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Nimrod D. Rubinstein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Julien Dutheil
- BiRC—Bioinformatics Research Center, University of Aarhus, Århus, Denmark
| | - Eran Bacharach
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution—CC64, Centre National de la Recherche Scientifique—Université Montpellier 2, Montpelier, France
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
10
|
Zhou Y, Rodrigue N, Lartillot N, Philippe H. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol Biol 2007; 7:206. [PMID: 17974035 PMCID: PMC2248194 DOI: 10.1186/1471-2148-7-206] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2007] [Accepted: 11/01/2007] [Indexed: 11/30/2022] Open
Abstract
Background The evolutionary rate at a given homologous position varies across time. When sufficiently pronounced, this phenomenon – called heterotachy – may produce artefactual phylogenetic reconstructions under the commonly used models of sequence evolution. These observations have motivated the development of models that explicitly recognize heterotachy, with research directions proposed along two main axes: 1) the covarion approach, where sites switch from variable to invariable states; and 2) the mixture of branch lengths (MBL) approach, where alignment patterns are assumed to arise from one of several sets of branch lengths, under a given phylogeny. Results Here, we report the first statistical comparisons contrasting the performance of covarion and MBL modeling strategies. Using simulations under heterotachous conditions, we explore the properties of three model comparison methods: the Akaike information criterion, the Bayesian information criterion, and cross validation. Although more time consuming, cross validation appears more reliable than AIC and BIC as it directly measures the predictive power of a model on 'future' data. We also analyze three large datasets (nuclear proteins of animals, mitochondrial proteins of mammals, and plastid proteins of plants), and find the optimal number of components of the MBL model to be two for all datasets, indicating that this model is preferred over the standard homogeneous model. However, the covarion model is always favored over the optimal MBL model. Conclusion We demonstrated, using three large datasets, that the covarion model is more efficient at handling heterotachy than the MBL model. This is probably due to the fact that the MBL model requires a serious increase in the number of parameters, as compared to two supplementary parameters of the covarion approach. Further improvements of the both the mixture and the covarion approaches might be obtained by modeling heterogeneous behavior both along time and across sites.
Collapse
Affiliation(s)
- Yan Zhou
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Succursale Centre-Ville, Montréal, Québec H3C3J7, Canada.
| | | | | | | |
Collapse
|
11
|
Anisimova M, Liberles DA. The quest for natural selection in the age of comparative genomics. Heredity (Edinb) 2007; 99:567-79. [PMID: 17848974 DOI: 10.1038/sj.hdy.6801052] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Continued genome sequencing has fueled progress in statistical methods for understanding the action of natural selection at the molecular level. This article reviews various statistical techniques (and their applicability) for detecting adaptation events and the functional divergence of proteins. As large-scale automated studies become more frequent, they provide a useful resource for generating biological null hypotheses for further experimental and statistical testing. Furthermore, they shed light on typical patterns of lineage-specific evolution of organisms, on the functional and structural evolution of protein families and on the interplay between the two. More complex models are being developed to better reflect the underlying biological and chemical processes and to complement simpler statistical models. Linking molecular processes to their statistical signatures in genomes can be demanding, and the proper application of statistical models is discussed.
Collapse
Affiliation(s)
- M Anisimova
- Department of Biology, University College London, London, UK
| | | |
Collapse
|
12
|
Philippe H, Blanchette M. Proceedings of the First International Conference on Phylogenomics. March 15-19, 2006. Quebec, Canada. BMC Evol Biol 2007; 7 Suppl 1:S1-16. [PMID: 17288567 PMCID: PMC1796603 DOI: 10.1186/1471-2148-7-s1-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The First Phylogenomics Conference was held in Ste-Adèle (Québec, Canada) in March 2006. Selected papers appear in this special issue of BMC Evolutionary Biology. Here, we give an introduction to the field and provide an overview of the articles presented in this issue.
Collapse
Affiliation(s)
- Hervé Philippe
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, 2900 Boulevard Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, 3775 University Steet, Montréal, Québec, H3A 2B4, Canada
| |
Collapse
|