1
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
2
|
Liu L, Yu L, Wu S, Arnold J, Whalen C, Davis C, Edwards S. Short branch attraction in phylogenomic inference under the multispecies coalescent. Front Ecol Evol 2023; 11:1134764. [PMID: 39233780 PMCID: PMC11372852 DOI: 10.3389/fevo.2023.1134764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024] Open
Abstract
Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star treeT * = S 1 , S 2 , S 3 , S 4 with two short branches leading to the speciesS 1 andS 2 , we demonstrate that maximum likelihood significantly favors the wrong bifurcating treeS 1 , S 2 , S 3 , S 4 grouping the two speciesS 1 andS 2 with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Lili Yu
- Department of Biostatistics, Georgia Southern University, Statesboro, GA, United States
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, Jiangsu International Joint Center of Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
| | - Jonathan Arnold
- Department of Genetics, University of Georgia, Athens, GA, United States
| | - Christopher Whalen
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, United States
| | - Charles Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| | - Scott Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| |
Collapse
|
3
|
Arenas M. ProteinEvolverABC: coestimation of recombination and substitution rates in protein sequences by approximate Bayesian computation. Bioinformatics 2021; 38:58-64. [PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 07/24/2021] [Accepted: 08/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used. RESULTS In order to accommodate this need, here I present a computational framework, called ProteinEvolverABC, to jointly estimate recombination and substitution rates from alignments of protein sequences. The framework implements the approximate Bayesian computation approach, with and without regression adjustments and includes a variety of substitution models of protein evolution, demographics and longitudinal sampling. It also implements several nuisance parameters such as heterogeneous amino acid frequencies and rate of change among sites and, proportion of invariable sites. The framework produces accurate coestimation of recombination and substitution rates under diverse evolutionary scenarios. As illustrative examples of usage, I applied it to several viral protein families, including coronaviruses, showing heterogeneous substitution and recombination rates. AVAILABILITY AND IMPLEMENTATION ProteinEvolverABC is freely available from https://github.com/miguelarenas/proteinevolverabc, includes a graphical user interface for helping the specification of the input settings, extensive documentation and ready-to-use examples. Conveniently, the simulations can run in parallel on multicore machines. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Universidade de Vigo, Departamento de Bioquimica, Xenetica e Inmunoloxia, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
4
|
Kreger J, Garcia J, Zhang H, Komarova NL, Wodarz D, Levy DN. Quantifying the dynamics of viral recombination during free virus and cell-to-cell transmission in HIV-1 infection. Virus Evol 2021; 7:veab026. [PMID: 34012557 PMCID: PMC8117450 DOI: 10.1093/ve/veab026] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recombination has been shown to contribute to human immunodeficiency virus-1 (HIV-1) evolution in vivo, but the underlying dynamics are extremely complex, depending on the nature of the fitness landscapes and of epistatic interactions. A less well-studied determinant of recombinant evolution is the mode of virus transmission in the cell population. HIV-1 can spread by free virus transmission, resulting largely in singly infected cells, and also by direct cell-to-cell transmission, resulting in the simultaneous infection of cells with multiple viruses. We investigate the contribution of these two transmission pathways to recombinant evolution, by applying mathematical models to in vitro experimental data on the growth of fluorescent reporter viruses under static conditions (where both transmission pathways operate), and under gentle shaking conditions, where cell-to-cell transmission is largely inhibited. The parameterized mathematical models are then used to extrapolate the viral evolutionary dynamics beyond the experimental settings. Assuming a fixed basic reproductive ratio of the virus (independent of transmission pathway), we find that recombinant evolution is fastest if virus spread is driven only by cell-to-cell transmission and slows down if both transmission pathways operate. Recombinant evolution is slowest if all virus spread occurs through free virus transmission. This is due to cell-to-cell transmission 1, increasing infection multiplicity; 2, promoting the co-transmission of different virus strains from cell to cell; and 3, increasing the rate at which point mutations are generated as a result of more reverse transcription events. This study further resulted in the estimation of various parameters that characterize these evolutionary processes. For example, we estimate that during cell-to-cell transmission, an average of three viruses successfully integrated into the target cell, which can significantly raise the infection multiplicity compared to free virus transmission. In general, our study points towards the importance of infection multiplicity and cell-to-cell transmission for HIV evolution.
Collapse
Affiliation(s)
- Jesse Kreger
- Department of Mathematics, Rowland Hall, University of California, Irvine, CA 92697, USA
| | - Josephine Garcia
- Department of Basic Science, New York University College of Dentistry, 921 Schwartz Building, 345 East 24th Street, New York, NY 10010-9403, USA
| | - Hongtao Zhang
- Department of Basic Science, New York University College of Dentistry, 921 Schwartz Building, 345 East 24th Street, New York, NY 10010-9403, USA
| | - Natalia L Komarova
- Department of Mathematics, Rowland Hall, University of California, Irvine, CA 92697, USA
| | - Dominik Wodarz
- Department of Mathematics, Rowland Hall, University of California, Irvine, CA 92697, USA.,Department of Population Health and Disease Prevention, Program in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA 92697, USA
| | - David N Levy
- Department of Basic Science, New York University College of Dentistry, 921 Schwartz Building, 345 East 24th Street, New York, NY 10010-9403, USA
| |
Collapse
|
5
|
Del Amparo R, Vicens A, Arenas M. The influence of heterogeneous codon frequencies along sequences on the estimation of molecular adaptation. Bioinformatics 2020; 36:430-436. [PMID: 31304972 DOI: 10.1093/bioinformatics/btz558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 07/08/2019] [Accepted: 07/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The nonsynonymous/synonymous substitution rate ratio (dN/dS) is a commonly used parameter to quantify molecular adaptation in protein-coding data. It is known that the estimation of dN/dS can be biased if some evolutionary processes are ignored. In this concern, common ML methods to estimate dN/dS assume invariable codon frequencies among sites, despite this characteristic is rare in nature, and it could bias the estimation of this parameter. RESULTS Here we studied the influence of variable codon frequencies among genetic regions on the estimation of dN/dS. We explored scenarios varying the number of genetic regions that differ in codon frequencies, the amount of variability of codon frequencies among regions and the nucleotide frequencies at each codon position among regions. We found that ignoring heterogeneous codon frequencies among regions overall leads to underestimation of dN/dS and the bias increases with the level of heterogeneity of codon frequencies. Interestingly, we also found that varying nucleotide frequencies among regions at the first or second codon position leads to underestimation of dN/dS while variation at the third codon position leads to overestimation of dN/dS. Next, we present a methodology to reduce this bias based on the analysis of partitions presenting similar codon frequencies and we applied it to analyze four real datasets. We conclude that accounting for heterogeneous codon frequencies along sequences is required to obtain realistic estimates of molecular adaptation through this relevant evolutionary parameter. AVAILABILITY AND IMPLEMENTATION The applied frameworks for the computer simulations of protein-coding data and estimation of molecular adaptation are SGWE and PAML, respectively. Both are publicly available and referenced in the study. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Alberto Vicens
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology.,Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
6
|
Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Mol Phylogenet Evol 2015; 94:264-70. [PMID: 26358613 DOI: 10.1016/j.ympev.2015.09.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 08/31/2015] [Accepted: 09/01/2015] [Indexed: 10/23/2022]
Abstract
The understanding of the evolutionary processes underlying HIV-1 fitness recovery is fundamental for HIV-1 pathogenesis, antiretroviral treatment and vaccine design. It is known that HIV-1 can present very high mutation and recombination rates, however the specific contribution of these evolutionary forces in the "in vitro" viral fitness recovery has not been simultaneously quantified. To this aim, we analyzed substitution, recombination and molecular adaptation rates in a variety of HIV-1 biological clones derived from a viral isolate after severe population bottlenecks and a number of large population cell culture passages. These clones presented an overall but uneven fitness gain, mean of 3-fold, respect to the initial passage values. We found a significant relationship between the fitness increase and the appearance and fixation of mutations. In addition, these fixed mutations presented molecular signatures of positive selection through the accumulation of non-synonymous substitutions. Interestingly, viral recombination correlated with fitness recovery in most of studied viral quasispecies. The genetic diversity generated by these evolutionary processes was positively correlated with the viral fitness. We conclude that HIV-1 fitness recovery can be derived from the genetic heterogeneity generated through both mutation and recombination, and under diversifying molecular adaptation. The findings also suggest nonrandom evolutionary pathways for in vitro fitness recovery.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal; Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain.
| | - Ramon Lorenzo-Redondo
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| | - Cecilio Lopez-Galindez
- Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Majadahonda, Madrid, Spain.
| |
Collapse
|
7
|
Arenas M, Lopes JS, Beaumont MA, Posada D. CodABC: a computational framework to coestimate recombination, substitution, and molecular adaptation rates by approximate Bayesian computation. Mol Biol Evol 2015; 32:1109-12. [PMID: 25577191 PMCID: PMC4379410 DOI: 10.1093/molbev/msu411] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The estimation of substitution and recombination rates can provide important insights into the molecular evolution of protein-coding sequences. Here, we present a new computational framework, called "CodABC," to jointly estimate recombination, substitution and synonymous and nonsynonymous rates from coding data. CodABC uses approximate Bayesian computation with and without regression adjustment and implements a variety of codon models, intracodon recombination, and longitudinal sampling. CodABC can provide accurate joint parameter estimates from recombining coding sequences, often outperforming maximum-likelihood methods based on more approximate models. In addition, CodABC allows for the inclusion of several nuisance parameters such as those representing codon frequencies, transition matrices, heterogeneity across sites or invariable sites. CodABC is freely available from http://code.google.com/p/codabc/, includes a GUI, extensive documentation and ready-to-use examples, and can run in parallel on multicore machines.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| | - Joao S Lopes
- Instituto Gulbenkian de Ciencia, Oeiras, Portugal
| | - Mark A Beaumont
- School of Mathematical Sciences and School of Biological Sciences, University of Bristol, University Walk, Bristol, United Kingdom
| | - David Posada
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, Vigo, Spain
| |
Collapse
|
8
|
Arenas M, Posada D. Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 2014; 31:1295-301. [PMID: 24557445 PMCID: PMC3995339 DOI: 10.1093/molbev/msu078] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE—Simulation of Genome-Wide Evolution—is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | | |
Collapse
|
9
|
Coestimation of recombination, substitution and molecular adaptation rates by approximate Bayesian computation. Heredity (Edinb) 2013; 112:255-64. [PMID: 24149652 DOI: 10.1038/hdy.2013.101] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 08/22/2013] [Accepted: 09/17/2013] [Indexed: 11/08/2022] Open
Abstract
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.
Collapse
|
10
|
Affiliation(s)
- Miguel Arenas
- Computational and Molecular Population Genetics Lab-CMPG, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland.
| |
Collapse
|
11
|
Castro-Nallar E, Pérez-Losada M, Burton GF, Crandall KA. The evolution of HIV: inferences using phylogenetics. Mol Phylogenet Evol 2012; 62:777-92. [PMID: 22138161 PMCID: PMC3258026 DOI: 10.1016/j.ympev.2011.11.019] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Revised: 11/17/2011] [Accepted: 11/21/2011] [Indexed: 12/02/2022]
Abstract
Molecular phylogenetics has revolutionized the study of not only evolution but also disparate fields such as genomics, bioinformatics, epidemiology, ecology, microbiology, molecular biology and biochemistry. Particularly significant are its achievements in population genetics as a result of the development of coalescent theory, which have contributed to more accurate model-based parameter estimation and explicit hypothesis testing. The study of the evolution of many microorganisms, and HIV in particular, have benefited from these new methodologies. HIV is well suited for such sophisticated population analyses because of its large population sizes, short generation times, high substitution rates and relatively small genomes. All these factors make HIV an ideal and fascinating model to study molecular evolution in real time. Here we review the significant advances made in HIV evolution through the application of phylogenetic approaches. We first examine the relative roles of mutation and recombination on the molecular evolution of HIV and its adaptive response to drug therapy and tissue allocation. We then review some of the fundamental questions in HIV evolution in relation to its origin and diversification and describe some of the insights gained using phylogenies. Finally, we show how phylogenetic analysis has advanced our knowledge of HIV dynamics (i.e., phylodynamics).
Collapse
Affiliation(s)
- Eduardo Castro-Nallar
- Department of Biology, 401 Widtsoe Building, Brigham Young University, Provo, UT 84602-5181, USA.
| | | | | | | |
Collapse
|
12
|
Abstract
Throughout the living world, genetic recombination and nucleotide substitution are the primary processes that create the genetic variation upon which natural selection acts. Just as analyses of substitution patterns can reveal a great deal about evolution, so too can analyses of recombination. Evidence of genetic recombination within the genomes of apparently asexual species can equate with evidence of cryptic sexuality. In sexually reproducing species, nonrandom patterns of sequence exchange can provide direct evidence of population subdivisions that prevent certain individuals from mating. Although an interesting topic in its own right, an important reason for analysing recombination is to account for its potentially disruptive influences on various phylogenetic-based molecular evolution analyses. Specifically, the evolutionary histories of recombinant sequences cannot be accurately described by standard bifurcating phylogenetic trees. Taking recombination into account can therefore be pivotal to the success of selection, molecular clock and various other analyses that require adequate modelling of shared ancestry and draw increased power from accurately inferred phylogenetic trees. Here, we review various computational approaches to studying recombination and provide guidelines both on how to gain insights into this important evolutionary process and on how it can be properly accounted for during molecular evolution studies.
Collapse
Affiliation(s)
- Darren P Martin
- Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | | |
Collapse
|
13
|
Abstract
While a variety of methods exist to reconstruct ancestral sequences, all of them assume that a single phylogeny underlies all the positions in the alignment and therefore that recombination has not taken place. Using computer simulations we show that recombination can severely bias ancestral sequence reconstruction (ASR), and quantify this effect. If recombination is ignored, the ancestral sequences recovered can be quite distinct from the grand most recent common ancestor (GMRCA) of the sample and better resemble the concatenate of partial most recent common ancestors (MRCAs) at each recombination fragment. When independent phylogenetic trees are assumed for the different recombinant segments, the estimation of the fragment MRCAs improves significantly. Importantly, we show that recombination can change the biological predictions derived from ASRs carried out with real data. Given that recombination is widespread on nuclear genes and in particular in RNA viruses and some bacteria, the reconstruction of ancestral sequences in these cases should consider the potential impact of recombination and ideally be carried out using approaches that accommodate recombination.
Collapse
|
14
|
Comparative analysis of American Dengue virus type 1 full-genome sequences. Virus Genes 2009; 40:60-6. [PMID: 19997970 DOI: 10.1007/s11262-009-0428-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2009] [Accepted: 11/26/2009] [Indexed: 10/20/2022]
Abstract
Dengue virus (DENV; Genus Flavivirus, Family Flaviviridae) has been circulating in Brazil since at least the mid-1980s and continues to be responsible for sporadic cases of Dengue fever and Dengue hemorrhagic fever throughout this country. Here, we describe the full genomes of two new Brazilian DENV-serotype 1 (DENV-1) variants and analyze these together with all other available American DENV-1 full-genome sequences. Besides confirming the existence of various country-specific DENV-1 founder effects that have produced a high degree of geographical structure in the American DENV-1 population, we also identify that one of the new viruses is one of only three detectable intra-American DENV-1 recombinants. Although such obvious evidence of genetic exchange among epidemiologically unlinked Latin American DENV-1 sequences is relatively rare, we find that at the population-scale there exists substantial evidence of pervasive recombination that most likely occurs between viruses that are so genetically similar that it is not possible to reliably distinguish and characterize individual recombination events.
Collapse
|
15
|
Struchiner CJ, Massad E, Tu Z, Ribeiro JMC. The tempo and mode of evolution of transposable elements as revealed by molecular phylogenies reconstructed from mosquito genomes. Evolution 2009; 63:3136-46. [PMID: 19656180 PMCID: PMC2789996 DOI: 10.1111/j.1558-5646.2009.00788.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Although many mathematical models exist predicting the dynamics of transposable elements (TEs), there is a lack of available empirical data to validate these models and inherent assumptions. Genomes can provide a snapshot of several TE families in a single organism, and these could have their demographics inferred by coalescent analysis, allowing for the testing of theories on TE amplification dynamics. Using the available genomes of the mosquitoes Aedes aegypti and Anopheles gambiae, we indicate that such an approach is feasible. Our analysis follows four steps: (1) mining the two mosquito genomes currently available in search of TE families; (2) fitting, to selected families found in (1), a phylogeny tree under the general time-reversible (GTR) nucleotide substitution model with an uncorrelated lognormal (UCLN) relaxed clock and a nonparametric demographic model; (3) fitting a nonparametric coalescent model to the tree generated in (2); and (4) fitting parametric models motivated by ecological theories to the curve generated in (3).
Collapse
Affiliation(s)
- Claudio J Struchiner
- ENSP/FIOCRUZ and IMS/UERJ, Av. Brasil, 4365, Rio de Janeiro, Braxil 21040 360, Brazil.
| | | | | | | |
Collapse
|
16
|
Abstract
The coalescent with recombination is a very useful tool in molecular population genetics. Under this framework, genealogies often represent the evolution of the substitution unit, and because of this, the few coalescent algorithms implemented for the simulation of coding sequences force recombination to occur only between codons. However, it is clear that recombination is expected to occur most often within codons. Here we have developed an algorithm that can evolve coding sequences under an ancestral recombination graph that represents the genealogies at each nucleotide site, thereby allowing for intracodon recombination. The algorithm is a modification of Hudson's coalescent in which, in addition to keeping track of events occurring in the ancestral material that reaches the sample, we need to keep track of events occurring in ancestral material that does not reach the sample but that is produced by intracodon recombination. We are able to show that at typical substitution rates the number of nonsynonymous changes induced by intracodon recombination is small and that intracodon recombination does not generally result in inflated estimates of the overall nonsynonymous/synonymous substitution ratio (omega). On the other hand, recombination can bias the estimation of omega at particular codons, resulting in apparent rate variation among sites and in the spurious identification of positively selected sites. Importantly, in this case, allowing for variable synonymous rates across sites greatly reduces the false-positive rate and recovers statistical power. Finally, coalescent simulations with intracodon recombination could be used to better represent the evolution of nuclear coding genes or fast-evolving pathogens such as HIV-1.We have implemented this algorithm in a computer program called NetRecodon, freely available at http://darwin.uvigo.es.
Collapse
|
17
|
Lee HY, Perelson AS, Park SC, Leitner T. Dynamic correlation between intrahost HIV-1 quasispecies evolution and disease progression. PLoS Comput Biol 2008; 4:e1000240. [PMID: 19079613 PMCID: PMC2602878 DOI: 10.1371/journal.pcbi.1000240] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2008] [Accepted: 10/31/2008] [Indexed: 11/19/2022] Open
Abstract
Quantifying the dynamics of intrahost HIV-1 sequence evolution is one means of uncovering information about the interaction between HIV-1 and the host immune system. In the chronic phase of infection, common dynamics of sequence divergence and diversity have been reported. We developed an HIV-1 sequence evolution model that simulated the effects of mutation and fitness of sequence variants. The amount of evolution was described by the distance from the founder strain, and fitness was described by the number of offspring a parent sequence produces. Analysis of the model suggested that the previously observed saturation of divergence and decrease of diversity in later stages of infection can be explained by a decrease in the proportion of offspring that are mutants as the distance from the founder strain increases rather than due to an increase of viral fitness. The prediction of the model was examined by performing phylogenetic analysis to estimate the change in the rate of evolution during infection. In agreement with our modeling, in 13 out of 15 patients (followed for 3-12 years) we found that the rate of intrahost HIV-1 evolution was not constant but rather slowed down at a rate correlated with the rate of CD4+ T-cell decline. The correlation between the dynamics of the evolutionary rate and the rate of CD4+ T-cell decline, coupled with our HIV-1 sequence evolution model, explains previously conflicting observations of the relationships between the rate of HIV-1 quasispecies evolution and disease progression.
Collapse
Affiliation(s)
- Ha Youn Lee
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, NY, USA.
| | | | | | | |
Collapse
|
18
|
The Yin and Yang of linkage disequilibrium: mapping of genes and nucleotides conferring insecticide resistance in insect disease vectors. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008; 627:71-83. [PMID: 18510015 DOI: 10.1007/978-0-387-78225-6_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Genetic technologies developed in the last 20 years have lead to novel and exciting methods to identify genes and specific nucleotides within genes that control phenotypes in field collected organisms. In this review we define and explain two of these methods: linkage disequilibrium (LD) mapping and quantitative trait nucleotide (QTN) mapping. The power to detect valid genotype-phenotype associations with LD or QTN mapping depends critically on the extent to which segregating sites in a genome assort independently. LD mapping depends on markers being in disequilibrium with the genes that condition expression of the phenotype. In contrast, QTN mapping depends critically upon most proximal loci being at equilibrium. We show that both patterns actually exist in the genome of Anapheles gambiae, the most important malaria vector in sub-Saharan Africa while segregating sites appear to be largely in equilibrium throughout the genome of Aedes aegypti, the vector of Dengue and Yellow fever flaviviruses. We discuss additional approaches that will be needed to identify genes and nucleotides that control phenotypes in field collected organisms, focusing specifically on ongoing studies of genes conferring resistance to insecticides.
Collapse
|
19
|
Carvajal-Rodríguez A. Detecting recombination and diversifying selection in human alpha-papillomavirus. INFECTION GENETICS AND EVOLUTION 2008; 8:689-92. [PMID: 18675939 DOI: 10.1016/j.meegid.2008.07.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2008] [Revised: 07/04/2008] [Accepted: 07/08/2008] [Indexed: 11/26/2022]
Abstract
Intragenic recombination and selection analyses were performed in DNA sequences of human alpha-papillomavirus. Recombination was estimated and the corresponding breakpoints obtained by re-analyzing data grouped by phylogenetic and epidemiological criteria, using different alignment methods. Diversifying or positive selection has been scarcely studied in these viruses probably due to the high divergence rates. We have applied maximum likelihood, empirical Bayesian and maximum parsimony methods to detect the presence of positive selection. Within the HPV 16 type, significant positive selection was detected at the time of the separation of the African 1 and African 2 branches from the other populations. At the inter-type level, positive selection can be traced in some codons of the gene L2 of the high and low risk groups. These results indicate that positive selection could have been important in the evolution of HPV both at inter- and intra-type levels.
Collapse
Affiliation(s)
- A Carvajal-Rodríguez
- Dpto. de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36310 Vigo, Spain.
| |
Collapse
|
20
|
Carvajal-Rodríguez A. GENOMEPOP: a program to simulate genomes in populations. BMC Bioinformatics 2008; 9:223. [PMID: 18447924 PMCID: PMC2386491 DOI: 10.1186/1471-2105-9-223] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 04/30/2008] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation. RESULTS I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios. CONCLUSION GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTRxMG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].
Collapse
|
21
|
Conflict amongst chloroplast DNA sequences obscures the phylogeny of a group of Asplenium ferns. Mol Phylogenet Evol 2008; 48:176-87. [PMID: 18462954 DOI: 10.1016/j.ympev.2008.02.023] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Revised: 02/21/2008] [Accepted: 02/26/2008] [Indexed: 11/24/2022]
Abstract
A previous study of the relationships amongst three subgroups of the Austral Asplenium ferns found conflicting signal between the two chloroplast loci investigated. Because organelle genomes like those of chloroplasts and mitochondria are thought to be non-recombining, with a single evolutionary history, we sequenced four additional chloroplast loci with the expectation that this would resolve these relationships. Instead, the conflict was only magnified. Although tree-building analyses favoured one of the three possible trees, one of the alternative trees actually had one more supporting site (six versus five) and received greater support in spectral and neighbor-net analyses. Simulations suggested that chance alone was unlikely to produce strong support for two of the possible trees and none for the third. Likelihood permutation tests indicated that the concatenated chloroplast sequence data appeared to have experienced recombination. However, recombination between the chloroplast genomes of different species would be highly atypical, and corollary supporting observations, like chloroplast heteroplasmy, are lacking. Wider taxon sampling clarified the composition of the Austral group, but the conflicting signal meant analyses (e.g., morphological evolution, biogeographic) conditional on a well-supported phylogeny could not be performed.
Collapse
|
22
|
Disease progression and evolution of the HIV-1 env gene in 24 infected infants. INFECTION GENETICS AND EVOLUTION 2008; 8:110-20. [DOI: 10.1016/j.meegid.2007.10.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Revised: 10/23/2007] [Accepted: 10/24/2007] [Indexed: 11/23/2022]
|
23
|
Li N. The promise of composite likelihood methods for addressing computationally intensive challenges. ADVANCES IN GENETICS 2008; 60:637-654. [PMID: 18358335 DOI: 10.1016/s0065-2660(07)00422-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
High-dimensional genetic data, due to its complex correlation structure, poses an enormous challenge to standard likelihood-based methods for making statistical inference. As an approximation, composite likelihood has proved to be a successful strategy for some genetic applications. It has the potential to see even wider application and much research is needed. We first give a brief description of composite likelihood. The advantage of this method and potential challenges in inference are noted. Next, its applications in genetic studies are reviewed, specifically in estimating population genetics parameters such as recombination rate, and in multi-locus linkage disequilibrium mapping of disease genes with some discussion about future research directions.
Collapse
Affiliation(s)
- Na Li
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
24
|
Bisharat N, Cohen DI, Maiden MC, Crook DW, Peto T, Harding RM. The evolution of genetic structure in the marine pathogen, Vibrio vulnificus. INFECTION GENETICS AND EVOLUTION 2007; 7:685-93. [PMID: 17716955 DOI: 10.1016/j.meegid.2007.07.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Revised: 05/22/2007] [Accepted: 07/13/2007] [Indexed: 11/22/2022]
Abstract
Multi-locus sequence types (MLST) from a global collection of Vibrio vulnificus isolates were analysed for the contribution of recombination to the evolution of two divergent clusters of strains and a human-pathogenic hybrid genotype, which caused a disease outbreak in Israel. Recombination contributes more substantially than mutation to generating strain diversity. For allelic diversity within loci, the ratio of recombination to mutation events is approximately 2:1. The role of recombination relative to mutation in the generation of new MLST variants of V. vulnificus within the clusters is comparable to that of other highly recombining bacteria such as Neisseria meningitidis. However, across the divide between the two major clusters of V. vulnificus strains, there is substantial linkage disequilibrium, lower estimates for recombination rates and shorter estimates of recombination tract length. We account for these differences between V. vulnificus and N. meningitidis by attributing them to the presence of the unusual genetic structure within V. vulnificus. The reason for the presence of distinct and divergent genomes remains unresolved. Two possible explanations put forward for future study are first, ecologically based population structure within V. vulnificus and second, a recombination donor from a phenotypically differentiated species.
Collapse
Affiliation(s)
- Naiel Bisharat
- Department of Epidemiology and Preventive Medicine, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv, Israel.
| | | | | | | | | | | |
Collapse
|
25
|
Arenas M, Posada D. Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 2007; 8:458. [PMID: 18028540 PMCID: PMC2206059 DOI: 10.1186/1471-2105-8-458] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2007] [Accepted: 11/20/2007] [Indexed: 11/10/2022] Open
Abstract
Background Coalescent simulations have proven very useful in many population genetics studies. In order to arrive to meaningful conclusions, it is important that these simulations resemble the process of molecular evolution as much as possible. To date, no single coalescent program is able to simulate codon sequences sampled from populations with recombination, migration and growth. Results We introduce a new coalescent program, called Recodon, which is able to simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models. Conclusion Recodon is a flexible tool for the simulation of coding DNA sequences under realistic evolutionary models. These simulations can be used to build parameter distributions for testing evolutionary hypotheses using experimental data. Recodon is written in C, can run in parallel, and is freely available from .
Collapse
Affiliation(s)
- Miguel Arenas
- Departamento de Bioquímica, Genética e Inmunología, Universidad de Vigo, 36310 Vigo, Spain.
| | | |
Collapse
|
26
|
Hoggart CJ, Chadeau-Hyam M, Clark TG, Lampariello R, Whittaker JC, De Iorio M, Balding DJ. Sequence-level population simulations over large genomic regions. Genetics 2007; 177:1725-31. [PMID: 17947444 PMCID: PMC2147962 DOI: 10.1534/genetics.106.069088] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Accepted: 08/30/2007] [Indexed: 11/18/2022] Open
Abstract
Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic diversity, and for assessing the performance of statistical techniques, for example those designed to detect and measure the genomic effects of selection. It is also used to investigate the effectiveness of various design options for genetic association studies. Backward-in-time simulation methods are computationally efficient and have become widely used since their introduction in the 1980s. The forward-in-time approach has substantial advantages in terms of accuracy and modeling flexibility, but at greater computational cost. We have developed flexible and efficient simulation software and a rescaling technique to aid computational efficiency that together allow the simulation of sequence-level data over large genomic regions in entire diploid populations under various scenarios for demography, mutation, selection, and recombination, the latter including hotspots and gene conversion. Our forward evolution of genomic regions (FREGENE) software is freely available from www.ebi.ac.uk/projects/BARGEN together with an ancillary program to generate phenotype labels, either binary or quantitative. In this article we discuss limitations of coalescent-based simulation, introduce the rescaling technique that makes large-scale forward-in-time simulation feasible, and demonstrate the utility of various features of FREGENE, many not previously available.
Collapse
Affiliation(s)
- Clive J Hoggart
- Department of Epidemiology and Public Health, Imperial College, London W2 1PG, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
27
|
Boni MF, Posada D, Feldman MW. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 2007; 176:1035-47. [PMID: 17409078 PMCID: PMC1894573 DOI: 10.1534/genetics.106.068874] [Citation(s) in RCA: 599] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2006] [Accepted: 03/18/2007] [Indexed: 11/18/2022] Open
Abstract
Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Delta(m,n,b). We present a method for rapidly calculating the distribution of Delta(m,n,b) and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.
Collapse
Affiliation(s)
- Maciej F Boni
- Stanford Genome Technology Center, Palo Alto, California 94304, USA.
| | | | | |
Collapse
|
28
|
Evidence of recombination within human alpha-papillomavirus. Virol J 2007; 4:33. [PMID: 17391520 PMCID: PMC1847806 DOI: 10.1186/1743-422x-4-33] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 03/28/2007] [Indexed: 11/21/2022] Open
Abstract
Background Human papillomavirus (HPV) has a causal role in cervical cancer with almost half a million new cases occurring each year. Presence of the carcinogenic HPV is necessary for the development of the invasive carcinoma of the genital tract. Therefore, persistent infection with carcinogenic HPV causes virtually all cervical cancers. Some aspects of the molecular evolution of this virus, as the putative importance of recombination in its evolutionary history, are an opened current question. In addition, recombination could also be a significant issue nowadays since the frequency of co-infection with more than one HPV type is not a rare event and, thus, new recombinant types could be currently being generated. Results We have used human alpha-PV sequences from the public database at Los Alamos National Laboratory to report evidence that recombination may exist in this virus. A model-based population genetic approach was used to infer the recombination signal from the HPV DNA sequences grouped attending to phylogenetic and epidemiological information, as well as to clinical manifestations. Our results agree with recently published ones that use a different methodology to detect recombination associated to the gene L2. In addition, we have detected significant recombination signal in the genes E6, E7, L2 and L1 at different groups, and importantly within the high-risk type HPV16. The method used has recently been shown to be one of the most powerful and reliable procedures to detect the recombination signal. Conclusion We provide new support to the recent evidence of recombination in HPV. Additionally, we performed the recombination estimation assuming the best-fit model of nucleotide substitution and rate variation among sites, of the HPV DNA sequence sets. We found that the gene with recombination in most of the groups is L2 but the highest values were detected in L1 and E6. Gene E7 was recombinant only within the HPV16 type. The topic deserves further study because recombination is an important evolutionary mechanism that could have high impact both in pharmacogenomics (i.e. on the influence of genetic variation on the response to drugs) and for vaccine development.
Collapse
|
29
|
Lemey P, Kosakovsky Pond SL, Drummond AJ, Pybus OG, Shapiro B, Barroso H, Taveira N, Rambaut A. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput Biol 2007; 3:e29. [PMID: 17305421 PMCID: PMC1797821 DOI: 10.1371/journal.pcbi.0030029] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Accepted: 12/29/2006] [Indexed: 12/02/2022] Open
Abstract
Upon HIV transmission, some patients develop AIDS in only a few months, while others remain disease free for 20 or more years. This variation in the rate of disease progression is poorly understood and has been attributed to host genetics, host immune responses, co-infection, viral genetics, and adaptation. Here, we develop a new “relaxed-clock” phylogenetic method to estimate absolute rates of synonymous and nonsynonymous substitution through time. We identify an unexpected association between the synonymous substitution rate of HIV and disease progression parameters. Since immune activation is the major determinant of HIV disease progression, we propose that this process can also determine viral generation times, by creating favourable conditions for HIV replication. These conclusions may apply more generally to HIV evolution, since we also observed an overall low synonymous substitution rate for HIV-2, which is known to be less pathogenic than HIV-1 and capable of tempering the detrimental effects of immune activation. Humoral immune responses, on the other hand, are the major determinant of nonsynonymous rate changes through time in the envelope gene, and our relaxed-clock estimates support a decrease in selective pressure as a consequence of immune system collapse. During the clinical course of HIV infection, an asymptomatic phase always precedes the acquired immunodeficiency syndrome (AIDS). The duration of this asymptomatic phase is highly variable among patients and reflects the rate at which the immune system gradually deteriorates. Although humoral and cell-mediated immune responses are mounted against HIV, continuous replication and adaptation allows the virus to escape host immune responses. To gain a better understanding of the role of viral evolution in disease progression, we developed a new computational technique that can estimate changes in the absolute rates of synonymous and nonsynonymous divergence through time from molecular sequences. Using this type of evolutionary inference, we have identified a previously unknown association between the “silent” evolutionary rate of HIV and the rate of disease progression in infected individuals. This finding demonstrates that cellular immune processes, which are already known to determine HIV pathogenesis, also determine viral replication rates and therefore impose important constraints on HIV evolution.
Collapse
Affiliation(s)
- Philippe Lemey
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Pérez-Losada M, Porter ML, Tazi L, Crandall KA. New methods for inferring population dynamics from microbial sequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2007; 7:24-43. [PMID: 16627010 PMCID: PMC1949847 DOI: 10.1016/j.meegid.2006.03.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Revised: 03/13/2006] [Accepted: 03/14/2006] [Indexed: 11/26/2022]
Abstract
The reduced cost of high throughput sequencing, increasing automation, and the amenability of sequence data for evolutionary analysis are making DNA data (or the corresponding amino acid sequences) the molecular marker of choice for studying microbial population genetics and phylogenetics. Concomitantly, due to the ever-increasing computational power, new, more accurate (and sometimes faster), sequence-based analytical approaches are being developed and applied to these new data. Here we review some commonly used, recently improved, and newly developed methodologies for inferring population dynamics and evolutionary relationships using nucleotide and amino acid sequence data, including: alignment, model selection, bifurcating and network phylogenetic approaches, and methods for estimating demographic history, population structure, and population parameters (recombination, genetic diversity, growth, and natural selection). Because of the extensive literature published on these topics this review cannot be comprehensive in its scope. Instead, for all the methods discussed we introduce the approaches we think are particularly useful for analyses of microbial sequences and where possible, include references to recent and more inclusive reviews.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Department of Integrative Biology, 157 Widtsoe Building, Brigham Young University, Provo, UT 84602, USA.
| | | | | | | |
Collapse
|
31
|
Ruderfer DM, Pratt SC, Seidel HS, Kruglyak L. Population genomic analysis of outcrossing and recombination in yeast. Nat Genet 2006; 38:1077-81. [PMID: 16892060 DOI: 10.1038/ng1859] [Citation(s) in RCA: 172] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2006] [Accepted: 07/10/2006] [Indexed: 11/09/2022]
Abstract
The budding yeast Saccharomyces cerevisiae has been used by humans for millennia to make wine, beer and bread. More recently, it became a key model organism for studies of eukaryotic biology and for genomic analysis. However, relatively little is known about the natural lifestyle and population genetics of yeast. One major question is whether genetically diverse yeast strains mate and recombine in the wild. We developed a method to infer the evolutionary history of a species from genome sequences of multiple individuals and applied it to whole-genome sequence data from three strains of Saccharomyces cerevisiae and the sister species Saccharomyces paradoxus. We observed a pattern of sequence variation among yeast strains in which ancestral recombination events lead to a mosaic of segments with shared genealogy. Based on sequence divergence and the inferred median size of shared segments (approximately 2,000 bp), we estimated that although any two strains have undergone approximately 16 million cell divisions since their last common ancestor, only 314 outcrossing events have occurred during this time (roughly one every 50,000 divisions). Local correlations in polymorphism rates indicate that linkage disequilibrium in yeast should extend over kilobases. Our results provide the initial foundation for population studies of association between genotype and phenotype in S. cerevisiae.
Collapse
Affiliation(s)
- Douglas M Ruderfer
- Lewis-Sigler Institute for Integrative Genomics and Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA
| | | | | | | |
Collapse
|
32
|
Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW. Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm. Mol Biol Evol 2006; 23:1891-901. [PMID: 16818476 DOI: 10.1093/molbev/msl051] [Citation(s) in RCA: 723] [Impact Index Per Article: 38.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.
Collapse
|