1
|
Ji J, Roberts T, Flouri T, Yang Z. Inference of Cross-Species Gene Flow Using Genomic Data Depends on the Methods: Case Study of Gene Flow in Drosophila. Syst Biol 2025:syaf019. [PMID: 40421982 DOI: 10.1093/sysbio/syaf019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 10/25/2024] [Accepted: 03/11/2025] [Indexed: 05/28/2025] Open
Abstract
Analysis of genomic data in the past two decades has highlighted the prevalence of introgression as an important evolutionary force in both plants and animals. The genus Drosophila has received much attention recently, with an analysis of genomic sequence data revealing widespread introgression across the species phylogeny for the genus. However, the methods used in the study are based on data summaries for species triplets and are unable to infer gene flow between sister lineages or to identify the direction of gene flow. Hence, we reanalyze a subset of the data using the Bayesian program bpp, which is a full-likelihood implementation of the multispecies coalescent model and can provide more powerful inference of gene flow between species, including its direction, timing, and strength. While our analysis supports the presence of gene flow in the species group, the results differ from the previous study: we infer gene flow between sister lineages undetected previously whereas most gene-flow events inferred in the previous study are rejected in our tests. To verify our conclusions, we performed simulations to examine the properties of Bayesian and summary methods. Bpp was found to have high power to detect gene flow, high accuracy in estimated rates of gene flow, and robustness under misspecification of the mode of gene flow. In contrast, summary methods had low power and produced biased estimates of introgression probability. Our results highlight an urgent need for improving the statistical properties of summary methods and the computational efficiency of likelihood methods for inferring gene flow using genomic sequence data.
Collapse
Affiliation(s)
- Jiayi Ji
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Thomas Roberts
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
2
|
Magalhães FDM, Oliveira EF, Garda AA, Burbrink FT, Gehara M. Genomic data support reticulate evolution in whiptail lizards from the Brazilian Caatinga. Mol Phylogenet Evol 2025; 204:108280. [PMID: 39725181 DOI: 10.1016/j.ympev.2024.108280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 12/16/2024] [Accepted: 12/21/2024] [Indexed: 12/28/2024]
Abstract
Species relationships have traditionally been represented by phylogenetic trees, but not all evolutionary histories fit into bifurcating divergence models. Introgressive hybridization challenges this assumption by sometimes [or maybe often] leading to mitochondrial introgression, wherein one species' mitochondrial genome is entirely replaced by another's (mitochondrial capture). Such processes result in mitonuclear discrepancies, complicating species delimitation and phylogenetic inference. In our study, we used ultraconserved elements (UCE) and mitogenomic data to investigate the evolutionary history of the Ameivula ocellifera complex, a group of South American whiptail lizards widely distributed in semiarid environments of the Caatinga Domain in Brazil. We examine mitonuclear discordances, assessing reticulate evolution, evaluating species limits, and testing for adaptive mitochondrial capture that could explain higher introgression in the mitochondrial genome compared to nuclear DNA. Our findings support the occurrence of an ancient reticulation event during the diversification of these lizards, driven by introgressive hybridization, leading to mitochondrial capture, and explaining mitonuclear discrepancies. Overall, we did not find clear evidence of positive selection across mitochondrial protein-coding genes suggesting adaptive mitochondrial capture of individuals with introgressed mtDNA. Thus, the genetic diversification and mitogenome evolution could be neutral, with selection against hybridization in the autosomal loci only, or even mediated by mitonuclear incompatibilities. Analyses of mtDNA genomes alongside network and species delimitation methods were crucial for identifying and validating individuals with introgressed mtDNA as a distinct species, demonstrating the potential of genome sampling, and using innovative analytical techniques for elucidating speciation processes in the presence of introgressive hybridization.
Collapse
Affiliation(s)
- Felipe de M Magalhães
- Department of Earth and Environmental Sciences, Rutgers University, Newark, NJ, USA; Programa de Pós-Graduação em Ciências Biológicas, Centro de Ciências Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil.
| | - Eliana F Oliveira
- Instituto de Biociências, Universidade Federal de Mato Grosso do Sul, Campo Grande, Mato Grosso do Sul, Brazil
| | - Adrian A Garda
- Laboratório de Anfíbios e Répteis (LAR), Departamento de Botânica e Zoologia da Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
| | - Frank T Burbrink
- Department of Herpetology, The American Museum of Natural History, New York, NY, USA
| | - Marcelo Gehara
- Department of Earth and Environmental Sciences, Rutgers University, Newark, NJ, USA
| |
Collapse
|
3
|
Kornai D, Jiao X, Ji J, Flouri T, Yang Z. Hierarchical Heuristic Species Delimitation Under the Multispecies Coalescent Model with Migration. Syst Biol 2024; 73:1015-1037. [PMID: 39180155 PMCID: PMC11637770 DOI: 10.1093/sysbio/syae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 08/12/2024] [Accepted: 08/20/2024] [Indexed: 08/26/2024] Open
Abstract
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for 1 species versus MSC for 2 species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively, heuristic criteria based on population parameters (such as population split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here, we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (gdi) and implement them in a Python pipeline called hhsd. We characterize the behavior of the gdi under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as 3 empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to oversplitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.
Collapse
Affiliation(s)
- Daniel Kornai
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Xiyun Jiao
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Jiayi Ji
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
4
|
Fonseca LHM, Asselman P, Goodrich KR, Nge FJ, Soulé V, Mercier K, Couvreur TLP, Chatrou LW. Truly the best of both worlds: Merging lineage-specific and universal probe kits to maximize phylogenomic inference. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11615. [PMID: 39628541 PMCID: PMC11610415 DOI: 10.1002/aps3.11615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 12/06/2024]
Abstract
Premise Hybridization capture kits are now commonly used for reduced representation approaches in genomic sequencing, with both universal and clade-specific kits available. Here, we present a probe kit targeting 799 low-copy genes for the plant family Annonaceae. Methods This new version of the kit combines the original 469 genes from the previous Annonaceae kit with 334 genes from the universal Angiosperms353 kit. We also compare the results obtained using the original Angiosperms353 kit with our custom approach using a subset of specimens. Parsimony-informative sites and the results of maximum likelihood phylogenetic inference were assessed for combined matrices using the genera Asimina and Deeringothamnus. Results The Annonaceae799 genes derived from the Angiosperms353 kit have extremely high recovery rates. Off-target reads were also detected. When evaluating size, the proportion of on- and off-target regions, and the number of parsimony-informative sites, the genes incorporated from the Angiosperms353 panel generally outperformed the genes from the original Annonaceae probe kit. Discussion We demonstrated that the new sequences from the Angiosperms353 probe set are variable and relevant for future studies on species-level phylogenomics and within-species studies in the Annonaceae. The integration of kits also establishes a connection between projects and makes new genes available for phylogenetic and population studies.
Collapse
Affiliation(s)
- Luiz Henrique M. Fonseca
- Systematic and Evolutionary Botany Laboratory, Department of BiologyGhent UniversityGhentBelgium
| | - Pieter Asselman
- Systematic and Evolutionary Botany Laboratory, Department of BiologyGhent UniversityGhentBelgium
| | | | - Francis J. Nge
- Institute of Research for Development (IRD), UMR DIADEUniversité de MontpellierMontpellierFrance
| | - Vincent Soulé
- Institute of Research for Development (IRD), UMR DIADEUniversité de MontpellierMontpellierFrance
| | - Kathryn Mercier
- Department of BiologyCity College of New YorkNew YorkNew YorkUSA
- The Graduate Center of the City University of New YorkNew YorkNew YorkUSA
| | - Thomas L. P. Couvreur
- Institute of Research for Development (IRD), UMR DIADEUniversité de MontpellierMontpellierFrance
| | - Lars W. Chatrou
- Systematic and Evolutionary Botany Laboratory, Department of BiologyGhent UniversityGhentBelgium
| |
Collapse
|
5
|
Herrig DK, Ridenbaugh RD, Vertacnik KL, Everson KM, Sim SB, Geib SM, Weisrock DW, Linnen CR. Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies. Syst Biol 2024; 73:839-860. [PMID: 38970484 DOI: 10.1093/sysbio/syae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/08/2024] Open
Abstract
Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.
Collapse
Affiliation(s)
- Danielle K Herrig
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Ryan D Ridenbaugh
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kim L Vertacnik
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kathryn M Everson
- Department of Natural Resources and Environmental Science, University of Nevada, 1664 N. Virginia St., Reno, NV 89557, USA
- Department of Integrative Biology, Oregon State University, 4575 SW Research Way, Corvallis, OR 97333, USA
| | - Sheina B Sim
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - Scott M Geib
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Catherine R Linnen
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| |
Collapse
|
6
|
Zhang Z, Liu G, Li M. Incomplete lineage sorting and gene flow within Allium (Amayllidaceae). Mol Phylogenet Evol 2024; 195:108054. [PMID: 38471599 DOI: 10.1016/j.ympev.2024.108054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/01/2024] [Accepted: 03/07/2024] [Indexed: 03/14/2024]
Abstract
The phylogeny and systematics of the genus Allium have been studied with a variety of diverse data types, including an increasing amount of molecular data. However, strong phylogenetic discordance and high levels of uncertainty have prevented the identification of a consistent phylogeny. The difficulty in establishing phylogenetic consensus and evidence for genealogical discordance make Allium a compelling test case to assess the relative contribution of incomplete lineage sorting (ILS), gene flow and gene tree estimation error on phylogenetic reconstruction. In this study, we obtained 75 transcriptomes of 38 Allium species across 10 subgenera. Whole plastid genome, single copy genes and consensus CDS were generated to estimate phylogenetic trees both using coalescence and concatenation methods. Multiple approaches including coalescence simulation, quartet sampling, reticulate network inference, sequence simulation, theta of ILS and reticulation index were carried out across the CDS gene trees to investigate the degrees of ILS, gene flow and gene tree estimation error. Afterward, a regression analysis was used to test the relative contributions of each of these forms of uncertainty to the final phylogeny. Despite extensive topological discordance among gene trees, we found a fully supported species tree that agrees with the most of well-accepted relationships and establishes monophyly of the genus Allium. We presented clear evidence for substantial ILS across the phylogeny of Allium. Further, we identified two ancient hybridization events for the formation of the second evolutionary line and subg. Butomissa as well as several introgression events between recently diverged species. Our regression analysis revealed that gene tree inference error and gene flow were the two most dominant factors explaining for the overall gene tree variation, with the difficulty in disentangling the effects of ILS and gene tree estimation error due to a positive correlation between them. Based on our efforts to mitigate the methodological errors in reconstructing trees, we believed ILS and gene flow are two principal reasons for the oft-reported phylogenetic heterogeneity of Allium. This study presents a strongly-supported and well-resolved phylogenetic backbone for the sampled Allium species, and exemplifies how to untangle heterogeneity in phylogenetic signal and reconstruct the true evolutionary history of the target taxa.
Collapse
Affiliation(s)
- ZengZhu Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Gang Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China
| | - Minjie Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, People's Republic of China.
| |
Collapse
|
7
|
Pang XX, Zhang DY. Detection of Ghost Introgression Requires Exploiting Topological and Branch Length Information. Syst Biol 2024; 73:207-222. [PMID: 38224495 PMCID: PMC11129598 DOI: 10.1093/sysbio/syad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 12/17/2023] [Accepted: 12/27/2023] [Indexed: 01/17/2024] Open
Abstract
In recent years, the study of hybridization and introgression has made significant progress, with ghost introgression-the transfer of genetic material from extinct or unsampled lineages to extant species-emerging as a key area for research. Accurately identifying ghost introgression, however, presents a challenge. To address this issue, we focused on simple cases involving 3 species with a known phylogenetic tree. Using mathematical analyses and simulations, we evaluated the performance of popular phylogenetic methods, including HyDe and PhyloNet/MPL, and the full-likelihood method, Bayesian Phylogenetics and Phylogeography (BPP), in detecting ghost introgression. Our findings suggest that heuristic approaches relying on site-pattern counts or gene-tree topologies struggle to differentiate ghost introgression from introgression between sampled non-sister species, frequently leading to incorrect identification of donor and recipient species. The full-likelihood method BPP uses multilocus sequence alignments directly-hence taking into account both gene-tree topologies and branch lengths, by contrast, is capable of detecting ghost introgression in phylogenomic datasets. We analyzed a real-world phylogenomic dataset of 14 species of Jaltomata (Solanaceae) to showcase the potential of full-likelihood methods for accurate inference of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
8
|
Frankel LE, Ané C. Summary Tests of Introgression Are Highly Sensitive to Rate Variation Across Lineages. Syst Biol 2023; 72:1357-1369. [PMID: 37698548 DOI: 10.1093/sysbio/syad056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/07/2023] [Accepted: 08/29/2023] [Indexed: 09/13/2023] Open
Abstract
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.
Collapse
Affiliation(s)
- Lauren E Frankel
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
9
|
Thawornwattana Y, Seixas F, Yang Z, Mallet J. Major patterns in the introgression history of Heliconius butterflies. eLife 2023; 12:RP90656. [PMID: 38108819 PMCID: PMC10727504 DOI: 10.7554/elife.90656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023] Open
Abstract
Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the 'melpomene-silvaniform' group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that 'silvaniform' species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression.
Collapse
Affiliation(s)
| | - Fernando Seixas
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard UniversityCambridgeUnited States
| |
Collapse
|
10
|
Flouri T, Jiao X, Huang J, Rannala B, Yang Z. Efficient Bayesian inference under the multispecies coalescent with migration. Proc Natl Acad Sci U S A 2023; 120:e2310708120. [PMID: 37871206 PMCID: PMC10622872 DOI: 10.1073/pnas.2310708120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 08/15/2023] [Indexed: 10/25/2023] Open
Abstract
Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| | - Xiyun Jiao
- Department of Statistics and Data Science, China Southern University of Science and Technology, Shenzhen518055, China
| | - Jun Huang
- Department of Intelligent Medical Engineering, School of Biomedical Engineering, Capital Medical University, Beijing100069, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Ziheng Yang
- Department of Genetics, Evolution, and Environment, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
11
|
Thawornwattana Y, Huang J, Flouri T, Mallet J, Yang Z. Inferring the Direction of Introgression Using Genomic Sequence Data. Mol Biol Evol 2023; 40:msad178. [PMID: 37552932 PMCID: PMC10439365 DOI: 10.1093/molbev/msad178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/01/2023] [Accepted: 08/02/2023] [Indexed: 08/10/2023] Open
Abstract
Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.
Collapse
Affiliation(s)
| | - Jun Huang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
12
|
Ji J, Jackson DJ, Leaché AD, Yang Z. Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks. Syst Biol 2023; 72:446-465. [PMID: 36504374 PMCID: PMC10275556 DOI: 10.1093/sysbio/syac077] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 11/15/2022] [Accepted: 12/01/2022] [Indexed: 10/25/2023] Open
Abstract
In the past two decades, genomic data have been widely used to detect historical gene flow between species in a variety of plants and animals. The Tamias quadrivittatus group of North America chipmunks, which originated through a series of rapid speciation events, are known to undergo massive amounts of mitochondrial introgression. Yet in a recent analysis of targeted nuclear loci from the group, no evidence for cross-species introgression was detected, indicating widespread cytonuclear discordance. The study used the heuristic method HYDE to detect gene flow, which may suffer from low power. Here we use the Bayesian method implemented in the program BPP to re-analyze these data. We develop a Bayesian test of introgression, calculating the Bayes factor via the Savage-Dickey density ratio using the Markov chain Monte Carlo (MCMC) sample under the model of introgression. We take a stepwise approach to constructing an introgression model by adding introgression events onto a well-supported binary species tree. The analysis detected robust evidence for multiple ancient introgression events affecting the nuclear genome, with introgression probabilities reaching 63%. We estimate population parameters and highlight the fact that species divergence times may be seriously underestimated if ancient cross-species gene flow is ignored in the analysis. We examine the assumptions and performance of HYDE and demonstrate that it lacks power if gene flow occurs between sister lineages or if the mode of gene flow does not match the assumed hybrid-speciation model with symmetrical population sizes. Our analyses highlight the power of likelihood-based inference of cross-species gene flow using genomic sequence data. [Bayesian test; BPP; chipmunks; introgression; MSci; multispecies coalescent; Savage-Dickey density ratio.].
Collapse
Affiliation(s)
- Jiayi Ji
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Donavan J Jackson
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Adam D Leaché
- Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
13
|
Huang J, Thawornwattana Y, Flouri T, Mallet J, Yang Z. Inference of Gene Flow between Species under Misspecified Models. Mol Biol Evol 2022; 39:6783212. [PMID: 36317198 PMCID: PMC9729068 DOI: 10.1093/molbev/msac237] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.
Collapse
Affiliation(s)
| | | | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | | |
Collapse
|
14
|
LeMay M, Libeskind-Hadas R, Wu YC. A Polynomial-Time Algorithm for Minimizing the Deep Coalescence Cost for Level-1 Species Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2642-2653. [PMID: 34406946 DOI: 10.1109/tcbb.2021.3105922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Phylogenetic analyses commonly assume that the species history can be represented as a tree. However, in the presence of hybridization, the species history is more accurately captured as a network. Despite several advances in modeling phylogenetic networks, there is no known polynomial-time algorithm for parsimoniously reconciling gene trees with species networks while accounting for incomplete lineage sorting. To address this issue, we present a polynomial-time algorithm for the case of level-1 networks, in which no hybrid species is the direct ancestor of another hybrid species. This work enables more efficient reconciliation of gene trees with species networks, which in turn, enables more efficient reconstruction of species networks.
Collapse
|
15
|
Out of chaos: Phylogenomics of Asian Sonerileae. Mol Phylogenet Evol 2022; 175:107581. [PMID: 35810973 DOI: 10.1016/j.ympev.2022.107581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 05/23/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
Sonerileae is a diverse Melastomataceae lineage comprising ca. 1000 species in 44 genera, with >70% of genera and species distributed in Asia. Asian Sonerileae are taxonomically intractable with obscure generic circumscriptions. The backbone phylogeny of this group remains poorly resolved, possibly due to complexity caused by rapid species radiation in early and middle Miocene, which hampers further systematic study. Here, we used genome resequencing data to reconstruct the phylogeny of Asian Sonerileae. Three parallel datasets, viz. single-copy ortholog (SCO), genomic SNPs, and whole plastome, were assembled from genome resequencing data of 205 species for this purpose. Based on these genome-scale data, we provided the first well resolved phylogeny of Asian Sonerileae, with 34 major clades identified and 74% of the interclade relationships consistently resolved by both SCO and genomic data. Meanwhile, widespread phylogenetic discordance was detected among SCO gene trees as well as species trees reconstructed using different tree estimation methods (concatenation/site-based coalescent method/summary method) or different datasets (SCO/genomic/plastome). We explored sources of discordance using multiple approaches and found that the observed discordance in Asian Sonerileae was mainly caused by a combination of biased distribution of missing data, random noise from uninformative genes, incomplete lineage sorting, and hybridization/introgression. Exploration of these sources can enable us to generate hypotheses for future testing, which is the first step towards understanding the evolution of Asian Sonerileae. We also detected high levels of homoplasy for some characters traditionally used in taxonomy, which explains current chaotic generic delimitations. The backbone phylogeny of Asian Sonerileae revealed in this study offers a solid basis for future taxonomic revision at the generic level.
Collapse
|
16
|
Pang XX, Zhang DY. Impact of Ghost Introgression on Coalescent-based Species Tree Inference and Estimation of Divergence Time. Syst Biol 2022; 72:35-49. [PMID: 35799362 DOI: 10.1093/sysbio/syac047] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 06/25/2022] [Accepted: 07/05/2022] [Indexed: 11/15/2022] Open
Abstract
The species studied in any evolutionary investigation generally constitute a small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has rarely been studied and is poorly understood. Here, we use mathematical analysis and simulations to examine the robustness of species tree methods based on the multispecies coalescent model to introgression from a ghost or extant lineage. We found that many results originally obtained for introgression between extant species can easily be extended to ghost introgression, such as the strongly interactive effects of incomplete lineage sorting (ILS) and introgression on the occurrence of anomalous gene trees (AGTs). The relative performance of the summary species tree method (ASTRAL) and the full-likelihood method (*BEAST) varies under different introgression scenarios, with the former being more robust to gene flow between non-sister species whereas the latter performing better under certain conditions of ghost introgression. When an outgroup ghost (defined as a lineage that diverged before the most basal species under investigation) acts as the donor of the introgressed genes, the time of root divergence among the investigated species generally was overestimated, whereas ingroup introgression, as commonly perceived, can only lead to underestimation. In many cases of ingroup introgression that may or may not involve ghost lineages, the stronger the ILS, the higher the accuracy achieved in estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.
Collapse
Affiliation(s)
- Xiao-Xu Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
17
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
18
|
Yang Z, Flouri T. Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability. Mol Biol Evol 2022; 39:msac083. [PMID: 35417543 PMCID: PMC9087891 DOI: 10.1093/molbev/msac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.
Collapse
Affiliation(s)
- Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | | |
Collapse
|
19
|
Identifiability of species network topologies from genomic sequences using the logDet distance. J Math Biol 2022; 84:35. [PMID: 35385988 DOI: 10.1007/s00285-022-01734-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/12/2022] [Accepted: 03/02/2022] [Indexed: 10/18/2022]
Abstract
Inference of network-like evolutionary relationships between species from genomic data must address the interwoven signals from both gene flow and incomplete lineage sorting. The heavy computational demands of standard approaches to this problem severely limit the size of datasets that may be analyzed, in both the number of species and the number of genetic loci. Here we provide a theoretical pointer to more efficient methods, by showing that logDet distances computed from genomic-scale sequences retain sufficient information to recover network relationships in the level-1 ultrametric case. This result is obtained under the Network Multispecies Coalescent model combined with a mixture of General Time-Reversible sequence evolution models across individual gene trees. It applies to both unlinked site data, such as for SNPs, and to sequence data in which many contiguous sites may have evolved on a common tree, such as concatenated gene sequences. Thus under standard stochastic models statistically justifiable inference of network relationships from sequences can be accomplished without consideration of individual genes or gene trees.
Collapse
|
20
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
21
|
Hibbins MS, Hahn MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics 2022; 220:iyab173. [PMID: 34788444 PMCID: PMC9208645 DOI: 10.1093/genetics/iyab173] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/02/2021] [Indexed: 12/26/2022] Open
Abstract
Phylogenomics has revealed the remarkable frequency with which introgression occurs across the tree of life. These discoveries have been enabled by the rapid growth of methods designed to detect and characterize introgression from whole-genome sequencing data. A large class of phylogenomic methods makes use of data across species to infer and characterize introgression based on expectations from the multispecies coalescent. These methods range from simple tests, such as the D-statistic, to model-based approaches for inferring phylogenetic networks. Here, we provide a detailed overview of the various signals that different modes of introgression are expected leave in the genome, and how current methods are designed to detect them. We discuss the strengths and pitfalls of these approaches and identify areas for future development, highlighting the different signals of introgression, and the power of each method to detect them. We conclude with a discussion of current challenges in inferring introgression and how they could potentially be addressed.
Collapse
Affiliation(s)
- Mark S Hibbins
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
22
|
Cardoni S, Piredda R, Denk T, Grimm GW, Papageorgiou AC, Schulze E, Scoppola A, Salehi Shanjani P, Suyama Y, Tomaru N, Worth JRP, Cosimo Simeone M. 5S-IGS rDNA in wind-pollinated trees (Fagus L.) encapsulates 55 million years of reticulate evolution and hybrid origins of modern species. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 109:909-926. [PMID: 34808015 PMCID: PMC9299691 DOI: 10.1111/tpj.15601] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 11/02/2021] [Accepted: 11/18/2021] [Indexed: 05/31/2023]
Abstract
Standard models of plant speciation assume strictly dichotomous genealogies in which a species, the ancestor, is replaced by two offspring species. The reality in wind-pollinated trees with long evolutionary histories is more complex: species evolve from other species through isolation when genetic drift exceeds gene flow; lineage mixing can give rise to new species (hybrid taxa such as nothospecies and allopolyploids). The multi-copy, potentially multi-locus 5S rDNA is one of few gene regions conserving signal from dichotomous and reticulate evolutionary processes down to the level of intra-genomic recombination. Therefore, it can provide unique insights into the dynamic speciation processes of lineages that diversified tens of millions of years ago. Here, we provide the first high-throughput sequencing (HTS) of the 5S intergenic spacers (5S-IGS) for a lineage of wind-pollinated subtropical to temperate trees, the Fagus crenata - F. sylvatica s.l. lineage, and its distant relative F. japonica. The observed 4963 unique 5S-IGS variants reflect a complex history of hybrid origins, lineage sorting, mixing via secondary gene flow, and intra-genomic competition between two or more paralogous-homoeologous 5S rDNA lineages. We show that modern species are genetic mosaics and represent a striking case of ongoing reticulate evolution during the past 55 million years.
Collapse
Affiliation(s)
- Simone Cardoni
- Department of Agricultural and Forestry Science (DAFNE)Università degli studi della TusciaViterbo01100Italy
| | - Roberta Piredda
- Department of Veterinary MedicineUniversity of Bari ‘Aldo Moro’Valenzano70010Italy
| | - Thomas Denk
- Swedish Museum of Natural HistoryStockholm10405Sweden
| | | | | | | | - Anna Scoppola
- Department of Agricultural and Forestry Science (DAFNE)Università degli studi della TusciaViterbo01100Italy
| | - Parvin Salehi Shanjani
- Natural Resources Gene Bank, Research Institute of Forests and RangelandsAgricultural Research, Education and Extension OrganizationTehranIran
| | - Yoshihisa Suyama
- Graduate School of Agricultural ScienceTohoku UniversityOsakiMiyagi989‐6711Japan
| | - Nobuhiro Tomaru
- Graduate School of Bioagricultural SciencesNagoya UniversityNagoyaAichi464‐8601Japan
| | - James R. P. Worth
- Ecological Genetics LaboratoryForestry and Forest Products Research Institute (FFPRI)TsukubaIbaraki305‐8687Japan
| | - Marco Cosimo Simeone
- Department of Agricultural and Forestry Science (DAFNE)Università degli studi della TusciaViterbo01100Italy
| |
Collapse
|
23
|
Jiao X, Flouri T, Yang Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl Sci Rev 2022; 8:nwab127. [PMID: 34987842 PMCID: PMC8692950 DOI: 10.1093/nsr/nwab127] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/10/2021] [Accepted: 07/11/2021] [Indexed: 02/06/2023] Open
Abstract
Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
24
|
Thawornwattana Y, Seixas FA, Yang Z, Mallet J. OUP accepted manuscript. Syst Biol 2022; 71:1159-1177. [PMID: 35169847 PMCID: PMC9366460 DOI: 10.1093/sysbio/syac009] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 02/01/2022] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open
Abstract
Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of Heliconius telesiphe, a likely hybrid speciation origin for Heliconius hecalesia, and gene flow between the sister species Heliconius erato and Heliconius himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the MSC makes it possible to extract rich historical information of species divergence and gene flow from genomic data. [3s; bpp; gene flow; Heliconius; hybrid speciation; introgression; inversion; multispecies coalescent]
Collapse
Affiliation(s)
- Yuttapong Thawornwattana
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - Fernando A Seixas
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Ziheng Yang
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| | - James Mallet
- Correspondence to be sent to: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; E-mail: ; (Y.T. and J.M.); Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK; E-mail: (Z.Y.)
| |
Collapse
|
25
|
Abstract
Alleles that introgress between species can influence the evolutionary and ecological fate of species exposed to novel environments. Hybrid offspring of different species are often unfit, and yet it has long been argued that introgression can be a potent force in evolution, especially in plants. Over the last two decades, genomic data have increasingly provided evidence that introgression is a critically important source of genetic variation and that this additional variation can be useful in adaptive evolution of both animals and plants. Here, we review factors that influence the probability that foreign genetic variants provide long-term benefits (so-called adaptive introgression) and discuss their potential benefits. We find that introgression plays an important role in adaptive evolution, particularly when a species is far from its fitness optimum, such as when they expand their range or are subject to changing environments.
Collapse
Affiliation(s)
- Nathaniel B Edelman
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA; .,Current affiliation: Yale Institute for Biospheric Studies and Yale School of the Environment, Yale University, New Haven, Connecticut 06511, USA;
| | - James Mallet
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA;
| |
Collapse
|
26
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
27
|
Yan Z, Cao Z, Liu Y, Ogilvie HA, Nakhleh L. Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes. Syst Biol 2021; 71:706-720. [PMID: 34605924 PMCID: PMC9017653 DOI: 10.1093/sysbio/syab081] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 09/26/2021] [Accepted: 09/29/2021] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate
evolutionary histories. While polyploidy has been shown to be prevalent not only in plants
but also in other groups of eukaryotic species, most work done thus far on phylogenetic
network inference assumes diploid hybridization. These inference methods have been
applied, with varying degrees of success, to data sets with polyploid species, even though
polyploidy violates the mathematical assumptions underlying these methods. Statistical
methods were developed recently for handling specific types of polyploids and so were
parsimony methods that could handle polyploidy more generally yet while excluding
processes such as incomplete lineage sorting. In this article, we introduce a new method
for inferring most parsimonious phylogenetic networks on data that include polyploid
species. Taking gene tree topologies as input, the method seeks a phylogenetic network
that minimizes deep coalescences while accounting for polyploidy. We demonstrate the
performance of the method on both simulated and biological data. The inference method as
well as a method for evaluating evolutionary hypotheses in the form of phylogenetic
networks are implemented and publicly available in the PhyloNet software package.
[Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees;
multispecies network coalescent; phylogenetic networks; polyploidy.]
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Yushu Liu
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
- Department of Biosciences, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
28
|
Rabier CE, Berry V, Stoltz M, Santos JD, Wang W, Glaszmann JC, Pardi F, Scornavacca C. On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo. PLoS Comput Biol 2021; 17:e1008380. [PMID: 34478440 PMCID: PMC8445492 DOI: 10.1371/journal.pcbi.1008380] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 09/16/2021] [Accepted: 07/13/2021] [Indexed: 11/19/2022] Open
Abstract
For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.
Collapse
Affiliation(s)
- Charles-Elie Rabier
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
- Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, CNRS, Montpellier, France
| | - Vincent Berry
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Marnus Stoltz
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - João D. Santos
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Wensheng Wang
- Institute of Crop Sciences (ICS), Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
29
|
Wang Y, Cao Z, Ogilvie HA, Nakhleh L. Phylogenomic assessment of the role of hybridization and introgression in trait evolution. PLoS Genet 2021; 17:e1009701. [PMID: 34407067 PMCID: PMC8405015 DOI: 10.1371/journal.pgen.1009701] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 08/30/2021] [Accepted: 07/07/2021] [Indexed: 11/30/2022] Open
Abstract
Trait evolution among a set of species-a central theme in evolutionary biology-has long been understood and analyzed with respect to a species tree. However, the field of phylogenomics, which has been propelled by advances in sequencing technologies, has ushered in the era of species/gene tree incongruence and, consequently, a more nuanced understanding of trait evolution. For a trait whose states are incongruent with the branching patterns in the species tree, the same state could have arisen independently in different species (homoplasy) or followed the branching patterns of gene trees, incongruent with the species tree (hemiplasy). Another evolutionary process whose extent and significance are better revealed by phylogenomic studies is gene flow between different species. In this work, we present a phylogenomic method for assessing the role of hybridization and introgression in the evolution of polymorphic or monomorphic binary traits. We apply the method to simulated evolutionary scenarios to demonstrate the interplay between the parameters of the evolutionary history and the role of introgression in a binary trait's evolution (which we call xenoplasy). Very importantly, we demonstrate, including on a biological data set, that inferring a species tree and using it for trait evolution analysis in the presence of gene flow could lead to misleading hypotheses about trait evolution.
Collapse
Affiliation(s)
- Yaxuan Wang
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- Department of BioSciences, Rice University, Houston, Texas, United States of America
| |
Collapse
|
30
|
Debray K, Le Paslier MC, Bérard A, Thouroude T, Michel G, Marie-Magdelaine J, Bruneau A, Foucher F, Malécot V. Unveiling the Patterns of Reticulated Evolutionary Processes with Phylogenomics: Hybridization and Polyploidy in the genus Rosa. Syst Biol 2021; 71:547-569. [PMID: 34329460 DOI: 10.1093/sysbio/syab064] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 11/13/2022] Open
Abstract
Reticulation, caused by hybridization and allopolyploidization, is considered an important and frequent phenomenon in the evolution of numerous plant lineages. Although both processes represent important driving forces of evolution, they are mostly ignored in phylogenetic studies involving a large number of species. Indeed only a scattering of methods exists to recover a comprehensive reticulated evolutionary history for a broad taxon sampling. Among these methods, comparisons of topologies obtained from plastid markers with those from a few nuclear sequences are favored, even though they restrict in-depth studies of hybridization and polyploidization. The genus Rosa encompasses c. 150 species widely distributed throughout the northern hemisphere and represents a challenging taxonomic group in which hybridization and polyploidization are prominent. Our main objective was to develop a general framework that would take patterns of reticulation into account in the study of the phylogenetic relationships among Rosa species. Using amplicon sequencing we targeted allele variation in the nuclear genome as well as haploid sequences in the chloroplast genome. We successfully recovered robust plastid and nuclear phylogenies and performed in-depth tests for several scenarios of hybridization using a maximum pseudo-likelihood approach on taxon subsets. Our diploid-first approach followed by hybrid and polyploid grafting resolved most of the evolutionary relationships among Rosa subgenera, sections, and selected species. Based on these results, we provide new directions for a future revision of the infrageneric classification in Rosa. The stepwise strategy proposed here can be used to reconstruct the phylogenetic relationships of other challenging taxonomic groups with large numbers of hybrid and polyploid taxa.
Collapse
Affiliation(s)
- Kevin Debray
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
| | | | - Aurélie Bérard
- Etude du Polymorphisme des Génomes Végétaux (EPGV), INRA, Université Paris-Saclay, 91000 Evry, France
| | - Tatiana Thouroude
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
| | - Gilles Michel
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
| | | | - Anne Bruneau
- Institut de recherche en biologie végétale and Département de Sciences biologiques, Université de Montréal, 4101 Sherbrooke Est, Montréal, QC, H1X 2B2, Canada
| | - Fabrice Foucher
- Univ Angers, Institut Agro, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
| | - Valéry Malécot
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QUASAV, F-49000 Angers, France
| |
Collapse
|
31
|
Richards A, Kubatko L. Bayesian-Weighted Triplet and Quartet Methods for Species Tree Inference. Bull Math Biol 2021; 83:93. [PMID: 34297209 DOI: 10.1007/s11538-021-00918-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/03/2021] [Indexed: 11/26/2022]
Abstract
Inference of the evolutionary histories of species, commonly represented by a species tree, is complicated by the divergent evolutionary history of different parts of the genome. Different loci on the genome can have different histories from the underlying species tree (and each other) due to processes such as incomplete lineage sorting (ILS), gene duplication and loss, and horizontal gene transfer. The multispecies coalescent is a commonly used model for performing inference on species and gene trees in the presence of ILS. This paper introduces Lily-T and Lily-Q, two new methods for species tree inference under the multispecies coalescent. We then compare them to two frequently used methods, SVDQuartets and ASTRAL, using simulated and empirical data. Both methods generally showed improvement over SVDQuartets, and Lily-Q was superior to Lily-T for most simulation settings. The comparison to ASTRAL was more mixed-Lily-Q tended to be better than ASTRAL when the length of recombination-free loci was short, when the coalescent population parameter [Formula: see text] was small, or when the internal branch lengths were longer.
Collapse
Affiliation(s)
- Andrew Richards
- Department of Statistics, The Ohio State University, Columbus, USA
| | - Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, USA.
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, USA.
| |
Collapse
|
32
|
Ogilvie HA, Mendes FK, Vaughan TG, Matzke NJ, Stadler T, Welch D, Drummond AJ. Novel Integrative Modeling of Molecules and Morphology across Evolutionary Timescales. Syst Biol 2021; 71:208-220. [PMID: 34228807 PMCID: PMC8677526 DOI: 10.1093/sysbio/syab054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 06/23/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
Evolutionary models account for either population- or species-level processes but usually not both. We introduce a new model, the FBD-MSC, which makes it possible for the first time to integrate both the genealogical and fossilization phenomena, by means of the multispecies coalescent (MSC) and the fossilized birth–death (FBD) processes. Using this model, we reconstruct the phylogeny representing all extant and many fossil Caninae, recovering both the relative and absolute time of speciation events. We quantify known inaccuracy issues with divergence time estimates using the popular strategy of concatenating molecular alignments and show that the FBD-MSC solves them. Our new integrative method and empirical results advance the paradigm and practice of probabilistic total evidence analyses in evolutionary biology.[Caninae; fossilized birth–death; molecular clock; multispecies coalescent; phylogenetics; species trees.]
Collapse
Affiliation(s)
- Huw A Ogilvie
- Department of Computer Science, Rice University, Houston TX, 77005, USA
| | - Fábio K Mendes
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Nicholas J Matzke
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - David Welch
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Computer Science, The University of Auckland, Auckland, 1010, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, The University of Auckland, Auckland, 1010, New Zealand.,School of Computer Science, The University of Auckland, Auckland, 1010, New Zealand.,School of Biological Sciences, The University of Auckland, Auckland, 1010, New Zealand
| |
Collapse
|
33
|
Huang J, Bennett J, Flouri T, Leaché AD, Yang Z. Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model. Syst Biol 2021; 71:334-352. [PMID: 34143216 PMCID: PMC8977997 DOI: 10.1093/sysbio/syab047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 06/03/2021] [Accepted: 06/21/2021] [Indexed: 01/01/2023] Open
Abstract
Genome sequencing projects routinely generate haploid consensus sequences from diploid
genomes, which are effectively chimeric sequences with the phase at heterozygous sites
resolved at random. The impact of phasing errors on phylogenomic analyses under the
multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer
simulation to evaluate the performance of four phase-resolution strategies (the true phase
resolution, the diploid analytical integration algorithm which averages over all phase
resolutions, computational phase resolution using the program PHASE, and random
resolution) on estimation of the species tree and evolutionary parameters in analysis of
multilocus genomic data under the MSC model. We found that species tree estimation is
robust to phasing errors when species divergences were much older than average coalescent
times but may be affected by phasing errors when the species tree is shallow. Estimation
of parameters under the MSC model with and without introgression is affected by phasing
errors. In particular, random phase resolution causes serious overestimation of population
sizes for modern species and biased estimation of cross-species introgression probability.
In general, the impact of phasing errors is greater when the mutation rate is higher, the
data include more samples per species, and the species tree is shallower with recent
divergences. Use of phased sequences inferred by the PHASE program produced small biases
in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and
another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution
strategies have similar impacts on practical data analyses. We suggest that genome
sequencing projects should produce unphased diploid genotype sequences if fully phased
data are too challenging to generate, and avoid haploid consensus sequences, which have
heterozygous sites phased at random. In case the analytical integration algorithm is
computationally unfeasible, computational phasing prior to population genomic analyses is
an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species
tree.]
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, P.R. China
| | - Jeremy Bennett
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK.,Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT 06269-3043, USA
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195-1800, USA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
34
|
Cai R, Ané C. Assessing the fit of the multi-species network coalescent to multi-locus data. Bioinformatics 2021; 37:634-641. [PMID: 33027508 DOI: 10.1093/bioinformatics/btaa863] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 09/14/2020] [Accepted: 09/22/2020] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network. RESULTS We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network. AVAILABILITY AND IMPLEMENTATION Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruoyi Cai
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA.,Department of Botany, University of Wisconsin - Madison, Madison, WI 53706, USA
| |
Collapse
|
35
|
Wang Y, Ogilvie HA, Nakhleh L. Practical Speedup of Bayesian Inference of Species Phylogenies by Restricting the Space of Gene Trees. Mol Biol Evol 2021; 37:1809-1818. [PMID: 32077947 PMCID: PMC7253205 DOI: 10.1093/molbev/msaa045] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.
Collapse
Affiliation(s)
- Yaxuan Wang
- Computer Science Department, Rice University, Houston, TX
| | - Huw A Ogilvie
- Computer Science Department, Rice University, Houston, TX
| | - Luay Nakhleh
- Computer Science Department, Rice University, Houston, TX
| |
Collapse
|
36
|
Flouri T, Jiao X, Rannala B, Yang Z. A Bayesian Implementation of the Multispecies Coalescent Model with Introgression for Phylogenomic Analysis. Mol Biol Evol 2021; 37:1211-1223. [PMID: 31825513 PMCID: PMC7086182 DOI: 10.1093/molbev/msz296] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Recent analyses suggest that cross-species gene flow or introgression is common in nature, especially during species divergences. Genomic sequence data can be used to infer introgression events and to estimate the timing and intensity of introgression, providing an important means to advance our understanding of the role of gene flow in speciation. Here, we implement the multispecies-coalescent-with-introgression model, an extension of the multispecies-coalescent model to incorporate introgression, in our Bayesian Markov chain Monte Carlo program Bpp. The multispecies-coalescent-with-introgression model accommodates deep coalescence (or incomplete lineage sorting) and introgression and provides a natural framework for inference using genomic sequence data. Computer simulation confirms the good statistical properties of the method, although hundreds or thousands of loci are typically needed to estimate introgression probabilities reliably. Reanalysis of data sets from the purple cone spruce confirms the hypothesis of homoploid hybrid speciation. We estimated the introgression probability using the genomic sequence data from six mosquito species in the Anopheles gambiae species complex, which varies considerably across the genome, likely driven by differential selection against introgressed alleles.
Collapse
Affiliation(s)
- Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California, Davis, Davis, CA
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
37
|
A molecularphylogeny offorktail damselflies(genus Ischnura)revealsa dynamic macroevolutionary history of female colour polymorphisms. Mol Phylogenet Evol 2021; 160:107134. [PMID: 33677008 DOI: 10.1016/j.ympev.2021.107134] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 02/11/2021] [Accepted: 02/26/2021] [Indexed: 12/18/2022]
Abstract
Colour polymorphisms are popular study systems among biologists interested in evolutionary dynamics, genomics, sexual selection and sexual conflict. In many damselfly groups, such as in the globally distributed genus Ischnura (forktails), sex-limited female colour polymorphisms occur in multiple species. Female-polymorphic species contain two or three female morphs, one of which phenotypically matches the male (androchrome or male mimic) and the other(s) which are phenotypically distinct from the male (heterochrome). These female colour polymorphisms are thought to be maintained by frequency-dependent sexual conflict, but their macroevolutionary histories are unknown, due to the lack of a robust molecular phylogeny. Here, we present the first time-calibrated phylogeny of Ischnura, using a multispecies coalescent approach (StarBEAST2) and incorporating both molecular and fossil data for 41 extant species (55% of the genus). We estimate the age of Ischnura to be between 13.8 and 23.4 millions of years, i.e. Miocene. We infer the ancestral state of this genus as female monomorphism with heterochrome females, with multiple gains and losses of female polymorphisms, evidence of trans-species female polymorphisms and a significant positive relationship between female polymorphism incidence and current geographic range size. Our study provides a robust phylogenetic framework for future research on the dynamic macroevolutionary history of this clade with its extraordinary diversity of sex-limited female polymorphisms.
Collapse
|
38
|
Westbury MV, Le Duc D, Duchêne DA, Krishnan A, Prost S, Rutschmann S, Grau JH, Dalen L, Weyrich A, Norén K, Werdelin L, Dalerum F, Schöneberg T, Hofreiter M. Ecological Specialisation and Evolutionary Reticulation in Extant Hyaenidae. Mol Biol Evol 2021; 38:3884-3897. [PMID: 34426844 PMCID: PMC8382907 DOI: 10.1093/molbev/msab055] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
During the Miocene, Hyaenidae was a highly diverse family of Carnivora that has since been severely reduced to four species: the bone-cracking spotted, striped, and brown hyenas, and the specialized insectivorous aardwolf. Previous studies investigated the evolutionary histories of the spotted and brown hyenas, but little is known about the remaining two species. Moreover, the genomic underpinnings of scavenging and insectivory, defining traits of the extant species, remain elusive. Here, we generated an aardwolf genome and analyzed it together with the remaining three species to reveal their evolutionary relationships, genomic underpinnings of their scavenging and insectivorous lifestyles, and their respective genetic diversities and demographic histories. High levels of phylogenetic discordance suggest gene flow between the aardwolf lineage and the ancestral brown/striped hyena lineage. Genes related to immunity and digestion in the bone-cracking hyenas and craniofacial development in the aardwolf showed the strongest signals of selection, suggesting putative key adaptations to carrion and termite feeding, respectively. A family-wide expansion in olfactory receptor genes suggests that an acute sense of smell was a key early adaptation. Finally, we report very low levels of genetic diversity within the brown and striped hyenas despite no signs of inbreeding, putatively linked to their similarly slow decline in effective population size over the last ∼2 million years. High levels of genetic diversity and more stable population sizes through time are seen in the spotted hyena and aardwolf. Taken together, our findings highlight how ecological specialization can impact the evolutionary history, demographics, and adaptive genetic changes of an evolutionary lineage.
Collapse
Affiliation(s)
- M V Westbury
- University of Potsdam, Institute of Biochemistry and Biology, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany.,Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Øster Voldgade 5-7, Copenhagen, Denmark
| | - Diana Le Duc
- Institute of Human Genetics, University Medical Center Leipzig, Leipzig, 04103, Germany.,Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, 04103, Germany
| | - David A Duchêne
- Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Øster Voldgade 5-7, Copenhagen, Denmark.,Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Arunkumar Krishnan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.,Department of Biological Sciences, Indian Institute of Science Education and Research (IISER) Berhampur, Odisha, 760010, India
| | - Stefan Prost
- LOEWE-Center for Translational Biodiversity Genomics, Senckenberg, 60325, Germany. Frankfurt.,South African National Biodiversity Institute, National Zoological Garden, Pretoria, 0184, South Africa
| | - Sereina Rutschmann
- University of Potsdam, Institute of Biochemistry and Biology, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany
| | - Jose H Grau
- University of Potsdam, Institute of Biochemistry and Biology, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany.,amedes Genetics, amedes Medizinische Dienstleistungen, Berlin, Germany
| | - Love Dalen
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, Stockholm, 10691, Sweden.,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, Stockholm, 10405, Sweden
| | - Alexandra Weyrich
- Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research (IZW), Berlin, 10315, Germany
| | - Karin Norén
- Department of Zoology, Stockholm University, Stockholm, 106 91, Sweden
| | - Lars Werdelin
- Department of Palaeobiology, Swedish Museum of Natural History, Box 50007, Stockholm, SE-10405, Sweden
| | - Fredrik Dalerum
- Department of Zoology, Stockholm University, Stockholm, 106 91, Sweden.,Research Unit of Biodiversity (UO-CSIC-PA), Mieres Campus, University of Oviedo, Mieres, Asturias, 33600, Spain.,Mammal Research Institute, Department of Zoology and Entomology, University of Pretoria, South Africa
| | - Torsten Schöneberg
- Rudolf Schönheimer Institute of Biochemistry, Molecular Biochemistry, Medical Faculty, Johannisallee 30, Leipzig, 04103, Germany
| | - Michael Hofreiter
- University of Potsdam, Institute of Biochemistry and Biology, Karl-Liebknecht-Str. 24-25, Potsdam, 14476, Germany
| |
Collapse
|
39
|
Allman ES, Mitchell JD, Rhodes JA. Gene tree discord, simplex plots, and statistical tests under the coalescent. Syst Biol 2021; 71:929-942. [PMID: 33560348 DOI: 10.1093/sysbio/syab008] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 01/31/2021] [Accepted: 02/03/2021] [Indexed: 02/06/2023] Open
Abstract
A simple graphical device, the simplex plot of quartet concordance factors, is introduced to aid in the exploration of a collection of gene trees on a common set of taxa. A single plot summarizes all gene tree discord, and allows for visual comparison to the expected discord from the multispecies coalescent model (MSC) of incomplete lineage sorting on a species tree. A formal statistical procedure is described that can quantify the deviation from expectation for each subset of four taxa, suggesting when the data is not in accord with the MSC, and thus that either gene tree inference error is substantial or a more complex model such as that on a network may be required. If the collection of gene trees is in accord with the MSC, the plots reveal when substantial incomplete lineage sorting is present. Applications to both simulated and empirical multilocus data sets illustrate the insights provided.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA.,Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA
| |
Collapse
|
40
|
Lopes F, Oliveira LR, Kessler A, Beux Y, Crespo E, Cárdenas-Alayza S, Majluf P, Sepúlveda M, Brownell RL, Franco-Trecu V, Páez-Rosas D, Chaves J, Loch C, Robertson BC, Acevedo-Whitehouse K, Elorriaga-Verplancken FR, Kirkman SP, Peart CR, Wolf JBW, Bonatto SL. Phylogenomic Discordance in the Eared Seals is best explained by Incomplete Lineage Sorting following Explosive Radiation in the Southern Hemisphere. Syst Biol 2020; 70:786-802. [PMID: 33367817 DOI: 10.1093/sysbio/syaa099] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 12/02/2020] [Accepted: 12/08/2020] [Indexed: 12/25/2022] Open
Abstract
The phylogeny and systematics of fur seals and sea lions (Otariidae) have long been studied with diverse data types, including an increasing amount of molecular data. However, only a few phylogenetic relationships have reached acceptance because of strong gene-tree species tree discordance. Divergence times estimates in the group also vary largely between studies. These uncertainties impeded the understanding of the biogeographical history of the group, such as when and how trans-equatorial dispersal and subsequent speciation events occurred. Here, we used high-coverage genome-wide sequencing for 14 of the 15 species of Otariidae to elucidate the phylogeny of the family and its bearing on the taxonomy and biogeographical history. Despite extreme topological discordance among gene trees, we found a fully supported species tree that agrees with the few well-accepted relationships and establishes monophyly of the genus Arctocephalus. Our data support a relatively recent trans-hemispheric dispersal at the base of a southern clade, which rapidly diversified into six major lineages between 3 and 2.5 Ma. Otaria diverged first, followed by Phocarctos and then four major lineages within Arctocephalus. However, we found Zalophus to be nonmonophyletic, with California (Zalophus californianus) and Steller sea lions (Eumetopias jubatus) grouping closer than the Galapagos sea lion (Zalophus wollebaeki) with evidence for introgression between the two genera. Overall, the high degree of genealogical discordance was best explained by incomplete lineage sorting resulting from quasi-simultaneous speciation within the southern clade with introgresssion playing a subordinate role in explaining the incongruence among and within prior phylogenetic studies of the family. [Hybridization; ILS; phylogenomics; Pleistocene; Pliocene; monophyly.].
Collapse
Affiliation(s)
- Fernando Lopes
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil.,Laboratório de Ecologia de Mamíferos, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brazil
| | - Larissa R Oliveira
- Laboratório de Ecologia de Mamíferos, Universidade do Vale do Rio dos Sinos, São Leopoldo, RS, Brazil.,GEMARS, Grupo de Estudos de Mamíferos Aquáticos do Rio Grande do Sul, 95560-000 Torres, RS, Brazil
| | - Amanda Kessler
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| | - Yago Beux
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| | - Enrique Crespo
- Centro Nacional Patagónico - CENPAT, CONICET, Puerto Madryn, Argentina
| | - Susana Cárdenas-Alayza
- Centro para la Sostenibilidad Ambiental, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Patricia Majluf
- Centro para la Sostenibilidad Ambiental, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Maritza Sepúlveda
- Centro de Investigación y Gestión de Recursos Naturales (CIGREN), Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
| | - Robert L Brownell
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration, NOAA, La Jolla, USA
| | - Valentina Franco-Trecu
- Departamento de Ecología y Evolución, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Diego Páez-Rosas
- Colegio de Ciencias Biológicas y Ambientales, COCIBA, Universidad San Francisco de Quito, Quito, Ecuador
| | - Jaime Chaves
- Colegio de Ciencias Biológicas y Ambientales, COCIBA, Universidad San Francisco de Quito, Quito, Ecuador.,Department of Biology, San Francisco State University, 1800 Holloway Ave, San Francisco, CA, USA
| | - Carolina Loch
- Sir John Walsh Research Institute, Faculty of Dentistry, University of Otago, Dunedin, New Zealand
| | | | - Karina Acevedo-Whitehouse
- Unit for Basic and Applied Microbiology, School of Natural Sciences, Universidad Autónoma de Querétaro, Querétaro, Mexico
| | | | - Stephen P Kirkman
- Department of Environmental Affairs, Oceans and Coasts, Cape Town, South Africa
| | - Claire R Peart
- Department Biologie II, Division of Evolutionary Biology, Ludwig-Maximilians-Universität München, Münich, Germany
| | - Jochen B W Wolf
- Department Biologie II, Division of Evolutionary Biology, Ludwig-Maximilians-Universität München, Münich, Germany
| | - Sandro L Bonatto
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, 90619-900 Porto Alegre, RS, Brazil
| |
Collapse
|
41
|
Koch H, DeGiorgio M. Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure. Genome Biol Evol 2020; 12:3977-3995. [PMID: 32022857 PMCID: PMC7061232 DOI: 10.1093/gbe/evaa022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2020] [Indexed: 11/12/2022] Open
Abstract
Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI's performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.
Collapse
Affiliation(s)
- Hillary Koch
- Department of Statistics, Pennsylvania State University
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University
| |
Collapse
|
42
|
Blair C, Ané C. Phylogenetic Trees and Networks Can Serve as Powerful and Complementary Approaches for Analysis of Genomic Data. Syst Biol 2020; 69:593-601. [PMID: 31432090 DOI: 10.1093/sysbio/syz056] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 08/15/2019] [Indexed: 11/14/2022] Open
Abstract
Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene flow post divergence, calling for additional methods that can accommodate this limitation. Explicit phylogenetic network methods have emerged, which can simultaneously account for ILS and gene flow by representing evolutionary history as a directed acyclic graph. In this point of view, we highlight some of the strengths and limitations of phylogenetic networks and argue that tree-based inference should not be blindly abandoned in favor of networks simply because they represent more parameter rich models. Attention should be given to model selection of reticulation complexity, and the most robust conclusions regarding evolutionary history are likely obtained when combining tree- and network-based inference.
Collapse
Affiliation(s)
- Christopher Blair
- Department of Biological Sciences, New York City College of Technology, The City University of New York, 285 Jay Street, Brooklyn, NY 11201, USA
- Biology PhD Program, CUNY Graduate Center, 365 5th Ave., New York, NY 10016, USA
| | - Cécile Ané
- Department of Botany, University of Wisconsin - Madison, 1300 University Ave, Madison, WI 53706, USA
- Department of Statistics, University of Wisconsin - Madison, 1300 University Ave, Madison, WI 53706, USA
| |
Collapse
|
43
|
Huang J, Flouri T, Yang Z. A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model. Mol Biol Evol 2020; 37:3211-3224. [DOI: 10.1093/molbev/msaa166] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
AbstractWe use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Mathematics, Beijing Jiaotong University, Beijing, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
44
|
Poelstra JW, Salmona J, Tiley GP, Schüßler D, Blanco MB, Andriambeloson JB, Bouchez O, Campbell CR, Etter PD, Hohenlohe PA, Hunnicutt KE, Iribar A, Johnson EA, Kappeler PM, Larsen PA, Manzi S, Ralison JM, Randrianambinina B, Rasoloarison RM, Rasolofoson DW, Stahlke AR, Weisrock DW, Williams RC, Chikhi L, Louis EE, Radespiel U, Yoder AD. Cryptic Patterns of Speciation in Cryptic Primates: Microendemic Mouse Lemurs and the Multispecies Coalescent. Syst Biol 2020; 70:203-218. [PMID: 32642760 DOI: 10.1093/sysbio/syaa053] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 06/13/2020] [Accepted: 06/23/2020] [Indexed: 12/21/2022] Open
Abstract
Mouse lemurs (Microcebus) are a radiation of morphologically cryptic primates distributed throughout Madagascar for which the number of recognized species has exploded in the past two decades. This taxonomic revision has prompted understandable concern that there has been substantial oversplitting in the mouse lemur clade. Here, we investigate mouse lemur diversity in a region in northeastern Madagascar with high levels of microendemism and predicted habitat loss. We analyzed RADseq data with multispecies coalescent (MSC) species delimitation methods for two pairs of sister lineages that include three named species and an undescribed lineage previously identified to have divergent mtDNA. Marked differences in effective population sizes, levels of gene flow, patterns of isolation-by-distance, and species delimitation results were found among the two pairs of lineages. Whereas all tests support the recognition of the presently undescribed lineage as a separate species, the species-level distinction of two previously described species, M. mittermeieri and M. lehilahytsara is not supported-a result that is particularly striking when using the genealogical discordance index (gdi). Nonsister lineages occur sympatrically in two of the localities sampled for this study, despite an estimated divergence time of less than 1 Ma. This suggests rapid evolution of reproductive isolation in the focal lineages and in the mouse lemur clade generally. The divergence time estimates reported here are based on the MSC calibrated with pedigree-based mutation rates and are considerably more recent than previously published fossil-calibrated relaxed-clock estimates. We discuss the possible explanations for this discrepancy, noting that there are theoretical justifications for preferring the MSC estimates in this case. [Cryptic species; effective population size; microendemism; multispecies coalescent; speciation; species delimitation.].
Collapse
Affiliation(s)
| | - Jordi Salmona
- CNRS, Université Paul Sabatier, IRD; UMR5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, 31062 Toulouse, France
| | - George P Tiley
- Department of Biology, Duke University, Durham, NC 27708, USA
| | - Dominik Schüßler
- Research Group Ecology and Environmental Education, Department of Biology, University of Hildesheim, Universitaetsplatz 1, 31141 Hildesheim, Germany
| | - Marina B Blanco
- Department of Biology, Duke University, Durham, NC 27708, USA.,Duke Lemur Center, Duke University, Durham, NC 27705, USA
| | - Jean B Andriambeloson
- Department of Zoology and Animal Biodiversity, University of Antananarivo, Antananarivo 101, Madagascar
| | - Olivier Bouchez
- INRA, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - C Ryan Campbell
- Department of Biology, Duke University, Durham, NC 27708, USA.,Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
| | - Paul D Etter
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
| | - Paul A Hohenlohe
- Department of Biological Sciences, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844, USA
| | - Kelsie E Hunnicutt
- Department of Biology, Duke University, Durham, NC 27708, USA.,Department of Biological Sciences, University of Denver, Denver, CO 80208, USA
| | - Amaia Iribar
- CNRS, Université Paul Sabatier, IRD; UMR5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, 31062 Toulouse, France
| | - Eric A Johnson
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
| | - Peter M Kappeler
- Behavioral Ecology and Sociobiology Unit, German Primate Center, Kellnerweg 6, 37077 Göttingen, Germany
| | - Peter A Larsen
- Department of Biology, Duke University, Durham, NC 27708, USA.,Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, MN 55108, USA
| | - Sophie Manzi
- CNRS, Université Paul Sabatier, IRD; UMR5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, 31062 Toulouse, France
| | - JosÉ M Ralison
- Department of Zoology and Animal Biodiversity, University of Antananarivo, Antananarivo 101, Madagascar
| | - Blanchard Randrianambinina
- Groupe d'Etude et de Recherche sur les Primates de Madagascar (GERP), BP 779, Antananarivo 101, Madagascar.,Faculté des Sciences, University of Mahajanga, Mahajanga, Madagascar
| | - Rodin M Rasoloarison
- Behavioral Ecology and Sociobiology Unit, German Primate Center, Kellnerweg 6, 37077 Göttingen, Germany
| | - David W Rasolofoson
- Groupe d'Etude et de Recherche sur les Primates de Madagascar (GERP), BP 779, Antananarivo 101, Madagascar
| | - Amanda R Stahlke
- Department of Biological Sciences, Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, ID 83844, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, Lexington, KY, 40506, USA
| | - Rachel C Williams
- Department of Biology, Duke University, Durham, NC 27708, USA.,Duke Lemur Center, Duke University, Durham, NC 27705, USA
| | - LounÈs Chikhi
- CNRS, Université Paul Sabatier, IRD; UMR5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, 31062 Toulouse, France.,Instituto Gulbenkian de Ciência, Rua da Quinta Grande, 6, 2780-156 Oeiras, Portugal
| | - Edward E Louis
- Grewcock Center for Conservation and Research, Omaha's Henry Doorly Zoo and Aquarium, Omaha, NE, USA
| | - Ute Radespiel
- Institute of Zoology, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, Germany Jelmer Poelstra, Jordi Salmona, George P. Tiley are the joint first authors. Ute Radespiel and Anne D. Yoder are the joint senior authors
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
45
|
Jiao X, Yang Z. Defining Species When There is Gene Flow. Syst Biol 2020; 70:108-119. [PMID: 32617579 DOI: 10.1093/sysbio/syaa052] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/23/2020] [Accepted: 06/23/2020] [Indexed: 12/20/2022] Open
Abstract
Whatever one's definition of species, it is generally expected that individuals of the same species should be genetically more similar to each other than they are to individuals of another species. Here, we show that in the presence of cross-species gene flow, this expectation may be incorrect. We use the multispecies coalescent model with continuous-time migration or episodic introgression to study the impact of gene flow on genetic differences within and between species and highlight a surprising but plausible scenario in which different population sizes and asymmetrical migration rates cause a genetic sequence to be on average more closely related to a sequence from another species than to a sequence from the same species. Our results highlight the extraordinary impact that even a small amount of gene flow may have on the genetic history of the species. We suggest that contrasting long-term migration rate and short-term hybridization rate, both of which can be estimated using genetic data, may be a powerful approach to detecting the presence of reproductive barriers and to define species boundaries.[Gene flow; introgression; migration; multispecies coalescent; species concept; species delimitation.].
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
46
|
Zhu J, Liu X, Ogilvie HA, Nakhleh LK. A divide-and-conquer method for scalable phylogenetic network inference from multilocus data. Bioinformatics 2020; 35:i370-i378. [PMID: 31510688 PMCID: PMC6612858 DOI: 10.1093/bioinformatics/btz359] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiafan Zhu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Xinhao Liu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Luay K Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA.,Department of BioSciences, Rice University, Houston, TX, USA
| |
Collapse
|
47
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
48
|
Bernhardt N, Brassac J, Dong X, Willing EM, Poskar CH, Kilian B, Blattner FR. Genome-wide sequence information reveals recurrent hybridization among diploid wheat wild relatives. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:493-506. [PMID: 31821649 DOI: 10.1111/tpj.14641] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 11/13/2019] [Accepted: 11/28/2019] [Indexed: 05/07/2023]
Abstract
Many conflicting hypotheses regarding the relationships among crops and wild species closely related to wheat (the genera Aegilops, Amblyopyrum, and Triticum) have been postulated. The contribution of hybridization to the evolution of these taxa is intensely discussed. To determine possible causes for this, and provide a phylogeny of the diploid taxa based on genome-wide sequence information, independent data were obtained from genotyping-by-sequencing and a target-enrichment experiment that returned 244 low-copy nuclear loci. The data were analyzed using Bayesian, likelihood and coalescent-based methods. D statistics were used to test if incomplete lineage sorting alone or together with hybridization is the source for incongruent gene trees. Here we present the phylogeny of all diploid species of the wheat wild relatives. We hypothesize that most of the wheat-group species were shaped by a primordial homoploid hybrid speciation event involving the ancestral Triticum and Am. muticum lineages to form all other species except Ae. speltoides. This hybridization event was followed by multiple introgressions affecting all taxa except Triticum. Mostly progenitors of the extant species were involved in these processes, while recent interspecific gene flow seems insignificant. The composite nature of many genomes of wheat-group taxa results in complicated patterns of diploid contributions when these lineages are involved in polyploid formation, which is, for example, the case for tetraploid and hexaploid wheats. Our analysis provides phylogenetic relationships and a testable hypothesis for the genome compositions in the basic evolutionary units within the wheat group of Triticeae.
Collapse
Affiliation(s)
- Nadine Bernhardt
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Germany
| | - Jonathan Brassac
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Germany
| | - Xue Dong
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Plant Germplasm and Genomics Centre, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, 650201, Kunming, Yunnan, China
| | - Eva-Maria Willing
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - C Hart Poskar
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Germany
| | - Benjamin Kilian
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Germany
- Global Crop Diversity Trust, 53113, Bonn, Germany
| | - Frank R Blattner
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103, Leipzig, Germany
| |
Collapse
|
49
|
Tidwell H, Nakhleh L. Integrated likelihood for phylogenomics under a no-common-mechanism model. BMC Genomics 2020; 21:219. [PMID: 32299348 PMCID: PMC7161099 DOI: 10.1186/s12864-020-6608-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Multi-locus species phylogeny inference is based on models of sequence evolution on gene trees as well as models of gene tree evolution within the branches of species phylogenies. Almost all statistical methods for this inference task assume a common mechanism across all loci as captured by a single value of each branch length of the species phylogeny. Results In this paper, we pursue a “no common mechanism" (NCM) model, where every gene tree evolves according to its own parameters of the species phylogeny. Based on this model, we derive an analytically integrated likelihood of both species trees and networks given the gene trees of multiple loci under an NCM model. We demonstrate the performance of inference under this integrated likelihood on both simulated and biological data. Conclusions The model presented here will afford opportunities for exploring connections among various criteria for estimating species phylogenies from multiple, independent loci. Furthermore, further development of this model could potentially result in more efficient methods for searching the space of species phylogenies by focusing solely on the topology of the phylogeny.
Collapse
|
50
|
Hedin M, Foldi S, Rajah-Boyer B. Evolutionary divergences mirror Pleistocene paleodrainages in a rapidly-evolving complex of oasis-dwelling jumping spiders (Salticidae, Habronattus tarsalis). Mol Phylogenet Evol 2020; 144:106696. [DOI: 10.1016/j.ympev.2019.106696] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/14/2019] [Accepted: 11/27/2019] [Indexed: 10/25/2022]
|