1
|
Bernstein JM, Francioli YZ, Schield DR, Adams RH, Perry BW, Farleigh K, Smith CF, Meik JM, Mackessy SP, Castoe TA. Disentangling a genome-wide mosaic of conflicting phylogenetic signals in Western Rattlesnakes. Mol Phylogenet Evol 2025; 206:108309. [PMID: 39938672 DOI: 10.1016/j.ympev.2025.108309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 02/04/2025] [Accepted: 02/08/2025] [Indexed: 02/14/2025]
Abstract
Species tree inference is often assumed to be more accurate as datasets increase in size, with whole genomes representing the best-case-scenario for estimating a single, most-likely speciation history with high confidence. However, genomes may harbor a complex mixture of evolutionary histories among loci, which amplifies the opportunity for model misspecification and impacts phylogenetic inference. Accordingly, multiple distinct and well-supported phylogenetic trees are often recovered from genome-scale data, and approaches for biologically interpreting these distinct signatures are a major challenge for evolutionary biology in the age of genomics. Here, we analyze 32 whole genomes of nine taxa and two outgroups from the Western Rattlesnake species complex. Using concordance factors, topology weighting, and concatenated and species tree analyses with a chromosome-level reference genome, we characterize the distribution of phylogenetic signal across the genomic landscape. We find that concatenated and species tree analyses of autosomes, the Z (sex) chromosome, and mitochondrial genome yield distinct, yet strongly supported phylogenies. Analyses of site-specific likelihoods show additional patterns consistent with rampant model misspecification, a likely consequence of several evolutionary processes. Together, our results suggest that a combination of historic and recent introgression, along with natural selection, recombination rate variation, and cytonuclear co-evolution of nuclear-encoded mitochondrial genes, underlie genome-wide variation in phylogenetic signal. Our results highlight both the power and complexity of interpreting whole genomes in a phylogenetic context and illustrate how patterns of phylogenetic discordance can reveal the impacts of different evolutionary processes that contribute to genome-wide variation in phylogenetic signal.
Collapse
Affiliation(s)
- Justin M Bernstein
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Yannick Z Francioli
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Drew R Schield
- Department of Biology, University of Virginia, Charlottesville, VA 22903, USA
| | - Richard H Adams
- Department of Entomology and Plant Pathology, University of Arkansas Agricultural Experimental Station, University of Arkansas, Fayetteville, AR 72701, USA
| | - Blair W Perry
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Keaka Farleigh
- Department of Biology, University of Virginia, Charlottesville, VA 22903, USA
| | - Cara F Smith
- Department of Biochemistry and Molecular Genetics, 12801 East 17th Avenue, University of Colorado Denver, Aurora, CO 80045, USA
| | - Jesse M Meik
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76402, USA
| | - Stephen P Mackessy
- School of Biological Sciences, University of Northern Colorado, Greeley, CO 80639, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA.
| |
Collapse
|
2
|
Adams R, Lozano JR, Duncan M, Green J, Assis R, DeGiorgio M. A Tale of Too Many Trees: A Conundrum for Phylogenetic Regression. Mol Biol Evol 2025; 42:msaf032. [PMID: 39930867 PMCID: PMC11884811 DOI: 10.1093/molbev/msaf032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 12/20/2024] [Accepted: 01/21/2025] [Indexed: 03/08/2025] Open
Abstract
Just exactly which tree(s) should we assume when testing evolutionary hypotheses? This question has plagued comparative biologists for decades. Though all phylogenetic comparative methods require input trees, we seldom know with certainty whether even a perfectly estimated tree (if this is possible in practice) is appropriate for our studied traits. Yet, we also know that phylogenetic conflict is ubiquitous in modern comparative biology, and we are still learning about its dangers when testing evolutionary hypotheses. Here, we investigate the consequences of tree-trait mismatch for phylogenetic regression in the presence of gene tree-species tree conflict. Our simulation experiments reveal excessively high false positive rates for mismatched models with both small and large trees, simple and complex traits, and known and estimated phylogenies. In some cases, we find evidence of a directionality of error: assuming a species tree for traits that evolved according to a gene tree sometimes fares worse than the opposite. We also explored the impacts of tree choice using an expansive, cross-species gene expression dataset as an arguably "best-case" scenario in which one may have a better chance of matching tree with trait. Offering a potential path forward, we found promise in the application of a robust estimator as a potential, albeit imperfect, solution to some issues raised by tree mismatch. Collectively, our results emphasize the importance of careful study design for comparative methods, highlighting the need to fully appreciate the role of accurate and thoughtful phylogenetic modeling.
Collapse
Affiliation(s)
- Richard Adams
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
- Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Jenniffer Roa Lozano
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
- Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Mataya Duncan
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
- Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Jack Green
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
- Center for Agricultural Data Analytics, University of Arkansas, Fayetteville, AR, USA
| | - Raquel Assis
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
- Institute for Human Health and Disease Intervention, Florida Atlantic University, Boca Raton, FL, USA
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
3
|
Herrig DK, Ridenbaugh RD, Vertacnik KL, Everson KM, Sim SB, Geib SM, Weisrock DW, Linnen CR. Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies. Syst Biol 2024; 73:839-860. [PMID: 38970484 DOI: 10.1093/sysbio/syae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/08/2024] Open
Abstract
Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.
Collapse
Affiliation(s)
- Danielle K Herrig
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Ryan D Ridenbaugh
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kim L Vertacnik
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Kathryn M Everson
- Department of Natural Resources and Environmental Science, University of Nevada, 1664 N. Virginia St., Reno, NV 89557, USA
- Department of Integrative Biology, Oregon State University, 4575 SW Research Way, Corvallis, OR 97333, USA
| | - Sheina B Sim
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - Scott M Geib
- USDA-ARS Daniel K. Inouye US Pacific Basin Agricultural Research Center, Tropical Pest Genetics and Molecular Biology Research Unit, 64 Nowelo St., Hilo, HI 96720, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| | - Catherine R Linnen
- Department of Biology, University of Kentucky, 195 Huguelet Dr., Lexington, KY 40508, USA
| |
Collapse
|
4
|
Baake E, Cordero F, Di Gaspero E. The mutation process on the ancestral line under selection. Theor Popul Biol 2024; 158:60-75. [PMID: 38641140 DOI: 10.1016/j.tpb.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 03/07/2024] [Accepted: 04/12/2024] [Indexed: 04/21/2024]
Abstract
We consider the Moran model of population genetics with two types, mutation, and selection, and investigate the line of descent of a randomly-sampled individual from a contemporary population. We trace this ancestral line back into the distant past, far beyond the most recent common ancestor of the population (thus connecting population genetics to phylogeny), and analyse the mutation process along this line. To this end, we use the pruned lookdown ancestral selection graph (Lenz et al., 2015), which consists of a set of potential ancestors of the sampled individual at any given time. Relative to the neutral case (that is, without selection), we obtain a general bias towards the beneficial type, an increase in the beneficial mutation rate, and a decrease in the deleterious mutation rate. This sheds new light on previous analytical results. We discuss our findings in the light of a well-known observation at the interface of phylogeny and population genetics, namely, the difference in the mutation rates (or, more precisely, mutation fluxes) estimated via phylogenetic methods relative to those observed in pedigree studies.
Collapse
Affiliation(s)
- E Baake
- Faculty of Technology, Bielefeld University, Postbox 100131, 33501 Bielefeld, Germany.
| | - F Cordero
- Faculty of Technology, Bielefeld University, Postbox 100131, 33501 Bielefeld, Germany; BOKU University, Department of Integrative Biology and Biodiversity Research, Institute of Mathematics, Gregor-Mendel-Strasse 33, 1180 Vienna, Austria.
| | - E Di Gaspero
- Faculty of Technology, Bielefeld University, Postbox 100131, 33501 Bielefeld, Germany.
| |
Collapse
|
5
|
Zhang Q, Folk RA, Mo ZQ, Ye H, Zhang ZY, Peng H, Zhao JL, Yang SX, Yu XQ. Phylotranscriptomic analyses reveal deep gene tree discordance in Camellia (Theaceae). Mol Phylogenet Evol 2023; 188:107912. [PMID: 37648181 DOI: 10.1016/j.ympev.2023.107912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 08/09/2023] [Accepted: 08/27/2023] [Indexed: 09/01/2023]
Abstract
Gene tree discordance is a significant legacy of biological evolution. Multiple factors can result in incongruence among genes, such as introgression, incomplete lineage sorting (ILS), gene duplication or loss. Resolving the background of gene tree discordance is a critical way to uncover the process of species diversification. Camellia, the largest genus in Theaceae, has controversial taxonomy and systematics due in part to a complex evolutionary history. We used 60 transcriptomes of 55 species, which represented 15 sections of Camellia to investigate its phylogeny and the possible causes of gene tree discordance. We conducted gene tree discordance analysis based on 1,617 orthologous low-copy nuclear genes, primarily using coalescent species trees and polytomy tests to distinguish hard and soft conflict. A selective pressure analysis was also performed to assess the impact of selection on phylogenetic topology reconstruction. Our results detected different levels of gene tree discordance in the backbone of Camellia, and recovered rapid diversification as one of the possible causes of gene tree discordance. Furthermore, we confirmed that none of the currently proposed sections of Camellia was monophyletic. Comparisons among datasets partitioned under different selective pressure regimes showed that integrating all orthologous genes provided the best phylogenetic resolution of the species tree of Camellia. The findings of this study reveal rapid diversification as a major source of gene tree discordance in Camellia and will facilitate future investigation of reticulate relationships at the species level in this important plant genus.
Collapse
Affiliation(s)
- Qiong Zhang
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ryan A Folk
- Department of Biological Sciences, Mississippi State University, MS 39762, United States
| | - Zhi-Qiong Mo
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China
| | - Hang Ye
- Guangxi Key Laboratory of Special Non-wood Forest Cultivation and Utilization, Guangxi Forestry Research Institute, Nanning 530002, Guangxi, China
| | - Zhao-Yuan Zhang
- Guangxi Key Laboratory of Special Non-wood Forest Cultivation and Utilization, Guangxi Forestry Research Institute, Nanning 530002, Guangxi, China
| | - Hua Peng
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China
| | - Jian-Li Zhao
- Yunnan Key Laboratory of Plant Reproductive Adaptation and Evolutionary Ecology and Institute of Biodiversity, School of Ecology and Environmental Science, Yunnan University, Kunming 650091, China.
| | - Shi-Xiong Yang
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| | - Xiang-Qin Yu
- CAS Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| |
Collapse
|
6
|
Dean LL, Magalhaes IS, D’Agostino D, Hohenlohe P, MacColl ADC. On the Origins of Phenotypic Parallelism in Benthic and Limnetic Stickleback. Mol Biol Evol 2023; 40:msad191. [PMID: 37652053 PMCID: PMC10490448 DOI: 10.1093/molbev/msad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 07/24/2023] [Accepted: 08/16/2023] [Indexed: 09/02/2023] Open
Abstract
Rapid evolution of similar phenotypes in similar environments, giving rise to in situ parallel adaptation, is an important hallmark of ecological speciation. However, what appears to be in situ adaptation can also arise by dispersal of divergent lineages from elsewhere. We test whether two contrasting phenotypes repeatedly evolved in parallel, or have a single origin, in an archetypal example of ecological adaptive radiation: benthic-limnetic three-spined stickleback (Gasterosteus aculeatus) across species pair and solitary lakes in British Columbia. We identify two genomic clusters across freshwater populations, which differ in benthic-limnetic divergent phenotypic traits and separate benthic from limnetic individuals in species pair lakes. Phylogenetic reconstruction and niche evolution modeling both suggest a single evolutionary origin for each of these clusters. We detected strong phylogenetic signal in benthic-limnetic divergent traits, suggesting that they are ancestrally retained. Accounting for ancestral state retention, we identify local adaptation of body armor due to the presence of an intraguild predator, the sculpin (Cottus asper), and environmental effects of lake depth and pH on body size. Taken together, our results imply a predominant role for retention of ancestral characteristics in driving trait distribution, with further selection imposed on some traits by environmental factors.
Collapse
Affiliation(s)
- Laura L Dean
- School of Life Sciences, The University of Nottingham, University Park, Nottingham, UK
| | - Isabel Santos Magalhaes
- School of Life Sciences, The University of Nottingham, University Park, Nottingham, UK
- Department of Life Sciences, School of Health and Life Sciences, Whitelands College, University of Roehampton, London, UK
| | - Daniele D’Agostino
- School of Life Sciences, The University of Nottingham, University Park, Nottingham, UK
- Water Research Center, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Paul Hohenlohe
- Institute for Bioinformatics and Evolutionary Studies, Department of Biological Sciences, University of Idaho, Moscow, ID, USA
| | - Andrew D C MacColl
- School of Life Sciences, The University of Nottingham, University Park, Nottingham, UK
| |
Collapse
|
7
|
Adams R, DeGiorgio M. Likelihood-Based Tests of Species Tree Hypotheses. Mol Biol Evol 2023; 40:msad159. [PMID: 37440530 PMCID: PMC10368450 DOI: 10.1093/molbev/msad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 06/20/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.
Collapse
Affiliation(s)
- Richard Adams
- Agricultural Statistics Laboratory, University of Arkansas, Fayetteville, AR
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR
| | - Michael DeGiorgio
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
8
|
Liu L, Yu L, Wu S, Arnold J, Whalen C, Davis C, Edwards S. Short branch attraction in phylogenomic inference under the multispecies coalescent. Front Ecol Evol 2023; 11:1134764. [PMID: 39233780 PMCID: PMC11372852 DOI: 10.3389/fevo.2023.1134764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024] Open
Abstract
Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star treeT * = S 1 , S 2 , S 3 , S 4 with two short branches leading to the speciesS 1 andS 2 , we demonstrate that maximum likelihood significantly favors the wrong bifurcating treeS 1 , S 2 , S 3 , S 4 grouping the two speciesS 1 andS 2 with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Lili Yu
- Department of Biostatistics, Georgia Southern University, Statesboro, GA, United States
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, Jiangsu International Joint Center of Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
| | - Jonathan Arnold
- Department of Genetics, University of Georgia, Athens, GA, United States
| | - Christopher Whalen
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, United States
| | - Charles Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| | - Scott Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States
| |
Collapse
|
9
|
On the effects of selection and mutation on species tree inference. Mol Phylogenet Evol 2023; 179:107650. [PMID: 36441104 DOI: 10.1016/j.ympev.2022.107650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 10/17/2022] [Accepted: 10/18/2022] [Indexed: 11/24/2022]
Abstract
The effect of selection acting on regions of the genome on the accuracy of species-level phylogenetic inference using methods that do not explicitly model selection is an open question that is relevant to most, if not all, phylogenomic studies. To address this, we derive a mathematical approximation to the Wright-Fisher model with mutation and selection in the limit as the population size becomes large. In contrast to previous approximations based on diffusion processes, our approximation can be used to study the distribution of coalescent times for an arbitrary number of lineages, allowing calculation of the probability distribution of gene genealogies under the coalescent model. We use these calculations to show that direct selection at strengths typically encountered in practice has only a small effect on the distribution of coalescent times, and hence on the distribution of gene trees. This implies that many coalescent-based methods for estimating the species tree topology will be robust to the presence of selection in a subset of the underlying genes. Selection will, however, bias the estimation of speciation times, causing them to underestimate the true speciation times. Our model captures the effects of selection on the genealogies that generate the observed sequence data, but does not model selective pressures that act only on the subsequent sequences or that negatively impact gene tree estimation.
Collapse
|
10
|
Hibbins MS, Hahn MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics 2022; 220:iyab173. [PMID: 34788444 PMCID: PMC9208645 DOI: 10.1093/genetics/iyab173] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/02/2021] [Indexed: 12/26/2022] Open
Abstract
Phylogenomics has revealed the remarkable frequency with which introgression occurs across the tree of life. These discoveries have been enabled by the rapid growth of methods designed to detect and characterize introgression from whole-genome sequencing data. A large class of phylogenomic methods makes use of data across species to infer and characterize introgression based on expectations from the multispecies coalescent. These methods range from simple tests, such as the D-statistic, to model-based approaches for inferring phylogenetic networks. Here, we provide a detailed overview of the various signals that different modes of introgression are expected leave in the genome, and how current methods are designed to detect them. We discuss the strengths and pitfalls of these approaches and identify areas for future development, highlighting the different signals of introgression, and the power of each method to detect them. We conclude with a discussion of current challenges in inferring introgression and how they could potentially be addressed.
Collapse
Affiliation(s)
- Mark S Hibbins
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
11
|
Borges R, Boussau B, Szöllősi GJ, Kosiol C. Nucleotide Usage Biases Distort Inferences of the Species Tree. Genome Biol Evol 2022; 14:6496956. [PMID: 34983052 PMCID: PMC8829901 DOI: 10.1093/gbe/evab290] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2021] [Indexed: 12/15/2022] Open
Abstract
Despite the importance of natural selection in species’ evolutionary history, phylogenetic methods that take into account population-level processes typically ignore selection. The assumption of neutrality is often based on the idea that selection occurs at a minority of loci in the genome and is unlikely to compromise phylogenetic inferences significantly. However, genome-wide processes like GC-bias and some variation segregating at the coding regions are known to evolve in the nearly neutral range. As we are now using genome-wide data to estimate species trees, it is natural to ask whether weak but pervasive selection is likely to blur species tree inferences. We developed a polymorphism-aware phylogenetic model tailored for measuring signatures of nucleotide usage biases to test the impact of selection in the species tree. Our analyses indicate that although the inferred relationships among species are not significantly compromised, the genetic distances are systematically underestimated in a node-height-dependent manner: that is, the deeper nodes tend to be more underestimated than the shallow ones. Such biases have implications for molecular dating. We dated the evolutionary history of 30 worldwide fruit fly populations, and we found signatures of GC-bias considerably affecting the estimated divergence times (up to 23%) in the neutral model. Our findings call for the need to account for selection when quantifying divergence or dating species evolution.
Collapse
Affiliation(s)
- Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Bastien Boussau
- Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5558, LBBE, Villeurbanne, France
| | - Gergely J Szöllősi
- Department of Biological Physics, Eötvös University, Budapest , Hungary.,MTA-ELTE "Lendület" Evolutionary Genomics Research Group, Budapest, Hungary.,Evolutionary Systems Research Group, Centre for Ecological Research, Hungarian Academy of Sciences, Tihany, Hungary
| | - Carolin Kosiol
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Centre for Biological Diversity, University of St Andrews, St Andrews, United Kingdom
| |
Collapse
|
12
|
Jorna J, Linde JB, Searle PC, Jackson AC, Nielsen M, Nate MS, Saxton NA, Grewe F, Herrera‐Campos MDLA, Spjut RW, Wu H, Ho B, Lumbsch HT, Leavitt SD. Species boundaries in the messy middle-A genome-scale validation of species delimitation in a recently diverged lineage of coastal fog desert lichen fungi. Ecol Evol 2021; 11:18615-18632. [PMID: 35003697 PMCID: PMC8717302 DOI: 10.1002/ece3.8467] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 11/01/2021] [Accepted: 11/16/2021] [Indexed: 12/05/2022] Open
Abstract
Species delimitation among closely related species is challenging because traditional phenotype-based approaches, for example, using morphology, ecological, or chemical characteristics, may not coincide with natural groupings. With the advent of high-throughput sequencing, it has become increasingly cost-effective to acquire genome-scale data which can resolve previously ambiguous species boundaries. As the availability of genome-scale data has increased, numerous species delimitation analyses, such as BPP and SNAPP+Bayes factor delimitation (BFD*), have been developed to delimit species boundaries. However, even empirical molecular species delimitation approaches can be biased by confounding evolutionary factors, for example, hybridization/introgression and incomplete lineage sorting, and computational limitations. Here, we investigate species boundaries and the potential for micro-endemism in a lineage of lichen-forming fungi, Niebla Rundel & Bowler, in the family Ramalinaceae by analyzing single-locus and genome-scale data consisting of (a) single-locus species delimitation analysis using ASAP, (b) maximum likelihood-based phylogenetic tree inference, (c) genome-scale species delimitation models, e.g., BPP and SNAPP+BFD, and (d) species validation using the genealogical divergence index (gdi). We specifically use these methods to cross-validate results between genome-scale and single-locus datasets, differently sampled subsets of genomic data and to control for population-level genetic divergence. Our species delimitation models tend to support more speciose groupings that were inconsistent with traditional taxonomy, supporting a hypothesis of micro-endemism, which may include morphologically cryptic species. However, the models did not converge on robust, consistent species delimitations. While the results of our analysis are somewhat ambiguous in terms of species boundaries, they provide a valuable perspective on how to use these empirical species delimitation methods in a nonmodel system. This study thus highlights the challenges inherent in delimiting species, particularly in groups such as Niebla, with complex, relatively recent phylogeographic histories.
Collapse
Affiliation(s)
- Jesse Jorna
- Department of BiologyBrigham Young UniversityProvoUtahUSA
| | | | | | | | | | | | | | - Felix Grewe
- Science & EducationThe Grainger Bioinformatics CenterThe Field MuseumChicagoIllinoisUSA
| | | | | | - Huini Wu
- Science & EducationThe Grainger Bioinformatics CenterThe Field MuseumChicagoIllinoisUSA
| | - Brian Ho
- Science & EducationThe Grainger Bioinformatics CenterThe Field MuseumChicagoIllinoisUSA
| | - H. Thorsten Lumbsch
- Science & EducationThe Grainger Bioinformatics CenterThe Field MuseumChicagoIllinoisUSA
| | - Steven D. Leavitt
- Department of BiologyBrigham Young UniversityProvoUtahUSA
- Monte L. Bean Life Science MuseumBrigham Young UniversityProvoUtahUSA
| |
Collapse
|
13
|
Adams RH, Castoe TA, DeGiorgio M. PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments. Bioinformatics 2021; 37:1923-1925. [PMID: 33051672 DOI: 10.1093/bioinformatics/btaa884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 09/16/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Here, we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data. AVAILABILITYAND IMPLEMENTATION Available at Github (https://github.com/radamsRHA/PhyloWGA). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Richard H Adams
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
| |
Collapse
|
14
|
Abstract
In this review, we discuss the current status and future challenges for fully elucidating the fungal tree of life. In the last 15 years, advances in genomic technologies have revolutionized fungal systematics, ushering the field into the phylogenomic era. This has made the unthinkable possible, namely access to the entire genetic record of all known extant taxa. We first review the current status of the fungal tree and highlight areas where additional effort will be required. We then review the analytical challenges imposed by the volume of data and discuss methods to recover the most accurate species tree given the sea of gene trees. Highly resolved and deeply sampled trees are being leveraged in novel ways to study fungal radiations, species delimitation, and metabolic evolution. Finally, we discuss the critical issue of incorporating the unnamed and uncultured dark matter taxa that represent the vast majority of fungal diversity.
Collapse
Affiliation(s)
- Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA;
| | - Jason E Stajich
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA;
| | - Chris Todd Hittinger
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science and Innovation, J.F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53726, USA;
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235, USA;
| |
Collapse
|
15
|
Reyes-Velasco J, Adams RH, Boissinot S, Parkinson CL, Campbell JA, Castoe TA, Smith EN. Genome-wide SNPs clarify lineage diversity confused by coloration in coralsnakes of the Micrurus diastema species complex (Serpentes: Elapidae). Mol Phylogenet Evol 2020; 147:106770. [PMID: 32084510 DOI: 10.1016/j.ympev.2020.106770] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 02/06/2020] [Accepted: 02/14/2020] [Indexed: 01/04/2023]
Abstract
New world coralsnakes of the genus Micrurus are a diverse radiation of highly venomous and brightly colored snakes that range from North Carolina to Argentina. Species in this group have played central roles in developing and testing hypotheses about the evolution of mimicry and aposematism. Despite their diversity and prominence as model systems, surprisingly little is known about species boundaries and phylogenetic relationships within Micrurus, which has substantially hindered meaningful analyses of their evolutionary history. Here we use mitochondrial genes together with thousands of nuclear genomic loci obtained via ddRADseq to study the phylogenetic relationships and population genomics of a subclade of the genus Micrurus: The M. diastema species complex. Our results indicate that prior species and species-group inferences based on morphology and color pattern have grossly misguided taxonomy, and that the M. diastema complex is not monophyletic. Based on our analyses of molecular data, we infer the phylogenetic relationships among species and populations, and provide a revised taxonomy for the group. Two non-sister species-complexes with similar color patterns are recognized, the M. distans and the M. diastema complexes, the first being basal to the monadal Micrurus and the second encompassing most North American monadal taxa. We examined all 13 species, and their respective subspecies, for a total of 24 recognized taxa in the M. diastema species complex. Our analyses suggest a reduction to 10 species, with no subspecific designations warranted, to be a more likely estimate of species diversity, namely, M. apiatus, M. browni, M. diastema, M. distans, M. ephippifer, M. fulvius, M. michoacanensis, M. oliveri, M. tener, and one undescribed species.
Collapse
Affiliation(s)
- Jacobo Reyes-Velasco
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Drive, 337 Life Science, Arlington, TX 76010, USA; New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Richard H Adams
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Drive, 337 Life Science, Arlington, TX 76010, USA
| | - Stephane Boissinot
- New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Christopher L Parkinson
- Department of Biological Sciences and Department of Forestry and Environmental Conservation, Clemson University, 190 Collins St., Clemson, SC 29634, USA
| | - Jonathan A Campbell
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Drive, 337 Life Science, Arlington, TX 76010, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Drive, 337 Life Science, Arlington, TX 76010, USA
| | - Eric N Smith
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Drive, 337 Life Science, Arlington, TX 76010, USA.
| |
Collapse
|
16
|
Adams RH, Castoe TA. Probabilistic Species Tree Distances: Implementing the Multispecies Coalescent to Compare Species Trees Within the Same Model-Based Framework Used to Estimate Them. Syst Biol 2020; 69:194-207. [PMID: 31086978 DOI: 10.1093/sysbio/syz031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Accepted: 05/02/2019] [Indexed: 11/14/2022] Open
Abstract
Despite the ubiquitous use of statistical models for phylogenomic and population genomic inferences, this model-based rigor is rarely applied to post hoc comparison of trees. In a recent study, Garba et al. derived new methods for measuring the distance between two gene trees computed as the difference in their site pattern probability distributions. Unlike traditional metrics that compare trees solely in terms of geometry, these measures consider gene trees and associated parameters as probabilistic models that can be compared using standard information theoretic approaches. Consequently, probabilistic measures of phylogenetic tree distance can be far more informative than simply comparisons of topology and/or branch lengths alone. However, in their current form, these distance measures are not suitable for the comparison of species tree models in the presence of gene tree heterogeneity. Here, we demonstrate an approach for how the theory of Garba et al. (2018), which is based on gene tree distances, can be extended naturally to the comparison of species tree models. Multispecies coalescent (MSC) models parameterize the discrete probability distribution of gene trees conditioned upon a species tree with a particular topology and set of divergence times (in coalescent units), and thus provide a framework for measuring distances between species tree models in terms of their corresponding gene tree topology probabilities. We describe the computation of probabilistic species tree distances in the context of standard MSC models, which assume complete genetic isolation postspeciation, as well as recent theoretical extensions to the MSC in the form of network-based MSC models that relax this assumption and permit hybridization among taxa. We demonstrate these metrics using simulations and empirical species tree estimates and discuss both the benefits and limitations of these approaches. We make our species tree distance approach available as an R package called pSTDistanceR, for open use by the community.
Collapse
Affiliation(s)
- Richard H Adams
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, USA
| |
Collapse
|
17
|
He C, Liang D, Zhang P. Asymmetric Distribution of Gene Trees Can Arise under Purifying Selection If Differences in Population Size Exist. Mol Biol Evol 2019; 37:881-892. [DOI: 10.1093/molbev/msz232] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
AbstractIncomplete lineage sorting (ILS) is an important factor that causes gene tree discordance. For gene trees of three species, under neutrality, random mating, and the absence of interspecific gene flow, ILS creates a symmetric distribution of gene trees: the gene tree that accords with the species tree has the highest frequency, and the two discordant trees are equally frequent. If the neutral condition is violated, the impact of ILS may change, altering the gene tree distribution. Here, we show that under purifying selection, even assuming that the fitness effect of mutations is constant throughout the species tree, if differences in population size exist among species, asymmetric distributions of gene trees will arise, which is different from the expectation under neutrality. In extremes, one of the discordant trees rather than the concordant tree becomes the most frequent gene tree. In addition, we found that in a real case, the position of Scandentia relative to Primate and Glires, the symmetry in the gene tree distribution can be influenced by the strength of purifying selection. In current phylogenetic inference, the impact of purifying selection on the gene tree distribution is rarely considered by researchers. This study highlights the necessity of considering this impact.
Collapse
Affiliation(s)
- Chong He
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
18
|
Adams RH, Schield DR, Castoe TA. Recent Advances in the Inference of Gene Flow from Population Genomic Data. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/s40610-019-00120-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
19
|
Pasquesi GIM, Adams RH, Card DC, Schield DR, Corbin AB, Perry BW, Reyes-Velasco J, Ruggiero RP, Vandewege MW, Shortt JA, Castoe TA. Squamate reptiles challenge paradigms of genomic repeat element evolution set by birds and mammals. Nat Commun 2018; 9:2774. [PMID: 30018307 PMCID: PMC6050309 DOI: 10.1038/s41467-018-05279-1] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 06/25/2018] [Indexed: 12/14/2022] Open
Abstract
Broad paradigms of vertebrate genomic repeat element evolution have been largely shaped by analyses of mammalian and avian genomes. Here, based on analyses of genomes sequenced from over 60 squamate reptiles (lizards and snakes), we show that patterns of genomic repeat landscape evolution in squamates challenge such paradigms. Despite low variance in genome size, squamate genomes exhibit surprisingly high variation among species in abundance (ca. 25–73% of the genome) and composition of identifiable repeat elements. We also demonstrate that snake genomes have experienced microsatellite seeding by transposable elements at a scale unparalleled among eukaryotes, leading to some snake genomes containing the highest microsatellite content of any known eukaryote. Our analyses of transposable element evolution across squamates also suggest that lineage-specific variation in mechanisms of transposable element activity and silencing, rather than variation in species-specific demography, may play a dominant role in driving variation in repeat element landscapes across squamate phylogeny. Large-scale patterns of genomic repeat element evolution have been studied mainly in birds and mammals. Here, the authors analyze the genomes of over 60 squamate reptiles and show high variation in repeat elements compared to mammals and birds, and particularly high microsatellite seeding in snakes.
Collapse
Affiliation(s)
- Giulia I M Pasquesi
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Richard H Adams
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Daren C Card
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Drew R Schield
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Andrew B Corbin
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Blair W Perry
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA
| | - Jacobo Reyes-Velasco
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA.,Department of Biology, New York University Abu Dhabi, Saadiyat Island, United Arab Emirates
| | - Robert P Ruggiero
- Department of Biology, New York University Abu Dhabi, Saadiyat Island, United Arab Emirates
| | - Michael W Vandewege
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Jonathan A Shortt
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, 501S. Nedderman Drive, Arlington, TX, 76019, USA.
| |
Collapse
|