1
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
2
|
Simmons MP, Springer MS, Gatesy J. Gene-tree misrooting drives conflicts in phylogenomic coalescent analyses of palaeognath birds. Mol Phylogenet Evol 2021; 167:107344. [PMID: 34748873 DOI: 10.1016/j.ympev.2021.107344] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/08/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022]
Abstract
Phylogenomic analyses of ancient rapid radiations can produce conflicting results that are driven by differential sampling of taxa and characters as well as the limitations of alternative analytical methods. We re-examine basal relationships of palaeognath birds (ratites and tinamous) using recently published datasets of nucleotide characters from 20,850 loci as well as 4301 retroelement insertions. The original studies attributed conflicting resolutions of rheas in their inferred coalescent and concatenation trees to concatenation failing in the anomaly zone. By contrast, we find that the coalescent-based resolution of rheas is premised upon extensive gene-tree estimation errors. Furthermore, retroelement insertions contain much more conflict than originally reported and multiple insertion loci support the basal position of rheas found in concatenation trees, while none were reported in the original publication. We demonstrate how even remarkable congruence in phylogenomic studies may be driven by long-branch misplacement of a divergent outgroup, highly incongruent gene trees, differential taxon sampling that can result in gene-tree misrooting errors that bias species-tree inference, and gross homology errors. What was previously interpreted as broad, robustly supported corroboration for a single resolution in coalescent analyses may instead indicate a common bias that taints phylogenomic results across multiple genome-scale datasets. The updated retroelement dataset now supports a species tree with branch lengths that suggest an ancient anomaly zone, and both concatenation and coalescent analyses of the huge nucleotide datasets fail to yield coherent, reliable results in this challenging phylogenetic context.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
3
|
Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021; 71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, College Park, 20742, USA
| | - John Gatesy
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, 92521, USA
| |
Collapse
|
4
|
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses. Mol Phylogenet Evol 2021; 158:107092. [PMID: 33545272 DOI: 10.1016/j.ympev.2021.107092] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/30/2020] [Accepted: 01/28/2021] [Indexed: 01/15/2023]
Abstract
In two-step coalescent analyses of phylogenomic data, gene-tree topologies are treated as fixed prior to species-tree inference. Although all gene-tree conflict is assumed to be caused by lineage sorting when applying these methods, in empirical datasets much of the conflict can be caused by estimation error. Weakly supported and even arbitrarily resolved clades are important sources of this estimation error for gene trees inferred from few informative characters relative to the number of sampled terminals, and the resulting extraneous conflict among gene trees can negatively impact species-tree inference. In this study, we quantified the relative severity of alternative methods for collapsing gene-tree branches for seven empirical datasets and quantified their effects on species-tree inference. The branch-collapsing methods that we employed were based on the strict consensus of optimal topologies, various bootstrap thresholds, and 0% approximate likelihood ratio test (SH-like aLRT) support. Up to 86% of internal gene-tree branches are dubiously or arbitrarily resolved in reanalyses of these published phylogenomic datasets, and collapsing these branches increased inferred species-tree coalescent branch lengths by up to 455%. For two datasets, the longer inferred branch lengths sometimes impacted inference of anomaly-zone conditions. Although branch-collapsing methods did not consistently affect the species-tree topology, they often increased branch support. The more severe and clearly justified gene-tree branch-collapsing methods, which we recommend be broadly applied for two-step coalescent analyses, are use of the strict consensus in parsimony analyses and the collapse clades with 0% SH-like aLRT support in likelihood analyses. Collapsing dubiously or arbitrarily resolved branches in gene trees sometimes improved congruence between coalescent-based results and concatenation trees. In such cases, we contend that the resolution provided by concatenation should be preferred and that incomplete lineage sorting is a poor explanation for the initial conflict between phylogenetic approaches.
Collapse
|
5
|
Springer MS, Molloy EK, Sloan DB, Simmons MP, Gatesy J. ILS-Aware Analysis of Low-Homoplasy Retroelement Insertions: Inference of Species Trees and Introgression Using Quartets. J Hered 2019; 111:147-168. [DOI: 10.1093/jhered/esz076] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 12/12/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
DNA sequence alignments have provided the majority of data for inferring phylogenetic relationships with both concatenation and coalescent methods. However, DNA sequences are susceptible to extensive homoplasy, especially for deep divergences in the Tree of Life. Retroelement insertions have emerged as a powerful alternative to sequences for deciphering evolutionary relationships because these data are nearly homoplasy-free. In addition, retroelement insertions satisfy the “no intralocus-recombination” assumption of summary coalescent methods because they are singular events and better approximate neutrality relative to DNA loci commonly sampled in phylogenomic studies. Retroelements have traditionally been analyzed with parsimony, distance, and network methods. Here, we analyze retroelement data sets for vertebrate clades (Placentalia, Laurasiatheria, Balaenopteroidea, Palaeognathae) with 2 ILS-aware methods that operate by extracting, weighting, and then assembling unrooted quartets into a species tree. The first approach constructs a species tree from retroelement bipartitions with ASTRAL, and the second method is based on split-decomposition with parsimony. We also develop a Quartet-Asymmetry test to detect hybridization using retroelements. Both ILS-aware methods recovered the same species-tree topology for each data set. The ASTRAL species trees for Laurasiatheria have consecutive short branch lengths in the anomaly zone whereas Palaeognathae is outside of this zone. For the Balaenopteroidea data set, which includes rorquals (Balaenopteridae) and gray whale (Eschrichtiidae), both ILS-aware methods resolved balaeonopterids as paraphyletic. Application of the Quartet-Asymmetry test to this data set detected 19 different quartets of species for which historical introgression may be inferred. Evidence for introgression was not detected in the other data sets.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO
| | - Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
| |
Collapse
|
6
|
Springer MS, Foley NM, Brady PL, Gatesy J, Murphy WJ. Evolutionary Models for the Diversification of Placental Mammals Across the KPg Boundary. Front Genet 2019; 10:1241. [PMID: 31850081 PMCID: PMC6896846 DOI: 10.3389/fgene.2019.01241] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 11/08/2019] [Indexed: 01/29/2023] Open
Abstract
Deciphering the timing of the placental mammal radiation is a longstanding problem in evolutionary biology, but consensus on the tempo and mode of placental diversification remains elusive. Nevertheless, an accurate timetree is essential for understanding the role of important events in Earth history (e.g., Cretaceous Terrestrial Revolution, KPg mass extinction) in promoting the taxonomic and ecomorphological diversification of Placentalia. Archibald and Deutschman described three competing models for the diversification of placental mammals, which are the Explosive, Long Fuse, and Short Fuse Models. More recently, the Soft Explosive Model and Trans-KPg Model have emerged as additional hypotheses for the placental radiation. Here, we review molecular and paleontological evidence for each of these five models including the identification of general problems that can negatively impact divergence time estimates. The Long Fuse Model has received more support from relaxed clock studies than any of the other models, but this model is not supported by morphological cladistic studies that position Cretaceous eutherians outside of crown Placentalia. At the same time, morphological cladistics has a poor track record of reconstructing higher-level relationships among the orders of placental mammals including the results of new pseudoextinction analyses that we performed on the largest available morphological data set for mammals (4,541 characters). We also examine the strengths and weaknesses of different timetree methods (node dating, tip dating, and fossilized birth-death dating) that may now be applied to estimate the timing of the placental radiation. While new methods such as tip dating are promising, they also have problems that must be addressed if these methods are to effectively discriminate among competing hypotheses for placental diversification. Finally, we discuss the complexities of timetree estimation when the signal of speciation times is impacted by incomplete lineage sorting (ILS) and hybridization. Not accounting for ILS results in dates that are older than speciation events. Hybridization, in turn, can result in dates than are younger or older than speciation dates. Disregarding this potential variation in "gene" history across the genome can distort phylogenetic branch lengths and divergence estimates when multiple unlinked genomic loci are combined together in a timetree analysis.
Collapse
Affiliation(s)
- Mark S. Springer
- Department of Evolution, Ecology, and Evolutionary Biology, University of California, Riverside, Riverside, CA, United States
| | - Nicole M. Foley
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States
| | - Peggy L. Brady
- Department of Evolution, Ecology, and Evolutionary Biology, University of California, Riverside, Riverside, CA, United States
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, United States
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States
| |
Collapse
|
7
|
Gatesy J, Sloan DB, Warren JM, Baker RH, Simmons MP, Springer MS. Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts. Mol Phylogenet Evol 2019; 139:106539. [DOI: 10.1016/j.ympev.2019.106539] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/10/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022]
|
8
|
Affiliation(s)
- Mark S. Springer
- Department of Biology, University of California, Riverside, CA 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|