51
|
Wen D, Nakhleh L. Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data. Syst Biol 2017; 67:439-457. [DOI: 10.1093/sysbio/syx085] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 10/24/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Luay Nakhleh
- Department of Computer Science
- Department of BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
52
|
Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017; 67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
53
|
Bhattacharyya S, Mukherjee J. IDXL: Species Tree Inference Using Internode Distance and Excess Gene Leaf Count. J Mol Evol 2017; 85:57-78. [PMID: 28835989 DOI: 10.1007/s00239-017-9807-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 08/09/2017] [Indexed: 11/28/2022]
Abstract
We propose an extension of the distance matrix methods NJst and ASTRID to infer species trees from incongruent gene trees having Incomplete Lineage Sorting. Both approaches consider the average internode distance (ID) between individual taxa pairs as the distance measure. The measure ID does not use the root of a tree, and thus may not always infer the relative position of a taxon with respect to the root. We define a novel distance measure excess gene leaf count (XL) between individual couplets. The XL measure is computed using the root of a tree. It is proved to be additive, and is shown to infer the relative order of divergence among individual couplets better. We propose a novel method IDXL which uses both the XL and ID measures for species tree construction. IDXL is shown to perform better than NJst and other distance matrix approaches for most of the biological and simulated datasets. Having the same computational complexity as NJst, IDXL can be applied for species tree inference on large-scale biological datasets.
Collapse
Affiliation(s)
- Sourya Bhattacharyya
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India.
| | - Jayanta Mukherjee
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| |
Collapse
|
54
|
Léveillé-Bourret É, Starr JR, Ford BA, Moriarty Lemmon E, Lemmon AR. Resolving Rapid Radiations within Angiosperm Families Using Anchored Phylogenomics. Syst Biol 2017; 67:94-112. [DOI: 10.1093/sysbio/syx050] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 04/28/2017] [Indexed: 11/13/2022] Open
|
55
|
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology Harvard University Cambridge MA 02138 USA
| |
Collapse
|
56
|
Shen XX, Salichos L, Rokas A. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference. Genome Biol Evol 2016; 8:2565-80. [PMID: 27492233 PMCID: PMC5010910 DOI: 10.1093/gbe/evw179] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2016] [Indexed: 12/13/2022] Open
Abstract
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University
| | - Leonidas Salichos
- Department of Biological Sciences, Vanderbilt University Department of Molecular Biophysics and Biochemistry, Yale University
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University
| |
Collapse
|
57
|
Foley NM, Springer MS, Teeling EC. Mammal madness: is the mammal tree of life not yet resolved? Philos Trans R Soc Lond B Biol Sci 2016; 371:20150140. [PMID: 27325836 PMCID: PMC4920340 DOI: 10.1098/rstb.2015.0140] [Citation(s) in RCA: 159] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2016] [Indexed: 11/12/2022] Open
Abstract
Most molecular phylogenetic studies place all placental mammals into four superordinal groups, Laurasiatheria (e.g. dogs, bats, whales), Euarchontoglires (e.g. humans, rodents, colugos), Xenarthra (e.g. armadillos, anteaters) and Afrotheria (e.g. elephants, sea cows, tenrecs), and estimate that these clades last shared a common ancestor 90-110 million years ago. This phylogeny has provided a framework for numerous functional and comparative studies. Despite the high level of congruence among most molecular studies, questions still remain regarding the position and divergence time of the root of placental mammals, and certain 'hard nodes' such as the Laurasiatheria polytomy and Paenungulata that seem impossible to resolve. Here, we explore recent consensus and conflict among mammalian phylogenetic studies and explore the reasons for the remaining conflicts. The question of whether the mammal tree of life is or can be ever resolved is also addressed.This article is part of the themed issue 'Dating species divergences using rocks and clocks'.
Collapse
Affiliation(s)
- Nicole M Foley
- School of Biology and Environmental Science, Science Centre East, University College Dublin, Dublin 4, Ireland
| | - Mark S Springer
- Department of Biology, University of California, Riverside, CA 92521, USA
| | - Emma C Teeling
- School of Biology and Environmental Science, Science Centre East, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
58
|
Linkem CW, Minin VN, Leaché AD. Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae). Syst Biol 2016; 65:465-77. [PMID: 26738927 PMCID: PMC6383586 DOI: 10.1093/sysbio/syw001] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 12/29/2015] [Indexed: 01/28/2023] Open
Abstract
The anomaly zone, defined by the presence of gene tree topologies that are more probable than the true species tree, presents a major challenge to the accurate resolution of many parts of the Tree of Life. This discrepancy can result from consecutive rapid speciation events in the species tree. Similar to the problem of long-branch attraction, including more data via loci concatenation will only reinforce the support for the incorrect species tree. Empirical phylogenetic studies often employ coalescent-based species tree methods to avoid the anomaly zone, but to this point these studies have not had a method for providing any direct evidence that the species tree is actually in the anomaly zone. In this study, we use 16 species of lizards in the family Scincidae to investigate whether nodes that are difficult to resolve place the species tree within the anomaly zone. We analyze new phylogenomic data (429 loci), using both concatenation and coalescent-based species tree estimation, to locate conflicting topological signal. We then use the unifying principle of the anomaly zone, together with estimates of ancestral population sizes and species persistence times, to determine whether the observed phylogenetic conflict is a result of the anomaly zone. We identify at least three regions of the Scincidae phylogeny that provide demographic signatures consistent with the anomaly zone, and this new information helps reconcile the phylogenetic conflict in previously published studies on these lizards. The anomaly zone presents a real problem in phylogenetics, and our new framework for identifying anomalous relationships will help empiricists leverage their resources appropriately for investigating and overcoming this challenge.
Collapse
Affiliation(s)
| | - Vladimir N Minin
- Department of Biology, University of Washington, Seattle WA; Department of Statistics, University of Washington, Seattle WA
| | - Adam D Leaché
- Department of Biology, University of Washington, Seattle WA; Burke Museum of Natural History and Culture, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
59
|
Larson ER, Castelin M, Williams BW, Olden JD, Abbott CL. Phylogenetic species delimitation for crayfishes of the genus Pacifastacus. PeerJ 2016; 4:e1915. [PMID: 27114875 PMCID: PMC4841241 DOI: 10.7717/peerj.1915] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Accepted: 03/18/2016] [Indexed: 12/20/2022] Open
Abstract
Molecular genetic approaches are playing an increasing role in conservation science by identifying biodiversity that may not be evident by morphology-based taxonomy and systematics. So-called cryptic species are particularly prevalent in freshwater environments, where isolation of dispersal-limited species, such as crayfishes, within dendritic river networks often gives rise to high intra- and inter-specific genetic divergence. We apply here a multi-gene molecular approach to investigate relationships among extant species of the crayfish genus Pacifastacus, representing the first comprehensive phylogenetic study of this taxonomic group. Importantly, Pacifastacus includes both the widely invasive signal crayfish Pacifastacus leniusculus, as well as several species of conservation concern like the Shasta crayfish Pacifastacus fortis. Our analysis used 83 individuals sampled across the four extant Pacifastacus species (omitting the extinct Pacifastacus nigrescens), representing the known taxonomic diversity and geographic distributions within this genus as comprehensively as possible. We reconstructed phylogenetic trees from mitochondrial (16S, COI) and nuclear genes (GAPDH), both separately and using a combined or concatenated dataset, and performed several species delimitation analyses (PTP, ABGD, GMYC) on the COI phylogeny to propose Primary Species Hypotheses (PSHs) within the genus. All phylogenies recovered the genus Pacifastacus as monophyletic, within which we identified a range of six to 21 PSHs; more abundant PSHs delimitations from GMYC and ABGD were always nested within PSHs delimited by the more conservative PTP method. Pacifastacus leniusculus included the majority of PSHs and was not monophyletic relative to the other Pacifastacus species considered. Several of these highly distinct P. leniusculus PSHs likely require urgent conservation attention. Our results identify research needs and conservation priorities for Pacifastacus crayfishes in western North America, and may inform better understanding and management of P. leniusculus in regions where it is invasive, such as Europe and Japan.
Collapse
Affiliation(s)
- Eric R Larson
- Department of Natural Resources and Environmental Sciences, University of Illinois at Urbana-Champaign , Urbana, Illinois , United States
| | - Magalie Castelin
- Pacific Biological Station, Fisheries and Oceans Canada , Nanaimo, British Columbia , Canada
| | - Bronwyn W Williams
- North Carolina Museum of Natural Sciences , Raleigh, North Carolina , United States
| | - Julian D Olden
- School of Aquatic and Fishery Sciences, University of Washington , Seattle, Washington , United States
| | - Cathryn L Abbott
- Pacific Biological Station, Fisheries and Oceans Canada , Nanaimo, British Columbia , Canada
| |
Collapse
|
60
|
Comer JR, Zomlefer WB, Barrett CF, Stevenson DW, Heyduk K, Leebens-Mack JH. Nuclear phylogenomics of the palm subfamily Arecoideae (Arecaceae). Mol Phylogenet Evol 2016; 97:32-42. [DOI: 10.1016/j.ympev.2015.12.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 12/04/2015] [Accepted: 12/23/2015] [Indexed: 02/02/2023]
|
61
|
Meiklejohn KA, Faircloth BC, Glenn TC, Kimball RT, Braun EL. Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods. Syst Biol 2016; 65:612-27. [DOI: 10.1093/sysbio/syw014] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 01/25/2016] [Indexed: 01/30/2023] Open
|
62
|
Ogilvie HA, Heled J, Xie D, Drummond AJ. Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods. Syst Biol 2016; 65:381-96. [PMID: 26821913 PMCID: PMC4851174 DOI: 10.1093/sysbio/syv118] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Accepted: 12/07/2015] [Indexed: 01/02/2023] Open
Abstract
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behavior of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow-up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.
Collapse
Affiliation(s)
- Huw A Ogilvie
- Evolution, Ecology and Genetics, Research School of Biology, The Australian National University, Canberra, Australia
| | - Joseph Heled
- Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| | - Dong Xie
- Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Department of Computer Science, University of Auckland, Auckland, New Zealand; Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|
63
|
Simmons MP, Sloan DB, Gatesy J. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 2016; 97:76-89. [PMID: 26768112 DOI: 10.1016/j.ympev.2015.12.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/03/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA
| |
Collapse
|
64
|
Springer MS, Gatesy J. The gene tree delusion. Mol Phylogenet Evol 2016; 94:1-33. [DOI: 10.1016/j.ympev.2015.07.018] [Citation(s) in RCA: 198] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 06/04/2015] [Accepted: 07/22/2015] [Indexed: 10/23/2022]
|
65
|
Richart CH, Hayashi CY, Hedin M. Phylogenomic analyses resolve an ancient trichotomy at the base of Ischyropsalidoidea (Arachnida, Opiliones) despite high levels of gene tree conflict and unequal minority resolution frequencies. Mol Phylogenet Evol 2015; 95:171-82. [PMID: 26691642 DOI: 10.1016/j.ympev.2015.11.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 09/16/2015] [Accepted: 11/13/2015] [Indexed: 11/19/2022]
Abstract
Phylogenetic resolution of ancient rapid radiations has remained problematic despite major advances in statistical approaches and DNA sequencing technologies. Here we report on a combined phylogenetic approach utilizing transcriptome data in conjunction with Sanger sequence data to investigate a tandem of ancient divergences in the harvestmen superfamily Ischyropsalidoidea (Arachnida, Opiliones, Dyspnoi). We rely on Sanger sequences to resolve nodes within and between closely related genera, and use RNA-seq data from a subset of taxa to resolve a short and ancient internal branch. We use several analytical approaches to explore this succession of ancient diversification events, including concatenated and coalescent-based analyses and maximum likelihood gene trees for each locus. We evaluate the robustness of phylogenetic inferences using a randomized locus sub-sampling approach, and find congruence across these methods despite considerable incongruence across gene trees. Incongruent gene trees are not recovered in frequencies expected from a simple multispecies coalescent model, and we reject incomplete lineage sorting as the sole contributor to gene tree conflict. Using these approaches we attain robust support for higher-level phylogenetic relationships within Ischyropsalidoidea.
Collapse
Affiliation(s)
- Casey H Richart
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA; Department of Biology, University of California, Riverside, CA 92521, USA.
| | - Cheryl Y Hayashi
- Department of Biology, University of California, Riverside, CA 92521, USA
| | - Marshal Hedin
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA
| |
Collapse
|
66
|
Xi Z, Liu L, Davis CC. The Impact of Missing Data on Species Tree Estimation. Mol Biol Evol 2015; 33:838-60. [DOI: 10.1093/molbev/msv266] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
|
67
|
Mallo D, De Oliveira Martins L, Posada D. SimPhy: Phylogenomic Simulation of Gene, Locus, and Species Trees. Syst Biol 2015; 65:334-44. [PMID: 26526427 PMCID: PMC4748750 DOI: 10.1093/sysbio/syv082] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 10/20/2015] [Indexed: 11/14/2022] Open
Abstract
We present a fast and flexible software package--SimPhy--for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer--all three potentially leading to species tree/gene tree discordance--and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy's output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, precompiled executables, a detailed manual and example cases.
Collapse
Affiliation(s)
- Diego Mallo
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| | | | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| |
Collapse
|
68
|
Liu L, Edwards SV. Comment on "Statistical binning enables an accurate coalescent-based estimation of the avian tree". Science 2015; 350:171. [PMID: 26450203 DOI: 10.1126/science.aaa7343] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Mirarab et al. (Research Article, 12 December 2014, p. 1250463) introduced statistical binning to improve the signal in phylogenetic methods using the multispecies coalescent model. We show that all forms of binning-naïve, statistical, and weighted statistical-display poor performance and are statistically inconsistent in large regions of parameter space, unlike unbinned sequence data used with species tree methods.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics, University of Georgia, Athens, GA 30602, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
69
|
A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 2015; 526:569-73. [DOI: 10.1038/nature15697] [Citation(s) in RCA: 965] [Impact Index Per Article: 96.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2015] [Accepted: 09/09/2015] [Indexed: 12/20/2022]
|
70
|
Chou J, Gupta A, Yaduvanshi S, Davidson R, Nute M, Mirarab S, Warnow T. A comparative study of SVDquartets and other coalescent-based species tree estimation methods. BMC Genomics 2015; 16 Suppl 10:S2. [PMID: 26449249 PMCID: PMC4602346 DOI: 10.1186/1471-2164-16-s10-s2] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of ILS, "coalescent-based" summary methods (which first estimate gene trees and then combine gene trees into a species tree) have been developed that have theoretical guarantees of robustness to arbitrarily high amounts of ILS. Some studies have suggested that summary methods should only be used on "c-genes" (i.e., recombination-free loci) that can be extremely short (sometimes fewer than 100 sites). However, gene trees estimated on short alignments can have high estimation error, and summary methods tend to have high error on short c-genes. To address this problem, Chifman and Kubatko introduced SVDquartets, a new coalescent-based method. SVDquartets takes multi-locus unlinked single-site data, infers the quartet trees for all subsets of four species, and then combines the set of quartet trees into a species tree using a quartet amalgamation heuristic. Yet, the relative accuracy of SVDquartets to leading coalescent-based methods has not been assessed. RESULTS We compared SVDquartets to two leading coalescent-based methods (ASTRAL-2 and NJst), and to concatenation using maximum likelihood. We used a collection of simulated datasets, varying ILS levels, numbers of taxa, and number of sites per locus. Although SVDquartets was sometimes more accurate than ASTRAL-2 and NJst, most often the best results were obtained using ASTRAL-2, even on the shortest gene sequence alignments we explored (with only 10 sites per locus). Finally, concatenation was the most accurate of all methods under low ILS conditions. CONCLUSIONS ASTRAL-2 generally had the best accuracy under higher ILS conditions, and concatenation had the best accuracy under the lowest ILS conditions. However, SVDquartets was competitive with the best methods under conditions with low ILS and small numbers of sites per locus. The good performance under many conditions of ASTRAL-2 in comparison to SVDquartets is surprising given the known vulnerability of ASTRAL-2 and similar methods to short gene sequences.
Collapse
Affiliation(s)
- Jed Chou
- Department of Mathematics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ashu Gupta
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Shashank Yaduvanshi
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Ruth Davidson
- Department of Mathematics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Mike Nute
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
71
|
Davidson R, Vachaspati P, Mirarab S, Warnow T. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 2015; 16 Suppl 10:S1. [PMID: 26450506 PMCID: PMC4603753 DOI: 10.1186/1471-2164-16-s10-s1] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. RESULTS We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. CONCLUSION Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.
Collapse
Affiliation(s)
- Ruth Davidson
- Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green Street, 61801 Urbana, IL, USA
| | - Pranjal Vachaspati
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, 2317 Speedway, Stop D9500, 78712 Austin, TX, USA
- Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, 1270 Digital Computer Laboratory, MC-278, 61801 Urbana, IL, USA
| |
Collapse
|
72
|
Abstract
BACKGROUND Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. RESULTS We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. CONCLUSIONS ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.
Collapse
Affiliation(s)
- Pranjal Vachaspati
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, 61801 USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, 61801 USA
| |
Collapse
|
73
|
Simmons MP, Gatesy J. Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms. Mol Phylogenet Evol 2015; 91:98-122. [DOI: 10.1016/j.ympev.2015.05.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 05/01/2015] [Accepted: 05/14/2015] [Indexed: 11/24/2022]
|
74
|
Chen MY, Liang D, Zhang P. Selecting Question-Specific Genes to Reduce Incongruence in Phylogenomics: A Case Study of Jawed Vertebrate Backbone Phylogeny. Syst Biol 2015; 64:1104-20. [PMID: 26276158 DOI: 10.1093/sysbio/syv059] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 08/10/2015] [Indexed: 11/13/2022] Open
Abstract
Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes.
Collapse
Affiliation(s)
- Meng-Yun Chen
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| |
Collapse
|
75
|
Smith SA, Moore MJ, Brown JW, Yang Y. Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evol Biol 2015; 15:150. [PMID: 26239519 PMCID: PMC4524127 DOI: 10.1186/s12862-015-0423-0] [Citation(s) in RCA: 275] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 06/25/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. RESULTS Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. CONCLUSION This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts ), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, S State St, Ann Arbor, 48109, MI, USA.
| | - Michael J Moore
- Department of Biology, Oberlin College, W Lorain St, Oberlin, 44074, OH, USA.
| | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, S State St, Ann Arbor, 48109, MI, USA.
| | - Ya Yang
- Department of Ecology and Evolutionary Biology, University of Michigan, S State St, Ann Arbor, 48109, MI, USA.
| |
Collapse
|
76
|
Xi Z, Liu L, Davis CC. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased. Mol Phylogenet Evol 2015; 92:63-71. [PMID: 26115844 DOI: 10.1016/j.ympev.2015.06.009] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 04/23/2015] [Accepted: 06/16/2015] [Indexed: 11/30/2022]
Abstract
The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).
Collapse
Affiliation(s)
- Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, Athens, GA 30602, USA; Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
77
|
Bayzid MS, Mirarab S, Boussau B, Warnow T. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses. PLoS One 2015; 10:e0129183. [PMID: 26086579 PMCID: PMC4472720 DOI: 10.1371/journal.pone.0129183] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/05/2015] [Indexed: 11/19/2022] Open
Abstract
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.
Collapse
Affiliation(s)
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, Texas, USA
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyons, France
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
78
|
Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting. PLOS CURRENTS 2015; 7:ecurrents.currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7. [PMID: 26064786 PMCID: PMC4450984 DOI: 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Incomplete lineage sorting (ILS), modelled by the multi-species coalescent, is a process that results in a gene tree being different from the species tree. Because ILS is expected to occur for at least some loci within genome-scale analyses, the evaluation of species tree estimation methods in the presence of ILS is of great interest. Performance on simulated and biological data have suggested that concatenation analyses can result in the wrong tree with high support under some conditions, and a recent theoretical result by Roch and Steel proved that concatenation using unpartitioned maximum likelihood analysis can be statistically inconsistent in the presence of ILS. In this study, we survey the major species tree estimation methods, including the newly proposed "statistical binning" methods, and discuss their theoretical properties. We also note that there are two interpretations of the term "statistical consistency", and discuss the theoretical results proven under both interpretations.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign. Urbana, Illinois, USA
| |
Collapse
|
79
|
Giarla TC, Esselstyn JA. The Challenges of Resolving a Rapid, Recent Radiation: Empirical and Simulated Phylogenomics of Philippine Shrews. Syst Biol 2015; 64:727-40. [DOI: 10.1093/sysbio/syv029] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 05/07/2015] [Indexed: 01/30/2023] Open
|
80
|
Heyduk K, Trapnell DW, Barrett CF, Leebens-Mack J. Phylogenomic analyses of species relationships in the genusSabal(Arecaceae) using targeted sequence capture. Biol J Linn Soc Lond 2015. [DOI: 10.1111/bij.12551] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Karolina Heyduk
- Department of Plant Biology; University of Georgia; Athens GA 30602 USA
| | | | - Craig F. Barrett
- Department of Biological Sciences; California State University; Los Angeles CA 90032 USA
| | - Jim Leebens-Mack
- Department of Plant Biology; University of Georgia; Athens GA 30602 USA
| |
Collapse
|
81
|
Liu L, Xi Z, Wu S, Davis CC, Edwards SV. Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci 2015; 1360:36-53. [DOI: 10.1111/nyas.12747] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Liang Liu
- Department of Statistics; University of Georgia; Athens Georgia
- Institute of Bioinformatics; University of Georgia; Athens Georgia
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Shaoyuan Wu
- Department of Biochemistry and Molecular Biology & Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences; Tianjin Medical University; Tianjin China
| | - Charles C. Davis
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| |
Collapse
|
82
|
Roch S, Warnow T. On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. Syst Biol 2015; 64:663-76. [PMID: 25813358 DOI: 10.1093/sysbio/syv016] [Citation(s) in RCA: 104] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 03/20/2015] [Indexed: 11/13/2022] Open
Abstract
The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Tandy Warnow
- Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
83
|
Tonini J, Moore A, Stern D, Shcheglovitova M, Ortí G. Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions. PLOS CURRENTS 2015; 7. [PMID: 25901289 PMCID: PMC4391732 DOI: 10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Phylogeneticists have long understood that several biological processes can cause a gene tree to disagree with its species tree. In recent years, molecular phylogeneticists have increasingly foregone traditional supermatrix approaches in favor of species tree methods that account for one such source of error, incomplete lineage sorting (ILS). While gene tree-species tree discordance no doubt poses a significant challenge to phylogenetic inference with molecular data, researchers have only recently begun to systematically evaluate the relative accuracy of traditional and ILS-sensitive methods. Here, we report on simulations demonstrating that concatenation can perform as well or better than methods that attempt to account for sources of error introduced by ILS. Based on these and similar results from other researchers, we argue that concatenation remains a useful component of the phylogeneticist’s toolbox and highlight that phylogeneticists should continue to make explicit comparisons of results produced by contemporaneous and classical methods.
Collapse
Affiliation(s)
- João Tonini
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| | - Andrew Moore
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - David Stern
- Computational Biology Institute, Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - Maryia Shcheglovitova
- Department of Geography & Environmental Systems, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| |
Collapse
|
84
|
Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor Popul Biol 2015; 100C:56-62. [DOI: 10.1016/j.tpb.2014.12.005] [Citation(s) in RCA: 174] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Revised: 11/10/2014] [Accepted: 12/18/2014] [Indexed: 01/14/2023]
|
85
|
Meyer BS, Matschiner M, Salzburger W. A tribal level phylogeny of Lake Tanganyika cichlid fishes based on a genomic multi-marker approach. Mol Phylogenet Evol 2015; 83:56-71. [PMID: 25433288 PMCID: PMC4334724 DOI: 10.1016/j.ympev.2014.10.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Revised: 10/05/2014] [Accepted: 10/08/2014] [Indexed: 11/14/2022]
Abstract
The species-flocks of cichlid fishes in the East African Great Lakes Victoria, Malawi and Tanganyika constitute the most diverse extant adaptive radiations in vertebrates. Lake Tanganyika, the oldest of the lakes, harbors the morphologically and genetically most diverse assemblage of cichlids and contains the highest number of endemic cichlid genera of all African lakes. Based on morphological grounds, the Tanganyikan cichlid species have been grouped into 12-16 distinct lineages, so-called tribes. While the monophyly of most of the tribes is well established, the phylogenetic relationships among the tribes remain largely elusive. Here, we present a new tribal level phylogenetic hypothesis for the cichlid fishes of Lake Tanganyika that is based on the so far largest set of nuclear markers and a total alignment length of close to 18kb. Using next-generation amplicon sequencing with the 454 pyrosequencing technology, we compiled a dataset consisting of 42 nuclear loci in 45 East African cichlid species, which we subjected to maximum likelihood and Bayesian inference phylogenetic analyses. We analyzed the entire concatenated dataset and each marker individually, and performed a Bayesian concordance analysis and gene tree discordance tests. Overall, we find strong support for a position of the Oreochromini, Boulengerochromini, Bathybatini and Trematocarini outside of a clade combining the substrate spawning Lamprologini and the mouthbrooding tribes of the 'H-lineage', which are both strongly supported to be monophyletic. The Eretmodini are firmly placed within the 'H-lineage', as sister-group to the most species-rich tribe of cichlids, the Haplochromini. The phylogenetic relationships at the base of the 'H-lineage' received less support, which is likely due to high speciation rates in the early phase of the radiation. Discordance among gene trees and marker sets further suggests the occurrence of past hybridization and/or incomplete lineage sorting in the cichlid fishes of Lake Tanganyika.
Collapse
Affiliation(s)
- Britta S Meyer
- Zoological Institute, University of Basel, Vesalgasse 1, 4051 Basel, Switzerland; Evolutionary Ecology of Marine Fishes, GEOMAR Helmholtz Centre for Ocean Research Kiel, Düsternbrooker Weg 20, 24105 Kiel, Germany.
| | - Michael Matschiner
- Zoological Institute, University of Basel, Vesalgasse 1, 4051 Basel, Switzerland; Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Walter Salzburger
- Zoological Institute, University of Basel, Vesalgasse 1, 4051 Basel, Switzerland; Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway.
| |
Collapse
|
86
|
Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. ACTA ACUST UNITED AC 2015; 30:i541-8. [PMID: 25161245 PMCID: PMC4147915 DOI: 10.1093/bioinformatics/btu462] [Citation(s) in RCA: 776] [Impact Index Per Article: 77.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. RESULTS We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy-improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. AVAILABILITY AND IMPLEMENTATION ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- S Mirarab
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| | - R Reaz
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| | - Md S Bayzid
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| | - T Zimmermann
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| | - M S Swenson
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| | - T Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
87
|
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MPC, Prosdocimi F, Samaniego JA, Vargas Velazquez AM, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli KP, O'Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, et alJarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MPC, Prosdocimi F, Samaniego JA, Vargas Velazquez AM, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli KP, O'Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 2014; 346:1320-31. [PMID: 25504713 PMCID: PMC4405904 DOI: 10.1126/science.1253451] [Show More Authors] [Citation(s) in RCA: 1171] [Impact Index Per Article: 106.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
Collapse
Affiliation(s)
- Erich D Jarvis
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA.
| | - Siavash Mirarab
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
| | - Andre J Aberer
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Bo Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. College of Medicine and Forensics, Xi'an Jiaotong University Xi'an 710061, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA
| | - Cai Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Brant C Faircloth
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA. Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Benoit Nabholz
- CNRS UMR 5554, Institut des Sciences de l'Evolution de Montpellier, Université Montpellier II Montpellier, France
| | - Jason T Howard
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Alexander Suh
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Claudia C Weber
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Rute R da Fonseca
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Jianwen Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Fang Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Hui Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Long Zhou
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Nitish Narula
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa 904-0495, Japan
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Ganesh Ganapathy
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Université de Lyon, F-69622 Villeurbanne, France
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
| | - Volodymyr Zavidovych
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Sankar Subramanian
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland 4111, Australia
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain. Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Salvador Capella-Gutiérrez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Bhanu Rekepalli
- Joint Institute for Computational Sciences, The University of Tennessee, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Mikkel Schierup
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Bent Lindow
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Wesley C Warren
- The Genome Institute, Washington University School of Medicine, St Louis, MI 63108, USA
| | - David Ray
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA. Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Richard E Green
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA
| | - Michael W Bruford
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University Cardiff CF10 3AX, Wales, UK
| | - Xiangjiang Zhan
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University Cardiff CF10 3AX, Wales, UK. Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Andrew Dixon
- International Wildlife Consultants, Carmarthen SA33 5YL, Wales, UK
| | - Shengbin Li
- College of Medicine and Forensics, Xi'an Jiaotong University Xi'an, 710061, China
| | - Ning Li
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China
| | - Yinhua Huang
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China
| | - Elizabeth P Derryberry
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA. Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mads Frost Bertelsen
- Center for Zoo and Wild Animal Health, Copenhagen Zoo Roskildevej 38, DK-2000 Frederiksberg, Denmark
| | - Frederick H Sheldon
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA. Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA
| | - Morgan Wirthlin
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA
| | - Maria Paula Cruz Schneider
- Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil. Institute of Biological Sciences, Federal University of Para, Belem, Para, Brazil
| | - Francisco Prosdocimi
- Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil. Institute of Medical Biochemistry Leopoldo de Meis, Federal University of Rio de Janeiro, Rio de Janeiro RJ 21941-902, Brazil
| | - José Alfredo Samaniego
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Amhed Missael Vargas Velazquez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Alonzo Alfaro-Núñez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Paula F Campos
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Bent Petersen
- Centre for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 208, 2800 Kgs Lyngby, Denmark
| | - Thomas Sicheritz-Ponten
- Centre for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 208, 2800 Kgs Lyngby, Denmark
| | - An Pas
- Breeding Centre for Endangered Arabian Wildlife, Sharjah, United Arab Emirates
| | - Tom Bailey
- Dubai Falcon Hospital, Dubai, United Arab Emirates
| | - Paul Scofield
- Canterbury Museum Rolleston Avenue, Christchurch 8050, New Zealand
| | - Michael Bunce
- Trace and Environmental DNA Laboratory Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia
| | - David M Lambert
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland 4111, Australia
| | - Qi Zhou
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Polina Perelman
- Laboratory of Genomic Diversity, National Cancer Institute Frederick, MD 21702, USA. Institute of Molecular and Cellular Biology, SB RAS and Novosibirsk State University, Novosibirsk, Russia
| | - Amy C Driskell
- Smithsonian Institution National Museum of Natural History, Washington, DC 20013, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA
| | - Zijun Xiong
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yongli Zeng
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Shiping Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Zhenyu Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Binghang Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Kui Wu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Jin Xiao
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Xiong Yinqi
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Qiuemei Zheng
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yong Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen 518083, China
| | - Linnea Smeds
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Frank E Rheindt
- Department of Biological Sciences, National University of Singapore, Republic of Singapore
| | - Michael Braun
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Suitland, MD 20746, USA
| | - Jon Fjeldsa
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - F Keith Barker
- Bell Museum of Natural History, University of Minnesota, Saint Paul, MN 55108, USA
| | - Knud Andreas Jønsson
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, UK. Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Warren Johnson
- Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA 22630, USA
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, Washington, DC 20008, USA
| | - Stephen O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia 199004. Oceanographic Center, Nova Southeastern University, Ft Lauderdale, FL 33004, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, UCSC, Santa Cruz, CA 95064, USA
| | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, CA 92027, USA
| | - Carsten Rahbek
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Gary R Graves
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Vertebrate Zoology, MRC-116, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA
| | - Travis C Glenn
- Department of Environmental Health Science, University of Georgia, Athens, GA 30602, USA
| | - John McCormack
- Moore Laboratory of Zoology and Department of Biology, Occidental College, Los Angeles, CA 90041, USA
| | - Dave Burt
- Department of Genomics and Genetics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Per Alström
- Swedish Species Information Centre, Swedish University of Agricultural Sciences Box 7007, SE-750 07 Uppsala, Sweden. Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany. Institute of Theoretical Informatics, Department of Informatics, Karlsruhe Institute of Technology, D- 76131 Karlsruhe, Germany
| | - David P Mindell
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94158, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA
| | - Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA. Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Wang Jun
- BGI-Shenzhen, Shenzhen 518083, China. Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark. Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia. Macau University of Science and Technology, Avenida Wai long, Taipa, Macau 999078, China. Department of Medicine, University of Hong Kong, Hong Kong.
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark. Trace and Environmental DNA Laboratory Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia.
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark.
| |
Collapse
|
88
|
Mirarab S, Bayzid MS, Boussau B, Warnow T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 2014; 346:1250463. [PMID: 25504728 DOI: 10.1126/science.1250463] [Citation(s) in RCA: 164] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Gene tree incongruence arising from incomplete lineage sorting (ILS) can reduce the accuracy of concatenation-based estimations of species trees. Although coalescent-based species tree estimation methods can have good accuracy in the presence of ILS, they are sensitive to gene tree estimation error. We propose a pipeline that uses bootstrapping to evaluate whether two genes are likely to have the same tree, then it groups genes into sets using a graph-theoretic optimization and estimates a tree on each subset using concatenation, and finally produces an estimated species tree from these trees using the preferred coalescent-based method. Statistical binning improves the accuracy of MP-EST, a popular coalescent-based method, and we use it to produce the first genome-scale coalescent-based avian tree of life.
Collapse
Affiliation(s)
- Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR5558, Université Lyon 1, 69622, Villeurbanne, France
| | - Tandy Warnow
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA. Department of Bioengineering and Computer Science, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA.
| |
Collapse
|
89
|
Liu L, Xi Z, Davis CC. Coalescent Methods Are Robust to the Simultaneous Effects of Long Branches and Incomplete Lineage Sorting. Mol Biol Evol 2014; 32:791-805. [DOI: 10.1093/molbev/msu331] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
90
|
Gatesy J, Springer MS. Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol Phylogenet Evol 2014; 80:231-66. [DOI: 10.1016/j.ympev.2014.08.013] [Citation(s) in RCA: 219] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Revised: 07/26/2014] [Accepted: 08/10/2014] [Indexed: 11/16/2022]
|
91
|
Abstract
Motivation With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows. Results We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.
Collapse
|
92
|
Zimmermann T, Mirarab S, Warnow T. BBCA: Improving the scalability of *BEAST using random binning. BMC Genomics 2014; 15 Suppl 6:S11. [PMID: 25572469 PMCID: PMC4239591 DOI: 10.1186/1471-2164-15-s6-s11] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.
Collapse
|
93
|
A time-calibrated, multi-locus phylogeny of piranhas and pacus (Characiformes: Serrasalmidae) and a comparison of species tree methods. Mol Phylogenet Evol 2014; 81:242-57. [PMID: 25261120 DOI: 10.1016/j.ympev.2014.06.018] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 06/17/2014] [Accepted: 06/18/2014] [Indexed: 12/13/2022]
Abstract
The phylogeny of piranhas, pacus, and relatives (family Serrasalmidae) was inferred on the basis of DNA sequences from eleven gene fragments that include the mitochondrial control region plus 10 nuclear genes (two exons and eight introns). The new data were obtained for a representative sampling of 53 specimens, collected from all major South American rivers, accounting for over 40% of the valid species and all genera excluding Utiaritichthys. Two fossil calibration points and relaxed-clock Bayesian analyses were used to estimate the timing of diversification. The new multilocus dataset also is used to compare several species-tree approaches against the results obtained using the concatenated alignment analyzed under maximum likelihood and Bayesian inference. Individual gene trees showed substantial topological discordance, but analyses based on concatenation and Bayesian and maximum likelihood-based species trees approaches converged onto a single phylogeny. The resulting phylogenetic hypothesis is robust and supports a division of the family into three major clades, consistent with previous results based on mitochondrial DNA alone. The earliest branching event separated a "pacu" clade (Colossoma, Mylossoma and Piaractus) from the rest of the family in the Late Cretaceous (over 68 Ma). The other two clades, that contain most of the diversity, are formed by the "true piranhas" (Metynnis, Pygopristis, Pygocentrus, Pristobrycon, Catoprion, and Serrasalmus) and the Myleus-like pacus (the Myleus clade). The "true" piranha clade originated during the Eocene (∼53 Ma) but the most recent diversification of flesh-eating piranhas within the genera Serrasalmus and Pygocentrus did not start until the Miocene (∼17 Ma). A comparison of species tree approaches indicates that most methods tested are consistent with results obtained by concatenation, suggesting that the gene-tree incongruence observed is mild and will not produce misleading results under simple concatenation analysis. Non-monophyly of several genera (Pristobrycon, Tometes, Myloplus, Mylesinus) and putative species (Serrasalmus rhombeus) was obtained, suggesting that further study of this family is necessary.
Collapse
|
94
|
Mirarab S, Bayzid MS, Warnow T. Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. Syst Biol 2014; 65:366-80. [PMID: 25164915 DOI: 10.1093/sysbio/syu063] [Citation(s) in RCA: 179] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 08/18/2014] [Indexed: 12/13/2022] Open
Abstract
Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.
Collapse
Affiliation(s)
- Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and
| | - Tandy Warnow
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
95
|
Huang H, Tran LAP, Knowles LL. Do estimated and actual species phylogenies match? Evaluation of East African cichlid radiations. Mol Phylogenet Evol 2014; 78:56-65. [PMID: 24837624 DOI: 10.1016/j.ympev.2014.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 05/02/2014] [Accepted: 05/06/2014] [Indexed: 10/25/2022]
Abstract
A large number of published phylogenetic estimates are based on a single locus or the concatenation of multiple loci, even though genealogies of single or concatenated loci may not accurately reflect the true history of species diversification (i.e., the species tree). The increased availability of genomic data, coupled with new computational methods, improves resolution of species relationships beyond what was possible in the past. Such developments will no doubt benefit future phylogenetic studies. It remains unclear how robust phylogenies that predate these developments (i.e., the bulk of phylogenetic studies) are to departures from the assumption of strict gene tree-species tree concordance. Here, we present a parametric bootstrap (PBST) approach that assesses the reliability of past phylogenetic estimates in which gene tree-species tree discord was ignored. We focus on a universal cause of discord-the random loss of gene lineages from genetic drift-and apply the method in a meta-analysis of East African cichlids, a group encompassing historical scenarios that are particularly challenging for phylogenetic estimation. Although we identify some evolutionary relationships that are robust to gene tree discord, many past phylogenetic estimates of cichlids are not. We discuss the utility of the PBST method for evaluating the robustness of gene tree-based phylogenetic estimations in general as well as for testing the clade-specific performance of species tree estimation methods and designing sampling strategies that increase the accuracy of estimated species relationships.
Collapse
Affiliation(s)
- Huateng Huang
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| | - Lucy A P Tran
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| | - L Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
| |
Collapse
|
96
|
Zhong B, Liu L, Penny D. The multispecies coalescent model and land plant origins: a reply to Springer and Gatesy. TRENDS IN PLANT SCIENCE 2014; 19:270-272. [PMID: 24641876 DOI: 10.1016/j.tplants.2014.02.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2014] [Accepted: 02/20/2014] [Indexed: 06/03/2023]
Affiliation(s)
- Bojian Zhong
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand.
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30606, USA.
| | - David Penny
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
97
|
Springer MS, Gatesy J. Land plant origins and coalescence confusion. TRENDS IN PLANT SCIENCE 2014; 19:267-269. [PMID: 24641875 DOI: 10.1016/j.tplants.2014.02.012] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Revised: 01/30/2014] [Accepted: 02/20/2014] [Indexed: 06/03/2023]
Affiliation(s)
- Mark S Springer
- Department of Biology, University of California, Riverside, CA 92521, USA.
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA.
| |
Collapse
|
98
|
Molecular evidence for the monophyly of flatfishes (Carangimorpharia: Pleuronectiformes). Mol Phylogenet Evol 2014; 73:18-22. [DOI: 10.1016/j.ympev.2014.01.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Revised: 01/07/2014] [Accepted: 01/09/2014] [Indexed: 11/18/2022]
|
99
|
Capella-Gutierrez S, Kauff F, Gabaldón T. A phylogenomics approach for selecting robust sets of phylogenetic markers. Nucleic Acids Res 2014; 42:e54. [PMID: 24476915 PMCID: PMC3985644 DOI: 10.1093/nar/gku071] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Reconstructing the evolutionary relationships of species is a major goal in biology. Despite the increasing number of completely sequenced genomes, a large number of phylogenetic projects rely on targeted sequencing and analysis of a relatively small sample of marker genes. The selection of these phylogenetic markers should ideally be based on accurate predictions of their combined, rather than individual, potential to accurately resolve the phylogeny of interest. Here we present and validate a new phylogenomics strategy to efficiently select a minimal set of stable markers able to reconstruct the underlying species phylogeny. In contrast to previous approaches, our methodology does not only rely on the ability of individual genes to reconstruct a known phylogeny, but it also explores the combined power of sets of concatenated genes to accurately infer phylogenetic relationships of species not previously analyzed. We applied our approach to two broad sets of cyanobacterial and ascomycetous fungal species, and provide two minimal sets of six and four genes, respectively, necessary to fully resolve the target phylogenies. This approach paves the way for the informed selection of phylogenetic markers in the effort of reconstructing the tree of life.
Collapse
Affiliation(s)
- Salvador Capella-Gutierrez
- Bioinformatics and Genomics Programme. Centre for Genomic Regulation (CRG) and UPF. Doctor Aiguader, 88. 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF). 08003 Barcelona, Spain, University of Kaiserslautern, Molecular Phylogenetics, Postfach 3049, 67653 Kaiserslautern, Germany and Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | | | | |
Collapse
|
100
|
Betancur-R. R, Naylor GJ, Ortí G. Conserved Genes, Sampling Error, and Phylogenomic Inference. Syst Biol 2014; 63:257-62. [DOI: 10.1093/sysbio/syt073] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Ricardo Betancur-R.
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| | - Gavin J.P. Naylor
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA; and 2College of Charleston, Hollings Marine Lab, 331 Fort Johnson Rd., Charleston, SC 29412, USA
| |
Collapse
|