751
|
Davidson R, Vachaspati P, Mirarab S, Warnow T. Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 2015; 16 Suppl 10:S1. [PMID: 26450506 PMCID: PMC4603753 DOI: 10.1186/1471-2164-16-s10-s1] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. RESULTS We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. CONCLUSION Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.
Collapse
Affiliation(s)
- Ruth Davidson
- Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green Street, 61801 Urbana, IL, USA
| | - Pranjal Vachaspati
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, 2317 Speedway, Stop D9500, 78712 Austin, TX, USA
- Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, 92093, La Jolla, CA, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, 61801 Urbana, IL, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, 1270 Digital Computer Laboratory, MC-278, 61801 Urbana, IL, USA
| |
Collapse
|
752
|
Abstract
BACKGROUND Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. RESULTS We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. CONCLUSIONS ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.
Collapse
Affiliation(s)
- Pranjal Vachaspati
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, 61801 USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, 61801 USA
| |
Collapse
|
753
|
Simmons MP, Gatesy J. Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms. Mol Phylogenet Evol 2015; 91:98-122. [DOI: 10.1016/j.ympev.2015.05.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 05/01/2015] [Accepted: 05/14/2015] [Indexed: 11/24/2022]
|
754
|
Nicholls JA, Pennington RT, Koenen EJM, Hughes CE, Hearn J, Bunnefeld L, Dexter KG, Stone GN, Kidner CA. Using targeted enrichment of nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus Inga (Leguminosae: Mimosoideae). FRONTIERS IN PLANT SCIENCE 2015; 6:710. [PMID: 26442024 PMCID: PMC4584976 DOI: 10.3389/fpls.2015.00710] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 08/25/2015] [Indexed: 05/20/2023]
Abstract
Evolutionary radiations are prominent and pervasive across many plant lineages in diverse geographical and ecological settings; in neotropical rainforests there is growing evidence suggesting that a significant fraction of species richness is the result of recent radiations. Understanding the evolutionary trajectories and mechanisms underlying these radiations demands much greater phylogenetic resolution than is currently available for these groups. The neotropical tree genus Inga (Leguminosae) is a good example, with ~300 extant species and a crown age of 2-10 MY, yet over 6 kb of plastid and nuclear DNA sequence data gives only poor phylogenetic resolution among species. Here we explore the use of larger-scale nuclear gene data obtained though targeted enrichment to increase phylogenetic resolution within Inga. Transcriptome data from three Inga species were used to select 264 nuclear loci for targeted enrichment and sequencing. Following quality control to remove probable paralogs from these sequence data, the final dataset comprised 259,313 bases from 194 loci for 24 accessions representing 22 Inga species and an outgroup (Zygia). Bayesian phylogenies reconstructed using either all loci concatenated or a gene-tree/species-tree approach yielded highly resolved phylogenies. We used coalescent approaches to show that the same targeted enrichment data also have significant power to discriminate among alternative within-species population histories within the widespread species I. umbellifera. In either application, targeted enrichment simplifies the informatics challenge of identifying orthologous loci associated with de novo genome sequencing. We conclude that targeted enrichment provides the large volumes of phylogenetically-informative sequence data required to resolve relationships within recent plant species radiations, both at the species level and for within-species phylogeographic studies.
Collapse
Affiliation(s)
- James A. Nicholls
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of EdinburghEdinburgh, UK
- Royal Botanic Garden EdinburghEdinburgh, UK
| | | | - Erik J. M. Koenen
- Institute of Systematic Botany, University of ZurichZürich, Switzerland
| | - Colin E. Hughes
- Institute of Systematic Botany, University of ZurichZürich, Switzerland
| | - Jack Hearn
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of EdinburghEdinburgh, UK
| | - Lynsey Bunnefeld
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of EdinburghEdinburgh, UK
| | - Kyle G. Dexter
- School of Geosciences, University of EdinburghEdinburgh, UK
| | - Graham N. Stone
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of EdinburghEdinburgh, UK
| | - Catherine A. Kidner
- Royal Botanic Garden EdinburghEdinburgh, UK
- Institute of Molecular Plant Sciences, School of Biological Sciences, University of EdinburghEdinburgh, UK
| |
Collapse
|
755
|
Washburn JD, Schnable JC, Davidse G, Pires JC. Phylogeny and photosynthesis of the grass tribe Paniceae. AMERICAN JOURNAL OF BOTANY 2015; 102:1493-505. [PMID: 26373976 DOI: 10.3732/ajb.1500222] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/18/2015] [Indexed: 05/08/2023]
Abstract
PREMISE OF THE STUDY The grass tribe Paniceae includes important food, forage, and bioenergy crops such as switchgrass, napiergrass, various millet species, and economically important weeds. Paniceae are also valuable for answering scientific and evolutionary questions about C4 photosynthetic evolution, drought tolerance, and spikelet variation. However, the phylogeny of the tribe remains incompletely resolved. METHODS Forty-five taxa were selected from across the tribe Paniceae and outgroups for genome survey sequencing (GSS). These data were used to build a phylogenetic tree of the Paniceae based on 102 markers (78 chloroplast, 22 mitochondrial, 2 nrDNA). Ancestral state reconstruction analyses were also performed within the Paniceae using both the traditional and two subtype classification systems to test hypotheses of C4 subtype evolution. KEY RESULTS The phylogenetic tree resolves many areas of the Paniceae with high support and provides insight into the origin and number of C4 evolution events within the tribe. The recovered phylogeny and ancestral state reconstructions support between four and seven independent origins of C4 photosynthesis within the tribe and indicate which species are potentially the closest C3 sister taxa of each of these events. CONCLUSIONS Although the sequence of evolutionary events that produced multiple C4 subtypes within the Paniceae remains undetermined, the results presented here are consistent with only a subset of currently proposed models. The species used in this study constitute a panel of C3 and C4 grasses that are suitable for further studies on C4 photosynthesis, bioenergy, food and forage crops, and various developmental features of the Paniceae.
Collapse
Affiliation(s)
- Jacob D Washburn
- Division of Biological Sciences, University of Missouri, 311 Bond Life Sciences Center, Columbia, Missouri 65211 USA
| | - James C Schnable
- Agronomy & Horticulture, University of Nebraska-Lincoln, Beadle Center E207, Lincoln, Nebraska 68583-0660 USA
| | - Gerrit Davidse
- Missouri Botanical Garden, P.O. Box 299, St. Louis, Missouri 63166-0299 USA
| | - J Chris Pires
- Division of Biological Sciences, University of Missouri, 371b Bond Life Sciences Center, Columbia, Missouri 65211 USA
| |
Collapse
|
756
|
Streicher JW, Schulte JA, Wiens JJ. How Should Genes and Taxa be Sampled for Phylogenomic Analyses with Missing Data? An Empirical Study in Iguanian Lizards. Syst Biol 2015; 65:128-45. [PMID: 26330450 DOI: 10.1093/sysbio/syv058] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2014] [Accepted: 08/04/2015] [Indexed: 11/12/2022] Open
Abstract
Targeted sequence capture is becoming a widespread tool for generating large phylogenomic data sets to address difficult phylogenetic problems. However, this methodology often generates data sets in which increasing the number of taxa and loci increases amounts of missing data. Thus, a fundamental (but still unresolved) question is whether sampling should be designed to maximize sampling of taxa or genes, or to minimize the inclusion of missing data cells. Here, we explore this question for an ancient, rapid radiation of lizards, the pleurodont iguanians. Pleurodonts include many well-known clades (e.g., anoles, basilisks, iguanas, and spiny lizards) but relationships among families have proven difficult to resolve strongly and consistently using traditional sequencing approaches. We generated up to 4921 ultraconserved elements with sampling strategies including 16, 29, and 44 taxa, from 1179 to approximately 2.4 million characters per matrix and approximately 30% to 60% total missing data. We then compared mean branch support for interfamilial relationships under these 15 different sampling strategies for both concatenated (maximum likelihood) and species tree (NJst) approaches (after showing that mean branch support appears to be related to accuracy). We found that both approaches had the highest support when including loci with up to 50% missing taxa (matrices with ~40-55% missing data overall). Thus, our results show that simply excluding all missing data may be highly problematic as the primary guiding principle for the inclusion or exclusion of taxa and genes. The optimal strategy was somewhat different for each approach, a pattern that has not been shown previously. For concatenated analyses, branch support was maximized when including many taxa (44) but fewer characters (1.1 million). For species-tree analyses, branch support was maximized with minimal taxon sampling (16) but many loci (4789 of 4921). We also show that the choice of these sampling strategies can be critically important for phylogenomic analyses, since some strategies lead to demonstrably incorrect inferences (using the same method) that have strong statistical support. Our preferred estimate provides strong support for most interfamilial relationships in this important but phylogenetically challenging group.
Collapse
Affiliation(s)
- Jeffrey W Streicher
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA; Department of Life Sciences, The Natural History Museum, London SW7 5BD, UK and
| | - James A Schulte
- Department of Biology, Clarkson University, Potsdam, NY 13699, USA
| | - John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
757
|
DaCosta JM, Sorenson MD. ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: Analyses of two avian genera with contrasting histories. Mol Phylogenet Evol 2015; 94:122-35. [PMID: 26279345 DOI: 10.1016/j.ympev.2015.07.026] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 07/22/2015] [Accepted: 07/29/2015] [Indexed: 11/16/2022]
Abstract
Genotype-by-sequencing (GBS) methods have revolutionized the field of molecular ecology, but their application in molecular phylogenetics remains somewhat limited. In addition, most phylogenetic studies based on large GBS data sets have relied on analyses of concatenated data rather than species tree methods that explicitly account for genealogical stochasticity among loci. We explored the utility of "double-digest" restriction site-associated DNA sequencing (ddRAD-seq) for phylogenetic analyses of the Lagonosticta firefinches (family Estrildidae) and the Vidua brood parasitic finches (family Viduidae). As expected, the number of homologous loci shared among samples was negatively correlated with genetic distance due to the accumulation of restriction site polymorphisms. Nonetheless, for each genus, we obtained data sets of ∼3000 loci shared in common among all samples, including a more distantly related outgroup taxon. For all samples combined, we obtained >1000 homologous loci despite ∼20my divergence between estrildid and parasitic finches. In addition to nucleotide polymorphisms, the ddRAD-seq data yielded large sets of indel and locus presence-absence polymorphisms, all of which had higher consistency indices than mtDNA sequence data in the context of concatenated parsimony analyses. Species tree methods, using individual gene trees or single nucleotide polymorphisms as input, generated results broadly consistent with analyses of concatenated data, particularly for Lagonosticta, which appears to have a well resolved, bifurcating history. Results for Vidua were also generally consistent across methods and data sets, although nodal support and results from different species tree methods were more variable. Lower gene tree congruence in Vidua is likely the result of its unique evolutionary history, which includes rapid speciation by host shift and occasional hybridization and introgression due to incomplete reproductive isolation. We conclude that ddRAD-seq is a cost-effective method for generating robust phylogenetic data sets, particularly for analyses of closely related species and genera.
Collapse
|
758
|
Andrade SC, Novo M, Kawauchi GY, Worsaae K, Pleijel F, Giribet G, Rouse GW. Articulating “Archiannelids”: Phylogenomics and Annelid Relationships, with Emphasis on Meiofaunal Taxa. Mol Biol Evol 2015. [DOI: 10.1093/molbev/msv157] [Citation(s) in RCA: 110] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
|
759
|
Rothfels CJ, Li FW, Sigel EM, Huiet L, Larsson A, Burge DO, Ruhsam M, Deyholos M, Soltis DE, Stewart CN, Shaw SW, Pokorny L, Chen T, dePamphilis C, DeGironimo L, Chen L, Wei X, Sun X, Korall P, Stevenson DW, Graham SW, Wong GKS, Pryer KM. The evolutionary history of ferns inferred from 25 low-copy nuclear genes. AMERICAN JOURNAL OF BOTANY 2015. [PMID: 26199366 DOI: 10.3732/ajb.1500089] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
UNLABELLED • PREMISE OF THE STUDY Understanding fern (monilophyte) phylogeny and its evolutionary timescale is critical for broad investigations of the evolution of land plants, and for providing the point of comparison necessary for studying the evolution of the fern sister group, seed plants. Molecular phylogenetic investigations have revolutionized our understanding of fern phylogeny, however, to date, these studies have relied almost exclusively on plastid data.• METHODS Here we take a curated phylogenomics approach to infer the first broad fern phylogeny from multiple nuclear loci, by combining broad taxon sampling (73 ferns and 12 outgroup species) with focused character sampling (25 loci comprising 35877 bp), along with rigorous alignment, orthology inference and model selection.• KEY RESULTS Our phylogeny corroborates some earlier inferences and provides novel insights; in particular, we find strong support for Equisetales as sister to the rest of ferns, Marattiales as sister to leptosporangiate ferns, and Dennstaedtiaceae as sister to the eupolypods. Our divergence-time analyses reveal that divergences among the extant fern orders all occurred prior to ∼200 MYA. Finally, our species-tree inferences are congruent with analyses of concatenated data, but generally with lower support. Those cases where species-tree support values are higher than expected involve relationships that have been supported by smaller plastid datasets, suggesting that deep coalescence may be reducing support from the concatenated nuclear data.• CONCLUSIONS Our study demonstrates the utility of a curated phylogenomics approach to inferring fern phylogeny, and highlights the need to consider underlying data characteristics, along with data quantity, in phylogenetic studies.
Collapse
Affiliation(s)
- Carl J Rothfels
- Department of Zoology & Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia V6J 3S7, Canada
| | - Fay-Wei Li
- Department of Biology, Duke University, Durham, North Carolina 27708 USA
| | - Erin M Sigel
- Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, P.O. Box 37012 Washington, District of Columbia 20013-7012 USA
| | - Layne Huiet
- Department of Biology, Duke University, Durham, North Carolina 27708 USA
| | - Anders Larsson
- Systematic Biology, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Norbyv. 18D, SE-752 36 Uppsala, Sweden
| | - Dylan O Burge
- California Academy of Sciences, 55 Music Concourse Drive, San Francisco, California 94118 USA
| | - Markus Ruhsam
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, Scotland, UK
| | - Michael Deyholos
- Department of Biology, University of British Columbia, Okanagan Campus, 1177 Research Road, Kelowna, British Columbia V1V 1V7, Canada
| | - Douglas E Soltis
- Florida Museum of Natural History, Department of Biology, and the Genetics Institute. University of Florida. Gainesville, Florida 32611 USA
| | - C Neal Stewart
- Department of Plant Sciences, University of Tennessee, Knoxville, Tennessee 37996, USA
| | | | - Lisa Pokorny
- Departamento de Biodiversidad y Conservación, Real Jardín Botánico-Consejo Superior de Investigaciones Científicas, 28014 Madrid, Spain
| | - Tao Chen
- Shenzhen Fairy Lake Botanical Garden, The Chinese Academy of Sciences, Shenzhen, Guangdong 518004, China
| | - Claude dePamphilis
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802 USA
| | - Lisa DeGironimo
- The New York Botanical Garden, 2900 Southern Blvd., Bronx, New York 10458 USA
| | - Li Chen
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Xiaofeng Wei
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Xiao Sun
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Petra Korall
- Systematic Biology, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Norbyv. 18D, SE-752 36 Uppsala, Sweden
| | - Dennis W Stevenson
- The New York Botanical Garden, 2900 Southern Blvd., Bronx, New York 10458 USA
| | - Sean W Graham
- Department of Botany & Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia V6J 3S7, Canada
| | - Gane K-S Wong
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2E1, Canada
| | - Kathleen M Pryer
- Department of Biology, Duke University, Durham, North Carolina 27708 USA
| |
Collapse
|
760
|
Xi Z, Liu L, Davis CC. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased. Mol Phylogenet Evol 2015; 92:63-71. [PMID: 26115844 DOI: 10.1016/j.ympev.2015.06.009] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 04/23/2015] [Accepted: 06/16/2015] [Indexed: 11/30/2022]
Abstract
The development and application of coalescent methods are undergoing rapid changes. One little explored area that bears on the application of gene-tree-based coalescent methods to species tree estimation is gene informativeness. Here, we investigate the accuracy of these coalescent methods when genes have minimal phylogenetic information, including the implementation of the multilocus bootstrap approach. Using simulated DNA sequences, we demonstrate that genes with minimal phylogenetic information can produce unreliable gene trees (i.e., high error in gene tree estimation), which may in turn reduce the accuracy of species tree estimation using gene-tree-based coalescent methods. We demonstrate that this problem can be alleviated by sampling more genes, as is commonly done in large-scale phylogenomic analyses. This applies even when these genes are minimally informative. If gene tree estimation is biased, however, gene-tree-based coalescent analyses will produce inconsistent results, which cannot be remedied by increasing the number of genes. In this case, it is not the gene-tree-based coalescent methods that are flawed, but rather the input data (i.e., estimated gene trees). Along these lines, the commonly used program PhyML has a tendency to infer one particular bifurcating topology even though it is best represented as a polytomy. We additionally corroborate these findings by analyzing the 183-locus mammal data set assembled by McCormack et al. (2012) using ultra-conserved elements (UCEs) and flanking DNA. Lastly, we demonstrate that when employing the multilocus bootstrap approach on this 183-locus data set, there is no strong conflict between species trees estimated from concatenation and gene-tree-based coalescent analyses, as has been previously suggested by Gatesy and Springer (2014).
Collapse
Affiliation(s)
- Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, Athens, GA 30602, USA; Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
761
|
Bayzid MS, Mirarab S, Boussau B, Warnow T. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses. PLoS One 2015; 10:e0129183. [PMID: 26086579 PMCID: PMC4472720 DOI: 10.1371/journal.pone.0129183] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/05/2015] [Indexed: 11/19/2022] Open
Abstract
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.
Collapse
Affiliation(s)
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, Texas, USA
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyons, France
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
762
|
Eytan RI, Evans BR, Dornburg A, Lemmon AR, Lemmon EM, Wainwright PC, Near TJ. Are 100 enough? Inferring acanthomorph teleost phylogeny using Anchored Hybrid Enrichment. BMC Evol Biol 2015; 15:113. [PMID: 26071950 PMCID: PMC4465735 DOI: 10.1186/s12862-015-0415-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Accepted: 06/08/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The past decade has witnessed remarkable progress towards resolution of the Tree of Life. However, despite the increased use of genomic scale datasets, some phylogenetic relationships remain difficult to resolve. Here we employ anchored phylogenomics to capture 107 nuclear loci in 29 species of acanthomorph teleost fishes, with 25 of these species sampled from the recently delimited clade Ovalentaria. Previous studies employing multilocus nuclear exon datasets have not been able to resolve the nodes at the base of the Ovalentaria tree with confidence. Here we test whether a phylogenomic approach will provide better support for these nodes, and if not, why this may be. RESULTS After using a novel method to account for paralogous loci, we estimated phylogenies with maximum likelihood and species tree methods using DNA sequence alignments of over 80,000 base pairs. Several key relationships within Ovalentaria are well resolved, including 1) the sister taxon relationship between Cichlidae and Pholidichthys, 2) a clade containing blennies, grammas, clingfishes, and jawfishes, and 3) monophyly of Atherinomorpha (topminnows, flyingfishes, and silversides). However, many nodes in the phylogeny associated with the early diversification of Ovalentaria are poorly resolved in several analyses. Through the use of rarefaction curves we show that limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny does not appear to be due to a deficiency of data, as average global node support ceases to increase when only 1/3rd of the sampled loci are used in analyses. Instead this lack of resolution may be driven by model misspecification as a Bayesian mixed model analysis of the amino acid dataset provided good support for parts of the base of the Ovalentaria tree. CONCLUSIONS Although it does not appear that the limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny is due to a deficiency of data, it may be that both stochastic and systematic error resulting from model misspecification play a role in the poor resolution at the base of the Ovalentaria tree as a Bayesian approach was able to resolve some of the deeper nodes, where the other methods failed.
Collapse
Affiliation(s)
- Ron I Eytan
- Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA.
- Department of Marine Biology, Texas A&M University at Galveston, Galveston, 77553, TX, USA.
| | - Benjamin R Evans
- Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA.
| | - Alex Dornburg
- Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA.
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, 32306, FL, USA.
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, Biomedical Research Facility, Tallahassee, 32306, FL, USA.
| | - Peter C Wainwright
- Department of Evolution & Ecology, University of California, One Shields Avenue, Davis, 95616, CA, USA.
| | - Thomas J Near
- Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA.
| |
Collapse
|
763
|
Warnow T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting. PLOS CURRENTS 2015; 7:ecurrents.currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7. [PMID: 26064786 PMCID: PMC4450984 DOI: 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Incomplete lineage sorting (ILS), modelled by the multi-species coalescent, is a process that results in a gene tree being different from the species tree. Because ILS is expected to occur for at least some loci within genome-scale analyses, the evaluation of species tree estimation methods in the presence of ILS is of great interest. Performance on simulated and biological data have suggested that concatenation analyses can result in the wrong tree with high support under some conditions, and a recent theoretical result by Roch and Steel proved that concatenation using unpartitioned maximum likelihood analysis can be statistically inconsistent in the presence of ILS. In this study, we survey the major species tree estimation methods, including the newly proposed "statistical binning" methods, and discuss their theoretical properties. We also note that there are two interpretations of the term "statistical consistency", and discuss the theoretical results proven under both interpretations.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign. Urbana, Illinois, USA
| |
Collapse
|
764
|
Dentinger BTM, Gaya E, O'Brien H, Suz LM, Lachlan R, Díaz-Valderrama JR, Koch RA, Aime MC. Tales from the crypt: genome mining from fungarium specimens improves resolution of the mushroom tree of life. Biol J Linn Soc Lond 2015. [DOI: 10.1111/bij.12553] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Bryn T. M. Dentinger
- Jodrell Laboratory; Royal Botanic Gardens; Kew TW9 3DS UK
- Institute of Biological, Environmental and Rural Sciences; Aberystwyth University; Cledwyn Building Penglais Aberystwyth SY23 3DD UK
| | - Ester Gaya
- Jodrell Laboratory; Royal Botanic Gardens; Kew TW9 3DS UK
| | - Heath O'Brien
- School of Biological Sciences; University of Bristol; Life Sciences Building 24 Tyndall Avenue Bristol BS8 1TQ UK
| | - Laura M. Suz
- Jodrell Laboratory; Royal Botanic Gardens; Kew TW9 3DS UK
| | - Robert Lachlan
- Department of Psychology; Queen Mary University of London; Mile End Road London E1 4NS UK
| | - Jorge R. Díaz-Valderrama
- Department of Botany and Plant Pathology; Purdue University; 915 W. State St. West Lafayette IN 47907 USA
| | - Rachel A. Koch
- Department of Botany and Plant Pathology; Purdue University; 915 W. State St. West Lafayette IN 47907 USA
| | - M. Catherine Aime
- Department of Botany and Plant Pathology; Purdue University; 915 W. State St. West Lafayette IN 47907 USA
| |
Collapse
|
765
|
Giarla TC, Esselstyn JA. The Challenges of Resolving a Rapid, Recent Radiation: Empirical and Simulated Phylogenomics of Philippine Shrews. Syst Biol 2015; 64:727-40. [DOI: 10.1093/sysbio/syv029] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 05/07/2015] [Indexed: 01/30/2023] Open
|
766
|
Liu L, Xi Z, Wu S, Davis CC, Edwards SV. Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci 2015; 1360:36-53. [DOI: 10.1111/nyas.12747] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Liang Liu
- Department of Statistics; University of Georgia; Athens Georgia
- Institute of Bioinformatics; University of Georgia; Athens Georgia
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Shaoyuan Wu
- Department of Biochemistry and Molecular Biology & Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences; Tianjin Medical University; Tianjin China
| | - Charles C. Davis
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| |
Collapse
|
767
|
Yang Y, Moore MJ, Brockington SF, Soltis DE, Wong GKS, Carpenter EJ, Zhang Y, Chen L, Yan Z, Xie Y, Sage RF, Covshoff S, Hibberd JM, Nelson MN, Smith SA. Dissecting Molecular Evolution in the Highly Diverse Plant Clade Caryophyllales Using Transcriptome Sequencing. Mol Biol Evol 2015; 32:2001-14. [PMID: 25837578 PMCID: PMC4833068 DOI: 10.1093/molbev/msv081] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.
Collapse
Affiliation(s)
- Ya Yang
- Department of Ecology & Evolutionary Biology, University of Michigan
| | - Michael J Moore
- Department of Biology, Oberlin College, Science Center K111, Oberlin, OH
| | - Samuel F Brockington
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Douglas E Soltis
- Department of Biology, University of Florida Florida Museum of Natural History, University of Florida Genetics Institute, University of Florida
| | - Gane Ka-Shu Wong
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada Department of Medicine, University of Alberta, Edmonton, AB, Canada BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China
| | - Eric J Carpenter
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Yong Zhang
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China
| | - Li Chen
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China
| | - Zhixiang Yan
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China
| | - Yinlong Xie
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China
| | - Rowan F Sage
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Sarah Covshoff
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Julian M Hibberd
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Matthew N Nelson
- School of Plant Biology, The University of Western Australia, Crawley, WA, Australia
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan
| |
Collapse
|
768
|
Roch S, Warnow T. On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. Syst Biol 2015; 64:663-76. [PMID: 25813358 DOI: 10.1093/sysbio/syv016] [Citation(s) in RCA: 104] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 03/20/2015] [Indexed: 11/13/2022] Open
Abstract
The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Tandy Warnow
- Department of Mathematics, University of Wisconsin at Madison, 480 Lincoln Dr., Madison, Wisconsin, 53706, USA and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
769
|
Laumer CE, Hejnol A, Giribet G. Nuclear genomic signals of the 'microturbellarian' roots of platyhelminth evolutionary innovation. eLife 2015; 4:e05503. [PMID: 25764302 PMCID: PMC4398949 DOI: 10.7554/elife.05503] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 03/06/2015] [Indexed: 12/25/2022] Open
Abstract
Flatworms number among the most diverse invertebrate phyla and represent the most biomedically significant branch of the major bilaterian clade Spiralia, but to date, deep evolutionary relationships within this group have been studied using only a single locus (the rRNA operon), leaving the origins of many key clades unclear. In this study, using a survey of genomes and transcriptomes representing all free-living flatworm orders, we provide resolution of platyhelminth interrelationships based on hundreds of nuclear protein-coding genes, exploring phylogenetic signal through concatenation as well as recently developed consensus approaches. These analyses robustly support a modern hypothesis of flatworm phylogeny, one which emphasizes the primacy of the often-overlooked 'microturbellarian' groups in understanding the major evolutionary transitions within Platyhelminthes: perhaps most notably, we propose a novel scenario for the interrelationships between free-living and vertebrate-parasitic flatworms, providing new opportunities to shed light on the origins and biological consequences of parasitism in these iconic invertebrates.
Collapse
Affiliation(s)
- Christopher E Laumer
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Andreas Hejnol
- Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
| | - Gonzalo Giribet
- Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| |
Collapse
|
770
|
Tonini J, Moore A, Stern D, Shcheglovitova M, Ortí G. Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions. PLOS CURRENTS 2015; 7. [PMID: 25901289 PMCID: PMC4391732 DOI: 10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Phylogeneticists have long understood that several biological processes can cause a gene tree to disagree with its species tree. In recent years, molecular phylogeneticists have increasingly foregone traditional supermatrix approaches in favor of species tree methods that account for one such source of error, incomplete lineage sorting (ILS). While gene tree-species tree discordance no doubt poses a significant challenge to phylogenetic inference with molecular data, researchers have only recently begun to systematically evaluate the relative accuracy of traditional and ILS-sensitive methods. Here, we report on simulations demonstrating that concatenation can perform as well or better than methods that attempt to account for sources of error introduced by ILS. Based on these and similar results from other researchers, we argue that concatenation remains a useful component of the phylogeneticist’s toolbox and highlight that phylogeneticists should continue to make explicit comparisons of results produced by contemporaneous and classical methods.
Collapse
Affiliation(s)
- João Tonini
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| | - Andrew Moore
- Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - David Stern
- Computational Biology Institute, Department of Biological Sciences, The George Washington University, Washington, District of Columbia, USA
| | - Maryia Shcheglovitova
- Department of Geography & Environmental Systems, University of Maryland Baltimore County, Baltimore, MD, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington Univerisity, Washington, District of Columbia, USA
| |
Collapse
|
771
|
Nicholls JA, Pennington RT, Koenen EJM, Hughes CE, Hearn J, Bunnefeld L, Dexter KG, Stone GN, Kidner CA. Using targeted enrichment of nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus Inga (Leguminosae: Mimosoideae). FRONTIERS IN PLANT SCIENCE 2015. [PMID: 26442024 DOI: 10.5061/dryad.r9c12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Evolutionary radiations are prominent and pervasive across many plant lineages in diverse geographical and ecological settings; in neotropical rainforests there is growing evidence suggesting that a significant fraction of species richness is the result of recent radiations. Understanding the evolutionary trajectories and mechanisms underlying these radiations demands much greater phylogenetic resolution than is currently available for these groups. The neotropical tree genus Inga (Leguminosae) is a good example, with ~300 extant species and a crown age of 2-10 MY, yet over 6 kb of plastid and nuclear DNA sequence data gives only poor phylogenetic resolution among species. Here we explore the use of larger-scale nuclear gene data obtained though targeted enrichment to increase phylogenetic resolution within Inga. Transcriptome data from three Inga species were used to select 264 nuclear loci for targeted enrichment and sequencing. Following quality control to remove probable paralogs from these sequence data, the final dataset comprised 259,313 bases from 194 loci for 24 accessions representing 22 Inga species and an outgroup (Zygia). Bayesian phylogenies reconstructed using either all loci concatenated or a gene-tree/species-tree approach yielded highly resolved phylogenies. We used coalescent approaches to show that the same targeted enrichment data also have significant power to discriminate among alternative within-species population histories within the widespread species I. umbellifera. In either application, targeted enrichment simplifies the informatics challenge of identifying orthologous loci associated with de novo genome sequencing. We conclude that targeted enrichment provides the large volumes of phylogenetically-informative sequence data required to resolve relationships within recent plant species radiations, both at the species level and for within-species phylogeographic studies.
Collapse
Affiliation(s)
- James A Nicholls
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh Edinburgh, UK ; Royal Botanic Garden Edinburgh Edinburgh, UK
| | | | - Erik J M Koenen
- Institute of Systematic Botany, University of Zurich Zürich, Switzerland
| | - Colin E Hughes
- Institute of Systematic Botany, University of Zurich Zürich, Switzerland
| | - Jack Hearn
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh Edinburgh, UK
| | - Lynsey Bunnefeld
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh Edinburgh, UK
| | - Kyle G Dexter
- School of Geosciences, University of Edinburgh Edinburgh, UK
| | - Graham N Stone
- Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh Edinburgh, UK
| | - Catherine A Kidner
- Royal Botanic Garden Edinburgh Edinburgh, UK ; Institute of Molecular Plant Sciences, School of Biological Sciences, University of Edinburgh Edinburgh, UK
| |
Collapse
|
772
|
Liu L, Xi Z, Davis CC. Coalescent Methods Are Robust to the Simultaneous Effects of Long Branches and Incomplete Lineage Sorting. Mol Biol Evol 2014; 32:791-805. [DOI: 10.1093/molbev/msu331] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
773
|
Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR, Wafula E, Der JP, Graham SW, Mathews S, Melkonian M, Soltis DE, Soltis PS, Miles NW, Rothfels CJ, Pokorny L, Shaw AJ, DeGironimo L, Stevenson DW, Surek B, Villarreal JC, Roure B, Philippe H, dePamphilis CW, Chen T, Deyholos MK, Baucom RS, Kutchan TM, Augustin MM, Wang J, Zhang Y, Tian Z, Yan Z, Wu X, Sun X, Wong GKS, Leebens-Mack J. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 2014; 111:E4859-68. [PMID: 25355905 PMCID: PMC4234587 DOI: 10.1073/pnas.1323926111] [Citation(s) in RCA: 803] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.
Collapse
Affiliation(s)
- Norman J Wickett
- Chicago Botanic Garden, Glencoe, IL 60022; Program in Biological Sciences, Northwestern University, Evanston, IL 60208;
| | - Siavash Mirarab
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Nam Nguyen
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Tandy Warnow
- Department of Computer Science, University of Texas, Austin, TX 78712
| | - Eric Carpenter
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9
| | - Naim Matasci
- iPlant Collaborative, Tucson, AZ 85721; Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| | | | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721
| | | | - Matthew A Gitzendanner
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611
| | - Brad R Ruhfel
- Department of Biology and Department of Biological Sciences, Eastern Kentucky University, Richmond, KY 40475; Florida Museum of Natural History, Gainesville, FL 32611
| | - Eric Wafula
- Department of Biology, Pennsylvania State University, University Park, PA 16803
| | - Joshua P Der
- Department of Biology, Pennsylvania State University, University Park, PA 16803
| | | | - Sarah Mathews
- Arnold Arboretum of Harvard University, Cambridge, MA 02138
| | | | - Douglas E Soltis
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, Gainesville, FL 32611
| | - Pamela S Soltis
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, Gainesville, FL 32611
| | | | - Carl J Rothfels
- Department of Biology, Duke University, Durham, NC 27708; Department of Zoology, University of British Columbia, Vancouver, BC, Canada V6T 1Z4
| | - Lisa Pokorny
- Department of Biology, Duke University, Durham, NC 27708; Department of Biodiversity and Conservation, Real Jardín Botánico-Consejo Superior de Investigaciones Cientificas, 28014 Madrid, Spain
| | | | | | | | - Barbara Surek
- Botanical Institute, Universität zu Köln, Cologne D-50674, Germany
| | - Juan Carlos Villarreal
- Department fur Biologie, Systematische Botanik und Mykologie, Ludwig-Maximilians-Universitat, 80638 Munich, Germany
| | - Béatrice Roure
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Montreal, QC, Canada H3C 3J7
| | - Hervé Philippe
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Montreal, QC, Canada H3C 3J7; CNRS, Station d' Ecologie Expérimentale du CNRS, Moulis, 09200, France
| | | | - Tao Chen
- Shenzhen Fairy Lake Botanical Garden, The Chinese Academy of Sciences, Shenzhen, Guangdong 518004, China
| | - Michael K Deyholos
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9
| | - Regina S Baucom
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Toni M Kutchan
- Donald Danforth Plant Science Center, St. Louis, MO 63132
| | | | - Jun Wang
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Yong Zhang
- CNRS, Station d' Ecologie Expérimentale du CNRS, Moulis, 09200, France
| | - Zhijian Tian
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Zhixiang Yan
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Xiaolei Wu
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Xiao Sun
- BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and
| | - Gane Ka-Shu Wong
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E9; BGI-Shenzhen, Bei shan Industrial Zone, Yantian District, Shenzhen 518083, China; and Department of Medicine, University of Alberta, Edmonton, AB, Canada T6G 2E1
| | | |
Collapse
|
774
|
Abstract
Motivation With the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows. Results We present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.
Collapse
|
775
|
Zimmermann T, Mirarab S, Warnow T. BBCA: Improving the scalability of *BEAST using random binning. BMC Genomics 2014; 15 Suppl 6:S11. [PMID: 25572469 PMCID: PMC4239591 DOI: 10.1186/1471-2164-15-s6-s11] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.
Collapse
|
776
|
Mirarab S, Bayzid MS, Warnow T. Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. Syst Biol 2014; 65:366-80. [PMID: 25164915 DOI: 10.1093/sysbio/syu063] [Citation(s) in RCA: 179] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 08/18/2014] [Indexed: 12/13/2022] Open
Abstract
Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.
Collapse
Affiliation(s)
- Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and
| | - Tandy Warnow
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA; and Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|