1
|
St. John K. Review Paper: The Shape of Phylogenetic Treespace. Syst Biol 2017; 66:e83-e94. [PMID: 28173538 PMCID: PMC5837343 DOI: 10.1093/sysbio/syw025] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 12/16/2015] [Accepted: 03/22/2016] [Indexed: 11/23/2022] Open
Abstract
Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and in running time, and the analysis and visualization of results. We review the past work on analyzing and comparing trees by their shape as well as recent work that incorporates trees with weighted branch lengths.
Collapse
Affiliation(s)
- Katherine St. John
- Department of Mathematics and Computer Science, Lehman College, NY 10034, USA
| |
Collapse
|
2
|
Abstract
Finding the optimal evolutionary history for a set of taxa is a challenging computational problem, even when restricting possible solutions to be "tree-like" and focusing on the maximum-parsimony optimality criterion. This has led to much work on using heuristic tree searches to find approximate solutions. We present an approach for finding exact optimal solutions that employs and complements the current heuristic methods for finding optimal trees. Given a set of taxa and a set of aligned sequences of characters, there may be subsets of characters that are compatible, and for each such subset there is an associated (possibly partially resolved) phylogeny with edges corresponding to each character state change. These perfect phylogenies serve as anchor trees for our constrained search space. We show that, for sequences with compatible sites, the parsimony score of any tree [Formula: see text] is at least the parsimony score of the anchor trees plus the number of inferred changes between [Formula: see text] and the anchor trees. As the maximum-parsimony optimality score is additive, the sum of the lower bounds on compatible character partitions provides a lower bound on the complete alignment of characters. This yields a region in the space of trees within which the best tree is guaranteed to be found; limiting the search for the optimal tree to this region can significantly reduce the number of trees that must be examined in a search of the space of trees. We analyze this method empirically using four different biological data sets as well as surveying 400 data sets from the TreeBASE repository, demonstrating the effectiveness of our technique in reducing the number of steps in exact heuristic searches for trees under the maximum-parsimony optimality criterion.
Collapse
Affiliation(s)
- Eric Ford
- Department of Computer Science, Graduate Center, CUNY, New York, NY, 10016, USA; Department of Mathematics and Computer Science, Lehman College, CUNY, Bronx, NY, 10468, USA; and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| | - Katherine St John
- Department of Computer Science, Graduate Center, CUNY, New York, NY, 10016, USA; Department of Mathematics and Computer Science, Lehman College, CUNY, Bronx, NY, 10468, USA; and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA; Department of Computer Science, Graduate Center, CUNY, New York, NY, 10016, USA; Department of Mathematics and Computer Science, Lehman College, CUNY, Bronx, NY, 10468, USA; and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA;
| | - Ward C Wheeler
- Department of Computer Science, Graduate Center, CUNY, New York, NY, 10016, USA; Department of Mathematics and Computer Science, Lehman College, CUNY, Bronx, NY, 10468, USA; and Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
3
|
Radel D, Sand A, Steel M. Hide and seek: placing and finding an optimal tree for thousands of homoplasy-rich sequences. Mol Phylogenet Evol 2013; 69:1186-9. [PMID: 23939134 DOI: 10.1016/j.ympev.2013.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Revised: 07/30/2013] [Accepted: 08/01/2013] [Indexed: 10/26/2022]
Abstract
Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset require, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.
Collapse
Affiliation(s)
- Dietrich Radel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| | | | | |
Collapse
|
4
|
Corser CA, McLenachan PA, Pierson MJ, Harrison GLA, Penny D. The Q2 mitochondrial haplogroup in Oceania. PLoS One 2013; 7:e52022. [PMID: 23284859 PMCID: PMC3527380 DOI: 10.1371/journal.pone.0052022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 11/09/2012] [Indexed: 12/03/2022] Open
Abstract
Many details surrounding the origins of the peoples of Oceania remain to be resolved, and as a step towards this we report seven new complete mitochondrial genomes from the Q2a haplogroup, from Papua New Guinea, Fiji and Kiribati. This brings the total to eleven Q2 genomes now available. The Q haplogroup (that includes Q2) is an old and diverse lineage in Near Oceania, and is reasonably common; within our sample set of 430, 97 are of the Q haplogroup. However, only 8 are Q2, and we report 7 here. The tree with all complete Q genomes is proven to be minimal. The dating estimate for the origin of Q2 (around 35 Kya) reinforces the understanding that humans have been in Near Oceania for tens of thousands of years; nevertheless the Polynesian maternal haplogroups remain distinctive. A major focus now, with regard to Polynesian ancestry, is to address the differences and timing of the ‘Melanesian’ contribution to the maternal and paternal lineages as people moved further and further into Remote Oceania. Input from other fields such as anthropology, history and linguistics is required for a better understanding and interpretation of the genetic data.
Collapse
Affiliation(s)
- Chris A. Corser
- Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand
| | | | - Melanie J. Pierson
- Department of Anthropology, University of Auckland, Auckland, New Zealand
| | - G. L. Abby Harrison
- Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand
- Peter Medawar Building for Pathogen Research, Oxford University, Oxford, United Kingdom
| | - David Penny
- Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand
- * E-mail:
| |
Collapse
|
5
|
Saurabh K, Holland BR, Gibb GC, Penny D. Gaps: an elusive source of phylogenetic information. Syst Biol 2012; 61:1075-82. [PMID: 22438330 DOI: 10.1093/sysbio/sys043] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Affiliation(s)
- Kumar Saurabh
- IMBS/IFS/INR, Massey University, Palmerston North 4442, New Zealand
| | | | | | | |
Collapse
|
7
|
Joly S, Stevens MI, van Vuuren BJ. Haplotype Networks Can Be Misleading in the Presence of Missing Data. Syst Biol 2007; 56:857-62. [DOI: 10.1080/10635150701633153] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Affiliation(s)
- Simon Joly
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University Private Bag 11222, Palmerston North 4442, New Zealand E-mail:
| | - Mark I. Stevens
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University Private Bag 11222, Palmerston North 4442, New Zealand E-mail:
- School of Biological Sciences, Monash University, Clayton 3800 Victoria, Australia
| | - Bettine Jansen van Vuuren
- DST-NRF Centre of Excellence for Invasion Biology, Department of Botany and Zoology, Stellenbosch University Private Bag X1, Matieland 7602, South Africa
| |
Collapse
|
8
|
Pierson MJ, Martinez-Arias R, Holland BR, Gemmell NJ, Hurles ME, Penny D. Deciphering past human population movements in Oceania: provably optimal trees of 127 mtDNA genomes. Mol Biol Evol 2006; 23:1966-75. [PMID: 16855009 PMCID: PMC2674580 DOI: 10.1093/molbev/msl063] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The settlement of the many island groups of Remote Oceania occurred relatively late in prehistory, beginning approximately 3,000 years ago when people sailed eastwards into the Pacific from Near Oceania, where evidence of human settlement dates from as early as 40,000 years ago. Archeological and linguistic analyses have suggested the settlers of Remote Oceania had ancestry in Taiwan, as descendants of a proposed Neolithic expansion that began approximately 5,500 years ago. Other researchers have suggested that the settlers were descendants of peoples from Island Southeast Asia or the existing inhabitants of Near Oceania alone. To explore patterns of maternal descent in Oceania, we have assembled and analyzed a data set of 137 mitochondrial DNA (mtDNA) genomes from Oceania, Australia, Island Southeast Asia, and Taiwan that includes 19 sequences generated for this project. Using the MinMax Squeeze Approach (MMS), we report the consensus network of 165 most parsimonious trees for the Oceanic data set, increasing by many orders of magnitude the numbers of trees for which a provable minimal solution has been found. The new mtDNA sequences highlight the limitations of partial sequencing for assigning sequences to haplogroups and dating recent divergence events. The provably optimal trees found for the entire mtDNA sequences using the MMS method provide a reliable and robust framework for the interpretation of evolutionary relationships and confirm that the female settlers of Remote Oceania descended from both the existing inhabitants of Near Oceania and more recent migrants into the region.
Collapse
Affiliation(s)
- Melanie J Pierson
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
| | | | | | | | | | | |
Collapse
|