1
|
Warnow T, Tabatabaee Y, Evans SN. Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs. J Comput Biol 2025; 32:3-27. [PMID: 39582425 DOI: 10.1089/cmb.2024.0710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2024] Open
Abstract
We address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the semi-directed level-1 phylogenetic network N, if we are given a set of SNPs that covers all the bipartitions of N, even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in the Journal of Computer and System Sciences in 2005 correctly recovers semi-directed level-1 phylogenetic networks in polynomial time in this case. We present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield's method) are statistically consistent estimators of the semi-directed level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs semi-directed level-1 networks under the required conditions (all cycles of length at least five), while Gusfield's algorithm cannot be used in that case. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.
Collapse
Affiliation(s)
- Tandy Warnow
- Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Yasamin Tabatabaee
- Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Steven N Evans
- Department of Statistics, University of California at Berkeley, Berkeley, California, USA
| |
Collapse
|
2
|
Hellmuth M, Schaller D, Stadler PF. Clustering systems of phylogenetic networks. Theory Biosci 2023; 142:301-358. [PMID: 37573261 PMCID: PMC10564800 DOI: 10.1007/s12064-023-00398-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 06/25/2023] [Indexed: 08/14/2023]
Abstract
Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets [Formula: see text] of descendant taxa of a vertex v. The clustering system [Formula: see text] comprising the clusters of a network N conveys key information on N itself. In the special case of rooted phylogenetic trees, T is uniquely determined by its clustering system [Formula: see text]. Although this is no longer true for networks in general, it is of interest to relate properties of N and [Formula: see text]. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering systems of the following form: If N is a network of type [Formula: see text], then [Formula: see text] satisfies [Formula: see text], and conversely if [Formula: see text] is a clustering system satisfying [Formula: see text] then there is network N of type [Formula: see text] such that [Formula: see text].This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.
Collapse
Affiliation(s)
- Marc Hellmuth
- Department of Mathematics, Faculty of Science, Stockholm University, Albanovägen 28, 10691 Stockholm, Sweden
| | - David Schaller
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, 04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany
- Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501 USA
| |
Collapse
|
3
|
van Iersel L, Kole S, Moulton V, Nipius L. An algorithm for reconstructing level-2 phylogenetic networks from trinets. INFORM PROCESS LETT 2022. [DOI: 10.1016/j.ipl.2022.106300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
4
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
5
|
Gross E, van Iersel L, Janssen R, Jones M, Long C, Murakami Y. Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes. J Math Biol 2021; 83:32. [PMID: 34482446 PMCID: PMC8418599 DOI: 10.1007/s00285-021-01653-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 07/07/2021] [Accepted: 08/16/2021] [Indexed: 11/24/2022]
Abstract
Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes–Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints.
Collapse
Affiliation(s)
- Elizabeth Gross
- Department of Mathematics, University of Hawai'i at Mānoa, 2565 McCarthy Mall, Honolulu, HI, 96822, USA
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Colby Long
- The College of Wooster, 1189 Beall Avenue, Wooster, OH, 44691, USA
| | - Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands.
| |
Collapse
|
6
|
Trinets encode orchard phylogenetic networks. J Math Biol 2021; 83:28. [PMID: 34420100 DOI: 10.1007/s00285-021-01654-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 05/29/2021] [Accepted: 08/16/2021] [Indexed: 10/20/2022]
Abstract
Rooted triples, rooted binary phylogenetic trees on three leaves, are sufficient to encode rooted binary phylogenetic trees. That is, if [Formula: see text] and [Formula: see text] are rooted binary phylogenetic X-trees that infer the same set of rooted triples, then [Formula: see text] and [Formula: see text] are isomorphic. However, in general, this sufficiency does not extend to rooted binary phylogenetic networks. In this paper, we show that trinets, phylogenetic network analogues of rooted triples, are sufficient to encode rooted binary orchard networks. Rooted binary orchard networks naturally generalise rooted binary tree-child networks. Moreover, we present a polynomial-time algorithm for building a rooted binary orchard network from its set of trinets. As a consequence, this algorithm affirmatively answers a previously-posed question of whether there is a polynomial-time algorithm for building a rooted binary tree-child network from the set of trinets it infers.
Collapse
|
7
|
Caterpillars on three and four leaves are sufficient to reconstruct binary normal networks. J Math Biol 2020; 81:961-980. [DOI: 10.1007/s00285-020-01533-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 07/15/2020] [Indexed: 10/23/2022]
|
8
|
A class of phylogenetic networks reconstructable from ancestral profiles. Math Biosci 2019; 313:33-40. [DOI: 10.1016/j.mbs.2019.04.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 04/30/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022]
|
9
|
Jansson J, Rajaby R, Sung WK. An Efficient Algorithm for the Rooted Triplet Distance Between Galled Trees. J Comput Biol 2019; 26:893-907. [PMID: 30990336 DOI: 10.1089/cmb.2019.0033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The previous fastest algorithm for computing the rooted triplet distance between two input galled trees (i.e., phylogenetic networks whose cycles are vertex-disjoint) runs in [Formula: see text] time, where n is the cardinality of the leaf label set. In this article, we present an [Formula: see text]-time solution. Our strategy is to transform the input so that the answer can be obtained by applying an existing [Formula: see text]-time algorithm for the simpler case of two phylogenetic trees a constant number of times. The new algorithm has been implemented, and applying it to pairs of randomly generated galled trees with up to [Formula: see text] leaves confirms that it is fast in practice.
Collapse
Affiliation(s)
- Jesper Jansson
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Ramesh Rajaby
- School of Computing, National University of Singapore, Singapore, Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.,Genome Institute of Singapore, Singapore, Singapore
| |
Collapse
|
10
|
Cardona G, Pons JC. Reconstruction of LGT networks from tri-LGT-nets. J Math Biol 2017; 75:1669-1692. [PMID: 28451760 DOI: 10.1007/s00285-017-1131-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 02/16/2017] [Indexed: 01/06/2023]
Abstract
Phylogenetic networks have gained attention from the scientific community due to the evidence of the existence of evolutionary events that cannot be represented using trees. A variant of phylogenetic networks, called LGT networks, models specifically lateral gene transfer events, which cannot be properly represented with generic phylogenetic networks. In this paper we treat the problem of the reconstruction of LGT networks from substructures induced by three leaves, which we call tri-LGT-nets. We first restrict ourselves to a class of LGT networks that are both mathematically treatable and biologically significant, called BAN-LGT networks. Then, we study the decomposition of such networks in subnetworks with three leaves and ask whether or not this decomposition determines the network. The answer to this question is negative, but if we further impose time-consistency (species involved in a later gene transfer must coexist) the answer is affirmative, up to some redundancy that can never be recovered but is fully characterized.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122, Palma, Spain.
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122, Palma, Spain
| |
Collapse
|
11
|
Abstract
The need for structures capable of accommodating complex evolutionary signals such as those found in, for example, wheat has fueled research into phylogenetic networks. Such structures generalize the standard model of a phylogenetic tree by also allowing for cycles and have been introduced in rooted and unrooted form. In contrast to phylogenetic trees or their unrooted versions, rooted phylogenetic networks are notoriously difficult to understand. To help alleviate this, recent work on them has also centered on their "uprooted" versions. By focusing on such graphs and the combinatorial concept of a split system which underpins an unrooted phylogenetic network, we show that not only can a so-called (uprooted) 1-nested network N be obtained from the Buneman graph (sometimes also called a median network) associated with the split system [Formula: see text] induced on the set of leaves of N but also that that graph is, in a well-defined sense, optimal. Along the way, we establish the 1-nested analogue of the fundamental "splits equivalence theorem" for phylogenetic trees and characterize maximal circular split systems.
Collapse
Affiliation(s)
- P. Gambette
- LIGM (UMR 8049), UPEM, CNRS, ESIEE, ENPC, Université Paris-Est, 77454 Marne-la-Vallée, France
| | - K. T. Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - G. E. Scholz
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
12
|
Abstract
The rooted triplet distance is a measure of the dissimilarity of two phylogenetic trees with identical leaf label sets. An algorithm by Brodal et al. that computes it in [Formula: see text] time and [Formula: see text] space, where n is the number of leaf labels, has recently been implemented in the software package tqDist. In this article, we show that replacing the hierarchical decomposition tree used in Brodal et al.'s algorithm by a centroid paths-based data structure yields an [Formula: see text]-time and [Formula: see text]-space algorithm that, although slower in theory, is faster in practice as well as less memory consuming. Simulations for values of n up to 4,000,000 support our claims experimentally.
Collapse
Affiliation(s)
- Jesper Jansson
- 1 Laboratory of Mathematical Bioinformatics, Institute for Chemical Research, Kyoto University , Kyoto, Japan
| | - Ramesh Rajaby
- 2 NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore , Singapore, Singapore .,3 University of Milano-Bicocca , Milano, Italy
| |
Collapse
|
13
|
On the challenge of reconstructing level-1 phylogenetic networks from triplets and clusters. J Math Biol 2016; 74:1729-1751. [PMID: 27800561 PMCID: PMC5420025 DOI: 10.1007/s00285-016-1068-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 04/05/2016] [Indexed: 11/16/2022]
Abstract
Phylogenetic networks have gained prominence over the years due to their ability to represent complex non-treelike evolutionary events such as recombination or hybridization. Popular combinatorial objects used to construct them are triplet systems and cluster systems, the motivation being that any network N induces a triplet system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and a softwired cluster system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). Since in real-world studies it cannot be guaranteed that all triplets/softwired clusters induced by a network are available, it is of particular interest to understand whether subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) allow one to uniquely reconstruct the underlying network N. Here we show that even within the highly restricted yet biologically interesting space of level-1 phylogenetic networks it is not always possible to uniquely reconstruct a level-1 network N, even when all triplets in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or all clusters in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) are available. On the positive side, we introduce a reasonably large subclass of level-1 networks the members of which are uniquely determined by their induced triplet/softwired cluster systems. Along the way, we also establish various enumerative results, both positive and negative, including results which show that certain special subclasses of level-1 networks N can be uniquely reconstructed from proper subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). We anticipate these results to be of use in the design of algorithms for phylogenetic network inference.
Collapse
|
14
|
Oldman J, Wu T, van Iersel L, Moulton V. TriLoNet: Piecing Together Small Networks to Reconstruct Reticulate Evolutionary Histories. Mol Biol Evol 2016; 33:2151-62. [PMID: 27189565 DOI: 10.1093/molbev/msw068] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic networks are a generalization of evolutionary trees that can be used to represent reticulate processes such as hybridization and recombination. Here, we introduce a new approach called TriLoNet (Trinet Level- one Network algorithm) to construct such networks directly from sequence alignments which works by piecing together smaller phylogenetic networks. More specifically, using a bottom up approach similar to Neighbor-Joining, TriLoNet constructs level-1 networks (networks that are somewhat more general than trees) from smaller level-1 networks on three taxa. In simulations, we show that TriLoNet compares well with Lev1athan, a method for reconstructing level-1 networks from three-leaved trees. In particular, in simulations we find that Lev1athan tends to generate networks that overestimate the number of reticulate events as compared with those generated by TriLoNet. We also illustrate TriLoNet's applicability using simulated and real sequence data involving recombination, demonstrating that it has the potential to reconstruct informative reticulate evolutionary histories. TriLoNet has been implemented in JAVA and is freely available at https://www.uea.ac.uk/computing/TriLoNet.
Collapse
Affiliation(s)
- James Oldman
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Delft, The Netherlands
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
15
|
Abstract
Background Several phylogenomic analyses have recently demonstrated the need to account simultaneously for incomplete lineage sorting (ILS) and hybridization when inferring a species phylogeny. A maximum likelihood approach was introduced recently for inferring species phylogenies in the presence of both processes, and showed very good results. However, computing the likelihood of a model in this case is computationally infeasible except for very small data sets. Results Inspired by recent work on the pseudo-likelihood of species trees based on rooted triples, we introduce the pseudo-likelihood of a phylogenetic network, which, when combined with a search heuristic, provides a statistical method for phylogenetic network inference in the presence of ILS. Unlike trees, networks are not always uniquely encoded by a set of rooted triples. Therefore, even when given sufficient data, the method might converge to a network that is equivalent under rooted triples to the true one, but not the true one itself. The method is computationally efficient and has produced very good results on the data sets we analyzed. The method is implemented in PhyloNet, which is publicly available in open source. Conclusions Maximum pseudo-likelihood allows for inferring species phylogenies in the presence of hybridization and ILS, while scaling to much larger data sets than is currently feasible under full maximum likelihood. The nonuniqueness of phylogenetic networks encoded by a system of rooted triples notwithstanding, the proposed method infers the correct network under certain scenarios, and provides candidates for further exploration under other criteria and/or data in other scenarios.
Collapse
|
16
|
Huber KT, Linz S, Moulton V, Wu T. Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations. J Math Biol 2015; 72:699-725. [PMID: 26037483 DOI: 10.1007/s00285-015-0899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 05/04/2015] [Indexed: 11/29/2022]
Abstract
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another. We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Simone Linz
- Department of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
17
|
Pardi F, Scornavacca C. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLoS Comput Biol 2015; 11:e1004135. [PMID: 25849429 PMCID: PMC4388854 DOI: 10.1371/journal.pcbi.1004135] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Accepted: 01/19/2015] [Indexed: 12/21/2022] Open
Abstract
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.
Collapse
Affiliation(s)
- Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution de Montpellier (ISE-M, UMR 5554) CNRS, IRD, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| |
Collapse
|
18
|
Huber KT, Van Iersel L, Moulton V, Wu T. How much information is needed to infer reticulate evolutionary histories? Syst Biol 2014; 64:102-11. [PMID: 25236959 PMCID: PMC4265143 DOI: 10.1093/sysbio/syu076] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Leo Van Iersel
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| |
Collapse
|
19
|
Jansson J, Lingas A. Computing the rooted triplet distance between galled trees by counting triangles. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/j.jda.2013.10.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
20
|
Trinets encode tree-child and level-2 phylogenetic networks. J Math Biol 2013; 68:1707-29. [PMID: 23680992 DOI: 10.1007/s00285-013-0683-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2012] [Revised: 03/20/2013] [Indexed: 10/26/2022]
Abstract
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that [Formula: see text] phylogenetic networks are encoded by their trinets, and also conjectured that all "recoverable" rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.
Collapse
|