1
|
Francis A, Marchei D, Steel M. Phylogenetic network classes through the lens of expanding covers. J Math Biol 2024; 88:58. [PMID: 38584237 PMCID: PMC10999392 DOI: 10.1007/s00285-024-02075-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 11/18/2023] [Accepted: 03/05/2024] [Indexed: 04/09/2024]
Abstract
It was recently shown that a large class of phylogenetic networks, the 'labellable' networks, is in bijection with the set of 'expanding' covers of finite sets. In this paper, we show how several prominent classes of phylogenetic networks can be characterised purely in terms of properties of their associated covers. These classes include the tree-based, tree-child, orchard, tree-sibling, and normal networks. In the opposite direction, we give an example of how a restriction on the set of expanding covers can define a new class of networks, which we call 'spinal' phylogenetic networks.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, Australia.
- School of Mathematics and Statistics, University of New South Wales, Sydney, Australia.
| | - Daniele Marchei
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, Australia
- Computer Science, University of Camerino, Camerino, Italy
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
2
|
Abstract
Phylogenetic networks are mathematical representations of evolutionary history that are able to capture both tree-like evolutionary processes (speciations) and non-tree-like 'reticulate' processes such as hybridization or horizontal gene transfer. The additional complexity that comes with this capacity, however, makes networks harder to infer from data, and more complicated to work with as mathematical objects. In this paper, we define a new, large class of phylogenetic networks, that we call labellable, and show that they are in bijection with the set of 'expanding covers' of finite sets. This correspondence is a generalisation of the encoding of phylogenetic forests by partitions of finite sets. Labellable networks can be characterised by a simple combinatorial condition, and we describe the relationship between this large class and other commonly studied classes. Furthermore, we show that all phylogenetic networks have a quotient network that is labellable.
Collapse
Affiliation(s)
- Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, Australia.
| | - Mike Steel
- Biomathematics Research Centre, Canterbury University, Christchurch, New Zealand
| |
Collapse
|
3
|
van Iersel L, Janssen R, Jones M, Murakami Y. Orchard Networks are Trees with Additional Horizontal Arcs. Bull Math Biol 2022; 84:76. [PMID: 35727410 PMCID: PMC9213324 DOI: 10.1007/s11538-022-01037-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 05/30/2022] [Indexed: 11/30/2022]
Abstract
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(kn+n\log (n))$$\end{document}O(kn+nlog(n)).
Collapse
Affiliation(s)
- Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands.
| |
Collapse
|
4
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
5
|
Abstract
The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.
Collapse
Affiliation(s)
- Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, Spain
- Balearic Islands Health Research Institute (IdISBa), Palma, Spain
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, Spain
- Balearic Islands Health Research Institute (IdISBa), Palma, Spain
| | - Gabriel Valiente
- Department of Computer Science, Technical University of Catalonia, Barcelona, Spain
| |
Collapse
|
6
|
Bai A, Erdős PL, Semple C, Steel M. Defining phylogenetic networks using ancestral profiles. Math Biosci 2021; 332:108537. [PMID: 33453221 DOI: 10.1016/j.mbs.2021.108537] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/06/2021] [Accepted: 01/06/2021] [Indexed: 11/16/2022]
Abstract
Rooted phylogenetic networks provide a more complete representation of the ancestral relationship between species than phylogenetic trees when reticulate evolutionary processes are at play. One way to reconstruct a phylogenetic network is to consider its 'ancestral profile' (the number of paths from each ancestral vertex to each leaf). In general, this information does not uniquely determine the underlying phylogenetic network. A recent paper considered a new class of phylogenetic networks called 'orchard networks' where this uniqueness was claimed to hold. Here we show that an additional restriction on the network, that of being 'stack-free', is required in order for the original uniqueness claim to hold. On the other hand, if the additional stack-free restriction is lifted, we establish an alternative result; namely, there is uniqueness within the class of orchard networks up to the resolution of vertices of high in-degree.
Collapse
Affiliation(s)
- Allan Bai
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| | - Péter L Erdős
- Alfréd Rényi Institute of Mathematics, Budapest, Hungary.
| | - Charles Semple
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| |
Collapse
|
7
|
Pons JC, Scornavacca C, Cardona G. Generation of Level- k LGT Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:158-164. [PMID: 30703035 DOI: 10.1109/tcbb.2019.2895344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Phylogenetic networks provide a mathematical model to represent the evolution of a set of species where, apart from speciation, reticulate evolutionary events have to be taken into account. Among these events, lateral gene transfers need special consideration due to the asymmetry in the roles of the species involved in such an event. To take into account this asymmetry, LGT networks were introduced. Contrarily to the case of phylogenetic trees, the combinatorial structure of phylogenetic networks is much less known and difficult to describe. One of the approaches in the literature is to classify them according to their level and find generators of the given level that can be used to recursively generate all networks. In this paper, we adapt the concept of generators to the case of LGT networks. We show how these generators, classified by their level, give rise to simple LGT networks of the specified level, and how any LGT network can be obtained from these simple networks, that act as building blocks of the generic structure. The stochastic models of evolution of phylogenetic networks are also much less studied than those for phylogenetic trees. In this setting, we introduce a novel two-parameter model that generates LGT networks. Finally, we present some computer simulations using this model in order to investigate the complexity of the generated networks, depending on the parameters of the model.
Collapse
|
8
|
Cardona G, Pons JC, Scornavacca C. Generation of Binary Tree-Child phylogenetic networks. PLoS Comput Biol 2019; 15:e1007347. [PMID: 31509525 PMCID: PMC6756559 DOI: 10.1371/journal.pcbi.1007347] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 09/23/2019] [Accepted: 08/20/2019] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from https://github.com/bielcardona/TCGenerators.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. de Valldemossa Ctra. de Valldemossa km. 7.5. 07122 - Palma, Spain
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Ctra. de Valldemossa Ctra. de Valldemossa km. 7.5. 07122 - Palma, Spain
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISE-M), Université de Montpellier, CNRS, IRD, EPHE, 34095 Montpellier Cedex 5, France
| |
Collapse
|
9
|
A class of phylogenetic networks reconstructable from ancestral profiles. Math Biosci 2019; 313:33-40. [DOI: 10.1016/j.mbs.2019.04.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 04/30/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022]
|
10
|
Tree-based networks: characterisations, metrics, and support trees. J Math Biol 2018; 78:899-918. [PMID: 30283985 DOI: 10.1007/s00285-018-1296-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 08/25/2018] [Indexed: 10/28/2022]
Abstract
Phylogenetic networks generalise phylogenetic trees and allow for the accurate representation of the evolutionary history of a set of present-day species whose past includes reticulate events such as hybridisation and lateral gene transfer. One way to obtain such a network is by starting with a (rooted) phylogenetic tree T, called a base tree, and adding arcs between arcs of T. The class of phylogenetic networks that can be obtained in this way is called tree-based networks and includes the prominent classes of tree-child and reticulation-visible networks. Initially defined for binary phylogenetic networks, tree-based networks naturally extend to arbitrary phylogenetic networks. In this paper, we generalise recent tree-based characterisations and associated proximity measures for binary phylogenetic networks to arbitrary phylogenetic networks. These characterisations are in terms of matchings in bipartite graphs, path partitions, and antichains. Some of the generalisations are straightforward to establish using the original approach, while others require a very different approach. Furthermore, for an arbitrary tree-based network N, we characterise the support trees of N, that is, the tree-based embeddings of N. We use this characterisation to give an explicit formula for the number of support trees of N when N is binary. This formula is written in terms of the components of a bipartite graph.
Collapse
|
11
|
Wang J, Guo M. A review of metrics measuring dissimilarity for rooted phylogenetic networks. Brief Bioinform 2018; 20:1972-1980. [DOI: 10.1093/bib/bby062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 06/20/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
A rooted phylogenetic network is an important structure in the description of evolutionary relationships. Computing the distance (topological dissimilarity) between two rooted phylogenetic networks is a fundamental in phylogenic analysis. During the past few decades, several polynomial-time computable metrics have been described. Here, we give a comprehensive review and analysis on those metrics, including the correlation among metrics and the distribution of distance values computed by each metric. Moreover, we describe the software and website, CDRPN (Computing Distance for Rooted Phylogenetic Networks), for measuring the topological dissimilarity between rooted phylogenetic networks.
Availability
http://bioinformatics.imu.edu.cn/distance/
Contact
guomaozu@bucea.edu.cn
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia 010021, P.R. China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, P.R. ChinaBeijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, P.R. China
| |
Collapse
|
12
|
Wang J, Guo M. A Metric on the Space of kth-order reduced Phylogenetic Networks. Sci Rep 2017; 7:3189. [PMID: 28600511 DOI: 10.1038/s41598-017-03363-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 04/27/2017] [Indexed: 11/09/2022] Open
Abstract
Phylogenetic networks can be used to describe the evolutionary history of species which experience a certain number of reticulate events, and represent conflicts in phylogenetic trees that may be due to inadequacies of the evolutionary model used in the construction of the trees. Measuring the dissimilarity between two phylogenetic networks is at the heart of our understanding of the evolutionary history of species. This paper proposes a new metric, i.e. kth-distance, for the space of kth-order reduced phylogenetic networks that can be calculated in polynomial time in the size of the compared networks.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, 010021, P.R. China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, P.R. China.
| |
Collapse
|
13
|
On the challenge of reconstructing level-1 phylogenetic networks from triplets and clusters. J Math Biol 2016; 74:1729-1751. [PMID: 27800561 PMCID: PMC5420025 DOI: 10.1007/s00285-016-1068-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 04/05/2016] [Indexed: 11/16/2022]
Abstract
Phylogenetic networks have gained prominence over the years due to their ability to represent complex non-treelike evolutionary events such as recombination or hybridization. Popular combinatorial objects used to construct them are triplet systems and cluster systems, the motivation being that any network N induces a triplet system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and a softwired cluster system \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). Since in real-world studies it cannot be guaranteed that all triplets/softwired clusters induced by a network are available, it is of particular interest to understand whether subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) allow one to uniquely reconstruct the underlying network N. Here we show that even within the highly restricted yet biologically interesting space of level-1 phylogenetic networks it is not always possible to uniquely reconstruct a level-1 network N, even when all triplets in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) or all clusters in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N) are available. On the positive side, we introduce a reasonably large subclass of level-1 networks the members of which are uniquely determined by their induced triplet/softwired cluster systems. Along the way, we also establish various enumerative results, both positive and negative, including results which show that certain special subclasses of level-1 networks N can be uniquely reconstructed from proper subsets of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal R(N)$$\end{document}R(N) and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal S(N)$$\end{document}S(N). We anticipate these results to be of use in the design of algorithms for phylogenetic network inference.
Collapse
|
14
|
Gambette P, van Iersel L, Kelk S, Pardi F, Scornavacca C. Do Branch Lengths Help to Locate a Tree in a Phylogenetic Network? Bull Math Biol 2016; 78:1773-1795. [DOI: 10.1007/s11538-016-0199-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/16/2016] [Indexed: 12/22/2022]
|
15
|
Affiliation(s)
- Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore
| |
Collapse
|
16
|
Wang J. A Metric on the Space of Partly Reduced Phylogenetic Networks. BIOMED RESEARCH INTERNATIONAL 2016; 2016:7534258. [PMID: 27419137 PMCID: PMC4935902 DOI: 10.1155/2016/7534258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 05/23/2016] [Indexed: 11/17/2022]
Abstract
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, such as recombination between genes, hybridization between lineages, and horizontal gene transfer. The researchers have designed several measures for computing the dissimilarity between two phylogenetic networks, and each measure has been proven to be a metric on a special kind of phylogenetic networks. However, none of the existing measures is a metric on the space of partly reduced phylogenetic networks. In this paper, we provide a metric, d e -distance, on the space of partly reduced phylogenetic networks, which is polynomial-time computable.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
17
|
Huber KT, Moulton V, Steel M, Wu T. Folding and unfolding phylogenetic trees and networks. J Math Biol 2016; 73:1761-1780. [PMID: 27107869 PMCID: PMC5061860 DOI: 10.1007/s00285-016-0993-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 12/16/2015] [Indexed: 12/29/2022]
Abstract
Phylogenetic networks are rooted, labelled directed acyclic graphswhich are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be “unfolded” to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be “folded” to obtain aphylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop aphylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
18
|
Pardi F, Scornavacca C. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLoS Comput Biol 2015; 11:e1004135. [PMID: 25849429 PMCID: PMC4388854 DOI: 10.1371/journal.pcbi.1004135] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Accepted: 01/19/2015] [Indexed: 12/21/2022] Open
Abstract
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are "indistinguishable". This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.
Collapse
Affiliation(s)
- Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM, UMR 5506) CNRS, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution de Montpellier (ISE-M, UMR 5554) CNRS, IRD, Université de Montpellier, France
- Institut de Biologie Computationnelle, Montpellier, France
| |
Collapse
|
19
|
Cordue P, Linz S, Semple C. Phylogenetic networks that display a tree twice. Bull Math Biol 2014; 76:2664-79. [PMID: 25245396 DOI: 10.1007/s11538-014-0032-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Accepted: 09/17/2014] [Indexed: 11/26/2022]
Abstract
In the last decade, the use of phylogenetic networks to analyze the evolution of species whose past is likely to include reticulation events, such as horizontal gene transfer or hybridization, has gained popularity among evolutionary biologists. Nevertheless, the evolution of a particular gene can generally be described without reticulation events and therefore be represented by a phylogenetic tree. While this is not in contrast to each other, it places emphasis on the necessity of algorithms that analyze and summarize the tree-like information that is contained in a phylogenetic network. We contribute to the toolbox of such algorithms by investigating the question of whether or not a phylogenetic network embeds a tree twice and give a quadratic-time algorithm to solve this problem for a class of networks that is more general than tree-child networks.
Collapse
Affiliation(s)
- Paul Cordue
- Biomathematics Research Centre, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand,
| | | | | |
Collapse
|
20
|
The comparison of tree-sibling time consistent phylogenetic networks is graph isomorphism-complete. ScientificWorldJournal 2014; 2014:254279. [PMID: 24982934 PMCID: PMC3996867 DOI: 10.1155/2014/254279] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 01/21/2014] [Indexed: 11/17/2022] Open
Abstract
Several polynomial time computable metrics on the class of semibinary tree-sibling time consistent phylogenetic networks are available in the literature; in particular, the problem of deciding if two networks of this kind are isomorphic is in P. In this paper, we show that if we remove the semibinarity condition, then the problem becomes much harder. More precisely, we prove that the isomorphism problem for generic tree-sibling time consistent phylogenetic networks is polynomially equivalent to the graph isomorphism problem. Since the latter is believed not to belong to P, the chances are that it is impossible to define a metric on the class of all tree-sibling time consistent phylogenetic networks that can be computed in polynomial time.
Collapse
|
21
|
Wang J, Guo M, Xing L, Che K, Liu X, Wang C. BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees. Gene 2013; 527:344-51. [PMID: 23816409 DOI: 10.1016/j.gene.2013.06.036] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 06/10/2013] [Accepted: 06/17/2013] [Indexed: 10/26/2022]
Abstract
Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, PR China.
| | | | | | | | | | | |
Collapse
|
22
|
Trinets encode tree-child and level-2 phylogenetic networks. J Math Biol 2013; 68:1707-29. [PMID: 23680992 DOI: 10.1007/s00285-013-0683-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2012] [Revised: 03/20/2013] [Indexed: 10/26/2022]
Abstract
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that [Formula: see text] phylogenetic networks are encoded by their trinets, and also conjectured that all "recoverable" rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.
Collapse
|
23
|
Asano T, Jansson J, Sadakane K, Uehara R, Valiente G. Faster computation of the Robinson–Foulds distance between phylogenetic networks. Inf Sci (N Y) 2012. [DOI: 10.1016/j.ins.2012.01.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Willson SJ. CSD homomorphisms between phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1128-1138. [PMID: 22487988 DOI: 10.1109/tcbb.2012.52] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Since Darwin, species trees have been used as a simplified description of the relationships which summarize the complicated network N of reality. Recent evidence of hybridization and lateral gene transfer, however, suggest that there are situations where trees are inadequate. Consequently it is important to determine properties that characterize networks closely related to N and possibly more complicated than trees but lacking the full complexity of N. A connected surjective digraph map (CSD) is a map f from one network N to another network M such that every arc is either collapsed to a single vertex or is taken to an arc, such that f is surjective, and such that the inverse image of a vertex is always connected. CSD maps are shown to behave well under composition. It is proved that if there is a CSD map from N to M, then there is a way to lift an undirected version of M into N, often with added resolution. A CSD map from N to M puts strong constraints on N. In general, it may be useful to study classes of networks such that, for any N, there exists a CSD map from N to some standard member of that class.
Collapse
Affiliation(s)
- Stephen J Willson
- Department of Mathematics, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
25
|
On encodings of phylogenetic networks of bounded level. J Math Biol 2011; 65:157-80. [DOI: 10.1007/s00285-011-0456-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2009] [Revised: 06/22/2011] [Indexed: 10/18/2022]
|
26
|
Huber KT, van Iersel L, Kelk S, Suchecki R. A practical algorithm for reconstructing level-1 phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:635-649. [PMID: 21393651 DOI: 10.1109/tcbb.2010.17] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks--a type of network slightly more general than a phylogenetic tree--from triplets. Our algorithm has been made publicly available as the program LEV1ATHAN. It combines ideas from several known theoretical algorithms for phylogenetic tree and network reconstruction with two novel subroutines. Namely, an exponential-time exact and a greedy algorithm both of which are of independent theoretical interest. Most importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1 network. If the data are consistent with a phylogenetic tree, then the algorithm constructs such a tree. Moreover, if the input triplet set is dense and, in addition, is fully consistent with some level-1 network, it will find such a network. The potential of LEV1ATHAN is explored by means of an extensive simulation study and a biological data set. One of our conclusions is that LEV1ATHAN is able to construct networks consistent with a high percentage of input triplets, even when these input triplets are affected by a low to moderate level of noise.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | | | | |
Collapse
|
27
|
Willson SJ. Regular networks can be uniquely constructed from their trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:785-796. [PMID: 20714025 DOI: 10.1109/tcbb.2010.69] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A rooted acyclic digraph N with labeled leaves displays a tree T when there exists a way to select a unique parent of each hybrid vertex resulting in the tree T. Let Tr(N) denote the set of all trees displayed by the network N. In general, there may be many other networks M, such that Tr(M) = Tr(N). A network is regular if it is isomorphic with its cover digraph. If N is regular and D is a collection of trees displayed by N, this paper studies some procedures to try to reconstruct N given D. If the input is D = Tr(N), one procedure is described, which will reconstruct N. Hence, if N and M are regular networks and Tr(N) = Tr(M), it follows that N = M, proving that a regular network is uniquely determined by its displayed trees. If D is a (usually very much smaller) collection of displayed trees that satisfies certain hypotheses, modifications of the procedure will still reconstruct N given D.
Collapse
Affiliation(s)
- Stephen J Willson
- Department of Mathematics, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
28
|
Cardona G, Llabrés M, Rosselló F, Valiente G. Comparison of galled trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:410-427. [PMID: 20660951 DOI: 10.1109/tcbb.2010.60] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial-time algorithms for their reconstruction. In this paper, we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence, when they can be safely used to evaluate galled tree reconstruction methods.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
29
|
|
30
|
Arenas M, Patricio M, Posada D, Valiente G. Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 2010; 11:268. [PMID: 20487540 PMCID: PMC2880032 DOI: 10.1186/1471-2105-11-268] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 05/20/2010] [Indexed: 11/13/2022] Open
Abstract
Background Typical evolutionary events like recombination, hybridization or gene transfer make necessary the use of phylogenetic networks to properly depict the evolution of DNA and protein sequences. Although several theoretical classes have been proposed to characterize these networks, they make stringent assumptions that will likely not be met by the evolutionary process. We have recently shown that the complexity of simulated networks is a function of the population recombination rate, and that at moderate and large recombination rates the resulting networks cannot be categorized. However, we do not know whether these results extend to networks estimated from real data. Results We introduce a web server for the categorization of explicit phylogenetic networks, including the most relevant theoretical classes developed so far. Using this tool, we analyzed statistical parsimony phylogenetic networks estimated from ~5,000 DNA alignments, obtained from the NCBI PopSet and Polymorphix databases. The level of characterization was correlated to nucleotide diversity, and a high proportion of the networks derived from these data sets could be formally characterized. Conclusions We have developed a public web server, NetTest (freely available from the software section at http://darwin.uvigo.es), to formally characterize the complexity of phylogenetic networks. Using NetTest we found that most statistical parsimony networks estimated with the program TCS could be assigned to a known network class. The level of network characterization was correlated to nucleotide diversity and dependent upon the intra/interspecific levels, although no significant differences were detected among genes. More research on the properties of phylogenetic networks is clearly needed.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, E-36310 Vigo, Spain.
| | | | | | | |
Collapse
|
31
|
Nakhleh L. A metric on the space of reduced phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:218-222. [PMID: 20431142 DOI: 10.1109/tcbb.2009.2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Phylogenetic networks are leaf-labeled, rooted, acyclic, and directed graphs that are used to model reticulate evolutionary histories. Several measures for quantifying the topological dissimilarity between two phylogenetic networks have been devised, each of which was proven to be a metric on certain restricted classes of phylogenetic networks. A biologically motivated class of phylogenetic networks, namely, reduced phylogenetic networks, was recently introduced. None of the existing measures is a metric on the space of reduced phylogenetic networks. In this paper, we provide a metric on the space of reduced phylogenetic networks that is computable in time polynomial in the size of the networks.
Collapse
Affiliation(s)
- Luay Nakhleh
- Department of Computer Science, Rice University, 6100 Main Street, MS 132, Houston, TX 77005, USA.
| |
Collapse
|
32
|
Maetschke SR, Kassahn KS, Dunn JA, Han SP, Curley EZ, Stacey KJ, Ragan MA. A visual framework for sequence analysis using n-grams and spectral rearrangement. ACTA ACUST UNITED AC 2010; 26:737-44. [PMID: 20130028 DOI: 10.1093/bioinformatics/btq042] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Protein sequences are often composed of regions that have distinct evolutionary histories as a consequence of domain shuffling, recombination or gene conversion. New approaches are required to discover, visualize and analyze these sequence regions and thus enable a better understanding of protein evolution. RESULTS Here, we have developed an alignment-free and visual approach to analyze sequence relationships. We use the number of shared n-grams between sequences as a measure of sequence similarity and rearrange the resulting affinity matrix applying a spectral technique. Heat maps of the affinity matrix are employed to identify and visualize clusters of related sequences or outliers, while n-gram-based dot plots and conservation profiles allow detailed analysis of similarities among selected sequences. Using this approach, we have identified signatures of domain shuffling in an otherwise poorly characterized family, and homology clusters in another. We conclude that this approach may be generally useful as a framework to analyze related, but highly divergent protein sequences. It is particularly useful as a fast method to study sequence relationships prior to much more time-consuming multiple sequence alignment and phylogenetic analysis. AVAILABILITY A software implementation (MOSAIC) of the framework described here can be downloaded from http://bioinformatics.org.au/mosaic/ CONTACT m.ragan@uq.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefan R Maetschke
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | | | | | | | |
Collapse
|
33
|
|
34
|
Cardona G, Llabrés M, Rosselló F, Valiente G. On Nakhleh's metric for reduced phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:629-638. [PMID: 19875861 DOI: 10.1109/tcbb.2009.33] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We prove that Nakhleh's metric for reduced phylogenetic networks is also a metric on the classes of tree-child phylogenetic networks, semibinary tree-sibling time consistent phylogenetic networks, and multilabeled phylogenetic trees. We also prove that it separates distinguishable phylogenetic networks. In this way, it becomes the strongest dissimilarity measure for phylogenetic networks available so far. Furthermore, we propose a generalization of that metric that separates arbitrary phylogenetic networks.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de-Mallorca, Spain. gabriel.cardona
| | | | | | | |
Collapse
|
35
|
Willson SJ. Properties of Normal Phylogenetic Networks. Bull Math Biol 2009; 72:340-58. [DOI: 10.1007/s11538-009-9449-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Accepted: 08/14/2009] [Indexed: 11/30/2022]
|
36
|
Cardona G, Llabrés M, Rosselló F, Valiente G. Metrics for phylogenetic networks II: nodal and triplets metrics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:454-469. [PMID: 19644173 DOI: 10.1109/tcbb.2008.127] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the second in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we generalize to phylogenetic networks two metrics that have already been introduced in the literature for phylogenetic trees: the nodal distance and the triplets distance. We prove that they are metrics on any class of tree-child time consistent phylogenetic networks on the same set of taxa, as well as some basic properties for them. To prove these results, we introduce a reduction/expansion procedure that can be used not only to establish properties of tree-child time consistent phylogenetic networks by induction, but also to generate all tree-child time consistent phylogenetic networks with a given number of leaves.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, Spain. gabriel.cardona@@uib.es
| | | | | | | |
Collapse
|
37
|
Cardona G, Llabrés M, Rosselló F, Valiente G. Metrics for phylogenetic networks I: generalizations of the Robinson-Foulds metric. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:46-61. [PMID: 19179698 DOI: 10.1109/tcbb.2008.70] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the mu-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network.
Collapse
Affiliation(s)
- Gabriel Cardona
- University of the Balearic Islands, Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
38
|
Arenas M, Valiente G, Posada D. Characterization of reticulate networks based on the coalescent with recombination. Mol Biol Evol 2008; 25:2517-20. [PMID: 18927089 PMCID: PMC2582979 DOI: 10.1093/molbev/msn219] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic networks aim to represent the evolutionary history of taxa. Within these, reticulate networks are explicitly able to accommodate evolutionary events like recombination, hybridization, or lateral gene transfer. Although several metrics exist to compare phylogenetic networks, they make several assumptions regarding the nature of the networks that are not likely to be fulfilled by the evolutionary process. In order to characterize the potential disagreement between the algorithms and the biology, we have used the coalescent with recombination to build the type of networks produced by reticulate evolution and classified them as regular, tree sibling, tree child, or galled trees. We show that, as expected, the complexity of these reticulate networks is a function of the population recombination rate. At small recombination rates, most of the networks produced are already more complex than regular or tree sibling networks, whereas with moderate and large recombination rates, no network fit into any of the standard classes. We conclude that new metrics still need to be devised in order to properly compare two phylogenetic networks that have arisen from reticulating evolutionary process.
Collapse
|