1
|
Xu J, Ané C. Identifiability of local and global features of phylogenetic networks from average distances. J Math Biol 2022; 86:12. [PMID: 36481927 DOI: 10.1007/s00285-022-01847-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/17/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.
Collapse
Affiliation(s)
- Jingcheng Xu
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA.
| | - Cécile Ané
- Department of Statistics, University of Wisconsin - Madison, Madison, WI, 53706, USA
- Department of Botany, University of Wisconsin - Madison, Madison, WI, 53706, USA
| |
Collapse
|
2
|
Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol 2022; 84:47. [PMID: 35503141 DOI: 10.1007/s00285-022-01746-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/18/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Laura Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH, USA.,Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Kristina Wicke
- Department of Mathematics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
3
|
Tan M, Long H, Liao B, Cao Z, Yuan D, Tian G, Zhuang J, Yang J. QS-Net: Reconstructing Phylogenetic Networks Based on Quartet and Sextet. Front Genet 2019; 10:607. [PMID: 31396256 PMCID: PMC6667645 DOI: 10.3389/fgene.2019.00607] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 06/11/2019] [Indexed: 01/27/2023] Open
Abstract
Phylogenetic networks are used to estimate evolutionary relationships among biological entities or taxa involving reticulate events such as horizontal gene transfer, hybridization, recombination, and reassortment. In the past decade, many phylogenetic tree and network reconstruction methods have been proposed. Despite that they are highly accurate in reconstructing simple to moderate complex reticulate events, the performance decreases when several reticulate events are present simultaneously. In this paper, we proposed QS-Net, a phylogenetic network reconstruction method taking advantage of information on the relationship among six taxa. To evaluate the performance of QS-Net, we conducted experiments on three artificial sequence data simulated from an evolutionary tree, an evolutionary network involving three reticulate events, and a complex evolutionary network involving five reticulate events. Comparison with popular phylogenetic methods including Neighbor-Joining, Split-Decomposition, Neighbor-Net, and Quartet-Net suggests that QS-Net is comparable with other methods in reconstructing tree-like evolutionary histories, while it outperforms them in reconstructing reticulate events. In addition, we also applied QS-Net in real data including a bacterial taxonomy data consisting of 36 bacterial species and the whole genome sequences of 22 H7N9 influenza A viruses. The results indicate that QS-Net is capable of inferring commonly believed bacterial taxonomy and influenza evolution as well as identifying novel reticulate events. The software QS-Net is publically available at https://github.com/Tmyiri/QS-Net.
Collapse
Affiliation(s)
- Ming Tan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Haixia Long
- School of Information Science and Technology , Hainan Normal University, Haikou, China
| | - Bo Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.,School of Information Science and Technology , Hainan Normal University, Haikou, China
| | - Zhi Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Dawei Yuan
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Jujuan Zhuang
- Department of Mathematics, Dalian Martine University, Dalian, China
| | - Jialiang Yang
- School of Information Science and Technology , Hainan Normal University, Haikou, China.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
4
|
Determining phylogenetic networks from inter-taxa distances. J Math Biol 2015; 73:283-303. [DOI: 10.1007/s00285-015-0950-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Revised: 08/18/2015] [Indexed: 11/27/2022]
|
5
|
Francis AR, Steel M. Tree-like reticulation networks--when do tree-like distances also support reticulate evolution? Math Biosci 2014; 259:12-9. [PMID: 25447812 DOI: 10.1016/j.mbs.2014.10.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Revised: 09/22/2014] [Accepted: 10/31/2014] [Indexed: 10/24/2022]
Abstract
Hybrid evolution and horizontal gene transfer (HGT) are processes where evolutionary relationships may more accurately be described by a reticulated network than by a tree. In such a network, there will often be several paths between any two extant species, reflecting the possible pathways that genetic material may have been passed down from a common ancestor to these species. These paths will typically have different lengths but an 'average distance' can still be calculated between any two taxa. In this article, we ask whether this average distance is able to distinguish reticulate evolution from pure tree-like evolution. We consider two types of reticulation networks: hybridisation networks and HGT networks. For the former, we establish a general result which shows that average distances between extant taxa can appear tree-like, but only under a single hybridisation event near the root; in all other cases, the two forms of evolution can be distinguished by average distances. For HGT networks, we demonstrate some analogous but more intricate results.
Collapse
Affiliation(s)
- Andrew R Francis
- Centre for Research in Mathematics, School of Computing, Engineering and Mathematics, University of Western Sydney, Sydney, New South Wales, Australia.
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, Canterbury, New Zealand.
| |
Collapse
|
6
|
Huber KT, Van Iersel L, Moulton V, Wu T. How much information is needed to infer reticulate evolutionary histories? Syst Biol 2014; 64:102-11. [PMID: 25236959 PMCID: PMC4265143 DOI: 10.1093/sysbio/syu076] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Leo Van Iersel
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands
| |
Collapse
|