1
|
Khayatian E, Valiente G, Zhang L. The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees. J Comput Biol 2024; 31:328-344. [PMID: 38271573 PMCID: PMC11057537 DOI: 10.1089/cmb.2023.0312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open
Abstract
Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.
Collapse
Affiliation(s)
- Elahe Khayatian
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Gabriel Valiente
- Department of Computer Science, Technical University of Catalonia, Barcelona, Spain
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| |
Collapse
|
2
|
Chon NL, Tran S, Miller CS, Lin H, Knight JD. A conserved electrostatic membrane-binding surface in synaptotagmin-like proteins revealed using molecular phylogenetic analysis and homology modeling. Protein Sci 2024; 33:e4850. [PMID: 38038838 PMCID: PMC10731544 DOI: 10.1002/pro.4850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/29/2023] [Accepted: 11/28/2023] [Indexed: 12/02/2023]
Abstract
Protein structure prediction has emerged as a core technology for understanding biomolecules and their interactions. Here, we combine homology-based structure prediction with molecular phylogenetic analysis to study the evolution of electrostatic membrane binding among the vertebrate synaptotagmin-like protein (Slp) family. Slp family proteins play key roles in the membrane trafficking of large dense-core secretory vesicles. Our previous experimental and computational study found that the C2A domain of Slp-4 (also called granuphilin) binds with high affinity to anionic phospholipids in the cytoplasmic leaflet of the plasma membrane through a large positively charged protein surface centered on a cluster of phosphoinositide-binding lysine residues. Because this surface contributes greatly to Slp-4 C2A domain membrane binding, we hypothesized that the net charge on the surface might be evolutionarily conserved. To test this hypothesis, the known C2A sequences of Slp-4 among vertebrates were organized by class (from mammalia to pisces) using molecular phylogenetic analysis. Consensus sequences for each class were then identified and used to generate homology structures, from which Poisson-Boltzmann electrostatic potentials were calculated. For comparison, homology structures and electrostatic potentials were also calculated for the five human Slp protein family members. The results demonstrate that the charge on the membrane-binding surface is highly conserved throughout the evolution of Slp-4, and more highly conserved than many individual residues among the human Slp family paralogs. Such molecular phylogenetic-driven computational analysis can help to describe the evolution of electrostatic interactions between proteins and membranes which are crucial for their function.
Collapse
Affiliation(s)
- Nara L. Chon
- Department of ChemistryUniversity of Colorado DenverDenverColoradoUSA
| | - Sherleen Tran
- Department of ChemistryUniversity of Colorado DenverDenverColoradoUSA
| | | | - Hai Lin
- Department of ChemistryUniversity of Colorado DenverDenverColoradoUSA
| | | |
Collapse
|
3
|
Chon NL, Tran S, Miller CS, Lin H, Knight JD. A Conserved Electrostatic Membrane-Binding Surface in Synaptotagmin-Like Proteins Revealed Using Molecular Phylogenetic Analysis and Homology Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.13.548768. [PMID: 37502952 PMCID: PMC10369986 DOI: 10.1101/2023.07.13.548768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Protein structure prediction has emerged as a core technology for understanding biomolecules and their interactions. Here, we combine homology-based structure prediction with molecular phylogenetic analysis to study the evolution of electrostatic membrane binding among vertebrate synaptotagmin-like proteins (Slps). Slp family proteins play key roles in the membrane trafficking of large dense-core secretory vesicles. Our previous experimental and computational study found that the C2A domain of Slp-4 (also called granuphilin) binds with high affinity to anionic phospholipids in the cytoplasmic leaflet of the plasma membrane through a large positively charged protein surface centered on a cluster of phosphoinositide-binding lysine residues. Because this surface contributes greatly to Slp-4 C2A domain membrane binding, we hypothesized that the net charge on the surface might be evolutionarily conserved. To test this hypothesis, the known C2A sequences of Slp-4 among vertebrates were organized by class (from mammalia to pisces) using molecular phylogenetic analysis. Consensus sequences for each class were then identified and used to generate homology structures, from which Poisson-Boltzmann electrostatic potentials were calculated. For comparison, homology structures and electrostatic potentials were also calculated for the five human Slp protein family members. The results demonstrate that the charge on the membrane-binding surface is highly conserved throughout the evolution of Slp-4, and more highly conserved than many individual residues among the human Slp family paralogs. Such molecular phylogenetic-driven computational analysis can help to describe the evolution of electrostatic interactions between proteins and membranes which are crucial for their function.
Collapse
Affiliation(s)
- Nara L. Chon
- Department of Chemistry, University of Colorado Denver
| | - Sherleen Tran
- Department of Chemistry, University of Colorado Denver
| | | | - Hai Lin
- Department of Chemistry, University of Colorado Denver
| | | |
Collapse
|
4
|
van Iersel L, Janssen R, Jones M, Murakami Y. Orchard Networks are Trees with Additional Horizontal Arcs. Bull Math Biol 2022; 84:76. [PMID: 35727410 PMCID: PMC9213324 DOI: 10.1007/s11538-022-01037-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 05/30/2022] [Indexed: 11/30/2022]
Abstract
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(kn+n\log (n))$$\end{document}O(kn+nlog(n)).
Collapse
Affiliation(s)
- Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands
| | - Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, South Holland, The Netherlands.
| |
Collapse
|
5
|
Singh KN, Narzary D. Heavy metal tolerance of bacterial isolates associated with overburden strata of an opencast coal mine of Assam (India). ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:63111-63126. [PMID: 34218386 DOI: 10.1007/s11356-021-15153-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 06/23/2021] [Indexed: 05/21/2023]
Abstract
Coal overburden strata (OBS) vary in thickness, geochemical composition, and physical properties from stratum to stratum. Here, we enumerated the cultivable bacterial diversity and their distribution in different OBS taken from the opencast mining of Tikok colliery, Assam. The pH of the coal OBS ranged from 2.46 to 7.93, but 73% of the OBS was acidic. The OBS samples were mostly of shale types except for a few that were sandstone, mudstone, and red soil. The bacterial CFUs per gram OBS samples were highly diverse ranging from 52 to 57.4×104. A total of 79 bacterial pure culture isolates belonging to 19 genera, 12 family, and 3 phyla (Actinobacteria, Firmicutes, and Proteobacteria) were recovered in nutrient agar plates. Firmicutes appeared dominant over the others. All the isolates were screened for heavy metal tolerance in broth culture augmented with five different metals (Ni2+, Cu2+, Cr6+, As3+, and Cd2+) separately. The number of isolates that showed tolerance was 95% for Cr6+, 69.6% for Ni2+, 50.6% each for As3+ and Cu2+, and 7.6% for Cd2+. The bacterial isolates with high metal tolerance, i.e., 5 to 12 mM could be promising for bioremediation of Ni2+, Cu2+, Cr6+, and As3+ from the sites contaminated with these heavy metals.
Collapse
Affiliation(s)
- Khomdram Niren Singh
- Microbiology and Molecular Systematics Laboratory, Department of Botany, Gauhati University, Guwahati, Assam, 781014, India
| | - Diganta Narzary
- Microbiology and Molecular Systematics Laboratory, Department of Botany, Gauhati University, Guwahati, Assam, 781014, India.
| |
Collapse
|
6
|
Jahn K, Beerenwinkel N, Zhang L. The Bourque distances for mutation trees of cancers. Algorithms Mol Biol 2021; 16:9. [PMID: 34112201 PMCID: PMC8193869 DOI: 10.1186/s13015-021-00188-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 06/02/2021] [Indexed: 12/02/2022] Open
Abstract
Background Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson–Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels. Results We generalize the Robinson–Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. We show the basic version of the Bourque distance for mutation trees can be computed in linear time. We also make a connection between the Robinson–Foulds distance and the nearest neighbor interchange distance. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-021-00188-3.
Collapse
|
7
|
Gaba S, Kumari A, Medema M, Kaushik R. Pan-genome analysis and ancestral state reconstruction of class halobacteria: probability of a new super-order. Sci Rep 2020; 10:21205. [PMID: 33273480 PMCID: PMC7713125 DOI: 10.1038/s41598-020-77723-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 09/25/2020] [Indexed: 12/13/2022] Open
Abstract
Halobacteria, a class of Euryarchaeota are extremely halophilic archaea that can adapt to a wide range of salt concentration generally from 10% NaCl to saturated salt concentration of 32% NaCl. It consists of the orders: Halobacteriales, Haloferaciales and Natriabales. Pan-genome analysis of class Halobacteria was done to explore the core (300) and variable components (Softcore: 998, Cloud:36531, Shell:11784). The core component revealed genes of replication, transcription, translation and repair, whereas the variable component had a major portion of environmental information processing. The pan-gene matrix was mapped onto the core-gene tree to find the ancestral (44.8%) and derived genes (55.1%) of the Last Common Ancestor of Halobacteria. A High percentage of derived genes along with presence of transformation and conjugation genes indicate the occurrence of horizontal gene transfer during the evolution of Halobacteria. A Core and pan-gene tree were also constructed to infer a phylogeny which implicated on the new super-order comprising of Natrialbales and Halobacteriales.
Collapse
Affiliation(s)
- Sonam Gaba
- Division of Microbiology, ICAR-Indian Agricultural Research Institute, New Delhi, India.,Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Abha Kumari
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Marnix Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Rajeev Kaushik
- Division of Microbiology, ICAR-Indian Agricultural Research Institute, New Delhi, India.
| |
Collapse
|
8
|
Muñoz-Leal S, Domínguez L, Armstrong BA, Labruna MB, Bermúdez C S. Ornithodoros capensis sensu stricto (Ixodida: Argasidae) in Coiba National Park: first report for Panama, with notes on the O. capensis group in Panamanian shores and Costa Rica. EXPERIMENTAL & APPLIED ACAROLOGY 2020; 81:469-481. [PMID: 32607963 DOI: 10.1007/s10493-020-00516-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 06/18/2020] [Indexed: 06/11/2023]
Abstract
Ornithodoros capensis sensu lato (s.l.) is a morphologically similar group of soft ticks that parasitizes mostly seabirds in continental and offshore territories worldwide. Ornithodoros capensis sensu stricto (s.s.) has been previously recorded in many islands and coastal localities along the American continent; however, some records from Central America remain obscure. In this work we performed morphological and molecular analyses on soft ticks collected in Coiba National Park, an archipelago located in the Pacific Ocean off the coast of Panama, confirming the occurrence of O. capensis s.s. in this country for the first time. In addition, a morphological examination of museum specimens collected in Costa Rica, and a further locality in Panama, confirmed that O. capensis s.l. is established in the former country, and that its distribution along Panamanian shores is likely larger.
Collapse
Affiliation(s)
- Sebastián Muñoz-Leal
- Departamento de Medicina Veterinária Preventiva e Saúde Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil.
| | - Lillian Domínguez
- Medical Entomology Department, Gorgas Memorial Institute of Health Studies, Panama City, Panama
| | - Brittany A Armstrong
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
| | - Marcelo B Labruna
- Departamento de Medicina Veterinária Preventiva e Saúde Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil
| | - Sergio Bermúdez C
- Medical Entomology Department, Gorgas Memorial Institute of Health Studies, Panama City, Panama.
- Coiba Scientific Station, Coiba AIP, City of Knowledge, Panama City, Panama.
| |
Collapse
|
9
|
Chen ZZ, Ueta S, Li J, Wang L. Computing a Consensus Phylogeny via Leaf Removal. J Comput Biol 2020; 27:175-188. [PMID: 31638413 DOI: 10.1089/cmb.2019.0269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Given a set [Formula: see text]. of phylogenetic trees with the same leaf-label set X, we wish to remove some leaves from the trees so that there is a tree T with leaf-label set X displaying all the resulting trees. Note that the labels of leaves removed from one input tree may be different from those of leaves removed from another input tree. One objective is to minimize the total number of leaves removed from the trees, whereas the other is to minimize the maximum number of leaves removed from an input tree. Chauve et al. refer to the problem with the first (respectively, second) objective as AST-LR (respectively, AST-LR-d), and they show that both problems are NP-hard, where NP is the class of problems solvable in non-deterministic polynomial time. They further present algorithms for the parameterized versions of both problems. In this article, we point out that their algorithm for the parameterized version of AST-LR is flawed and present a new algorithm. Since neither Chauve et al.'s algorithm for AST-LR-d nor our new algorithm for AST-LR looks practical, we further design integer-linear programming (ILP for short) models for AST-LR and AST-LR-d, and we discuss speedup issues when using popular ILP solvers (say, GUROBI or CPLEX) to solve the models. Our experimental results show that our ILP approach is quite efficient.
Collapse
Affiliation(s)
- Zhi-Zhong Chen
- Division of Information System Design, Tokyo Denki University, Hatoyama, Saitama, Japan
| | - Shohei Ueta
- Division of Information System Design, Tokyo Denki University, Hatoyama, Saitama, Japan
| | - Jingyu Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR
| |
Collapse
|
10
|
Muñoz-Leal S, Marcili A, Fuentes-Castillo D, Ayala M, Labruna MB. A relapsing fever Borrelia and spotted fever Rickettsia in ticks from an Andean valley, central Chile. EXPERIMENTAL & APPLIED ACAROLOGY 2019; 78:403-420. [PMID: 31165944 DOI: 10.1007/s10493-019-00389-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 05/31/2019] [Indexed: 06/09/2023]
Abstract
In humans, emerging infectious diseases are mostly zoonoses with ticks playing an important role as vectors. Tick-borne relapsing fever Borrelia and spotted fever Rickettsia occur in endemic foci along tropical and subtropical regions of the globe. However, both are widely neglected etiologic agents. In this study, we performed molecular analyses in order to assess the presence of Borrelia and Rickettsia DNA in ticks infesting small-mammals within a National Reserve located in the Andes Mountains, central Chile. While hard ticks were negative for the presence of both agents, sequences of four rickettsial (gltA, htrA, ompA, ompB) and two borrelial (16S rRNA and flaB) genes were obtained from larvae of an Ornithodoros sp. morphologically related with Ornithodoros atacamensis. Phylogenetic analyses indicated that the detected Borrelia and Rickettsia spp. belong to the relapsing fever and spotted fever groups, respectively. Moreover, the agents formed monophyletic clades with Rickettsia amblyommatis and "Candidatus Borrelia johnsonii." As positive ticks parasitize rodents within a highly visited National Reserve where outdoor activities are of common practice, the risk for human parasitism should not be discarded.
Collapse
Affiliation(s)
- Sebastián Muñoz-Leal
- Departamento de Medicina Veterinária Preventiva e Saúde Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil.
| | - Arlei Marcili
- Departamento de Medicina Veterinária Preventiva e Saúde Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil
- Mestrado em Medicina e Bem estar animal, Universidade Santo Amaro, São Paulo, São Paulo, Brazil
| | - Danny Fuentes-Castillo
- Departamento de Patologia Experimental e Comparada, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil
| | | | - Marcelo B Labruna
- Departamento de Medicina Veterinária Preventiva e Saúde Animal, Faculdade de Medicina Veterinária e Zootecnia, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
11
|
Muñoz-Leal S, Macedo C, Gonçalves TC, Dias Barreira J, Labruna MB, de Lemos ERS, Ogrzewalska M. Detected microorganisms and new geographic records of Ornithodoros rietcorreai (Acari: Argasidae) from northern Brazil. Ticks Tick Borne Dis 2019; 10:853-861. [DOI: 10.1016/j.ttbdis.2019.04.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 03/27/2019] [Accepted: 04/08/2019] [Indexed: 10/27/2022]
|
12
|
Exploring the Tiers of Rooted Phylogenetic Network Space Using Tail Moves. Bull Math Biol 2018; 80:2177-2208. [PMID: 29948885 PMCID: PMC6061524 DOI: 10.1007/s11538-018-0452-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 06/04/2018] [Indexed: 11/25/2022]
Abstract
Popular methods for exploring the space of rooted phylogenetic trees use rearrangement moves such as rooted Nearest Neighbour Interchange (rNNI) and rooted Subtree Prune and Regraft (rSPR). Recently, these moves were generalized to rooted phylogenetic networks, which are a more suitable representation of reticulate evolutionary histories, and it was shown that any two rooted phylogenetic networks of the same complexity are connected by a sequence of either rSPR or rNNI moves. Here, we show that this is possible using only tail moves, which are a restricted version of rSPR moves on networks that are more closely related to rSPR moves on trees. The connectedness still holds even when we restrict to distance-1 tail moves (a localized version of tail moves). Moreover, we give bounds on the number of (distance-1) tail moves necessary to turn one network into another, which in turn yield new bounds for rSPR, rNNI and SPR (i.e. the equivalent of rSPR on unrooted networks). The upper bounds are constructive, meaning that we can actually find a sequence with at most this length for any pair of networks. Finally, we show that finding a shortest sequence of tail or rSPR moves is NP-hard.
Collapse
|
13
|
Cha IT, Cho ES, Yoo Y, Seok YJ, Park I, Lim HS, Park JM, Roh SW, Nam YD, Choi HJ, Lee YK, Seo MJ. Paenibacillus arcticus sp. nov., isolated from Arctic soil. Int J Syst Evol Microbiol 2017; 67:4385-4389. [DOI: 10.1099/ijsem.0.002299] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- In-Tae Cha
- Division of Bioengineering, Incheon National University, Incheon 22012, Republic of Korea
| | - Eui-Sang Cho
- Department of Bioengineering and Nano-Bioengineering, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| | - Yesol Yoo
- Department of Life Sciences, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| | - Yoon Ji Seok
- Department of Life Sciences, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| | - Inhye Park
- Department of Life Sciences, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| | - Hee Seon Lim
- Department of Life Sciences, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| | - Jung-Min Park
- Korean Culture Center of Microorganisms, Seoul 03641, Republic of Korea
| | - Seong Woon Roh
- Microbiology and Functionality Research Group, World Institute of Kimchi, Gwangju 61755, Republic of Korea
| | - Young-Do Nam
- Research Group of Gut Microbiome, Korea Food Research Institute, Seongnam 13539, Republic of Korea
| | - Hak-Jong Choi
- Microbiology and Functionality Research Group, World Institute of Kimchi, Gwangju 61755, Republic of Korea
| | - Yoo Kyung Lee
- Division of Life Sciences, Korea Polar Research Institute, Incheon 21990, Republic of Korea
| | - Myung-Ji Seo
- Division of Bioengineering, Incheon National University, Incheon 22012, Republic of Korea
- Department of Bioengineering and Nano-Bioengineering, Graduate School of Incheon National University, Incheon 22012, Republic of Korea
| |
Collapse
|
14
|
Pawel Gorecki P, Paszek J, Eulenstein O. Unconstrained Diameters for Deep Coalescence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1002-1012. [PMID: 26887001 DOI: 10.1109/tcbb.2016.2520937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The minimizing-deep-coalescence (MDC) approach infers a median (species) tree for a given set of gene trees under the deep coalescence cost. This cost accounts for the minimum number of deep coalescences needed to reconcile a gene tree with a species tree where the leaf-genes are mapped to the leaf-species through a function called leaf labeling. In order to better understand the MDC approach we investigate here the diameter of a gene tree, which is an important property of the deep coalescence cost. This diameter is the maximal deep coalescence costs for a given gene tree under all leaf labelings for each possible species tree topology. While we prove that this diameter is generally infinite, this result relies on the diameter's unrealistic assumption that species trees can be of infinite size. Providing a more practical definition, we introduce a natural extension of the gene tree diameter that constrains the species tree size by a given constant. For this new diameter, we describe an exact formula, present a complete classification of the trees yielding this diameter, derive formulas for its mean and variance, and demonstrate its ability using comparative studies.
Collapse
|
15
|
Abstract
Phylogenetic networks are a generalization of phylogenetic trees that allow for representation of reticulate evolution. Recently, a space of unrooted phylogenetic networks was introduced, where such a network is a connected graph in which every vertex has degree 1 or 3 and whose leaf-set is a fixed set X of taxa. This space, denoted [Formula: see text], is defined in terms of two operations on networks-the nearest neighbor interchange and triangle operations-which can be used to transform any network with leaf set X into any other network with that leaf set. In particular, it gives rise to a metric d on [Formula: see text] which is given by the smallest number of operations required to transform one network in [Formula: see text] into another in [Formula: see text]. The metric generalizes the well-known NNI-metric on phylogenetic trees which has been intensively studied in the literature. In this paper, we derive a bound for the metric d as well as a related metric [Formula: see text] which arises when restricting d to the subset of [Formula: see text] consisting of all networks with [Formula: see text] vertices, [Formula: see text]. We also introduce two new metrics on networks-the SPR and TBR metrics-which generalize the metrics on phylogenetic trees with the same name and give bounds for these new metrics. We expect our results to eventually have applications to the development and understanding of network search algorithms.
Collapse
|
16
|
Gambette P, van Iersel L, Jones M, Lafond M, Pardi F, Scornavacca C. Rearrangement moves on rooted phylogenetic networks. PLoS Comput Biol 2017; 13:e1005611. [PMID: 28763439 PMCID: PMC5557604 DOI: 10.1371/journal.pcbi.1005611] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 08/15/2017] [Accepted: 05/27/2017] [Indexed: 12/05/2022] Open
Abstract
Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction. Phylogenetic networks are used to represent reticulate evolution, that is, cases in which the tree-of-life metaphor for evolution breaks down, because some of its branches have merged at one or several points in the past. This may occur, for example, when some organisms in the phylogeny are hybrids. In this paper, we deal with an elementary question for the reconstruction of phylogenetic networks: how to explore the space of all possible networks. The fundamental component for this is the set of operations that should be employed to generate alternative hypotheses for what happened in the past—which serve as basic blocks for optimization techniques such as hill-climbing. Although these approaches have a long tradition in classic tree-based phylogenetics, their application to networks that explicitly represent reticulate evolution is relatively unexplored. This paper provides the fundamental definitions and theoretical results for subsequent work in practical methods for phylogenetic network reconstruction: we subdivide networks into layers, according to a generally-accepted measure of their complexity, and provide operations that allow both to fully explore each layer, and to move across different layers. These operations constitute natural generalizations of well-known operations for the exploration of the space of phylogenetic trees, the lowest layer in the hierarchy described above.
Collapse
Affiliation(s)
- Philippe Gambette
- Laboratoire d’Informatique Gaspard-Monge (LIGM), Université Paris-Est, CNRS, ENPC, ESIEE Paris, UPEM, F-77454, Marne-la-Vallée, France
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Postbus 5031, 2628 CD Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Postbus 5031, 2628 CD Delft, The Netherlands
| | - Manuel Lafond
- Department of Mathematics and Statistics, University of Ottawa, K1N 6N5 Ottawa, Canada
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, 34095 Montpellier Cedex 5, France
- Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France
- * E-mail:
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), 34095 Montpellier, France
- Institut des Sciences de l’Evolution (ISE-M), Université de Montpellier, CNRS, IRD, EPHE, 34095 Montpellier Cedex 5, France
| |
Collapse
|
17
|
Huber KT, Linz S, Moulton V, Wu T. Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations. J Math Biol 2015; 72:699-725. [PMID: 26037483 DOI: 10.1007/s00285-015-0899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 05/04/2015] [Indexed: 11/29/2022]
Abstract
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another. We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Simone Linz
- Department of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
18
|
Bastkowski S, Moulton V, Spillner A, Wu T. Neighborhoods of trees in circular orderings. Bull Math Biol 2014; 77:46-70. [PMID: 25477080 DOI: 10.1007/s11538-014-0049-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 11/25/2014] [Indexed: 11/26/2022]
Abstract
In phylogenetics, a common strategy used to construct an evolutionary tree for a set of species [Formula: see text] is to search in the space of all such trees for one that optimizes some given score function (such as the minimum evolution, parsimony or likelihood score). As this can be computationally intensive, it was recently proposed to restrict such searches to the set of all those trees that are compatible with some circular ordering of the set [Formula: see text]. To inform the design of efficient algorithms to perform such searches, it is therefore of interest to find bounds for the number of trees compatible with a fixed ordering in the neighborhood of a tree that is determined by certain tree operations commonly used to search for trees: the nearest neighbor interchange (NNI), the subtree prune and regraft (SPR) and the tree bisection and reconnection (TBR) operations. We show that the size of such a neighborhood of a binary tree associated with the NNI operation is independent of the tree's topology, but that this is not the case for the SPR and TBR operations. We also give tight upper and lower bounds for the size of the neighborhood of a binary tree for the SPR and TBR operations and characterize those trees for which these bounds are attained.
Collapse
|
19
|
Lin Y, Rajan V, Moret BME. A metric for phylogenetic trees based on matching. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1014-1022. [PMID: 22184263 DOI: 10.1109/tcbb.2011.157] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes--reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.
Collapse
Affiliation(s)
- Yu Lin
- Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Swiss Federal Institute of Technology-EPFL, INJ 211, Station 14, Lausanne CH-1015, Switzerland.
| | | | | |
Collapse
|
20
|
Bogdanowicz D, Giaro K. Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:150-160. [PMID: 21383415 DOI: 10.1109/tcbb.2011.48] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The reconstruction of evolutionary trees is one of the primary objectives in phylogenetics. Such a tree represents the historical evolutionary relationship between different species or organisms. Tree comparisons are used for multiple purposes, from unveiling the history of species to deciphering evolutionary associations among organisms and geographical areas. In this paper, we propose a new method of defining distances between unrooted binary phylogenetic trees that is especially useful for relatively large phylogenetic trees. Next, we investigate in detail the properties of one example of these metrics, called the Matching Split distance, and describe how the general method can be extended to nonbinary trees.
Collapse
|
21
|
Huber KT, Spillner A, Suchecki R, Moulton V. Metrics on multilabeled trees: interrelationships and diameter bounds. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1029-1040. [PMID: 21116046 DOI: 10.1109/tcbb.2010.122] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Multilabeled trees or MUL-trees, for short, are trees whose leaves are labeled by elements of some nonempty finite set X such that more than one leaf may be labeled by the same element of X. This class of trees includes phylogenetic trees and tree shapes. MUL-trees arise naturally in, for example, biogeography and gene evolution studies and also in the area of phylogenetic network reconstruction. In this paper, we introduce novel metrics which may be used to compare MUL-trees, most of which generalize well-known metrics on phylogenetic trees and tree shapes. These metrics can be used, for example, to better understand the space of MUL-trees or to help visualize collections of MUL-trees. In addition, we describe some relationships between the MUL-tree metrics that we present and also give some novel diameter bounds for these metrics. We conclude by briefly discussing some open problems as well as pointing out how MUL-tree metrics may be used to define metrics on the space of phylogenetic networks.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | | | | | | |
Collapse
|
22
|
Beiko RG, Hamilton N. Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol 2006; 6:15. [PMID: 16472400 PMCID: PMC1431587 DOI: 10.1186/1471-2148-6-15] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 02/11/2006] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Lateral genetic transfer can lead to disagreements among phylogenetic trees comprising sequences from the same set of taxa. Where topological discordance is thought to have arisen through genetic transfer events, tree comparisons can be used to identify the lineages that may have shared genetic information. An 'edit path' of one or more transfer events can be represented with a series of subtree prune and regraft (SPR) operations, but finding the optimal such set of operations is NP-hard for comparisons between rooted trees, and may be so for unrooted trees as well. RESULTS Efficient Evaluation of Edit Paths (EEEP) is a new tree comparison algorithm that uses evolutionarily reasonable constraints to identify and eliminate many unproductive search avenues, reducing the time required to solve many edit path problems. The performance of EEEP compares favourably to that of other algorithms when applied to strictly bifurcating trees with specified numbers of SPR operations. We also used EEEP to recover edit paths from over 19,000 unrooted, incompletely resolved protein trees containing up to 144 taxa as part of a large phylogenomic study. While inferred protein trees were far more similar to a reference supertree than random trees were to each other, the phylogenetic distance spanned by random versus inferred transfer events was similar, suggesting that real transfer events occur most frequently between closely related organisms, but can span large phylogenetic distances as well. While most of the protein trees examined here were very similar to the reference supertree, requiring zero or one edit operations for reconciliation, some trees implied up to 40 transfer events within a single orthologous set of proteins. CONCLUSION Since sequence trees typically have no implied root and may contain unresolved or multifurcating nodes, the strategy implemented in EEEP is the most appropriate for phylogenomic analyses. The high degree of consistency among inferred protein trees shows that vertical inheritance is the dominant pattern of evolution, at least for the set of organisms considered here. However, the edit paths inferred using EEEP suggest an important role for genetic transfer in the evolution of microbial genomes as well.
Collapse
Affiliation(s)
- Robert G Beiko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia and ARC Centre in Bioinformatics, Australia
| | - Nicholas Hamilton
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia and ARC Centre in Bioinformatics, Australia
- Advanced Computational Modelling Centre, The University of Queensland, Brisbane, Australia
| |
Collapse
|
23
|
|
24
|
Ané C, Sanderson M. Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Syst Biol 2005; 54:146-57. [PMID: 15805016 DOI: 10.1080/10635150590905984] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Phylogenetic tree reconstruction is difficult in the presence of lateral gene transfer and other processes generating conflicting signals. We develop a new approach to this problem using ideas borrowed from algorithmic information theory. It selects the hypothesis that simultaneously minimizes the descriptive complexity of the tree(s) plus the data when encoded using those tree(s). In practice this is the hypothesis that can compress the data the most. We show not only that phylogenetic compression is an efficient method for encoding most phylogenetic data sets and is more efficient than compression schemes designed for single sequences, but also that it provides a clear information theoretic rule for determining when a collection of conflicting trees is a better explanation of the data than a single tree. By casting the parsimony problem in this more general framework, we also conclude that the so-called total-evidence tree--the tree constructed from all the data simultaneously--is not always the most economical explanation of the data.
Collapse
Affiliation(s)
- Cécile Ané
- Section of Evolution and Ecology, University of California, Davis, California 95616, USA.
| | | |
Collapse
|
25
|
Hon WK, Kao MY, Lam TW, Sung WK, Yiu SM. Non-shared edges and nearest neighbor interchanges revisited. INFORM PROCESS LETT 2004. [DOI: 10.1016/j.ipl.2004.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
26
|
Ganapathy G, Ramachandran V, Warnow T. Better Hill-Climbing Searches for Parsimony. LECTURE NOTES IN COMPUTER SCIENCE 2003. [DOI: 10.1007/978-3-540-39763-2_19] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
27
|
|
28
|
Some Approximation Results for the Maximum Agreement Forest Problem. APPROXIMATION, RANDOMIZATION, AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES 2001. [DOI: 10.1007/3-540-44666-4_19] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
29
|
|