1
|
Wheeler WC, Varón A. Phylogenetic minimum description length: an optimality criterion based on algorithmic complexity. Cladistics 2025; 41:193-211. [PMID: 39956947 DOI: 10.1111/cla.12603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 11/07/2024] [Accepted: 12/05/2024] [Indexed: 02/18/2025] Open
Abstract
Phylogenetic minimum description length (PMDL) is proposed as an optimality criterion for phylogenetic analysis. PMDL is based on algorithmic (Kolmogorov) information and the minimum description length principle. This criterion generates natural weighting functions (i.e. not being externally specified) for a diversity of phylogenetic graph, data and model types. PMDL is a generalized criterion that converges on existing forms of inference (i.e. parsimony, likelihood, Bayesian) in specific circumstances. Furthermore, as opposed to existing criteria, PMDL includes graph complexity allowing for the competition of hypotheses with myriad types of phylogenetic graphs (e.g. trees, networks, forests). Owing to its compound nature, PMDL allows for analytical model choice along with phylogenetic graph hypothesis while avoiding over-parameterization. Although uncomputable, heuristic methods are presented for the calculation of upper bounds on the algorithmic information content of a phylogenetic hypothesis. Examples are presented demonstrating the approach.
Collapse
Affiliation(s)
- Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA
| | - Andres Varón
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA
| |
Collapse
|
2
|
Wheeler WC. Multi-armed bandits, Thomson sampling and unsupervised machine learning in phylogenetic graph search. Cladistics 2024; 40:430-437. [PMID: 38415802 DOI: 10.1111/cla.12572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 01/19/2024] [Accepted: 01/29/2024] [Indexed: 02/29/2024] Open
Abstract
A phylogenetic graph search relies on a large number of highly parameterized search procedures (e.g. branch-swapping, perturbation, simulated annealing, genetic algorithm). These procedures vary in effectiveness over datasets and at alternative points in analytical pipelines. The multi-armed bandit problem is applied to phylogenetic graph searching to more effectively utilize these procedures. Thompson sampling is applied to a collection of search and optimization "bandits" to favour productive search strategies over those that are less successful. This adaptive random sampling strategy is shown to be more effective in producing heuristically optimal phylogenetic graphs and more time efficient than existing uniform probability randomized search strategies. The strategy acts as a form of unsupervised machine learning that can be applied to a diversity of phylogenetic datasets without prior knowledge of their properties.
Collapse
Affiliation(s)
- Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA
| |
Collapse
|
3
|
Caetano-Anollés G. Are Viruses Taxonomic Units? A Protein Domain and Loop-Centric Phylogenomic Assessment. Viruses 2024; 16:1061. [PMID: 39066224 PMCID: PMC11281659 DOI: 10.3390/v16071061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 06/26/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
Virus taxonomy uses a Linnaean-like subsumption hierarchy to classify viruses into taxonomic units at species and higher rank levels. Virus species are considered monophyletic groups of mobile genetic elements (MGEs) often delimited by the phylogenetic analysis of aligned genomic or metagenomic sequences. Taxonomic units are assumed to be independent organizational, functional and evolutionary units that follow a 'natural history' rationale. Here, I use phylogenomic and other arguments to show that viruses are not self-standing genetically-driven systems acting as evolutionary units. Instead, they are crucial components of holobionts, which are units of biological organization that dynamically integrate the genetics, epigenetic, physiological and functional properties of their co-evolving members. Remarkably, phylogenomic analyses show that viruses share protein domains and loops with cells throughout history via massive processes of reticulate evolution, helping spread evolutionary innovations across a wider taxonomic spectrum. Thus, viruses are not merely MGEs or microbes. Instead, their genomes and proteomes conduct cellularly integrated processes akin to those cataloged by the GO Consortium. This prompts the generation of compositional hierarchies that replace the 'is-a-kind-of' by a 'is-a-part-of' logic to better describe the mereology of integrated cellular and viral makeup. My analysis demands a new paradigm that integrates virus taxonomy into a modern evolutionarily centered taxonomy of organisms.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
4
|
Wheeler WC, Washburn A, Crowley LM. PhylogeneticGraph (PhyG) a new phylogenetic graph search and optimization program. Cladistics 2024; 40:97-105. [PMID: 37855442 DOI: 10.1111/cla.12560] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 10/20/2023] Open
Abstract
We present Phylogenetic Graph (PhyG), an open-source, phylogenetic search tool for diverse data types and graphs, including softwired and hardwired networks, in addition to trees. This allows for analysis of horizontal transfer and hybridization scenarios, as well as the necessary vertical inheritance of trees. PhyG is the successor to POY5 in performing combined data tree-alignment with enhancements in heuristic optimality (up to 7% in example data) and execution time (up to a factor of 200). Input data may exhibit a practically unlimited number of character states in qualitative or sequence (aligned and unaligned) types. Novel graph construction and refinement algorithms have been implemented and integrated into a variety of search procedures. Currently, PhyG implements parsimony and No-Common-Mechanism Likelihood optimization.
Collapse
Affiliation(s)
- Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA
| | - Alexander Washburn
- Department of Computer Science, CUNY Graduate Center, 365 Fifth Avenue, New York, NY, 10016, USA
| | - Louise M Crowley
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024, USA
| |
Collapse
|
5
|
Wheeler WC, Washburn AJ. Parsimony optimization of phylogenetic networks. Cladistics 2023; 39:456-474. [PMID: 37466283 DOI: 10.1111/cla.12552] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/04/2023] [Accepted: 06/15/2023] [Indexed: 07/20/2023] Open
Abstract
An algorithm is described for the optimization of character data (e.g. qualitative, nucleic acid sequence) on softwired phylogenetic networks. The algorithm presented here is an extension of those developed for trees under the parsimony criterion and can form the basis for phylogenetic network search procedures. Although the problem is (in general) an NP-Hard optimization, the resolution-based algorithm we describe here capitalizes on the significant amount of shared structure in sub-graphs containing network edges, reducing the execution time and allowing for the analysis of empirical datasets.
Collapse
Affiliation(s)
- Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, 10024, NY, USA
| | - Alexander J Washburn
- Department of Computer Science, City University of New York, 365 5th Avenue, New York, 10016, NY, USA
| |
Collapse
|
6
|
Caetano-Anollés G, Claverie JM, Nasir A. A critical analysis of the current state of virus taxonomy. Front Microbiol 2023; 14:1240993. [PMID: 37601376 PMCID: PMC10435761 DOI: 10.3389/fmicb.2023.1240993] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 07/20/2023] [Indexed: 08/22/2023] Open
Abstract
Taxonomical classification has preceded evolutionary understanding. For that reason, taxonomy has become a battleground fueled by knowledge gaps, technical limitations, and a priorism. Here we assess the current state of the challenging field, focusing on fallacies that are common in viral classification. We emphasize that viruses are crucial contributors to the genomic and functional makeup of holobionts, organismal communities that behave as units of biological organization. Consequently, viruses cannot be considered taxonomic units because they challenge crucial concepts of organismality and individuality. Instead, they should be considered processes that integrate virions and their hosts into life cycles. Viruses harbor phylogenetic signatures of genetic transfer that compromise monophyly and the validity of deep taxonomic ranks. A focus on building phylogenetic networks using alignment-free methodologies and molecular structure can help mitigate the impasse, at least in part. Finally, structural phylogenomic analysis challenges the polyphyletic scenario of multiple viral origins adopted by virus taxonomy, defeating a polyphyletic origin and supporting instead an ancient cellular origin of viruses. We therefore, prompt abandoning deep ranks and urgently reevaluating the validity of taxonomic units and principles of virus classification.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and C.R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Jean-Michel Claverie
- Structural and Genomic Information Laboratory (UMR7256), Mediterranean Institute of Microbiology (FR3479), IM2B, IOM, Aix Marseille University, CNRS, Marseille, France
| | | |
Collapse
|
7
|
Villalobos-Cid M, Dorn M, Contreras Á, Inostroza-Ponta M. An evolutionary algorithm based on parsimony for the multiobjective phylogenetic network inference problem. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023]
|
8
|
Scornavacca C, Weller M. Treewidth-based algorithms for the small parsimony problem on networks. Algorithms Mol Biol 2022; 17:15. [PMID: 35987645 PMCID: PMC9392953 DOI: 10.1186/s13015-022-00216-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/17/2022] [Indexed: 12/04/2022] Open
Abstract
Background Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component). Results We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$c^t {n^{O(1)}}$$\end{document}ctnO(1), \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$(3c)^t {n^{O(1)}}$$\end{document}(3c)tnO(1), and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$6^{tc}n^{O(1)}$$\end{document}6tcnO(1), respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems. Conclusions Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a “good” tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of “agreeing trees” on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network’s treewidth as parameter more accessible to this audience.
Collapse
|
9
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
10
|
Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
11
|
Systematics Association Special Volumes. Cladistics 2020. [DOI: 10.1017/9781139047678.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
12
|
Relationship Diagrams. Cladistics 2020. [DOI: 10.1017/9781139047678.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
13
|
The Separation of Classification and Phylogenetics. Cladistics 2020. [DOI: 10.1017/9781139047678.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
14
|
Beyond Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
15
|
The Interrelationships of Organisms. Cladistics 2020. [DOI: 10.1017/9781139047678.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
16
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
17
|
Modern Artificial Methods and Raw Data. Cladistics 2020. [DOI: 10.1017/9781139047678.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
18
|
Further Myths and More Misunderstandings. Cladistics 2020. [DOI: 10.1017/9781139047678.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
19
|
Afterword. Cladistics 2020. [DOI: 10.1017/9781139047678.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
20
|
Systematics: Exposing Myths. Cladistics 2020. [DOI: 10.1017/9781139047678.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
21
|
Essentialism and Typology. Cladistics 2020. [DOI: 10.1017/9781139047678.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
22
|
Beyond Classification: How to Study Phylogeny. Cladistics 2020. [DOI: 10.1017/9781139047678.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
23
|
How to Study Classification: ‘Total Evidence’ vs. ‘Consensus’, Character Congruence vs. Taxonomic Congruence, Simultaneous Analysis vs. Partitioned Data. Cladistics 2020. [DOI: 10.1017/9781139047678.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
24
|
What This Book Is About. Cladistics 2020. [DOI: 10.1017/9781139047678.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
25
|
How to Study Classification. Cladistics 2020. [DOI: 10.1017/9781139047678.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
26
|
The Cladistic Programme. Cladistics 2020. [DOI: 10.1017/9781139047678.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
27
|
Index. Cladistics 2020. [DOI: 10.1017/9781139047678.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
28
|
Parameters of Classification: Ordo Ab Chao. Cladistics 2020. [DOI: 10.1017/9781139047678.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
29
|
Monothetic and Polythetic Taxa. Cladistics 2020. [DOI: 10.1017/9781139047678.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
30
|
How to Study Classification: Consensus Techniques and General Classifications. Cladistics 2020. [DOI: 10.1017/9781139047678.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
31
|
Non-taxa or the Absence of –Phyly: Paraphyly and Aphyly. Cladistics 2020. [DOI: 10.1017/9781139047678.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
32
|
Introduction: Carving Nature at Its Joints, or Why Birds Are Not Dinosaurs and Men Are Not Apes. Cladistics 2020. [DOI: 10.1017/9781139047678.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
33
|
Preface. Cladistics 2020. [DOI: 10.1017/9781139047678.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
34
|
Block alignment: New representation and comparison method to study evolution of genomes. Genomics 2019; 111:1590-1603. [DOI: 10.1016/j.ygeno.2018.11.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2018] [Revised: 10/13/2018] [Accepted: 11/05/2018] [Indexed: 01/22/2023]
|
35
|
Miyagi M, Wheeler WC. Comparing and displaying phylogenetic trees using edge union networks. Cladistics 2019; 35:688-694. [PMID: 34618927 DOI: 10.1111/cla.12374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2018] [Indexed: 11/28/2022] Open
Abstract
The general problem of representing collections of trees as a single graph has led to many tree summary techniques. Many consensus approaches take sets of trees (either inferred as separate gene trees or gleaned from the posterior of a Bayesian analysis) and produce a single "best" tree. In scenarios where horizontal gene transfer or hybridization are suspected, networks may be preferred, which allow for nodes to have two parents, representing the fusion of lineages. One such construct is the cluster union network (CUN), which is constructed using the union of all clusters in the input trees. The CUN has a number of mathematically desirable properties, but can also present edges not observed in the input trees. In this paper we define a new network construction, the edge union network (EUN), which displays edges if and only if they are contained in the input trees. We also demonstrate that this object can be constructed with polynomial time complexity given arbitrary phylogenetic input trees, and so can be used in conjunction with network analysis techniques for further phylogenetic hypothesis testing.
Collapse
Affiliation(s)
- Miriam Miyagi
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St, Cambridge, MA, 02138, USA
| | - Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, 200 Central Park West, New York, NY, 10024-5192, USA
| |
Collapse
|
36
|
Janies D. Phylogenetic Concepts and Tools Applied to Epidemiologic Investigations of Infectious Diseases. Microbiol Spectr 2019; 7:10.1128/microbiolspec.ame-0006-2018. [PMID: 31325287 PMCID: PMC10956736 DOI: 10.1128/microbiolspec.ame-0006-2018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Indexed: 01/13/2023] Open
Abstract
In this review, which is a part of the Microbiology Spectrum Curated Collection: Advances in Molecular Epidemiology of Infectious Diseases, I present an overview of the principles used to classify organisms in the field of phylogenetics, highlight the methods used to infer the interrelationships of organisms, and summarize how these concepts are applied to molecular epidemiologic analyses. I present steps in analyses that come downstream of the assembly of a set of genomes or genes and the production of a multiple-sequence alignment or other matrices of putative orthologs for comparison. I focus on the history of the problem of phylogenetic reconstruction and debates within the field about the most appropriate methods. I illustrate methods that bridge the gap between molecular epidemiology and traditional epidemiology, including phylogenetic character evolution and geographic visualization. Finally, I provide practical advice on how to conduct an example analysis in the appendix. *This article is part of a curated collection.
Collapse
Affiliation(s)
- Daniel Janies
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223
| |
Collapse
|
37
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
38
|
Pagán I. The diversity, evolution and epidemiology of plant viruses: A phylogenetic view. INFECTION GENETICS AND EVOLUTION 2018; 65:187-199. [PMID: 30055330 DOI: 10.1016/j.meegid.2018.07.033] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 07/24/2018] [Accepted: 07/24/2018] [Indexed: 10/28/2022]
Abstract
During the past four decades, the scientific community has seen an exponential advance in the number, sophistication, and quality of molecular techniques and bioinformatics tools for the genetic characterization of plant virus populations. Predating these advances, the field of Phylogenetics has significantly contributed to understand important aspects of plant virus evolution. This review aims at summarizing the impact of Phylogenetics in the current knowledge on three major aspects of plant virus evolution that have benefited from the development of phylogenetic inference: (1) The identification and classification of plant virus diversity. (2) The mechanisms and forces shaping the evolution of plant virus populations. (3) The understanding of the interaction between plant virus evolution, epidemiology and ecology. The work discussed here highlights the important role of phylogenetic approaches in the study of the dynamics of plant virus populations.
Collapse
Affiliation(s)
- Israel Pagán
- Centro de Biotecnología y Genómica de Plantas UPM-INIA, E.T.S. Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid 28223, Spain.
| |
Collapse
|
39
|
Bryant C, Fischer M, Linz S, Semple C. On the quirks of maximum parsimony and likelihood on phylogenetic networks. J Theor Biol 2017; 417:100-108. [PMID: 28087420 DOI: 10.1016/j.jtbi.2017.01.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 10/28/2016] [Accepted: 01/06/2017] [Indexed: 11/16/2022]
Abstract
Maximum parsimony is one of the most frequently-discussed tree reconstruction methods in phylogenetic estimation. However, in recent years it has become more and more apparent that phylogenetic trees are often not sufficient to describe evolution accurately. For instance, processes like hybridization or lateral gene transfer that are commonplace in many groups of organisms and result in mosaic patterns of relationships cannot be represented by a single phylogenetic tree. This is why phylogenetic networks, which can display such events, are becoming of more and more interest in phylogenetic research. It is therefore necessary to extend concepts like maximum parsimony from phylogenetic trees to networks. Several suggestions for possible extensions can be found in recent literature, for instance the softwired and the hardwired parsimony concepts. In this paper, we analyze the so-called big parsimony problem under these two concepts, i.e. we investigate maximum parsimonious networks and analyze their properties. In particular, we show that finding a softwired maximum parsimony network is possible in polynomial time. We also show that the set of maximum parsimony networks for the hardwired definition always contains at least one phylogenetic tree. Lastly, we investigate some parallels of parsimony to different likelihood concepts on phylogenetic networks.
Collapse
Affiliation(s)
| | - Mareike Fischer
- Department for Mathematics and Computer Science, Ernst Moritz Arndt University, Greifswald, Germany.
| | - Simone Linz
- Department of Computer Science, University of Auckland, New Zealand.
| | - Charles Semple
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| |
Collapse
|