1
|
Kabir ER, Mustafa N, Nausheen N, Sharif Siam MK, Syed EU. Exploring existing drugs: proposing potential compounds in the treatment of COVID-19. Heliyon 2021; 7:e06284. [PMID: 33655082 PMCID: PMC7906017 DOI: 10.1016/j.heliyon.2021.e06284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 12/13/2020] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The COVID-19 situation had escalated into an unprecedented global crisis in just a few weeks. On the 30th of January 2020, World Health Organization officially declared the COVID-19 epidemic as a public health emergency of international concern. The confirmed cases were reported to exceed 105,856,046 globally, with the death toll of above 2,311,048, according to the dashboard from Johns Hopkins University on the 7th of February, 2021, though the actual figures may be much higher. Conserved regions of the South Asian strains were used to construct a phylogenetic tree to find evolutionary relationships among the novel virus. Off target similarities were searched with other microorganisms that have been previously reported using Basic Local Alignment Search Tool (BLAST). The conserved regions did not match with any previously reported microorganisms or viruses, which confirmed the novelty of SARS-CoV-2. Currently there is no approved drug for the prevention and treatment of COVID-19, but researchers globally are attempting to come up with one or more soon. Therapeutic strategies need to be addressed urgently to combat COVID-19. Successful drug repurposing is a tool that uses old and safe drugs, is time effective and requires lower development costs, and was thus considered for the study. Molecular docking was used for repurposing drugs from our own comprehensive database of approximately 300 highly characterized, existing drugs with known safety profile, to identify compounds that will inhibit the chosen molecular targets - SARS-CoV-2, ACE2, and TMPRSS2. The study has identified and proposed twenty seven candidates for further in vitro and in vivo studies for the treatment of SARS-CoV-2 infection.
Collapse
|
2
|
Cai L, Xi Z, Lemmon EM, Lemmon AR, Mast A, Buddenhagen CE, Liu L, Davis CC. The Perfect Storm: Gene Tree Estimation Error, Incomplete Lineage Sorting, and Ancient Gene Flow Explain the Most Recalcitrant Ancient Angiosperm Clade, Malpighiales. Syst Biol 2020; 70:491-507. [PMID: 33169797 DOI: 10.1093/sysbio/syaa083] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 10/20/2020] [Accepted: 10/28/2020] [Indexed: 12/20/2022] Open
Abstract
The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes 9 of the top 10 most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0$\%$, 34.8$\%$, and 21.4$\%$ of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution. [Coalescent; concatenation; flanking region; hybrid enrichment, introgression; phylogenomics; rapid radiation, triplet frequency.].
Collapse
Affiliation(s)
- Liming Cai
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Emily Moriarty Lemmon
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | - Austin Mast
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
| | - Christopher E Buddenhagen
- Department of Biological Sciences, Florida State University, Tallahassee, FL 32306, USA
- AgResearch, 10 Bisley Road, Hamilton 3214, New Zealand
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| |
Collapse
|
3
|
Zanne AE, Powell JR, Flores-Moreno H, Kiers ET, van 't Padje A, Cornwell WK. Finding fungal ecological strategies: Is recycling an option? FUNGAL ECOL 2020. [DOI: 10.1016/j.funeco.2019.100902] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
4
|
Franz NM, Musher LJ, Brown JW, Yu S, Ludäscher B. Verbalizing phylogenomic conflict: Representation of node congruence across competing reconstructions of the neoavian explosion. PLoS Comput Biol 2019; 15:e1006493. [PMID: 30768597 PMCID: PMC6395011 DOI: 10.1371/journal.pcbi.1006493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 02/28/2019] [Accepted: 09/10/2018] [Indexed: 11/24/2022] Open
Abstract
Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other.
Collapse
Affiliation(s)
- Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Lukas J. Musher
- Richard Gilder Graduate School and Department of Ornithology, American Museum of Natural History, New York, New York, United States of America
| | - Joseph W. Brown
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Shizhuo Yu
- Department of Computer Science, University of California at Davis, Davis, California, United States of America
| | - Bertram Ludäscher
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| |
Collapse
|
5
|
Jamil HM. Optimizing Phylogenetic Queries for Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1692-1705. [PMID: 28858810 DOI: 10.1109/tcbb.2017.2743706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The vast majority of phylogenetic databases do not support declarative querying using which their contents can be flexibly and conveniently accessed and the template based query interfaces they support do not allow arbitrary speculative queries. They therefore also do not support query optimization leveraging unique phylogeny properties. While a small number of graph query languages such as XQuery, Cypher, and GraphQL exist for computer savvy users, most are too general and complex to be useful for biologists, and too inefficient for large phylogeny querying. In this paper, we discuss a recently introduced visual query language, called PhyQL, that leverages phylogeny specific properties to support essential and powerful constructs for a large class of phylogentic queries. We develop a range of pruning aids, and propose a substantial set of query optimization strategies using these aids suitable for large phylogeny querying. A hybrid optimization technique that exploits a set of indices and "graphlet" partitioning is discussed. A "fail soonest" strategy is used to avoid hopeless processing and is shown to produce dividends. Possible novel optimization techniques yet to be explored are also discussed.
Collapse
|
6
|
Smith SA, Brown JW. Constructing a broadly inclusive seed plant phylogeny. AMERICAN JOURNAL OF BOTANY 2018; 105:302-314. [PMID: 29746720 DOI: 10.1002/ajb2.1019] [Citation(s) in RCA: 349] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 10/19/2017] [Indexed: 05/03/2023]
Abstract
PREMISE OF THE STUDY Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. METHODS We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. KEY RESULTS The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants. CONCLUSIONS This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| |
Collapse
|
7
|
Chesters D. Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling. Syst Biol 2018; 66:426-439. [PMID: 27798407 DOI: 10.1093/sysbio/syw099] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Accepted: 10/18/2016] [Indexed: 12/31/2022] Open
Abstract
Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications. [Data integration; data mining; insects; phylogenomics; phyloinformatics; tree of life.].
Collapse
Affiliation(s)
- Douglas Chesters
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
8
|
Antonelli A, Hettling H, Condamine FL, Vos K, Nilsson RH, Sanderson MJ, Sauquet H, Scharn R, Silvestro D, Töpel M, Bacon CD, Oxelman B, Vos RA. Toward a Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa. Syst Biol 2018; 66:152-166. [PMID: 27616324 PMCID: PMC5410925 DOI: 10.1093/sysbio/syw066] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 07/19/2016] [Indexed: 01/06/2023] Open
Abstract
Rapidly growing biological data—including molecular sequences and fossils—hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a “Dated Tree of Life” where all node ages are directly comparable.
Collapse
Affiliation(s)
- Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden.,Gothenburg Botanical Garden, Carl Skottsbergs Gata 22A, SE-41319 Göteborg, Sweden
| | - Hannes Hettling
- Naturalis Biodiversity Center, Darwinweg 4, 2333 CR Leiden, The Netherlands
| | - Fabien L Condamine
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden.,CNRS, UMR 5554 Institut des Sciences de l'Evolution (Université de Montpellier), Place Eugéne Bataillon, 34095 Montpellier, France
| | - Karin Vos
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden
| | - R Henrik Nilsson
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden
| | - Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell, Tucson, AZ 85721, USA
| | - Hervé Sauquet
- Université Paris-Sud, Laboratoire Écologie, Systématique, Évolution, CNRS UMR 8079, 91405 Orsay, France
| | - Ruud Scharn
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden
| | - Daniele Silvestro
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Mats Töpel
- Swedish Bioinformatics Infrastructure for Life Sciences, Department of Biological and Environmental Sciences, University of Gothenburg, Box 463, SE-405 30, Göteborg, Sweden.,Department of Marine Sciences, University of Gothenburg, Box 460, SE-405 30 Göteborg, Sweden
| | - Christine D Bacon
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden
| | - Bengt Oxelman
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Göteborg, Sweden
| | - Rutger A Vos
- Naturalis Biodiversity Center, Darwinweg 4, 2333 CR Leiden, The Netherlands
| |
Collapse
|
9
|
Tripp EA, Zhang N, Schneider H, Huang Y, Mueller GM, Hu Z, Häggblom M, Bhattacharya D. Reshaping Darwin's Tree: Impact of the Symbiome. Trends Ecol Evol 2017; 32:552-555. [PMID: 28601483 DOI: 10.1016/j.tree.2017.05.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Revised: 02/10/2017] [Accepted: 05/06/2017] [Indexed: 12/30/2022]
Abstract
Much of the undescribed biodiversity on Earth is microbial, often in mutualistic or pathogenic associations. Physically associated and coevolving life forms comprise a symbiome. We propose that systematics research can accelerate progress in science by introducing a new framework for phylogenetic analysis of symbiomes, here termed SYMPHY (symbiome phylogenetics).
Collapse
Affiliation(s)
- Erin A Tripp
- Department of Ecology and Evolutionary Biology and Museum of Natural History, University of Colorado, Boulder, Colorado, USA
| | - Ning Zhang
- Department of Plant Biology, Rutgers University, New Brunswick, New Jersey, USA; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, USA
| | - Harald Schneider
- Department of Ecology, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China; Department of Life Sciences, Natural History Museum, London, UK
| | - Ying Huang
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | | | - Zhihong Hu
- State Key Laboratory for Virology, Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China
| | - Max Häggblom
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, USA
| | - Debashish Bhattacharya
- Department of Ecology, Evolution and Natural Resources and Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, USA.
| |
Collapse
|
10
|
Das JK, Pal Choudhury P. Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach. PLoS One 2017; 12:e0175031. [PMID: 28362850 PMCID: PMC5376323 DOI: 10.1371/journal.pone.0175031] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 03/20/2017] [Indexed: 11/19/2022] Open
Abstract
Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Applied Statistics Unit, Indian Statistical Institute, 203 B.T Road, Kolkata-700108, West Bengal, India
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, 203 B.T Road, Kolkata-700108, West Bengal, India
| |
Collapse
|
11
|
Deng Y, Fernández-Baca D. An efficient algorithm for testing the compatibility of phylogenies with nested taxa. Algorithms Mol Biol 2017; 12:7. [PMID: 28331536 PMCID: PMC5356459 DOI: 10.1186/s13015-017-0099-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 03/04/2017] [Indexed: 11/23/2022] Open
Abstract
Background Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {P}$$\end{document}P of semi-labeled trees over various subsets of a set of taxa. The ancestral compatibility problem asks whether there is a semi-labeled tree that respects the clusterings and the ancestor/descendant relationships implied by the trees in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {P}$$\end{document}P. The running time and space usage of the best previous algorithm for testing ancestral compatibility depend on the degrees of the nodes in the trees in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {P}$$\end{document}P. Results We give a algorithm for the ancestral compatibility problem that runs in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(M_{\mathcal {P}}\log ^2 M_{\mathcal {P}})$$\end{document}O(MPlog2MP) time and uses \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(M_{\mathcal {P}})$$\end{document}O(MP) space, where \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$M_{\mathcal {P}}$$\end{document}MP is the total number of nodes and edges in the trees in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {P}$$\end{document}P. Conclusions Taxonomies enable researchers to expand greatly the taxonomic coverage of their phylogenetic analyses. The running time of our method does not depend on the degrees of the nodes in the trees in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathcal {P}$$\end{document}P. This characteristic is important when taxonomies—which can have nodes of high degree—are used.
Collapse
|
12
|
Das JK, Das P, Ray KK, Choudhury PP, Jana SS. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids. PLoS One 2016; 11:e0167651. [PMID: 27930687 PMCID: PMC5145171 DOI: 10.1371/journal.pone.0167651] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 11/17/2016] [Indexed: 01/08/2023] Open
Abstract
Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.
Collapse
Affiliation(s)
- Jayanta Kumar Das
- Applied Statistics Unit, Indian Statistical Institute, 203 B.T Road, Kolkata-700108, West Bengal, India
| | - Provas Das
- Department of Biological Chemistry, Indian Association for the Cultivation of Science, 2A & 2B Raja S. C. Mullick Road, Kolkata-700032, West Bengal, India
| | - Korak Kumar Ray
- Department of Chemistry, Indian Institute of Technology-Bombay, IIT Bombay, Powai, Mumbai-400076, Maharashtra, India
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, 203 B.T Road, Kolkata-700108, West Bengal, India
| | - Siddhartha Sankar Jana
- Department of Biological Chemistry, Indian Association for the Cultivation of Science, 2A & 2B Raja S. C. Mullick Road, Kolkata-700032, West Bengal, India
| |
Collapse
|
13
|
Pennell MW, FitzJohn RG, Cornwell WK. A simple approach for maximizing the overlap of phylogenetic and comparative data. Methods Ecol Evol 2016. [DOI: 10.1111/2041-210x.12517] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Matthew W. Pennell
- Institute for Bioinformatics and Evolutionary Studies; University of Idaho; Moscow ID 83844 USA
- Department of Zoology; Biodiversity Research Centre; University of British Columbia; Vancouver BC V6T 1Z4 Canada
| | - Richard G. FitzJohn
- Department of Biological Sciences; Macquarie University; Sydney NSW 2109 Australia
| | - William K. Cornwell
- Ecology and Evolution Research Centre; School of Biological, Earth and Environmental Sciences; University of New South Wales; Sydney NSW 2052 Australia
- Centre for Ecosystem Science; School of Biological, Earth and Environmental Sciences; University of New South Wales; Sydney NSW 2052 Australia
| |
Collapse
|
14
|
Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. Mol Phylogenet Evol 2016; 94:447-62. [DOI: 10.1016/j.ympev.2015.10.027] [Citation(s) in RCA: 265] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
15
|
Hinchliff CE, Smith SA, Allman JF, Burleigh JG, Chaudhary R, Coghill LM, Crandall KA, Deng J, Drew BT, Gazis R, Gude K, Hibbett DS, Katz LA, Laughinghouse HD, McTavish EJ, Midford PE, Owen CL, Ree RH, Rees JA, Soltis DE, Williams T, Cranston KA. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc Natl Acad Sci U S A 2015; 112:12764-9. [PMID: 26385966 PMCID: PMC4611642 DOI: 10.1073/pnas.1423041112] [Citation(s) in RCA: 372] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
Collapse
Affiliation(s)
- Cody E Hinchliff
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Stephen A Smith
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109;
| | | | | | - Ruchi Chaudhary
- Department of Biology, University of Florida, Gainesville, FL 32611
| | | | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147
| | - Jiabin Deng
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Bryan T Drew
- Department of Biology, University of Nebraska-Kearney, Kearney, NE 68849
| | - Romina Gazis
- Department of Biology, Clark University, Worcester, MA 01610
| | - Karl Gude
- School of Journalism, Michigan State University, East Lansing, MI 48824
| | - David S Hibbett
- Department of Biology, Clark University, Worcester, MA 01610
| | - Laura A Katz
- Biological Science, Clark Science Center, Smith College, Northampton, MA 01063
| | | | - Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | | | | | | | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705
| | - Douglas E Soltis
- Department of Biology, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
| | - Tiffani Williams
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705;
| |
Collapse
|
16
|
Owen CL, Bracken-Grissom H, Stern D, Crandall KA. A synthetic phylogeny of freshwater crayfish: insights for conservation. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140009. [PMID: 25561670 DOI: 10.1098/rstb.2014.0009] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Phylogenetic systematics is heading for a renaissance where we shift from considering our phylogenetic estimates as a static image in a published paper and taxonomies as a hardcopy checklist to treating both the phylogenetic estimate and dynamic taxonomies as metadata for further analyses. The Open Tree of Life project (opentreeoflife.org) is developing synthesis tools for harnessing the power of phylogenetic inference and robust taxonomy to develop a synthetic tree of life. We capitalize on this approach to estimate a synthesis tree for the freshwater crayfish. The crayfish make an exceptional group to demonstrate the utility of the synthesis approach, as there recently have been a number of phylogenetic studies on the crayfishes along with a robust underlying taxonomic framework. Importantly, the crayfish have also been extensively assessed by an IUCN Red List team and therefore have accurate and up-to-date area and conservation status data available for analysis within a phylogenetic context. Here, we develop a synthesis phylogeny for the world's freshwater crayfish and examine the phylogenetic distribution of threat. We also estimate a molecular phylogeny based on all available GenBank crayfish sequences and use this tree to estimate divergence times and test for divergence rate variation. Finally, we conduct EDGE and HEDGE analyses and identify a number of species of freshwater crayfish of highest priority in conservation efforts.
Collapse
Affiliation(s)
- Christopher L Owen
- Computational Biology Institute, George Washington University, Ashburn, VA 20132, USA
| | - Heather Bracken-Grissom
- Department of Biological Sciences, Florida International University, Biscayne Bay Campus, North Miami, FL 33181, USA
| | - David Stern
- Computational Biology Institute, George Washington University, Ashburn, VA 20132, USA
| | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20132, USA Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA
| |
Collapse
|
17
|
Liu L, Xi Z, Wu S, Davis CC, Edwards SV. Estimating phylogenetic trees from genome-scale data. Ann N Y Acad Sci 2015; 1360:36-53. [DOI: 10.1111/nyas.12747] [Citation(s) in RCA: 129] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Liang Liu
- Department of Statistics; University of Georgia; Athens Georgia
- Institute of Bioinformatics; University of Georgia; Athens Georgia
| | - Zhenxiang Xi
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Shaoyuan Wu
- Department of Biochemistry and Molecular Biology & Tianjin Key Laboratory of Medical Epigenetics, School of Basic Medical Sciences; Tianjin Medical University; Tianjin China
| | - Charles C. Davis
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology; Harvard University; Cambridge Massachusetts
| |
Collapse
|
18
|
Building the avian tree of life using a large-scale, sparse supermatrix. Mol Phylogenet Evol 2015; 84:53-63. [DOI: 10.1016/j.ympev.2014.12.003] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Revised: 12/03/2014] [Accepted: 12/05/2014] [Indexed: 11/20/2022]
|
19
|
Franz NM, Chen M, Yu S, Kianmajd P, Bowers S, Ludäscher B. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS One 2015; 10:e0118247. [PMID: 25700173 PMCID: PMC4336294 DOI: 10.1371/journal.pone.0118247] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 01/02/2015] [Indexed: 11/19/2022] Open
Abstract
Classifications and phylogenetic inferences of organismal groups change in light of new insights. Over time these changes can result in an imperfect tracking of taxonomic perspectives through the re-/use of Code-compliant or informal names. To mitigate these limitations, we introduce a novel approach for aligning taxonomies through the interaction of human experts and logic reasoners. We explore the performance of this approach with the Perelleschus use case of Franz & Cardona-Duque (2013). The use case includes six taxonomies published from 1936 to 2013, 54 taxonomic concepts (i.e., circumscriptions of names individuated according to their respective source publications), and 75 expert-asserted Region Connection Calculus articulations (e.g., congruence, proper inclusion, overlap, or exclusion). An Open Source reasoning toolkit is used to analyze 13 paired Perelleschus taxonomy alignments under heterogeneous constraints and interpretations. The reasoning workflow optimizes the logical consistency and expressiveness of the input and infers the set of maximally informative relations among the entailed taxonomic concepts. The latter are then used to produce merge visualizations that represent all congruent and non-congruent taxonomic elements among the aligned input trees. In this small use case with 6-53 input concepts per alignment, the information gained through the reasoning process is on average one order of magnitude greater than in the input. The approach offers scalable solutions for tracking provenance among succeeding taxonomic perspectives that may have differential biases in naming conventions, phylogenetic resolution, ingroup and outgroup sampling, or ostensive (member-referencing) versus intensional (property-referencing) concepts and articulations.
Collapse
Affiliation(s)
- Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Mingmin Chen
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Shizhuo Yu
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Parisa Kianmajd
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| | - Shawn Bowers
- Department of Computer Science, Gonzaga University, Spokane, Washington, United States of America
| | - Bertram Ludäscher
- Department of Computer Science, University of California Davis, Davis, California, United States of America
| |
Collapse
|
20
|
Hinchliff CE, Smith SA. Some limitations of public sequence data for phylogenetic inference (in plants). PLoS One 2014; 9:e98986. [PMID: 24999823 PMCID: PMC4085032 DOI: 10.1371/journal.pone.0098986] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 05/09/2014] [Indexed: 11/24/2022] Open
Abstract
The GenBank database contains essentially all of the nucleotide sequence data generated for published molecular systematic studies, but for the majority of taxa these data remain sparse. GenBank has value for phylogenetic methods that leverage data–mining and rapidly improving computational methods, but the limits imposed by the sparse structure of the data are not well understood. Here we present a tree representing 13,093 land plant genera—an estimated 80% of extant plant diversity—to illustrate the potential of public sequence data for broad phylogenetic inference in plants, and we explore the limits to inference imposed by the structure of these data using theoretical foundations from phylogenetic data decisiveness. We find that despite very high levels of missing data (over 96%), the present data retain the potential to inform over 86.3% of all possible phylogenetic relationships. Most of these relationships, however, are informed by small amounts of data—approximately half are informed by fewer than four loci, and more than 99% are informed by fewer than fifteen. We also apply an information theoretic measure of branch support to assess the strength of phylogenetic signal in the data, revealing many poorly supported branches concentrated near the tips of the tree, where data are sparse and the limiting effects of this sparseness are stronger. We argue that limits to phylogenetic inference and signal imposed by low data coverage may pose significant challenges for comprehensive phylogenetic inference at the species level. Computational requirements provide additional limits for large reconstructions, but these may be overcome by methodological advances, whereas insufficient data coverage can only be remedied by additional sampling effort. We conclude that public databases have exceptional value for modern systematics and evolutionary biology, and that a continued emphasis on expanding taxonomic and genomic coverage will play a critical role in developing these resources to their full potential.
Collapse
Affiliation(s)
- Cody E. Hinchliff
- Department of Ecology and Evolutionary Biology, University of Michigan. Ann Arbor, Michigan, United States of America
- * E-mail:
| | - Stephen Andrew Smith
- Department of Ecology and Evolutionary Biology, University of Michigan. Ann Arbor, Michigan, United States of America
| |
Collapse
|