1
|
PhyloFisher: A phylogenomic package for resolving eukaryotic relationships. PLoS Biol 2021; 19:e3001365. [PMID: 34358228 PMCID: PMC8345874 DOI: 10.1371/journal.pbio.3001365] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 07/15/2021] [Indexed: 11/19/2022] Open
Abstract
Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.
Collapse
|
2
|
Gulbrandsen ØS, Andresen IJ, Krabberød AK, Bråte J, Shalchian-Tabrizi K. Phylogenomic analysis restructures the ulvophyceae. JOURNAL OF PHYCOLOGY 2021; 57:1223-1233. [PMID: 33721355 DOI: 10.1111/jpy.13168] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 01/26/2021] [Accepted: 02/03/2021] [Indexed: 06/12/2023]
Abstract
Here, we present new transcriptome sequencing data from seven species of Dasycladales (Ulvophyceae) and a phylogenomic analysis of the Chlorophyta with a particular focus on Ulvophyceae. We have focused on a broad selection of green algal groups and carefully selected genes suitable for reconstructing deep eukaryote evolutionary histories. Increasing the taxon sampling of Dasycladales restructures the Ulvophyceae by identifying Dasycladales as closely related to Scotinosphaerales and Oltmannsiellopsidales. Contrary to previous studies, we do not find support for a close relationship between Dasycladales and a group with Cladophorales and Trentepohliales. Instead, the latter group is sister to the remainder of the Ulvophyceae. Furthermore, our analyses show high and consistent statistical support for a sister relationship between Bryopsidales and Chlorophyceae in trees generated with both homogeneous and heterogeneous (heterotachy) evolutionary models. Our study provides a new framework for interpreting the evolutionary history of Ulvophyceae and the evolution of cellular morphologies.
Collapse
Affiliation(s)
- Øyvind Saetren Gulbrandsen
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Kristine Bonnevies Hus, Blindernveien 31, 0316, Oslo, Norway
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Centre for Integrative Genetics, Norwegian University of Life Sciences, Ås, Norway
| | - Ina Jungersen Andresen
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Kristine Bonnevies Hus, Blindernveien 31, 0316, Oslo, Norway
| | - Anders Kristian Krabberød
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Kristine Bonnevies Hus, Blindernveien 31, 0316, Oslo, Norway
| | - Jon Bråte
- Section for Genetics and Evolutionary Biology (EVOGENE), Department of Biosciences, University of Oslo, Kristine Bonnevies Hus, Blindernveien 31, 0316, Oslo, Norway
- Department of Virology, Norwegian Institute of Public Health, Oslo, Norway
| | - Kamran Shalchian-Tabrizi
- Centre for Integrative Microbial Evolution (CIME), Centre for Epigenetics, Development and Evolution (CEDE), Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Kristine Bonnevies Hus, Blindernveien 31, 0316, Oslo, Norway
| |
Collapse
|
3
|
Cerón-Romero MA, Maurer-Alcalá XX, Grattepanche JD, Yan Y, Fonseca MM, Katz LA. PhyloToL: A Taxon/Gene-Rich Phylogenomic Pipeline to Explore Genome Evolution of Diverse Eukaryotes. Mol Biol Evol 2020; 36:1831-1842. [PMID: 31062861 PMCID: PMC6657734 DOI: 10.1093/molbev/msz103] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).
Collapse
Affiliation(s)
- Mario A Cerón-Romero
- Department of Biological Sciences, Smith College, Northampton, MA.,Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, MA
| | - Xyrus X Maurer-Alcalá
- Department of Biological Sciences, Smith College, Northampton, MA.,Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, MA.,Institute of Cell Biology, University of Bern, Bern, Switzerland
| | - Jean-David Grattepanche
- Department of Biological Sciences, Smith College, Northampton, MA.,Biology Department, Temple University, Philadelphia, PA
| | - Ying Yan
- Department of Biological Sciences, Smith College, Northampton, MA
| | - Miguel M Fonseca
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
| | - L A Katz
- Department of Biological Sciences, Smith College, Northampton, MA.,Program in Organismic and Evolutionary Biology, University of Massachusetts Amherst, Amherst, MA
| |
Collapse
|
4
|
Tekle KM, Gundersen S, Klepper K, Bongo LA, Raknes IA, Li X, Zhang W, Andreetta C, Mulugeta TD, Kalaš M, Rye MB, Hjerde E, Antony Samy JK, Fornous G, Azab A, Våge DI, Hovig E, Willassen NP, Drabløs F, Nygård S, Petersen K, Jonassen I. Norwegian e-Infrastructure for Life Sciences (NeLS). F1000Res 2018; 7:ELIXIR-968. [PMID: 30271575 PMCID: PMC6137412 DOI: 10.12688/f1000research.15119.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/13/2018] [Indexed: 12/26/2022] Open
Abstract
The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs). NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK. In this article, we outline the architecture of NeLS and discuss possible directions for further development.
Collapse
Affiliation(s)
- Kidane M. Tekle
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | | | - Kjetil Klepper
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Lars Ailo Bongo
- University of Tromsø - The Arctic University of Norway, Tromsø, Norway
| | | | - Xiaxi Li
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Wei Zhang
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Christian Andreetta
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Teshome Dagne Mulugeta
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Morten B. Rye
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Hjerde
- University of Tromsø - The Arctic University of Norway, Tromsø, Norway
| | - Jeevan Karloss Antony Samy
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | | | - Dag Inge Våge
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | | | - Finn Drabløs
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Kjell Petersen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
5
|
Krabberød AK, Orr RJS, Bråte J, Kristensen T, Bjørklund KR, Shalchian-Tabrizi K. Single Cell Transcriptomics, Mega-Phylogeny, and the Genetic Basis of Morphological Innovations in Rhizaria. Mol Biol Evol 2017; 34:1557-1573. [PMID: 28333264 PMCID: PMC5455982 DOI: 10.1093/molbev/msx075] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The innovation of the eukaryote cytoskeleton enabled phagocytosis, intracellular transport, and cytokinesis, and is largely responsible for the diversity of morphologies among eukaryotes. Still, the relationship between phenotypic innovations in the cytoskeleton and their underlying genotype is poorly understood. To explore the genetic mechanism of morphological evolution of the eukaryotic cytoskeleton, we provide the first single cell transcriptomes from uncultured, free-living unicellular eukaryotes: the polycystine radiolarian Lithomelissa setosa (Nassellaria) and Sticholonche zanclea (Taxopodida). A phylogenomic approach using 255 genes finds Radiolaria and Foraminifera as separate monophyletic groups (together as Retaria), while Cercozoa is shown to be paraphyletic where Endomyxa is sister to Retaria. Analysis of the genetic components of the cytoskeleton and mapping of the evolution of these on the revised phylogeny of Rhizaria reveal lineage-specific gene duplications and neofunctionalization of α and β tubulin in Retaria, actin in Retaria and Endomyxa, and Arp2/3 complex genes in Chlorarachniophyta. We show how genetic innovations have shaped cytoskeletal structures in Rhizaria, and how single cell transcriptomics can be applied for resolving deep phylogenies and studying gene evolution in uncultured protist species.
Collapse
Affiliation(s)
- Anders K Krabberød
- Department of Biosciences, Centre for Integrative Microbial Evolution (CIME) and Centre for Epigenetics Development and Evolution (CEDE), University of Oslo, Oslo, Norway
| | - Russell J S Orr
- Department of Biosciences, Centre for Integrative Microbial Evolution (CIME) and Centre for Epigenetics Development and Evolution (CEDE), University of Oslo, Oslo, Norway
| | - Jon Bråte
- Department of Biosciences, Centre for Integrative Microbial Evolution (CIME) and Centre for Epigenetics Development and Evolution (CEDE), University of Oslo, Oslo, Norway
| | - Tom Kristensen
- Department of Biosciences, Centre for Integrative Microbial Evolution (CIME) and Centre for Epigenetics Development and Evolution (CEDE), University of Oslo, Oslo, Norway
| | - Kjell R Bjørklund
- Department of Research and Collections, Natural History Museum, University of Oslo, Oslo, Norway
| | - Kamran Shalchian-Tabrizi
- Department of Biosciences, Centre for Integrative Microbial Evolution (CIME) and Centre for Epigenetics Development and Evolution (CEDE), University of Oslo, Oslo, Norway
| |
Collapse
|
6
|
Freeman MA, Fuss J, Kristmundsson Á, Bjorbækmo MF, Mangot JF, del Campo J, Keeling PJ, Shalchian-Tabrizi K, Bass D. X-Cells Are Globally Distributed, Genetically Divergent Fish Parasites Related to Perkinsids and Dinoflagellates. Curr Biol 2017; 27:1645-1651.e3. [DOI: 10.1016/j.cub.2017.04.045] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Revised: 03/07/2017] [Accepted: 04/21/2017] [Indexed: 11/17/2022]
|
7
|
Guy L. phyloSkeleton: taxon selection, data retrieval and marker identification for phylogenomics. Bioinformatics 2017; 33:1230-1232. [PMID: 28057682 PMCID: PMC5408842 DOI: 10.1093/bioinformatics/btw824] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 12/27/2016] [Indexed: 11/13/2022] Open
Abstract
Summary With the wealth of available genome sequences, a difficult and tedious part of inferring phylogenomic trees is now to select genomes with an appropriate taxon density in the different parts of the tree. The package described here offers tools to easily select the most representative organisms, following a set of simple rules based on taxonomy and assembly quality, to retrieve the genomes from public databases (NCBI, JGI), to annotate them if necessary, to identify given markers in these, and to prepare files for multiple sequence alignment. Availability and Implementation phyloSkeleton is a Perl module and is freely available under GPLv3 at https://bitbucket.org/lionelguy/phyloskeleton/ . Contact lionel.guy@imbim.uu.se.
Collapse
Affiliation(s)
- Lionel Guy
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- To whom correspondence should be addressed.
| |
Collapse
|