1
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. Proc Natl Acad Sci U S A 2025; 122:e2500553122. [PMID: 40314967 DOI: 10.1073/pnas.2500553122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Accepted: 03/31/2025] [Indexed: 05/03/2025] Open
Abstract
Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging due to the limited automation, substantial domain expertise, and computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully automated pipeline to infer species trees starting from raw genome assemblies. In contrast to the prominent approach, ROADIES incorporates a unique strategy of randomly sampling segments of the input genomes to generate gene trees. This eliminates the need for predefining a set of loci, limiting the analyses to a fixed number of genes, and performing the cumbersome gene annotation and/or whole genome alignment steps. ROADIES also eliminates the need to infer orthology by leveraging existing discordance-aware methods that allow multicopy genes. Using the genomic datasets from large-scale sequencing efforts across four diverse life forms (placental mammals, pomace flies, birds, and budding yeasts), we show that ROADIES infers species trees that are comparable in quality to the state-of-the-art studies but in a fraction of the time and effort, including on challenging datasets with rampant gene tree discordance and complex polyploidy. With its speed, accuracy, and automation, ROADIES has the potential to vastly simplify species tree inference, making it accessible to a broader range of scientists and applications.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, CA 92093
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego, CA 92093
| |
Collapse
|
2
|
Schiffer PH, Natsidis P, Leite DJ, Robertson HE, Lapraz F, Marlétaz F, Fromm B, Baudry L, Simpson F, Høye E, Zakrzewski AC, Kapli P, Hoff KJ, Müller S, Marbouty M, Marlow H, Copley RR, Koszul R, Sarkies P, Telford MJ. Insights into early animal evolution from the genome of the xenacoelomorph worm Xenoturbella bocki. eLife 2024; 13:e94948. [PMID: 39109482 PMCID: PMC11521371 DOI: 10.7554/elife.94948] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 07/03/2024] [Indexed: 10/30/2024] Open
Abstract
The evolutionary origins of Bilateria remain enigmatic. One of the more enduring proposals highlights similarities between a cnidarian-like planula larva and simple acoel-like flatworms. This idea is based in part on the view of the Xenacoelomorpha as an outgroup to all other bilaterians which are themselves designated the Nephrozoa (protostomes and deuterostomes). Genome data can provide important comparative data and help understand the evolution and biology of enigmatic species better. Here, we assemble and analyze the genome of the simple, marine xenacoelomorph Xenoturbella bocki, a key species for our understanding of early bilaterian evolution. Our highly contiguous genome assembly of X. bocki has a size of ~111 Mbp in 18 chromosome-like scaffolds, with repeat content and intron, exon, and intergenic space comparable to other bilaterian invertebrates. We find X. bocki to have a similar number of genes to other bilaterians and to have retained ancestral metazoan synteny. Key bilaterian signaling pathways are also largely complete and most bilaterian miRNAs are present. Overall, we conclude that X. bocki has a complex genome typical of bilaterians, which does not reflect the apparent simplicity of its body plan that has been so important to proposals that the Xenacoelomorpha are the simple sister group of the rest of the Bilateria.
Collapse
Affiliation(s)
- Philipp H Schiffer
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
- worm~lab, Institute of Zoology, University of CologneCologneGermany
| | - Paschalis Natsidis
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - Daniel J Leite
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
- Department of Biosciences, Durham UniversityDurhamUnited Kingdom
| | - Helen E Robertson
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - François Lapraz
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
- Université Côte D'Azur, CNRS, Inserm, iBVNiceFrance
| | - Ferdinand Marlétaz
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - Bastian Fromm
- The Arctic University Museum of Norway, UiT – The Arctic University of NorwayTromsøNorway
| | - Liam Baudry
- Collège Doctoral, Sorbonne UniversitéParisFrance
| | - Fraser Simpson
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - Eirik Høye
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University HospitalOsloNorway
- Institute of Clinical Medicine, Medical Faculty, University of OsloOsloNorway
| | - Anne C Zakrzewski
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity ScienceBerlinGermany
| | - Paschalia Kapli
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| | - Katharina J Hoff
- University of Greifswald, Institute for Mathematics and Computer ScienceGreifswaldGermany
- University of Greifswald, Center for Functional Genomics of MicrobesGreifswaldGermany
| | - Steven Müller
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
- Royal Brompton Hospital, Guy's and St Thomas' NHS Foundation TrustLondonUnited Kingdom
| | - Martial Marbouty
- Institut Pasteur, Université de Paris, CNRS UMR3525, Unité Régulation Spatiale des GénomesParisFrance
| | - Heather Marlow
- The University of Chicago, Division of Biological SciencesChicagoUnited States
| | - Richard R Copley
- Laboratoire de Biologie du Développement de Villefranche-sur-mer (LBDV), Sorbonne UniversiteVillefranche-sur-merFrance
| | - Romain Koszul
- Institut Pasteur, Université de Paris, CNRS UMR3525, Unité Régulation Spatiale des GénomesParisFrance
| | - Peter Sarkies
- Department of Biochemistry, University of OxfordOxfordUnited Kingdom
| | - Maximilian J Telford
- Center for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College LondonLondonUnited Kingdom
| |
Collapse
|
3
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
4
|
Glez-Peña D, López-Fernández H, Duque P, Vieira CP, Vieira J. Inferences on the evolution of the ascorbic acid synthesis pathway in insects using Phylogenetic Tree Collapser (PTC), a tool for the automated collapsing of phylogenetic trees using taxonomic information. J Integr Bioinform 2024; 21:jib-2023-0051. [PMID: 39054685 PMCID: PMC11377030 DOI: 10.1515/jib-2023-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 06/05/2024] [Indexed: 07/27/2024] Open
Abstract
When inferring the evolution of a gene/gene family, it is advisable to use all available coding sequences (CDS) from as many species genomes as possible in order to infer and date all gene duplications and losses. Nowadays, this means using hundreds or even thousands of CDSs, which makes the inferred phylogenetic trees difficult to visualize and interpret. Therefore, it is useful to have an automated way of collapsing large phylogenetic trees according to a taxonomic term decided by the user (family, class, or order, for instance), in order to highlight the minimal set of sequences that should be used to recapitulate the full history of the gene/gene family being studied at that taxonomic level, that can be refined using additional software. Here we present the Phylogenetic Tree Collapser (PTC) program (https://github.com/pegi3s/phylogenetic-tree-collapser), a flexible tool for automated tree collapsing using taxonomic information, that can be easily used by researchers without a background in informatics, since it only requires the installation of Docker, Podman or Singularity. The utility of PTC is demonstrated by addressing the evolution of the ascorbic acid synthesis pathway in insects. A Docker image is available at Docker Hub (https://hub.docker.com/r/pegi3s/phylogenetic-tree-collapser) with PTC installed and ready-to-run.
Collapse
Affiliation(s)
- Daniel Glez-Peña
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
- CINBIO: Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Hugo López-Fernández
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
- CINBIO: Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Pedro Duque
- Instituto de Investigação e Inovação em Saúde (I3S), 26706 Universidade do Porto , Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
- Instituto de Ciências Biomédicas Abel Salazar (ICBAS), 26706 Universidade do Porto , Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal
- Faculdade de Ciências da Universidade do Porto (FCUP), Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal
| | - Cristina P Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), 26706 Universidade do Porto , Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
| | - Jorge Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), 26706 Universidade do Porto , Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
| |
Collapse
|
5
|
Gàlvez-Morante A, Guéguen L, Natsidis P, Telford MJ, Richter DJ. Dollo Parsimony Overestimates Ancestral Gene Content Reconstructions. Genome Biol Evol 2024; 16:evae062. [PMID: 38518756 PMCID: PMC10995720 DOI: 10.1093/gbe/evae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024] Open
Abstract
Ancestral reconstruction is a widely used technique that has been applied to understand the evolutionary history of gain and loss of gene families. Ancestral gene content can be reconstructed via different phylogenetic methods, but many current and previous studies employ Dollo parsimony. We hypothesize that Dollo parsimony is not appropriate for ancestral gene content reconstruction inferences based on sequence homology, as Dollo parsimony is derived from the assumption that a complex character cannot be regained. This premise does not accurately model molecular sequence evolution, in which false orthology can result from sequence convergence or lateral gene transfer. The aim of this study is to test Dollo parsimony's suitability for ancestral gene content reconstruction and to compare its inferences with a maximum likelihood-based approach that allows a gene family to be gained more than once within a tree. We first compared the performance of the two approaches on a series of artificial data sets each of 5,000 genes that were simulated according to a spectrum of evolutionary rates without gene gain or loss, so that inferred deviations from the true gene count would arise only from errors in orthology inference and ancestral reconstruction. Next, we reconstructed protein domain evolution on a phylogeny representing known eukaryotic diversity. We observed that Dollo parsimony produced numerous ancestral gene content overestimations, especially at nodes closer to the root of the tree. These observations led us to the conclusion that, confirming our hypothesis, Dollo parsimony is not an appropriate method for ancestral reconstruction studies based on sequence homology.
Collapse
Affiliation(s)
- Alex Gàlvez-Morante
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Laurent Guéguen
- LBBE, UMR 5558, CNRS, Université Claude Bernard Lyon 1, Villeurbanne 69622, France
| | - Paschalis Natsidis
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Daniel J Richter
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| |
Collapse
|
6
|
Domazet-Lošo M, Široki T, Šimičević K, Domazet-Lošo T. Macroevolutionary dynamics of gene family gain and loss along multicellular eukaryotic lineages. Nat Commun 2024; 15:2663. [PMID: 38531970 DOI: 10.1038/s41467-024-47017-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
The gain and loss of genes fluctuate over evolutionary time in major eukaryotic clades. However, the full profile of these macroevolutionary trajectories is still missing. To give a more inclusive view on the changes in genome complexity across the tree of life, here we recovered the evolutionary dynamics of gene family gain and loss ranging from the ancestor of cellular organisms to 352 eukaryotic species. We show that in all considered lineages the gene family content follows a common evolutionary pattern, where the number of gene families reaches the highest value at a major evolutionary and ecological transition, and then gradually decreases towards extant organisms. This supports theoretical predictions and suggests that the genome complexity is often decoupled from commonly perceived organismal complexity. We conclude that simplification by gene family loss is a dominant force in Phanerozoic genomes of various lineages, probably underpinned by intense ecological specializations and functional outsourcing.
Collapse
Affiliation(s)
- Mirjana Domazet-Lošo
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia.
| | - Tin Široki
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Korina Šimičević
- Department of Applied Computing, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10000, Zagreb, Croatia
| | - Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
- School of Medicine, Catholic University of Croatia, Ilica 242, HR-10000, Zagreb, Croatia.
| |
Collapse
|
7
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree. Nat Biotechnol 2024; 42:139-147. [PMID: 37081138 PMCID: PMC10791578 DOI: 10.1038/s41587-023-01753-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 03/16/2023] [Indexed: 04/22/2023]
Abstract
Current methods for inference of phylogenetic trees require running complex pipelines at substantial computational and labor costs, with additional constraints in sequencing coverage, assembly and annotation quality, especially for large datasets. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes and bypasses traditional steps in phylogeny inference, such as genome assembly, annotation and all-versus-all sequence comparisons, while retaining accuracy. In a benchmark encompassing a broad variety of datasets, Read2Tree is 10-100 times faster than assembly-based approaches and in most cases more accurate-the exception being when sequencing coverage is high and reference species very distant. Here, to illustrate the broad applicability of the tool, we reconstruct a yeast tree of life of 435 species spanning 590 million years of evolution. We also apply Read2Tree to >10,000 Coronaviridae samples, accurately classifying highly diverse animal samples and near-identical severe acute respiratory syndrome coronavirus 2 sequences on a single tree. The speed, accuracy and versatility of Read2Tree enable comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computer Science, ETH, Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computer Science, University College London, London, UK.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.
| |
Collapse
|
8
|
Kapli P, Kotari I, Telford MJ, Goldman N, Yang Z. DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies. Syst Biol 2023; 72:1119-1135. [PMID: 37366056 PMCID: PMC10627555 DOI: 10.1093/sysbio/syad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Indexed: 06/28/2023] Open
Abstract
Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.
Collapse
Affiliation(s)
- Paschalia Kapli
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| | - Ioanna Kotari
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, 1210, Austria
| | - Maximilian J Telford
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| | - Nick Goldman
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ziheng Yang
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
9
|
Yadav IS, Rawat N, Chhuneja P, Kaur S, Uauy C, Lazo G, Gu YQ, Doležel J, Tiwari VK. Comparative genomic analysis of 5M g chromosome of Aegilops geniculata and 5U u chromosome of Aegilops umbellulata reveal genic diversity in the tertiary gene pool. FRONTIERS IN PLANT SCIENCE 2023; 14:1144000. [PMID: 37521926 PMCID: PMC10373596 DOI: 10.3389/fpls.2023.1144000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 06/22/2023] [Indexed: 08/01/2023]
Abstract
Wheat is one of the most important cereal crops for the global food security. Due to its narrow genetic base, modern bread wheat cultivars face challenges from increasing abiotic and biotic stresses. Since genetic improvement is the most sustainable approach, finding novel genes and alleles is critical for enhancing the genetic diversity of wheat. The tertiary gene pool of wheat is considered a gold mine for genetic diversity as novel genes and alleles can be identified and transferred to wheat cultivars. Aegilops geniculata and Ae. umbellulata are the key members of the tertiary gene pool of wheat and harbor important genes against abiotic and biotic stresses. Homoeologous-group five chromosomes (5Uu and 5Mg) have been extensively studied from Ae. geniculata and Ae. umbellulata as they harbor several important genes including Lr57, Lr76, Yr40, Yr70, Sr53 and chromosomal pairing loci. In the present study, using chromosome DNA sequencing and RNAseq datasets, we performed comparative analysis to study homoeologous gene evolution in 5Mg, 5Uu, and group 5 wheat chromosomes. Our findings highlight the diversity of transcription factors and resistance genes, resulting from the differential expansion of the gene families. Both the chromosomes were found to be enriched with the "response to stimulus" category of genes providing resistance against biotic and abiotic stress. Phylogenetic study positioned the M genome closer to the D genome, with higher proximity to the A genome than the B genome. Over 4000 genes were impacted by SNPs on 5D, with 4-5% of those genes displaying non-disruptive variations that affect gene function.
Collapse
Affiliation(s)
- Inderjit S. Yadav
- Department of Plant Sciences and Landscape Architecture, University of Maryland, College Park, MD, United States
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India
| | - Nidhi Rawat
- Department of Plant Sciences and Landscape Architecture, University of Maryland, College Park, MD, United States
| | - Parveen Chhuneja
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India
| | - Satinder Kaur
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India
| | | | - Gerard Lazo
- Agricultural Research Service, United States Department of Agriculture (USDA), Albany, CA, United States
| | - Yong Q. Gu
- Agricultural Research Service, United States Department of Agriculture (USDA), Albany, CA, United States
| | - Jaroslav Doležel
- Centre of Plant Structural and Functional Genomics, Institute of Experimental Botany, Olomouc, Czechia
| | - Vijay K. Tiwari
- Department of Plant Sciences and Landscape Architecture, University of Maryland, College Park, MD, United States
| |
Collapse
|
10
|
Glick L, Mayrose I. The Effect of Methodological Considerations on the Construction of Gene-Based Plant Pan-genomes. Genome Biol Evol 2023; 15:evad121. [PMID: 37401440 PMCID: PMC10340445 DOI: 10.1093/gbe/evad121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 06/21/2023] [Accepted: 06/28/2023] [Indexed: 07/05/2023] Open
Abstract
Pan-genomics is an emerging approach for studying the genetic diversity within plant populations. In contrast to common resequencing studies that compare whole genome sequencing data with a single reference genome, the construction of a pan-genome (PG) involves the direct comparison of multiple genomes to one another, thereby enabling the detection of genomic sequences and genes not present in the reference, as well as the analysis of gene content diversity. Although multiple studies describing PGs of various plant species have been published in recent years, a better understanding regarding the effect of the computational procedures used for PG construction could guide researchers in making more informed methodological decisions. Here, we examine the effect of several key methodological factors on the obtained gene pool and on gene presence-absence detections by constructing and comparing multiple PGs of Arabidopsis thaliana and cultivated soybean, as well as conducting a meta-analysis on published PGs. These factors include the construction method, the sequencing depth, and the extent of input data used for gene annotation. We observe substantial differences between PGs constructed using three common procedures (de novo assembly and annotation, map-to-pan, and iterative assembly) and that results are dependent on the extent of the input data. Specifically, we report low agreement between the gene content inferred using different procedures and input data. Our results should increase the awareness of the community to the consequences of methodological decisions made during the process of PG construction and emphasize the need for further investigation of commonly applied methodologies.
Collapse
Affiliation(s)
- Lior Glick
- Department of Life Sciences, School of Plant Sciences and Food Security, Tel-Aviv University, Tel Aviv, Israel
| | - Itay Mayrose
- Department of Life Sciences, School of Plant Sciences and Food Security, Tel-Aviv University, Tel Aviv, Israel
| |
Collapse
|
11
|
Ruperti F, Papadopoulos N, Musser JM, Mirdita M, Steinegger M, Arendt D. Cross-phyla protein annotation by structural prediction and alignment. Genome Biol 2023; 24:113. [PMID: 37173746 PMCID: PMC10176882 DOI: 10.1186/s13059-023-02942-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 04/18/2023] [Indexed: 05/15/2023] Open
Abstract
BACKGROUND Protein annotation is a major goal in molecular biology, yet experimentally determined knowledge is typically limited to a few model organisms. In non-model species, the sequence-based prediction of gene orthology can be used to infer protein identity; however, this approach loses predictive power at longer evolutionary distances. Here we propose a workflow for protein annotation using structural similarity, exploiting the fact that similar protein structures often reflect homology and are more conserved than protein sequences. RESULTS We propose a workflow of openly available tools for the functional annotation of proteins via structural similarity (MorF: MorphologFinder) and use it to annotate the complete proteome of a sponge. Sponges are highly relevant for inferring the early history of animals, yet their proteomes remain sparsely annotated. MorF accurately predicts the functions of proteins with known homology in [Formula: see text] cases and annotates an additional [Formula: see text] of the proteome beyond standard sequence-based methods. We uncover new functions for sponge cell types, including extensive FGF, TGF, and Ephrin signaling in sponge epithelia, and redox metabolism and control in myopeptidocytes. Notably, we also annotate genes specific to the enigmatic sponge mesocytes, proposing they function to digest cell walls. CONCLUSIONS Our work demonstrates that structural similarity is a powerful approach that complements and extends sequence similarity searches to identify homologous proteins over long evolutionary distances. We anticipate this will be a powerful approach that boosts discovery in numerous -omics datasets, especially for non-model organisms.
Collapse
Affiliation(s)
- Fabian Ruperti
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Collaboration for joint Ph.D. degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Nikolaos Papadopoulos
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Department for Evolutionary Biology, University of Vienna, Vienna, Austria
| | - Jacob M Musser
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Detlev Arendt
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
- Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany.
| |
Collapse
|
12
|
Walden N, Schranz ME. Synteny Identifies Reliable Orthologs for Phylogenomics and Comparative Genomics of the Brassicaceae. Genome Biol Evol 2023; 15:7059155. [PMID: 36848527 PMCID: PMC10016055 DOI: 10.1093/gbe/evad034] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 01/27/2023] [Accepted: 02/17/2023] [Indexed: 03/01/2023] Open
Abstract
Large genomic data sets are becoming the new normal in phylogenetic research, but the identification of true orthologous genes and the exclusion of problematic paralogs is still challenging when applying commonly used sequencing methods such as target enrichment. Here, we compared conventional ortholog detection using OrthoFinder with ortholog detection through genomic synteny in a data set of 11 representative diploid Brassicaceae whole-genome sequences spanning the entire phylogenetic space. Then, we evaluated the resulting gene sets regarding gene number, functional annotation, and gene and species tree resolution. Finally, we used the syntenic gene sets for comparative genomics and ancestral genome analysis. The use of synteny resulted in considerably more orthologs and also allowed us to reliably identify paralogs. Surprisingly, we did not detect notable differences between species trees reconstructed from syntenic orthologs when compared with other gene sets, including the Angiosperms353 set and a Brassicaceae-specific target enrichment gene set. However, the synteny data set comprised a multitude of gene functions, strongly suggesting that this method of marker selection for phylogenomics is suitable for studies that value downstream gene function analysis, gene interaction, and network studies. Finally, we present the first ancestral genome reconstruction for the Core Brassicaceae which predating the Brassicaceae lineage diversification ∼25 million years ago.
Collapse
Affiliation(s)
- Nora Walden
- Biosystematics Group, Wageningen University, Wageningen, The Netherlands.,Centre for Organismal Studies, Heidelberg University, Heidelberg, Germany
| | | |
Collapse
|
13
|
McCarthy CGP, Mulhair PO, Siu-Ting K, Creevey CJ, O’Connell MJ. Improving Orthologous Signal and Model Fit in Datasets Addressing the Root of the Animal Phylogeny. Mol Biol Evol 2023; 40:6989790. [PMID: 36649189 PMCID: PMC9848061 DOI: 10.1093/molbev/msac276] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/19/2022] [Accepted: 12/23/2022] [Indexed: 01/18/2023] Open
Abstract
There is conflicting evidence as to whether Porifera (sponges) or Ctenophora (comb jellies) comprise the root of the animal phylogeny. Support for either a Porifera-sister or Ctenophore-sister tree has been extensively examined in the context of model selection, taxon sampling, and outgroup selection. The influence of dataset construction is comparatively understudied. We re-examine five animal phylogeny datasets that have supported either root hypothesis using an approach designed to enrich orthologous signal in phylogenomic datasets. We find that many component orthogroups in animal datasets fail to recover major lineages as monophyletic with the exception of Ctenophora, regardless of the supported root. Enriching these datasets to retain orthogroups recovering ≥3 major lineages reduces dataset size by up to 50% while retaining underlying phylogenetic information and taxon sampling. Site-heterogeneous phylogenomic analysis of these enriched datasets recovers both Porifera-sister and Ctenophora-sister positions, even with additional constraints on outgroup sampling. Two datasets which previously supported Ctenophora-sister support Porifera-sister upon enrichment. All enriched datasets display improved model fitness under posterior predictive analysis. While not conclusively rooting animals at either Porifera or Ctenophora, we do see an increase in signal for Porifera-sister and a decrease in signal for Ctenophore-sister when data are filtered for orthologous signal. Our results indicate that dataset size and construction as well as model fit influence animal root inference.
Collapse
Affiliation(s)
| | | | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | | |
Collapse
|
14
|
Juravel K, Porras L, Höhna S, Pisani D, Wörheide G. Exploring genome gene content and morphological analysis to test recalcitrant nodes in the animal phylogeny. PLoS One 2023; 18:e0282444. [PMID: 36952565 PMCID: PMC10035847 DOI: 10.1371/journal.pone.0282444] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/14/2023] [Indexed: 03/25/2023] Open
Abstract
An accurate phylogeny of animals is needed to clarify their evolution, ecology, and impact on shaping the biosphere. Although datasets of several hundred thousand amino acids are nowadays routinely used to test phylogenetic hypotheses, key deep nodes in the metazoan tree remain unresolved: the root of animals, the root of Bilateria, and the monophyly of Deuterostomia. Instead of using the standard approach of amino acid datasets, we performed analyses of newly assembled genome gene content and morphological datasets to investigate these recalcitrant nodes in the phylogeny of animals. We explored extensively the choices for assembling the genome gene content dataset and model choices of morphological analyses. Our results are robust to these choices and provide additional insights into the early evolution of animals, they are consistent with sponges as the sister group of all the other animals, the worm-like bilaterian lineage Xenacoelomorpha as the sister group of the other Bilateria, and tentatively support monophyletic Deuterostomia.
Collapse
Affiliation(s)
- Ksenia Juravel
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Luis Porras
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Sebastian Höhna
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
- GeoBio-Center, Ludwig-Maximilians-Universität München, München, Germany
| | - Davide Pisani
- Bristol Palaeobiology Group, School of Biological Sciences and School of Earth Sciences, University of Bristol, Bristol, United Kingdom
| | - Gert Wörheide
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
- GeoBio-Center, Ludwig-Maximilians-Universität München, München, Germany
- SNSB-Bayerische Staatssammlung für Paläontologie und Geologie, München, Germany
| |
Collapse
|
15
|
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Read2Tree: scalable and accurate phylogenetic trees from raw reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.04.18.488678. [PMID: 36561179 PMCID: PMC9774205 DOI: 10.1101/2022.04.18.488678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The inference of phylogenetic trees is foundational to biology. However, state-of-the-art phylogenomics requires running complex pipelines, at significant computational and labour costs, with additional constraints in sequencing coverage, assembly and annotation quality. To overcome these challenges, we present Read2Tree, which directly processes raw sequencing reads into groups of corresponding genes. In a benchmark encompassing a broad variety of datasets, our assembly-free approach was 10-100x faster than conventional approaches, and in most cases more accurate-the exception being when sequencing coverage was high and reference species very distant. To illustrate the broad applicability of the tool, we reconstructed a yeast tree of life of 435 species spanning 590 million years of evolution. Applied to Coronaviridae samples, Read2Tree accurately classified highly diverse animal samples and near-identical SARS-CoV-2 sequences on a single tree-thereby exhibiting remarkable breadth and depth. The speed, accuracy, and versatility of Read2Tree enables comparative genomics at scale.
Collapse
Affiliation(s)
- David Dylus
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- present address: F. Hoffmann-La Roche Ltd, Immunology, Infectious Disease, and Ophthalmology (I2O), Roche Pharmaceutical Research and Early Development (pRED), Basel, 4070, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Adrian Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, ETH, 8092 Zurich, Switzerland
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, 77005, USA
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E, UK
| |
Collapse
|
16
|
Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O'Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol 2022; 32:5180-5188.e3. [PMID: 36356574 DOI: 10.1016/j.cub.2022.10.036] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/09/2022] [Accepted: 10/18/2022] [Indexed: 11/10/2022]
Abstract
Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum1,2,3,4,5,6 or within Deuterostomia, sister to Ambulacraria.7,8,9,10,11 Although their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. Recent studies have questioned whether Deuterostomia should be considered monophyletic at all.10,12,13 Hidden paralogy and poor phylogenetic signal present a major challenge for reconstructing species phylogenies.14,15,16,17,18 Here, we assess whether these issues have contributed to the conflict over the placement of Xenacoelomorpha. We reanalyzed published datasets, enriching for orthogroups whose gene trees support well-resolved clans elsewhere in the animal tree.16 We find that most genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy and low phylogenetic signal affect the ability to reconstruct branching patterns at deep nodes in the animal tree. We demonstrate that removing orthogroups that cannot recapitulate incontestable relationships alters the final topology that is inferred, while simultaneously improving the fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our set of filtered orthogroups. At a time when we are progressing toward sequencing all life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modeled adequately.19.
Collapse
Affiliation(s)
- Peter O Mulhair
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Charley G P McCarthy
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK.
| |
Collapse
|
17
|
Leite DJ, Piovani L, Telford MJ. Genome assembly of the polyclad flatworm Prostheceraeus crozieri. Genome Biol Evol 2022; 14:6678951. [PMID: 36040059 PMCID: PMC9469890 DOI: 10.1093/gbe/evac133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/25/2022] [Indexed: 11/24/2022] Open
Abstract
Polyclad flatworms are widely thought to be one of the least derived of the flatworm classes and, as such, are well placed to investigate evolutionary and developmental features such as spiral cleavage and larval diversification lost in other platyhelminths. Prostheceraeus crozieri, (formerly Maritigrella crozieri), is an emerging model polyclad flatworm that already has some useful transcriptome data but, to date, no sequenced genome. We have used high molecular weight DNA extraction and long-read PacBio sequencing to assemble the highly repetitive (67.9%) P. crozieri genome (2.07 Gb). We have annotated 43,325 genes, with 89.7% BUSCO completeness. Perhaps reflecting its large genome, introns were considerably larger than other free-living flatworms, but evidence of abundant transposable elements suggests genome expansion has been principally via transposable elements activity. This genome resource will be of great use for future developmental and phylogenomic research.
Collapse
Affiliation(s)
- Daniel J Leite
- Department of Biosciences, Durham University, Durham DH1 3LE, UK.,Centre for Life's Origins and Evolution, Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Laura Piovani
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution, and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
18
|
Santander MD, Maronna MM, Ryan JF, Andrade SCS. The state of Medusozoa genomics: current evidence and future challenges. Gigascience 2022; 11:6586816. [PMID: 35579552 PMCID: PMC9112765 DOI: 10.1093/gigascience/giac036] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/18/2022] [Accepted: 03/15/2022] [Indexed: 12/13/2022] Open
Abstract
Medusozoa is a widely distributed ancient lineage that harbors one-third of Cnidaria diversity divided into 4 classes. This clade is characterized by the succession of stages and modes of reproduction during metagenic lifecycles, and includes some of the most plastic body plans and life cycles among animals. The characterization of traditional genomic features, such as chromosome numbers and genome sizes, was rather overlooked in Medusozoa and many evolutionary questions still remain unanswered. Modern genomic DNA sequencing in this group started in 2010 with the publication of the Hydra vulgaris genome and has experienced an exponential increase in the past 3 years. Therefore, an update of the state of Medusozoa genomics is warranted. We reviewed different sources of evidence, including cytogenetic records and high-throughput sequencing projects. We focused on 4 main topics that would be relevant for the broad Cnidaria research community: (i) taxonomic coverage of genomic information; (ii) continuity, quality, and completeness of high-throughput sequencing datasets; (iii) overview of the Medusozoa specific research questions approached with genomics; and (iv) the accessibility of data and metadata. We highlight a lack of standardization in genomic projects and their reports, and reinforce a series of recommendations to enhance future collaborative research.
Collapse
Affiliation(s)
- Mylena D Santander
- Correspondence address. Mylena D. Santander, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade São Paulo, 277 Rua do Matão, Cidade Universitária, São Paulo 05508-090, Brazil. E-mail:
| | - Maximiliano M Maronna
- Correspondence address. Maximiliano M. Maronna, Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, 101 Rua do Matão Cidade Universitária, São Paulo 05508-090, Brazil. E-mail:
| | - Joseph F Ryan
- Whitney Laboratory for Marine Bioscience, University of Florida, 9505 Ocean Shore Blvd, St. Augustine, FL 32080, USA,Department of Biology, University of Florida, 220 Bartram Hall, Gainesville, FL 32611, USA
| | - Sónia C S Andrade
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade São Paulo, 277 Rua do Matão, Cidade Universitária, São Paulo 05508-090, Brazil
| |
Collapse
|
19
|
Merrikh H, Merrikh C. Reply to: Testing the adaptive hypothesis of lagging-strand encoding in bacterial genomes. Nat Commun 2022; 13:2627. [PMID: 35551437 PMCID: PMC9098457 DOI: 10.1038/s41467-022-30014-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 03/08/2022] [Indexed: 11/09/2022] Open
Affiliation(s)
- Houra Merrikh
- Department of Biochemistry, Vanderbilt University, Nashville, TN, USA.
| | | |
Collapse
|
20
|
A Thermodynamic Model for Water Activity and Redox Potential in Evolution and Development. J Mol Evol 2022; 90:182-199. [DOI: 10.1007/s00239-022-10051-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 02/22/2022] [Indexed: 10/18/2022]
|
21
|
Tihelka E, Cai C, Giacomelli M, Lozano-Fernandez J, Rota-Stabelli O, Huang D, Engel MS, Donoghue PCJ, Pisani D. The evolution of insect biodiversity. Curr Biol 2021; 31:R1299-R1311. [PMID: 34637741 DOI: 10.1016/j.cub.2021.08.057] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Insects comprise over half of all described animal species. Together with the Protura (coneheads), Collembola (springtails) and Diplura (two-pronged bristletails), insects form the Hexapoda, a terrestrial arthropod lineage characterised by possessing six legs. Exponential growth of genome-scale data for the hexapods has substantially altered our understanding of the origin and evolution of insect biodiversity. Phylogenomics has provided a new framework for reconstructing insect evolutionary history, resolving their position among the arthropods and some long-standing internal controversies such as the placement of the termites, twisted-winged insects, lice and fleas. However, despite the greatly increased size of phylogenomic datasets, contentious relationships among key insect clades remain unresolved. Further advances in insect phylogeny cannot rely on increased depth and breadth of genome and taxon sequencing. Improved modelling of the substitution process is fundamental to countering tree-reconstruction artefacts, while gene content, modelling of duplications and deletions, and comparative morphology all provide complementary lines of evidence to test hypotheses emerging from the analysis of sequence data. Finally, the integration of molecular and morphological data is key to the incorporation of fossil species within insect phylogeny. The emerging integrated framework of insect evolution will help explain the origins of insect megadiversity in terms of the evolution of their body plan, species diversity and ecology. Future studies of insect phylogeny should build upon an experimental, hypothesis-driven approach where the robustness of hypotheses generated is tested against increasingly realistic evolutionary models as well as complementary sources of phylogenetic evidence.
Collapse
Affiliation(s)
- Erik Tihelka
- School of Earth Sciences, University of Bristol, Bristol, UK; State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China.
| | - Chenyang Cai
- School of Earth Sciences, University of Bristol, Bristol, UK; State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China.
| | | | - Jesus Lozano-Fernandez
- School of Biological Sciences, University of Bristol, Bristol, UK; Institute of Evolutionary Biology (CSIC-UPF), Barcelona, Spain
| | - Omar Rota-Stabelli
- Research and Innovation Centre, Fondazione Edmund Mach, 38010 San Michele all Adige, Italy; Center Agriculture Food Environment, University of Trento, 38010 San Michele all Adige, Italy
| | - Diying Huang
- State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology, and Centre for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Nanjing, China
| | - Michael S Engel
- Division of Entomology, Natural History Museum, University of Kansas, Lawrence, KS, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| | | | - Davide Pisani
- School of Earth Sciences, University of Bristol, Bristol, UK; School of Biological Sciences, University of Bristol, Bristol, UK.
| |
Collapse
|