1
|
Zhang C, Nielsen R, Mirarab S. CASTER: Direct species tree inference from whole-genome alignments. Science 2025; 387:eadk9688. [PMID: 39847611 PMCID: PMC12038793 DOI: 10.1126/science.adk9688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 08/05/2024] [Accepted: 12/04/2024] [Indexed: 01/25/2025]
Abstract
Genomes contain mosaics of discordant evolutionary histories, challenging the accurate inference of the tree of life. Although genome-wide data are routinely used for discordance-aware phylogenomic analyses, because of modeling and scalability limitations, the current practice leaves out large chunks of genomes. As more high-quality genomes become available, we urgently need discordance-aware methods to infer the tree directly from a multiple genome alignment. In this study, we introduce Coalescence-Aware Alignment-Based Species Tree Estimator (CASTER), a theoretically justified site-based method that eliminates the need to predefine recombination-free loci. CASTER is scalable to hundreds of mammalian whole genomes. We demonstrate the accuracy and scalability of CASTER in simulations that include recombination and apply CASTER to several biological datasets, showing that its per-site scores can reveal both biological and artifactual patterns of discordance across the genome.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of
California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA
- Integrative Biology Department, University of California
Berkeley, 110 Sproul Hall, Berkeley, 94704, CA, USA
- Globe Institute, University of Copenhagen, Øster
Voldgade 5-7, Copenhagen, 1350, Denmark
| | - Rasmus Nielsen
- Integrative Biology Department, University of California
Berkeley, 110 Sproul Hall, Berkeley, 94704, CA, USA
- Globe Institute, University of Copenhagen, Øster
Voldgade 5-7, Copenhagen, 1350, Denmark
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of
California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA
| |
Collapse
|
2
|
Kapli P, Kotari I, Telford MJ, Goldman N, Yang Z. DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies. Syst Biol 2023; 72:1119-1135. [PMID: 37366056 PMCID: PMC10627555 DOI: 10.1093/sysbio/syad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Indexed: 06/28/2023] Open
Abstract
Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here, we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies.
Collapse
Affiliation(s)
- Paschalia Kapli
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| | - Ioanna Kotari
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, 1210, Austria
| | - Maximilian J Telford
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| | - Nick Goldman
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ziheng Yang
- Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
3
|
Casanellas M, Fernández-Sánchez J, Garrote-López M, Sabaté-Vidales M. Designing Weights for Quartet-Based Methods When Data are Heterogeneous Across Lineages. Bull Math Biol 2023; 85:68. [PMID: 37310552 DOI: 10.1007/s11538-023-01167-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 05/15/2023] [Indexed: 06/14/2023]
Abstract
Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand, dealing successfully with the general case (heterogeneity of rates across lineages) is one of the key features of phylogenetic reconstruction methods based on algebraic tools. The goal of this paper is twofold. First, we present a new weighting system for quartets (ASAQ) based on algebraic and semi-algebraic tools, thus especially indicated to deal with data evolving under heterogeneous rates. This method combines the weights of two previous methods by means of a test based on the positivity of the branch lengths estimated with the paralinear distance. ASAQ is statistically consistent when applied to data generated under the general Markov model, considers rate and base composition heterogeneity among lineages and does not assume stationarity nor time-reversibility. Second, we test and compare the performance of several quartet-based methods for phylogenetic tree reconstruction (namely QFM, wQFM, quartet puzzling, weight optimization and Willson's method) in combination with several systems of weights, including ASAQ weights and other weights based on algebraic and semi-algebraic methods or on the paralinear distance. These tests are applied to both simulated and real data and support weight optimization with ASAQ weights as a reliable and successful reconstruction method that improves upon the accuracy of global methods (such as neighbor-joining or maximum likelihood) in the presence of long branches or on mixtures of distributions on trees.
Collapse
Affiliation(s)
- Marta Casanellas
- Institut de Matematiques de la UPC-BarcelonaTech (IMTech), Universitat Politècnica de Catalunya and Centre de Recerca Matemàtica, Av. Diagonal 647, 08028, Barcelona, Spain.
| | - Jesús Fernández-Sánchez
- Institut de Matematiques de la UPC-BarcelonaTech (IMTech), Universitat Politècnica de Catalunya and Centre de Recerca Matemàtica, Av. Diagonal 647, 08028, Barcelona, Spain
| | | | | |
Collapse
|
4
|
Belton S, Lamari N, Jermiin LS, Mariscal V, Flores E, McCabe PF, Ng CKY. Genetic and lipidomic analyses suggest that Nostoc punctiforme, a plant-symbiotic cyanobacterium, does not produce sphingolipids. Access Microbiol 2022; 4:000306. [PMID: 35252750 PMCID: PMC8895605 DOI: 10.1099/acmi.0.000306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 11/23/2021] [Indexed: 11/21/2022] Open
Abstract
Sphingolipids, a class of amino-alcohol-based lipids, are well characterized in eukaryotes and in some anaerobic bacteria. However, the only sphingolipids so far identified in cyanobacteria are two ceramides (i.e., an acetylsphingomyelin and a cerebroside), both based on unbranched, long-chain base (LCB) sphingolipids in Scytonema julianum and Moorea producens, respectively. The first step in de novo sphingolipid biosynthesis is the condensation of l-serine with palmitoyl-CoA to produce 3-keto-diyhydrosphingosine (KDS). This reaction is catalyzed by serine palmitoyltransferase (SPT), which belongs to a small family of pyridoxal phosphate-dependent α-oxoamine synthase (AOS) enzymes. Based on sequence similarity to molecularly characterized bacterial SPT peptides, we identified a putative SPT (Npun_R3567) from the model nitrogen-fixing, plant-symbiotic cyanobacterium, Nostoc punctiforme strain PCC 73102 (ATCC 29133). Gene expression analysis revealed that Npun_R3567 is induced during late-stage diazotrophic growth in N. punctiforme. However, Npun_R3567 could not produce the SPT reaction product, 3-keto-diyhydrosphingosine (KDS), when heterologously expressed in Escherichia coli. This agreed with a sphingolipidomic analysis of N. punctiforme cells, which revealed that no LCBs or ceramides were present. To gain a better understanding of Npun_R3567, we inferred the phylogenetic position of Npun_R3567 relative to other bacterial AOS peptides. Rather than clustering with other bacterial SPTs, Npun_R3567 and the other cyanobacterial BioF homologues formed a separate, monophyletic group. Given that N. punctiforme does not appear to possess any other gene encoding an AOS enzyme, it is altogether unlikely that N. punctiforme is capable of synthesizing sphingolipids. In the context of cross-kingdom symbiosis signalling in which sphingolipids are emerging as important regulators, it appears unlikely that sphingolipids from N. punctiforme play a regulatory role during its symbiotic association with plants.
Collapse
Affiliation(s)
- Samuel Belton
- UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin D4, Ireland
- Present address: DBN Plant Molecular Biology Lab, National Botanic Gardens of Ireland, Dublin, Ireland
| | - Nadia Lamari
- Present address: Philip Morris International, Quai Jeanrenaud 3, 2000, Neuchâtel, Switzerland
- UCD Earth Institute, O’Brien Centre for Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin D4, Ireland
| | - Lars S. Jermiin
- Research School of Biology, Australian National University, Canberra, ACT 2600, Australia
- UCD Earth Institute, O’Brien Centre for Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin D4, Ireland
| | - Vicente Mariscal
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla, cicCartuja, Avda. Américo Vespucio 49, 41092 Seville, Spain
| | - Enrique Flores
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla, cicCartuja, Avda. Américo Vespucio 49, 41092 Seville, Spain
| | - Paul F. McCabe
- UCD Earth Institute, O’Brien Centre for Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD Centre for Plant Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin D4, Ireland
| | - Carl K. Y. Ng
- UCD Earth Institute, O’Brien Centre for Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin D4, Ireland
- UCD Centre for Plant Science, University College Dublin, Belfield, Dublin D4, Ireland
| |
Collapse
|