1
|
Barthe M, Rancilhac L, Arteaga MC, Feijó A, Tilak MK, Justy F, Loughry WJ, McDonough CM, de Thoisy B, Catzeflis F, Billet G, Hautier L, Benoit N, Delsuc F. Exon Capture Museomics Deciphers the Nine-Banded Armadillo Species Complex and Identifies a New Species Endemic to the Guiana Shield. Syst Biol 2025; 74:177-197. [PMID: 38907999 PMCID: PMC11958936 DOI: 10.1093/sysbio/syae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 05/24/2024] [Accepted: 06/19/2024] [Indexed: 06/24/2024] Open
Abstract
The nine-banded armadillo (Dasypus novemcinctus) is the most widespread xenarthran species across the Americas. Recent studies have suggested it is composed of 4 morphologically and genetically distinct lineages of uncertain taxonomic status. To address this issue, we used a museomic approach to sequence 80 complete mitogenomes and capture 997 nuclear loci for 71 Dasypus individuals sampled across the entire distribution. We carefully cleaned up potential genotyping errors and cross-contaminations that could blur species boundaries by mimicking gene flow. Our results unambiguously support 4 distinct lineages within the D. novemcinctus complex. We found cases of mito-nuclear phylogenetic discordance but only limited contemporary gene flow confined to the margins of the lineage distributions. All available evidence including the restricted gene flow, phylogenetic reconstructions based on both mitogenomes and nuclear loci, and phylogenetic delimitation methods consistently supported the 4 lineages within D. novemcinctus as 4 distinct species. Comparable genetic differentiation values to other recognized Dasypus species further reinforced their status as valid species. Considering congruent morphological results from previous studies, we provide an integrative taxonomic view to recognize 4 species within the D. novemcinctus complex: D. novemcinctus, D. fenestratus, D. mexicanus, and D. guianensis sp. nov., a new species endemic of the Guiana Shield that we describe here. The 2 available individuals of D. mazzai and D. sabanicola were consistently nested within D. novemcinctus lineage and their status remains to be assessed. The present work offers a case study illustrating the power of museomics to reveal cryptic species diversity within a widely distributed and emblematic species of mammals.
Collapse
Affiliation(s)
- Mathilde Barthe
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
| | - Loïs Rancilhac
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
- Animal Ecology, Department of Ecology and Genetics, Uppsala University, P.O. Box 256, SE-751 05 Uppsala, Sweden
- Department of biology, University of Cyprus, P.O. Box 20537, CY-1678 Nicosia, Cyprus
| | - Maria C Arteaga
- Department of Conservation Biology, CICESE, Carretera Ensenada, Tijuana No. 3918, Zona Playitas, CP. 22860, Ensenada, Baja California, México
| | - Anderson Feijó
- Negaunee Integrative Research Center, Field Museum of Natural History, 1400 S Lake Shore Dr, Chicago, IL 60605, United States
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Marie-Ka Tilak
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
| | - Fabienne Justy
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
| | - William J Loughry
- Department of Biology, Valdosta State University, 1500 North Patterson Street, Valdosta, GA 31698, United States
| | - Colleen M McDonough
- Department of Biology, Valdosta State University, 1500 North Patterson Street, Valdosta, GA 31698, United States
| | - Benoit de Thoisy
- Institut Pasteur de la Guyane, 23 Avenue Pasteur, BP 6010, Cayenne Cedex 97306, French Guiana
- Kwata NGO, 16 Avenue Pasteur, 97300 Cayenne, French Guiana
| | - François Catzeflis
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
| | - Guillaume Billet
- Centre de Recherche en Paléontologie – Paris (CR2P), CNRS/MNHN/Sorbonne Université, Muséum national d’Histoire naturelle, 43 Rue Buffon, 75005 Paris, France
| | - Lionel Hautier
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
- Mammal Section, Life Sciences, Vertebrate Division, The Natural History Museum, Cromwell Road London, SW7 5BD, London, United Kingdom
| | - Nabholz Benoit
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
- Institut universitaire de France, 1 Rue Descartes, 75231 Paris Cedex 05, France
| | - Frédéric Delsuc
- Institut des Sciences de l’Evolution de Montpellier (ISEM), Univ. Montpellier, CNRS, IRD, Place E. Bataillon, 34095 Montpellier Cedex 05, France
| |
Collapse
|
2
|
Latrille T, Joseph J, Hartasánchez DA, Salamin N. Estimating the proportion of beneficial mutations that are not adaptive in mammals. PLoS Genet 2024; 20:e1011536. [PMID: 39724093 PMCID: PMC11709321 DOI: 10.1371/journal.pgen.1011536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 01/08/2025] [Accepted: 12/10/2024] [Indexed: 12/28/2024] Open
Abstract
Mutations can be beneficial by bringing innovation to their bearer, allowing them to adapt to environmental change. These mutations are typically unpredictable since they respond to an unforeseen change in the environment. However, mutations can also be beneficial because they are simply restoring a state of higher fitness that was lost due to genetic drift in a stable environment. In contrast to adaptive mutations, these beneficial non-adaptive mutations can be predicted if the underlying fitness landscape is stable and known. The contribution of such non-adaptive mutations to molecular evolution has been widely neglected mainly because their detection is very challenging. We have here reconstructed protein-coding gene fitness landscapes shared between mammals, using mutation-selection models and a multi-species alignments across 87 mammals. These fitness landscapes have allowed us to predict the fitness effect of polymorphisms found in 28 mammalian populations. Using methods that quantify selection at the population level, we have confirmed that beneficial non-adaptive mutations are indeed positively selected in extant populations. Our work confirms that deleterious substitutions are accumulating in mammals and are being reverted, generating a balance in which genomes are damaged and restored simultaneously at different loci. We observe that beneficial non-adaptive mutations represent between 15% and 45% of all beneficial mutations in 24 of 28 populations analyzed, suggesting that a substantial part of ongoing positive selection is not driven solely by adaptation to environmental change in mammals.
Collapse
Affiliation(s)
- Thibault Latrille
- Department of Computational Biology, Université de Lausanne, Lausanne, Switzerland
| | - Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Villeurbanne, France
| | | | - Nicolas Salamin
- Department of Computational Biology, Université de Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Zheng W, Gojobori J, Suh A, Satta Y. Different Host-Endogenous Retrovirus Relationships between Mammals and Birds Reflected in Genome-Wide Evolutionary Interaction Patterns. Genome Biol Evol 2024; 16:evae065. [PMID: 38527852 PMCID: PMC11005779 DOI: 10.1093/gbe/evae065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/25/2024] [Accepted: 03/21/2024] [Indexed: 03/27/2024] Open
Abstract
Mammals and birds differ largely in their average endogenous retrovirus loads, namely the proportion of endogenous retrovirus in the genome. The host-endogenous retrovirus relationships, including conflict and co-option, have been hypothesized among the causes of this difference. However, there has not been studies about the genomic evolutionary signal of constant host-endogenous retrovirus interactions in a long-term scale and how such interactions could lead to the endogenous retrovirus load difference. Through a phylogeny-controlled correlation analysis on ∼5,000 genes between the dN/dS ratio of each gene and the load of endogenous retrovirus in 12 mammals and 21 birds, separately, we detected genes that may have evolved in association with endogenous retrovirus loads. Birds have a higher proportion of genes with strong correlation between dN/dS and the endogenous retrovirus load than mammals. Strong evidence of association is found between the dN/dS of the coding gene for leucine-rich repeat-containing protein 23 and endogenous retrovirus load in birds. Gene set enrichment analysis shows that gene silencing rather than immunity and DNA recombination may have a larger contribution to the association between dN/dS and the endogenous retrovirus load for both mammals and birds. The above results together showing different evolutionary patterns between bird and mammal genes can partially explain the apparently lower endogenous retrovirus loads of birds, while gene silencing may be a universal mechanism that plays a remarkable role in the evolutionary interaction between the host and endogenous retrovirus. In summary, our study presents signals that the host genes might have driven or responded to endogenous retrovirus load changes in long-term evolution.
Collapse
Affiliation(s)
- Wanjing Zheng
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Kanagawa 240-0193, Japan
- School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Jun Gojobori
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Kanagawa 240-0193, Japan
- Research Center for Integrative Evolutionary Science, SOKENDAI (The Graduate University for Advanced Studies), Kanagawa 240-0193, Japan
| | - Alexander Suh
- Department of Organismal Biology—Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden
- School of Biological Sciences—Organisms and the Environment, University of East Anglia, Norwich, UK
| | - Yoko Satta
- Department of Evolutionary Studies of Biosystems, School of Advanced Sciences, SOKENDAI (The Graduate University for Advanced Studies), Kanagawa 240-0193, Japan
- Research Center for Integrative Evolutionary Science, SOKENDAI (The Graduate University for Advanced Studies), Kanagawa 240-0193, Japan
| |
Collapse
|
4
|
Allio R, Delsuc F, Belkhir K, Douzery EJP, Ranwez V, Scornavacca C. OrthoMaM v12: a database of curated single-copy ortholog alignments and trees to study mammalian evolutionary genomics. Nucleic Acids Res 2024; 52:D529-D535. [PMID: 37843103 PMCID: PMC10767847 DOI: 10.1093/nar/gkad834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/19/2023] [Accepted: 09/26/2023] [Indexed: 10/17/2023] Open
Abstract
To date, the databases built to gather information on gene orthology do not provide end-users with descriptors of the molecular evolution information and phylogenetic pattern of these orthologues. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of coding sequences in mammalian genomes. OrthoMaM version 12 includes 15,868 alignments of orthologous coding sequences (CDS) from the 190 complete mammalian genomes currently available. All annotations and 1-to-1 orthology assignments are based on NCBI. Orthologous CDS can be mined for potential informative markers at the different taxonomic levels of the mammalian tree. To this end, several evolutionary descriptors of DNA sequences are provided for querying purposes (e.g. base composition and relative substitution rate). The graphical web interface allows the user to easily browse and sort the results of combined queries. The corresponding multiple sequence alignments and ML trees, inferred using state-of-the art approaches, are available for download both at the nucleotide and amino acid levels. OrthoMaM v12 can be used by researchers interested either in reconstructing the phylogenetic relationships of mammalian taxa or in understanding the evolutionary dynamics of coding sequences in their genomes. OrthoMaM is available for browsing, querying and complete or filtered download at https://orthomam.mbb.cnrs.fr/.
Collapse
Affiliation(s)
- Rémi Allio
- CBGP, INRAE, CIRAD, IRD, Institut Agro, Univ. Montpellier, Montpellier, 34988, France
- ISEM, Univ. Montpellier, CNRS, IRD, Montpellier, 34095, France
| | - Frédéric Delsuc
- ISEM, Univ. Montpellier, CNRS, IRD, Montpellier, 34095, France
| | - Khalid Belkhir
- ISEM, Univ. Montpellier, CNRS, IRD, Montpellier, 34095, France
| | | | - Vincent Ranwez
- AGAP, Univ. Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, 34398, France
| | | |
Collapse
|
5
|
Fang W, Li K, Ma S, Wei F, Hu Y. Natural selection and convergent evolution of the HOX gene family in Carnivora. Front Ecol Evol 2023. [DOI: 10.3389/fevo.2023.1107034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
HOX genes play a central role in the development and regulation of limb patterns. For mammals in the order Carnivora, limbs have evolved in different forms, and there are interesting cases of phenotypic convergence, such as the pseudothumb of the giant and red pandas, and the flippers or specialized limbs of the pinnipeds and sea otter. However, the molecular bases of limb development remain largely unclear. Here, we studied the molecular evolution of the HOX9 ~ 13 genes of 14 representative species in Carnivora and explored the molecular evolution of other HOX genes. We found that only one limb development gene, HOXC10, underwent convergent evolution between giant and red pandas and was thus an important candidate gene related to the development of pseudothumbs. No signals of amino acid convergence and natural selection were found in HOX9 ~ 13 genes between pinnipeds and sea otter, but there was evidence of positive selection and rapid evolution in four pinniped species. Overall, few HOX genes evolve via natural selection or convergent evolution, and these could be important candidate genes for further functional validation. Our findings provide insights into potential molecular mechanisms of the development of specialized pseudothumbs and flippers (or specialized limbs).
Collapse
|
6
|
Chakraborty A, Ay F, Davuluri RV. ExTraMapper: Exon- and Transcript-level mappings for orthologous gene pairs. Bioinformatics 2021; 37:3412-3420. [PMID: 34014317 PMCID: PMC8545320 DOI: 10.1093/bioinformatics/btab393] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms. RESULTS Here, we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 350k exon mappings, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits, and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. We also applied the method to detect conserved exon and transcript pairs between human and rhesus macaque genomes to highlight the point that ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs. AVAILABILITY The source code and the results are available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ferhat Ay
- La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.,Department of Pediatrics, UC San Diego - School of Medicine, La Jolla, 92093, CA, USA
| | - Ramana V Davuluri
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, 11794, USA
| |
Collapse
|
7
|
Abadi S, Avram O, Rosset S, Pupko T, Mayrose I. ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning. Mol Biol Evol 2021; 37:3338-3352. [PMID: 32585030 DOI: 10.1093/molbev/msaa154] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.
Collapse
Affiliation(s)
- Shiran Abadi
- School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel
| | - Oren Avram
- School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel
| | - Saharon Rosset
- Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Tal Pupko
- School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
8
|
Seoighe C, Kiniry SJ, Peters A, Baranov PV, Yang H. Selection Shapes Synonymous Stop Codon Use in Mammals. J Mol Evol 2020; 88:549-561. [PMID: 32617614 DOI: 10.1007/s00239-020-09957-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022]
Abstract
Phylogenetic models of the evolution of protein-coding sequences can provide insights into the selection pressures that have shaped them. In the application of these models synonymous nucleotide substitutions, which do not alter the encoded amino acid, are often assumed to have limited functional consequences and used as a proxy for the neutral rate of evolution. The ratio of nonsynonymous to synonymous substitution rates is then used to categorize the selective regime that applies to the protein (e.g., purifying selection, neutral evolution, diversifying selection). Here, we extend the Muse and Gaut model of codon evolution to explore the extent of purifying selection acting on substitutions between synonymous stop codons. Using a large collection of coding sequence alignments, we estimate that a high proportion (approximately 57%) of mammalian genes are affected by selection acting on stop codon preference. This proportion varies substantially by codon, with UGA stop codons far more likely to be conserved. Genes with evidence of selection acting on synonymous stop codons have distinctive characteristics, compared to unconserved genes with the same stop codon, including longer [Formula: see text] untranslated regions (UTRs) and shorter mRNA half-life. The coding regions of these genes are also much more likely to be under strong purifying selection pressure. Our results suggest that the preference for UGA stop codons found in many multicellular eukaryotes is selective rather than mutational in origin.
Collapse
Affiliation(s)
- Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland.
| | - Stephen J Kiniry
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Andrew Peters
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - Pavel V Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Haixuan Yang
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
9
|
Freitas L, Mesquita RD, Schrago CG. Survey for positively selected coding regions in the genome of the hematophagous tsetse fly Glossina morsitans identifies candidate genes associated with feeding habits and embryonic development. Genet Mol Biol 2020; 43:e20180311. [PMID: 32555940 PMCID: PMC7288665 DOI: 10.1590/1678-4685-gmb-2018-0311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Accepted: 08/23/2019] [Indexed: 11/22/2022] Open
Abstract
Tsetse flies are responsible for the transmission of Trypanossoma sp. to vertebrate animals in Africa causing huge health issues and economic loss. The availability of the genome sequence of Glossina morsitans enabled the discovery of several genes related to medically important phenotypes and novel physiological features. However, a genome-wide scan for coding regions that underwent positive selection is still missing, which is surprising given the evolution of traits associated with the hematophagy in this lineage. In this study, we employed an experimental design that controlled for the rate of false positives and we performed a scan of 3,318 G. morsitans genes. We found 145 genes with significant historical signal of positive selection. These genes were categorized into 18 functional classes after careful manual annotation. Based on their attributed functions, we identified candidate genes related with feeding habits and embryonic development. When our results were contrasted with gene expression data, we confirmed that most genes that underwent adaptive molecular evolution were frequently expressed in organs associated with key physiological evolutionary innovations in the G. morsitans lineage, namely, the salivary gland, the midgut, fat body tissue, and in the spermatophore.
Collapse
Affiliation(s)
- Lucas Freitas
- Universidade Federal do Rio de Janeiro, Departamento de Genética, Rio de Janeiro, RJ, Brazil.,Universidade Federal do Rio de Janeiro, Instituto de Química, Departamento de Bioquímica, Laboratório de Bioinformática, Rio de Janeiro, RJ, Brazil.,Instituto Nacional de Ciência e Tecnologia em Entomologia Molecular, Rio de Janeiro, RJ, Brazil
| | - Rafael D Mesquita
- Universidade Federal do Rio de Janeiro, Instituto de Química, Departamento de Bioquímica, Laboratório de Bioinformática, Rio de Janeiro, RJ, Brazil.,Instituto Nacional de Ciência e Tecnologia em Entomologia Molecular, Rio de Janeiro, RJ, Brazil
| | - Carlos G Schrago
- Universidade Federal do Rio de Janeiro, Departamento de Genética, Rio de Janeiro, RJ, Brazil
| |
Collapse
|
10
|
Borges R, Machado JP, Gomes C, Rocha AP, Antunes A. Measuring phylogenetic signal between categorical traits and phylogenies. Bioinformatics 2020; 35:1862-1869. [PMID: 30358816 DOI: 10.1093/bioinformatics/bty800] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 08/18/2018] [Accepted: 10/24/2018] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Determining whether a trait and phylogeny share some degree of phylogenetic signal is a flagship goal in evolutionary biology. Signatures of phylogenetic signal can assist the resolution of a broad range of evolutionary questions regarding the tempo and mode of phenotypic evolution. However, despite the considerable number of strategies to measure it, few and limited approaches exist for categorical traits. Here, we used the concept of Shannon entropy and propose the δ statistic for evaluating the degree of phylogenetic signal between a phylogeny and categorical traits. RESULTS We validated δ as a measure of phylogenetic signal: the higher the δ-value the higher the degree of phylogenetic signal between a given tree and a trait. Based on simulated data we proposed a threshold-based classification test to pinpoint cases of phylogenetic signal. The assessment of the test's specificity and sensitivity suggested that the δ approach should only be applied to 20 or more species. We have further tested the performance of δ in scenarios of branch length and topology uncertainty, unbiased and biased trait evolution and trait saturation. Our results showed that δ may be applied in a wide range of phylogenetic contexts. Finally, we investigated our method in 14 360 mammalian gene trees and found that olfactory receptor genes are significantly associated with the mammalian activity patterns, a result that is congruent with expectations and experiments from the literature. Our application shows that δ can successfully detect molecular signatures of phenotypic evolution. We conclude that δ represents a useful measure of phylogenetic signal since many phenotypes can only be measured in categories. AVAILABILITY AND IMPLEMENTATION https://github.com/mrborges23/delta_statistic. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Borges
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences of the University of Porto, FCUP, Porto, Portugal.,CMUP, Centre of Mathematics of the University of Porto, Porto, Portugal
| | - João Paulo Machado
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Matosinhos, Portugal
| | - Cidália Gomes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Matosinhos, Portugal
| | - Ana Paula Rocha
- Department of Biology, Faculty of Sciences of the University of Porto, FCUP, Porto, Portugal.,CMUP, Centre of Mathematics of the University of Porto, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Matosinhos, Portugal.,Department of Biology, Faculty of Sciences of the University of Porto, FCUP, Porto, Portugal
| |
Collapse
|
11
|
Mason VC, Helgen KM, Murphy WJ. Comparative Phylogeography of Forest-Dependent Mammals Reveals Paleo-Forest Corridors throughout Sundaland. J Hered 2020; 110:158-172. [PMID: 30247638 DOI: 10.1093/jhered/esy046] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 08/27/2018] [Indexed: 11/13/2022] Open
Abstract
The evolutionary history of the colugo, a gliding arboreal mammal distributed throughout Sundaland, was influenced by the location of and connections between forest habitats. By comparing colugo phylogenetic patterns, species ecology, sample distributions, and times of divergence to those of other Sundaic taxa with different life-history traits and dispersal capabilities, we inferred the probable distribution of paleo-forest corridors and their influence on observed biogeographic patterns. We identified a consistent pattern of early diversification between east and west Bornean lineages in colugos, lesser mouse deer, and Sunda pangolins, but not in greater mouse deer. This deep east-west split within Borneo has not been commonly described in mammals. Colugos on West Borneo diverged from colugos in Peninsular Malaysia and Sumatra in the late Pliocene, however most other mammalian populations distributed across these same geographic regions diverged from a common ancestor more recently in the Pleistocene. Low genetic divergence between colugos on large landmasses and their neighboring satellite islands indicated that past forest distributions were recently much larger than present refugial distributions. Our analysis of colugo evolutionary history reconstructs Borneo as the most likely ancestral area of origin for Sunda colugos, and suggests that forests present during the middle Pliocene within the Sunda Shelf were more evergreen and contiguous, while forests were more fragmented, transient, seasonal, or with lower density canopies in the Pleistocene.
Collapse
Affiliation(s)
- Victor C Mason
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX.,Victor C. Mason is now at Department of Clinical Veterinary Medicine, Swiss Institute of Equine Medicine, Vetsuisse Faculty, University of Bern, Länggassstrasse, Bern, Switzerland
| | - Kristofer M Helgen
- School of Biological Sciences, Environment Institute, and Centre for Applied Conservation Science, University of Adelaide, Adelaide, SA, Australia
| | - William J Murphy
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX
| |
Collapse
|
12
|
Rousselle M, Simion P, Tilak MK, Figuet E, Nabholz B, Galtier N. Is adaptation limited by mutation? A timescale-dependent effect of genetic diversity on the adaptive substitution rate in animals. PLoS Genet 2020; 16:e1008668. [PMID: 32251427 PMCID: PMC7162527 DOI: 10.1371/journal.pgen.1008668] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/16/2020] [Accepted: 02/14/2020] [Indexed: 12/16/2022] Open
Abstract
Whether adaptation is limited by the beneficial mutation supply is a long-standing question of evolutionary genetics, which is more generally related to the determination of the adaptive substitution rate and its relationship with species effective population size (Ne) and genetic diversity. Empirical evidence reported so far is equivocal, with some but not all studies supporting a higher adaptive substitution rate in large-Ne than in small-Ne species. We gathered coding sequence polymorphism data and estimated the adaptive amino-acid substitution rate ωa, in 50 species from ten distant groups of animals with markedly different population mutation rate θ. We reveal the existence of a complex, timescale dependent relationship between species adaptive substitution rate and genetic diversity. We find a positive relationship between ωa and θ among closely related species, indicating that adaptation is indeed limited by the mutation supply, but this was only true in relatively low-θ taxa. In contrast, we uncover no significant correlation between ωa and θ at a larger taxonomic scale, suggesting that the proportion of beneficial mutations scales negatively with species' long-term Ne.
Collapse
Affiliation(s)
| | - Paul Simion
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
- LEGE, Department of Biology, University of Namur, Namur, Belgium
| | - Marie-Ka Tilak
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Emeric Figuet
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Benoit Nabholz
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Nicolas Galtier
- ISEM, Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
13
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
14
|
Islam M, Sarker K, Das T, Reaz R, Bayzid MS. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics 2020; 21:136. [PMID: 32039704 PMCID: PMC7011378 DOI: 10.1186/s12864-020-6519-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open
Abstract
Background Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. Results We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. Conclusions Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.
Collapse
Affiliation(s)
- Mazharul Islam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Kowshika Sarker
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Trisha Das
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science, The University of Texas at Austin, Texas, 78712, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| |
Collapse
|
15
|
Zou Z, Zhang J. Amino acid exchangeabilities vary across the tree of life. SCIENCE ADVANCES 2019; 5:eaax3124. [PMID: 31840062 PMCID: PMC6892623 DOI: 10.1126/sciadv.aax3124] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 09/24/2019] [Indexed: 05/05/2023]
Abstract
Different amino acid pairs have drastically different relative exchangeabilities (REs), and accounting for this variation is an important and common practice in inferring phylogenies, testing selection, and predicting mutational effects, among other analyses. In all such endeavors, REs have been generally considered invariant among species; this assumption, however, has not been scrutinized. Using maximum likelihood to analyze 180 genome sequences, we estimated REs from 90 clades representing all three domains of life, and found numerous instances of substantial between-clade differences in REs. REs show more differences between orthologous proteins of different clades than unrelated proteins of the same clade, suggesting that REs are genome-wide, clade-specific features, probably a result of proteome-wide evolutionary changes in the physicochemical environments of amino acid residues. The discovery of among-clade RE variations cautions against assuming constant REs in various analyses and demonstrates a higher-than-expected complexity in mechanisms of proteome evolution.
Collapse
|
16
|
He C, Liang D, Zhang P. Asymmetric Distribution of Gene Trees Can Arise under Purifying Selection If Differences in Population Size Exist. Mol Biol Evol 2019; 37:881-892. [DOI: 10.1093/molbev/msz232] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
AbstractIncomplete lineage sorting (ILS) is an important factor that causes gene tree discordance. For gene trees of three species, under neutrality, random mating, and the absence of interspecific gene flow, ILS creates a symmetric distribution of gene trees: the gene tree that accords with the species tree has the highest frequency, and the two discordant trees are equally frequent. If the neutral condition is violated, the impact of ILS may change, altering the gene tree distribution. Here, we show that under purifying selection, even assuming that the fitness effect of mutations is constant throughout the species tree, if differences in population size exist among species, asymmetric distributions of gene trees will arise, which is different from the expectation under neutrality. In extremes, one of the discordant trees rather than the concordant tree becomes the most frequent gene tree. In addition, we found that in a real case, the position of Scandentia relative to Primate and Glires, the symmetry in the gene tree distribution can be influenced by the strength of purifying selection. In current phylogenetic inference, the impact of purifying selection on the gene tree distribution is rarely considered by researchers. This study highlights the necessity of considering this impact.
Collapse
Affiliation(s)
- Chong He
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
17
|
Roycroft EJ, Moussalli A, Rowe KC. Phylogenomics Uncovers Confidence and Conflict in the Rapid Radiation of Australo-Papuan Rodents. Syst Biol 2019; 69:431-444. [DOI: 10.1093/sysbio/syz044] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 06/12/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
The estimation of robust and accurate measures of branch support has proven challenging in the era of phylogenomics. In data sets of potentially millions of sites, bootstrap support for bifurcating relationships around very short internal branches can be inappropriately inflated. Such overestimation of branch support may be particularly problematic in rapid radiations, where phylogenetic signal is low and incomplete lineage sorting severe. Here, we explore this issue by comparing various branch support estimates under both concatenated and coalescent frameworks, in the recent radiation Australo-Papuan murine rodents (Muridae: Hydromyini). Using nucleotide sequence data from 1245 independent loci and several phylogenomic inference methods, we unequivocally resolve the majority of genus-level relationships within Hydromyini. However, at four nodes we recover inconsistency in branch support estimates both within and among concatenated and coalescent approaches. In most cases, concatenated likelihood approaches using standard fast bootstrap algorithms did not detect any uncertainty at these four nodes, regardless of partitioning strategy. However, we found this could be overcome with two-stage resampling, that is, across genes and sites within genes (using -bsam GENESITE in IQ-TREE). In addition, low confidence at recalcitrant nodes was recovered using UFBoot2, a recent revision to the bootstrap protocol in IQ-TREE, but this depended on partitioning strategy. Summary coalescent approaches also failed to detect uncertainty under some circumstances. For each of four recalcitrant nodes, an equivalent (or close to equivalent) number of genes were in strong support ($>$ 75% bootstrap) of both the primary and at least one alternative topological hypothesis, suggesting notable phylogenetic conflict among loci not detected using some standard branch support metrics. Recent debate has focused on the appropriateness of concatenated versus multigenealogical approaches to resolving species relationships, but less so on accurately estimating uncertainty in large data sets. Our results demonstrate the importance of employing multiple approaches when assessing confidence and highlight the need for greater attention to the development of robust measures of uncertainty in the era of phylogenomics.
Collapse
Affiliation(s)
- Emily J Roycroft
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Adnan Moussalli
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| | - Kevin C Rowe
- School of BioSciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Department of Science, Museums Victoria, GPO Box 666, Melbourne, VIC 3001, Australia
| |
Collapse
|
18
|
Rey C, Lanore V, Veber P, Guéguen L, Lartillot N, Sémon M, Boussau B. Detecting adaptive convergent amino acid evolution. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180234. [PMID: 31154974 PMCID: PMC6560273 DOI: 10.1098/rstb.2018.0234] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2019] [Indexed: 11/12/2022] Open
Abstract
In evolutionary genomics, researchers have taken an interest in identifying substitutions that subtend convergent phenotypic adaptations. This is a difficult question that requires distinguishing foreground convergent substitutions that are involved in the convergent phenotype from background convergent substitutions. Those may be linked to other adaptations, may be neutral or may be the consequence of mutational biases. Furthermore, there is no generally accepted definition of convergent substitutions. Various methods that use different definitions have been proposed in the literature, resulting in different sets of candidate foreground convergent substitutions. In this article, we first describe the processes that can generate foreground convergent substitutions in coding sequences, separating adaptive from non-adaptive processes. Second, we review methods that have been proposed to detect foreground convergent substitutions in coding sequences and expose the assumptions that underlie them. Finally, we examine their power on simulations of convergent changes-including in the presence of a change in the efficacy of selection-and on empirical alignments. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Carine Rey
- ENS de Lyon, CNRS UMR 5239, INSERM U1210, LBMC, Univ Lyon, Université Claude Bernard Lyon 1, F-69007 Lyon, France
| | - Vincent Lanore
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Philippe Veber
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Laurent Guéguen
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Nicolas Lartillot
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Marie Sémon
- ENS de Lyon, CNRS UMR 5239, INSERM U1210, LBMC, Univ Lyon, Université Claude Bernard Lyon 1, F-69007 Lyon, France
| | - Bastien Boussau
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| |
Collapse
|
19
|
Ashkenazy H, Levy Karin E, Mertens Z, Cartwright RA, Pupko T. SpartaABC: a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm. Nucleic Acids Res 2019; 45:W453-W457. [PMID: 28460062 PMCID: PMC5570005 DOI: 10.1093/nar/gkx322] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 04/15/2017] [Indexed: 11/22/2022] Open
Abstract
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Zach Mertens
- The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.,School of Life Sciences, Arizona State University, Tempe, AZ 85287-5301, USA
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
20
|
Scornavacca C, Belkhir K, Lopez J, Dernat R, Delsuc F, Douzery EJP, Ranwez V. OrthoMaM v10: Scaling-Up Orthologous Coding Sequence and Exon Alignments with More than One Hundred Mammalian Genomes. Mol Biol Evol 2019; 36:861-862. [PMID: 30698751 PMCID: PMC6445298 DOI: 10.1093/molbev/msz015] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.
Collapse
Affiliation(s)
- Celine Scornavacca
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Khalid Belkhir
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Jimmy Lopez
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Rémy Dernat
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Frédéric Delsuc
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Vincent Ranwez
- AGAP, Univ. Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| |
Collapse
|
21
|
Weber CC, Whelan S. Physicochemical Amino Acid Properties Better Describe Substitution Rates in Large Populations. Mol Biol Evol 2019; 36:679-690. [PMID: 30668757 DOI: 10.1093/molbev/msz003] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Substitutions between chemically distant amino acids are known to occur less frequently than those between more similar amino acids. This knowledge, however, is not reflected in most codon substitution models, which treat all nonsynonymous changes as if they were equivalent in terms of impact on the protein. A variety of methods for integrating chemical distances into models have been proposed, with a common approach being to divide substitutions into radical or conservative categories. Nevertheless, it remains unclear whether the resulting models describe sequence evolution better than their simpler counterparts. We propose a parametric codon model that distinguishes between radical and conservative substitutions, allowing us to assess if radical substitutions are preferentially removed by selection. Applying our new model to a range of phylogenomic data, we find differentiating between radical and conservative substitutions provides significantly better fit for large populations, but see no equivalent improvement for smaller populations. Comparing codon and amino acid models using these same data shows that alignments from large populations tend to select phylogenetic models containing information about amino acid exchangeabilities, whereas the structure of the genetic code is more important for smaller populations. Our results suggest selection against radical substitutions is, on average, more pronounced in large populations than smaller ones. The reduced observable effect of selection in smaller populations may be due to stronger genetic drift making it more challenging to detect preferences. Our results imply an important connection between the life history of a phylogenetic group and the model that best describes its evolution.
Collapse
Affiliation(s)
- Claudia C Weber
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, PA.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Simon Whelan
- Evolutionary Biology Center, Uppsala University, Uppsala, Sweden
| |
Collapse
|
22
|
Ranwez V, Douzery EJP, Cambon C, Chantret N, Delsuc F. MACSE v2: Toolkit for the Alignment of Coding Sequences Accounting for Frameshifts and Stop Codons. Mol Biol Evol 2019; 35:2582-2584. [PMID: 30165589 PMCID: PMC6188553 DOI: 10.1093/molbev/msy159] [Citation(s) in RCA: 318] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.
Collapse
Affiliation(s)
- Vincent Ranwez
- AGAP, Université de Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution de Montpellier (ISEM), UMR 5554, CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France
| | - Cédric Cambon
- AGAP, Université de Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France.,Institut des Sciences de l'Evolution de Montpellier (ISEM), UMR 5554, CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France
| | - Nathalie Chantret
- AGAP, Université de Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Frédéric Delsuc
- Institut des Sciences de l'Evolution de Montpellier (ISEM), UMR 5554, CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France
| |
Collapse
|
23
|
Khrameeva E, Kurochkin I, Bozek K, Giavalisco P, Khaitovich P. Lipidome Evolution in Mammalian Tissues. Mol Biol Evol 2019; 35:1947-1957. [PMID: 29762743 PMCID: PMC6063302 DOI: 10.1093/molbev/msy097] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Lipids are essential structural and functional components of cells. Little is known, however, about the evolution of lipid composition in different tissues. Here, we report a large-scale analysis of the lipidome evolution in six tissues of 32 species representing primates, rodents, and bats. While changes in genes’ sequence and expression accumulate proportionally to the phylogenetic distances, <2% of the lipidome evolves this way. Yet, lipids constituting this 2% cluster in specific functions shared among all tissues. Among species, human show the largest amount of species-specific lipidome differences. Many of the uniquely human lipidome features localize in the brain cortex and cluster in specific pathways implicated in cognitive disorders.
Collapse
Affiliation(s)
- Ekaterina Khrameeva
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia.,A.A.Kharkevich, Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Ilia Kurochkin
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Katarzyna Bozek
- Biological Physics Theory Unit, Okinawa Institute of Science and Technology, Graduate University, Onna-Son, Kunigami-Gun, Okinawa, Japan
| | - Patrick Giavalisco
- Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.,Current affiliation: Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - Philipp Khaitovich
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia.,Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai, China
| |
Collapse
|
24
|
Mello B, Schrago CG. The Estimated Pacemaker for Great Apes Supports the Hominoid Slowdown Hypothesis. Evol Bioinform Online 2019; 15:1176934319855988. [PMID: 31223232 PMCID: PMC6566470 DOI: 10.1177/1176934319855988] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/17/2019] [Indexed: 11/16/2022] Open
Abstract
The recent surge of genomic data has prompted the investigation of substitution rate variation across the genome, as well as among lineages. Evolutionary trees inferred from distinct genomic regions may display branch lengths that differ between loci by simple proportionality constants, indicating that rate variation follows a pacemaker model, which may be attributed to lineage effects. Analyses of genes from diverse biological clades produced contrasting results, supporting either this model or alternative scenarios where multiple pacemakers exist. So far, an evaluation of the pacemaker hypothesis for all great apes has never been carried out. In this work, we tested whether the evolutionary rates of hominids conform to pacemakers, which were inferred accounting for gene tree/species tree discordance. For higher precision, substitution rates in branches were estimated with a calibration-free approach, the relative rate framework. A predominant evolutionary trend in great apes was evidenced by the recovery of a large pacemaker, encompassing most hominid genomic regions. In addition, the majority of genes followed a pace of evolution that was closely related to the strict molecular clock. However, slight rate decreases were recovered in the internal branches leading to humans, corroborating the hominoid slowdown hypothesis. Our findings suggest that in great apes, life history traits were the major drivers of substitution rate variation across the genome.
Collapse
Affiliation(s)
- Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
25
|
Fan H, Hu Y, Shan L, Yu L, Wang B, Li M, Wu Q, Wei F. Synteny search identifies carnivore Y chromosome for evolution of male specific genes. Integr Zool 2019; 14:224-234. [PMID: 30019860 DOI: 10.1111/1749-4877.12352] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The explosive accumulation of mammalian genomes has provided a valuable resource to characterize the evolution of the Y chromosome. Unexpectedly, the Y-chromosome sequence has been characterized in only a small handful of species, with the majority being model organisms. Thus, identification of Y-linked scaffolds from unordered genome sequences is becoming more important. Here, we used a syntenic-based approach to generate the scaffolds of the male-specific region of the Y chromosome (MSY) from the genome sequence of 6 male carnivore species. Our results identified 14, 15, 9, 28, 14 and 11 Y-linked scaffolds in polar bears, pacific walruses, red pandas, cheetahs, ferrets and tigers, covering 1.55 Mbp, 2.62 Mbp, 964 Kb, 1.75 Mb, 2.17 Mbp and 1.84 Mb MSY, respectively. All the candidate Y-linked scaffolds in 3 selected species (red pandas, polar bears and tigers) were successfully verified using polymerase chain reaction. We re-annotated 8 carnivore MSYs including these 6 Y-linked scaffolds and domestic dog and cat MSY; a total of 11 orthologous genes conserved in at least 7 of the 8 carnivores were identified. These 11 Y-linked genes have significantly higher evolutionary rates compared with their X-linked counterparts, indicating less purifying selection for MSY genes. Taken together, our study shows that the approach of synteny search is a reliable and easily affordable strategy to identify Y-linked scaffolds from unordered carnivore genomes and provides a preliminary evolutionary study for carnivore MSY genes.
Collapse
Affiliation(s)
- Huizhong Fan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yibo Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Lei Shan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Lijun Yu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Bing Wang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Min Li
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qi Wu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Fuwen Wei
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
26
|
Vialle RA, Tamuri AU, Goldman N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol Biol Evol 2019; 35:1783-1797. [PMID: 29618097 PMCID: PMC5995191 DOI: 10.1093/molbev/msy055] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Accurate reconstruction of ancestral states is a critical evolutionary analysis when studying ancient proteins and comparing biochemical properties between parental or extinct species and their extant relatives. It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodological approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodology modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topology and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodological approaches. In addition, we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodological differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately determine ancestral states with confidence.
Collapse
Affiliation(s)
- Ricardo Assunção Vialle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom.,Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Department of Genetics and Molecular Biology, Laboratory of Human and Medical Genetics, Federal University of Pará, Belém, Pará, Brazil
| | - Asif U Tamuri
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom.,Research IT Services, University College London, London, United Kingdom
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
27
|
Chavez DE, Gronau I, Hains T, Kliver S, Koepfli KP, Wayne RK. Comparative genomics provides new insights into the remarkable adaptations of the African wild dog (Lycaon pictus). Sci Rep 2019; 9:8329. [PMID: 31171819 PMCID: PMC6554312 DOI: 10.1038/s41598-019-44772-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 05/22/2019] [Indexed: 12/02/2022] Open
Abstract
Within the Canidae, the African wild dog (Lycaon pictus) is the most specialized with regards to cursorial adaptations (specialized for running), having only four digits on their forefeet. In addition, this species is one of the few canids considered to be an obligate meat-eater, possessing a robust dentition for taking down large prey, and displays one of the most variable coat colorations amongst mammals. Here, we used comparative genomic analysis to investigate the evolutionary history and genetic basis for adaptations associated with cursoriality, hypercanivory, and coat color variation in African wild dogs. Genome-wide scans revealed unique amino acid deletions that suggest a mode of evolutionary digit loss through expanded apoptosis in the developing first digit. African wild dog-specific signals of positive selection also uncovered a putative mechanism of molar cusp modification through changes in genes associated with the sonic hedgehog (SHH) signaling pathway, required for spatial patterning of teeth, and three genes associated with pigmentation. Divergence time analyses suggest the suite of genomic changes we identified evolved ~1.7 Mya, coinciding with the diversification of large-bodied ungulates. Our results show that comparative genomics is a powerful tool for identifying the genetic basis of evolutionary changes in Canidae.
Collapse
Affiliation(s)
- Daniel E Chavez
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA.
| | - Ilan Gronau
- Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center (IDC), Herzliya, 46150, Israel
| | - Taylor Hains
- Environmental Science and Policy, Johns Hopkins University, Washington, D.C., 20036, USA
| | - Sergei Kliver
- Institute of Molecular and Cellular Biology, Novosibirsk, 630090, Russian Federation
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, Washington, D.C., 20008, USA
- Theodosius Dobzhansky Center for Genome Bioinformatics, Saint Petersburg State University, Saint Petersburg, 199034, Russian Federation
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, 90095, USA
| |
Collapse
|
28
|
Glémin S, Scornavacca C, Dainat J, Burgarella C, Viader V, Ardisson M, Sarah G, Santoni S, David J, Ranwez V. Pervasive hybridizations in the history of wheat relatives. SCIENCE ADVANCES 2019; 5:eaav9188. [PMID: 31049399 PMCID: PMC6494498 DOI: 10.1126/sciadv.aav9188] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 03/20/2019] [Indexed: 05/18/2023]
Abstract
Cultivated wheats are derived from an intricate history of three genomes, A, B, and D, present in both diploid and polyploid species. It was recently proposed that the D genome originated from an ancient hybridization between the A and B lineages. However, this result has been questioned, and a robust phylogeny of wheat relatives is still lacking. Using transcriptome data from all diploid species and a new methodological approach, our comprehensive phylogenomic analysis revealed that more than half of the species descend from an ancient hybridization event but with a more complex scenario involving a different parent than previously thought-Aegilops mutica, an overlooked wild species-instead of the B genome. We also detected other extensive gene flow events that could explain long-standing controversies in the classification of wheat relatives.
Collapse
Affiliation(s)
- Sylvain Glémin
- CNRS, Univ Rennes, ECOBIO (Ecosystèmes, biodiversité, évolution)–UMR 6553, F-35042 Rennes, France
- Department of Ecology and Genetics, Evolutionary Biology Center, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution Université de Montpellier, CNRS, IRD, EPHE CC 064, Place Eugène Bataillon, 34095 Montpellier, cedex 05, France
| | - Jacques Dainat
- National Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Uppsala Biomedicinska Centrum (BMC), Husargatan 3, S-751 23 Uppsala, Sweden
- IMBIM–Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala Biomedicinska Centrum (BMC), Husargatan 3, Box 582, S-751 23 Uppsala, Sweden
| | - Concetta Burgarella
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| | - Véronique Viader
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Morgane Ardisson
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Gautier Sarah
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
- South Green Bioinformatics Platform, BIOVERSITY, CIRAD, INRA, IRD, Montpellier SupAgro, Montpellier, France
| | - Sylvain Santoni
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Jacques David
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - Vincent Ranwez
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| |
Collapse
|
29
|
Laurin-Lemay S, Philippe H, Rodrigue N. Multiple Factors Confounding Phylogenetic Detection of Selection on Codon Usage. Mol Biol Evol 2019; 35:1463-1472. [PMID: 29596640 DOI: 10.1093/molbev/msy047] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Detecting selection on codon usage (CU) is a difficult task, since CU can be shaped by both the mutational process and selective constraints operating at the DNA, RNA, and protein levels. Yang and Nielsen (2008) developed a test (which we call CUYN) for detecting selection on CU using two competing mutation-selection models of codon substitution. The null model assumes that CU is determined by the mutation bias alone, whereas the alternative model assumes that both mutation bias and/or selection act on CU. In applications on mammalian-scale alignments, the CUYN test detects selection on CU for numerous genes. This is surprising, given the small effective population size of mammals, and prompted us to use simulations to evaluate the robustness of the test to model violations. Simulations using a modest level of CpG hypermutability completely mislead the test, with 100% false positives. Surprisingly, a high level of false positives (56.1%) resulted simply from using the HKY mutation-level parameterization within the CUYN test on simulations conducted with a GTR mutation-level parameterization. Finally, by using a crude optimization procedure on a parameter controlling the CpG hypermutability rate, we find that this mutational property could explain a very large part of the observed mammalian CU. Altogether, our work emphasizes the need to evaluate the potential impact of model violations on statistical tests in the field of molecular phylogenetic analysis. The source code of the simulator and the mammalian genes used are available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Department of Biochemistry and Molecular Medicine, Robert-Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Hervé Philippe
- Department of Biochemistry and Molecular Medicine, Robert-Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, Ariège, France
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| |
Collapse
|
30
|
Digging for the spiny rat and hutia phylogeny using a gene capture approach, with the description of a new mammal subfamily. Mol Phylogenet Evol 2019; 136:241-253. [PMID: 30885830 DOI: 10.1016/j.ympev.2019.03.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 03/13/2019] [Accepted: 03/14/2019] [Indexed: 02/07/2023]
Abstract
Next generation sequencing (NGS) and genomic database mining allow biologists to gather and select large molecular datasets well suited to address phylogenomics and molecular evolution questions. Here we applied this approach to a mammal family, the Echimyidae, for which generic relationships have been difficult to recover and often referred to as a star phylogeny. These South-American spiny rats represent a family of caviomorph rodents exhibiting a striking diversity of species and life history traits. Using a NGS exon capture protocol, we isolated and sequenced ca. 500 nuclear DNA exons for 35 species belonging to all major echimyid and capromyid clades. Exons were carefully selected to encompass as much diversity as possible in terms of rate of evolution, heterogeneity in the distribution of site-variation and nucleotide composition. Supermatrix inferences and coalescence-based approaches were subsequently applied to infer this family's phylogeny. The inferred topologies were the same for both approaches, and support was maximal for each node, entirely resolving the ambiguous relationships of previous analyses. Fast-evolving nuclear exons tended to yield more reliable phylogenies, as slower-evolving sequences were not informative enough to disentangle the short branches of the Echimyidae radiation. Based on this resolved phylogeny and on molecular and morphological evidence, we confirm the rank of the Caribbean hutias - formerly placed in the Capromyidae family - as Capromyinae, a clade nested within Echimyidae. We also name and define Carterodontinae, a new subfamily of Echimyidae, comprising the extant monotypic genus Carterodon from Brazil, which is the closest living relative of West Indies Capromyinae.
Collapse
|
31
|
Abstract
C-to-U RNA editing enzymatically converts the base C to U in RNA molecules and could lead to nonsynonymous changes when occurring in coding regions. Hundreds to thousands of coding sites were recently found to be C-to-U edited or editable in humans, but the biological significance of this phenomenon is elusive. Here, we test the prevailing hypothesis that nonsynonymous editing is beneficial because it provides a means for tissue- or time-specific regulation of protein function that may be hard to accomplish by mutations due to pleiotropy. The adaptive hypothesis predicts that the fraction of sites edited and the median proportion of RNA molecules edited (i.e., editing level) are both higher for nonsynonymous than synonymous editing. However, our empirical observations are opposite to these predictions. Furthermore, the frequency of nonsynonymous editing, relative to that of synonymous editing, declines as genes become functionally more important or evolutionarily more constrained, and the nonsynonymous editing level at a site is negatively correlated with the evolutionary conservation of the site. Together, these findings refute the adaptive hypothesis; they instead indicate that the reported C-to-U coding RNA editing is mostly slightly deleterious or neutral, probably resulting from off-target activities of editing enzymes. Along with similar conclusions on the more prevalent A-to-I editing and m6A modification of coding RNAs, our study suggests that, at least in humans, most events of each type of posttranscriptional coding RNA modification likely manifest cellular errors rather than adaptations, demanding a paradigm shift in the research of posttranscriptional modification.
Collapse
Affiliation(s)
- Zhen Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
32
|
Rousselle M, Laverré A, Figuet E, Nabholz B, Galtier N. Influence of Recombination and GC-biased Gene Conversion on the Adaptive and Nonadaptive Substitution Rate in Mammals versus Birds. Mol Biol Evol 2019; 36:458-471. [PMID: 30590692 PMCID: PMC6389324 DOI: 10.1093/molbev/msy243] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Recombination is expected to affect functional sequence evolution in several ways. On the one hand, recombination is thought to improve the efficiency of multilocus selection by dissipating linkage disequilibrium. On the other hand, natural selection can be counteracted by recombination-associated transmission distorters such as GC-biased gene conversion (gBGC), which tends to promote G and C alleles irrespective of their fitness effect in high-recombining regions. It has been suggested that gBGC might impact coding sequence evolution in vertebrates, and particularly the ratio of nonsynonymous to synonymous substitution rates (dN/dS). However, distinctive gBGC patterns have been reported in mammals and birds, maybe reflecting the documented contrasts in evolutionary dynamics of recombination rate between these two taxa. Here, we explore how recombination and gBGC affect coding sequence evolution in mammals and birds by analyzing proteome-wide data in six species of Galloanserae (fowls) and six species of catarrhine primates. We estimated the dN/dS ratio and rates of adaptive and nonadaptive evolution in bins of genes of increasing recombination rate, separately analyzing AT → GC, GC → AT, and G ↔ C/A ↔ T mutations. We show that in both taxa, recombination and gBGC entail a decrease in dN/dS. Our analysis indicates that recombination enhances the efficiency of purifying selection by lowering Hill-Robertson effects, whereas gBGC leads to an overestimation of the adaptive rate of AT → GC mutations. Finally, we report a mutagenic effect of recombination, which is independent of gBGC.
Collapse
Affiliation(s)
| | - Alexandre Laverré
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Emeric Figuet
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Benoit Nabholz
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Nicolas Galtier
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| |
Collapse
|
33
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
34
|
Hughes GM, Teeling EC. AGILE: an assembled genome mining pipeline. Bioinformatics 2018; 35:1252-1254. [DOI: 10.1093/bioinformatics/bty781] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 07/10/2018] [Accepted: 09/03/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Graham M Hughes
- School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
35
|
Botero-Castro F, Tilak MK, Justy F, Catzeflis F, Delsuc F, Douzery EJP. In Cold Blood: Compositional Bias and Positive Selection Drive the High Evolutionary Rate of Vampire Bats Mitochondrial Genomes. Genome Biol Evol 2018; 10:2218-2239. [PMID: 29931241 PMCID: PMC6127110 DOI: 10.1093/gbe/evy120] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2018] [Indexed: 12/24/2022] Open
Abstract
Mitochondrial genomes of animals have long been considered to evolve under the action of purifying selection. Nevertheless, there is increasing evidence that they can also undergo episodes of positive selection in response to shifts in physiological or environmental demands. Vampire bats experienced such a shift, as they are the only mammals feeding exclusively on blood and possessing anatomical adaptations to deal with the associated physiological requirements (e.g., ingestion of high amounts of liquid water and iron). We sequenced eight new chiropteran mitogenomes including two species of vampire bats, five representatives of other lineages of phyllostomids and one close outgroup. Conducting detailed comparative mitogenomic analyses, we found evidence for accelerated evolutionary rates at the nucleotide and amino acid levels in vampires. Moreover, the mitogenomes of vampire bats are characterized by an increased cytosine (C) content mirrored by a decrease in thymine (T) compared with other chiropterans. Proteins encoded by the vampire bat mitogenomes also exhibit a significant increase in threonine (Thr) and slight reductions in frequency of the hydrophobic residues isoleucine (Ile), valine (Val), methionine (Met), and phenylalanine (Phe). We show that these peculiar substitution patterns can be explained by the co-occurrence of both neutral (mutational bias) and adaptive (positive selection) processes. We propose that vampire bat mitogenomes may have been impacted by selection on mitochondrial proteins to accommodate the metabolism and nutritional qualities of blood meals.
Collapse
Affiliation(s)
- Fidel Botero-Castro
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France.,Division of Evolutionary Biology, Faculty of Biology II, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Marie-Ka Tilak
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Fabienne Justy
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - François Catzeflis
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Frédéric Delsuc
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution (ISEM), Univ. Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
36
|
Affiliation(s)
- Amy Willis
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Rayna Bell
- Smithsonian Institution, National Museum of Natural History, Washington, DC
| |
Collapse
|
37
|
Chen MY, Liang D, Zhang P. Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences. Genome Biol Evol 2018; 9:1998-2012. [PMID: 28830116 PMCID: PMC5737624 DOI: 10.1093/gbe/evx147] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/30/2017] [Indexed: 12/12/2022] Open
Abstract
The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life.
Collapse
Affiliation(s)
- Meng-Yun Chen
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Dan Liang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Peng Zhang
- State Key Laboratory of Biocontrol, College of Ecology and Evolution, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
38
|
Allio R, Donega S, Galtier N, Nabholz B. Large Variation in the Ratio of Mitochondrial to Nuclear Mutation Rate across Animals: Implications for Genetic Diversity and the Use of Mitochondrial DNA as a Molecular Marker. Mol Biol Evol 2018; 34:2762-2772. [PMID: 28981721 DOI: 10.1093/molbev/msx197] [Citation(s) in RCA: 208] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
It is commonly assumed that mitochondrial DNA (mtDNA) evolves at a faster rate than nuclear DNA (nuDNA) in animals. This has contributed to the popularity of mtDNA as a molecular marker in evolutionary studies. Analyzing 121 multilocus data sets and four phylogenomic data sets encompassing 4,676 species of animals, we demonstrate that the ratio of mitochondrial over nuclear mutation rate is highly variable among animal taxa. In nonvertebrates, such as insects and arachnids, the ratio of mtDNA over nuDNA mutation rate varies between 2 and 6, whereas it is above 20, on average, in vertebrates such as scaled reptiles and birds. Interestingly, this variation is sufficient to explain the previous report of a similar level of mitochondrial polymorphism, on average, between vertebrates and nonvertebrates, which was originally interpreted as reflecting the effect of pervasive positive selection. Our analysis rather indicates that the among-phyla homogeneity in within-species mtDNA diversity is due to a negative correlation between mtDNA per-generation mutation rate and effective population size, irrespective of the action of natural selection. Finally, we explore the variation in the absolute per-year mutation rate of both mtDNA and nuDNA using a reduced data set for which fossil calibration is available, and discuss the potential determinants of mutation rate variation across genomes and taxa. This study has important implications regarding DNA-based identification methods in predicting that mtDNA barcoding should be less reliable in nonvertebrates than in vertebrates.
Collapse
Affiliation(s)
- Remi Allio
- ISEM, Univ. Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Stefano Donega
- ISEM, Univ. Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Nicolas Galtier
- ISEM, Univ. Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Benoit Nabholz
- ISEM, Univ. Montpellier, CNRS, IRD, EPHE, Montpellier, France
| |
Collapse
|
39
|
Springer MS, Gatesy J. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets. Genes (Basel) 2018; 9:genes9030123. [PMID: 29495400 PMCID: PMC5867844 DOI: 10.3390/genes9030123] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/02/2018] [Accepted: 02/19/2018] [Indexed: 02/07/2023] Open
Abstract
coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation).
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA.
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
| |
Collapse
|
40
|
Scornavacca C, Galtier N. Incomplete Lineage Sorting in Mammalian Phylogenomics. Syst Biol 2018; 66:112-120. [PMID: 28173480 DOI: 10.1093/sysbio/syw082] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Revised: 03/25/2016] [Accepted: 09/04/2016] [Indexed: 01/05/2023] Open
Abstract
The impact of incomplete lineage sorting (ILS) on phylogenetic conflicts among genes, and the related issue of whether to account for ILS in species tree reconstruction, are matters of intense controversy. Here, focusing on full-genome data in placental mammals, we empirically test two assumptions underlying current usage of tree-building methods that account for ILS. We show that in this data set (i) distinct exons from a common gene do not share a common genealogy, and (ii) ILS is only a minor determinant of the existing phylogenetic conflict. These results shed new light on the relevance and conditions of applicability of ILS-aware methods in phylogenomic analyses of protein coding sequences.
Collapse
Affiliation(s)
- Celine Scornavacca
- UMR 5554-Institute of Evolutionary Sciences, University Montpellier, CNRS, IRD, EPHE, Place E. Bataillon-CC64, Montpellier, France
| | - Nicolas Galtier
- UMR 5554-Institute of Evolutionary Sciences, University Montpellier, CNRS, IRD, EPHE, Place E. Bataillon-CC64, Montpellier, France
| |
Collapse
|
41
|
Eldridge RA, Achmadi AS, Giarla TC, Rowe KC, Esselstyn JA. Geographic isolation and elevational gradients promote diversification in an endemic shrew on Sulawesi. Mol Phylogenet Evol 2018; 118:306-317. [DOI: 10.1016/j.ympev.2017.09.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 08/08/2017] [Accepted: 09/25/2017] [Indexed: 01/06/2023]
|
42
|
Zou Z, Zhang J. Gene Tree Discordance Does Not Explain Away the Temporal Decline of Convergence in Mammalian Protein Sequence Evolution. Mol Biol Evol 2017; 34:1682-1688. [PMID: 28379570 DOI: 10.1093/molbev/msx109] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Several authors reported lower frequencies of protein sequence convergence between more distantly related evolutionary lineages and attributed this trend to epistasis, which renders the acceptable amino acids at a site more different and convergence less likely in more divergent lineages. A recent primate study, however, suggested that this trend is at least partially and potentially entirely an artifact of gene tree discordance (GTD). Here, we demonstrate in a genome-wide data set from 17 mammals that the temporal trend remains (1) upon the control of the GTD level, (2) in genes whose genealogies are concordant with the species tree, and (3) for convergent changes, which are extremely unlikely to be caused by GTD. Similar results are observed in a comparable data set of 12 fruit flies in some but not all of these tests. We conclude that, at least in some cases, the temporal decline of convergence is genuine, reflecting an impact of epistasis on protein evolution.
Collapse
Affiliation(s)
- Zhengting Zou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
43
|
Springer MS, Gatesy J. Pinniped Diphyly and Bat Triphyly: More Homology Errors Drive Conflicts in the Mammalian Tree. J Hered 2017; 109:297-307. [DOI: 10.1093/jhered/esx089] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 10/07/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY
| |
Collapse
|
44
|
Ivancevic AM, Kortschak RD, Bertozzi T, Adelson DL. LINEs between Species: Evolutionary Dynamics of LINE-1 Retrotransposons across the Eukaryotic Tree of Life. Genome Biol Evol 2016; 8:3301-3322. [PMID: 27702814 PMCID: PMC5203782 DOI: 10.1093/gbe/evw243] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
LINE-1 (L1) retrotransposons are dynamic elements. They have the potential to cause great genomic change because of their ability to ‘jump’ around the genome and amplify themselves, resulting in the duplication and rearrangement of regulatory DNA. Active L1, in particular, are often thought of as tightly constrained, homologous and ubiquitous elements with well-characterized domain organization. For the past 30 years, model organisms have been used to define L1s as 6–8 kb sequences containing a 5′-UTR, two open reading frames working harmoniously in cis, and a 3′-UTR with a polyA tail. In this study, we demonstrate the remarkable and overlooked diversity of L1s via a comprehensive phylogenetic analysis of elements from over 500 species from widely divergent branches of the tree of life. The rapid and recent growth of L1 elements in mammalian species is juxtaposed against the diverse lineages found in other metazoans and plants. In fact, some of these previously unexplored mammalian species (e.g. snub-nosed monkey, minke whale) exhibit L1 retrotranspositional ‘hyperactivity’ far surpassing that of human or mouse. In contrast, non-mammalian L1s have become so varied that the current classification system seems to inadequately capture their structural characteristics. Our findings illustrate how both long-term inherited evolutionary patterns and random bursts of activity in individual species can significantly alter genomes, highlighting the importance of L1 dynamics in eukaryotes.
Collapse
Affiliation(s)
- Atma M Ivancevic
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia
| | - R Daniel Kortschak
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia
| | - Terry Bertozzi
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia.,Evolutionary Biology Unit, South Australian Museum, Adelaide, South Australia, Australia
| | - David L Adelson
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
45
|
Mason VC, Li G, Minx P, Schmitz J, Churakov G, Doronina L, Melin AD, Dominy NJ, Lim NTL, Springer MS, Wilson RK, Warren WC, Helgen KM, Murphy WJ. Genomic analysis reveals hidden biodiversity within colugos, the sister group to primates. SCIENCE ADVANCES 2016; 2:e1600633. [PMID: 27532052 PMCID: PMC4980104 DOI: 10.1126/sciadv.1600633] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 07/13/2016] [Indexed: 05/25/2023]
Abstract
Colugos are among the most poorly studied mammals despite their centrality to resolving supraordinal primate relationships. Two described species of these gliding mammals are the sole living members of the order Dermoptera, distributed throughout Southeast Asia. We generated a draft genome sequence for a Sunda colugo and a Philippine colugo reference alignment, and used these to identify colugo-specific genetic changes that were enriched in sensory and musculoskeletal-related genes that likely underlie their nocturnal and gliding adaptations. Phylogenomic analysis and catalogs of rare genomic changes overwhelmingly support the contested hypothesis that colugos are the sister group to primates (Primatomorpha), to the exclusion of treeshrews. We captured ~140 kb of orthologous sequence data from colugo museum specimens sampled across their range and identified large genetic differences between many geographically isolated populations that may result in a >300% increase in the number of recognized colugo species. Our results identify conservation units to mitigate future losses of this enigmatic mammalian order.
Collapse
Affiliation(s)
- Victor C. Mason
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX 77843, USA
| | - Gang Li
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX 77843, USA
| | - Patrick Minx
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Jürgen Schmitz
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| | - Gennady Churakov
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
- Institute of Evolution and Biodiversity, University of Münster, D-48149 Münster, Germany
| | - Liliya Doronina
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| | | | - Nathaniel J. Dominy
- Departments of Anthropology and Biological Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Norman T-L. Lim
- Natural Sciences and Science Education, National Institute of Education, Nanyang Technological University, Singapore 637616, Singapore
- Lee Kong Chian Natural History Museum, National University of Singapore, Singapore 117377 , Singapore
| | - Mark S. Springer
- Department of Biology, University of California, Riverside, Riverside, CA 92521, USA
| | - Richard K. Wilson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Wesley C. Warren
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Kristofer M. Helgen
- Division of Mammals, Smithsonian Institution, National Museum of Natural History, Washington, DC 20013, USA
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences, Interdisciplinary Program in Genetics, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
46
|
Ballesteros JA, Hormiga G. A New Orthology Assessment Method for Phylogenomic Data: Unrooted Phylogenetic Orthology. Mol Biol Evol 2016; 33:2117-34. [PMID: 27189539 DOI: 10.1093/molbev/msw069] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Current sequencing technologies are making available unprecedented amounts of genetic data for a large variety of species including nonmodel organisms. Although many phylogenomic surveys spend considerable time finding orthologs from the wealth of sequence data, these results do not transcend the original study and after being processed for specific phylogenetic purposes these orthologs do not become stable orthology hypotheses. We describe a procedure to detect and document the phylogenetic distribution of orthologs allowing researchers to use this information to guide selection of loci best suited to test specific evolutionary questions. At the core of this pipeline is a new phylogenetic orthology method that is neither affected by the position of the root nor requires explicit assignment of outgroups. We discuss the properties of this new orthology assessment method and exemplify its utility for phylogenomics using a small insects dataset. In addition, we exemplify the pipeline to identify and document stable orthologs for the group of orb-weaving spiders (Araneoidea) using RNAseq data. The scripts used in this study, along with sample files and additional documentation, are available at https://github.com/ballesterus/UPhO.
Collapse
Affiliation(s)
| | - Gustavo Hormiga
- Department of Biological Sciences, The George Washington University
| |
Collapse
|
47
|
Figuet E, Nabholz B, Bonneau M, Mas Carrio E, Nadachowska-Brzyska K, Ellegren H, Galtier N. Life History Traits, Protein Evolution, and the Nearly Neutral Theory in Amniotes. Mol Biol Evol 2016; 33:1517-27. [DOI: 10.1093/molbev/msw033] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
|
48
|
Binet M, Gascuel O, Scornavacca C, Douzery EJP, Pardi F. Fast and accurate branch lengths estimation for phylogenomic trees. BMC Bioinformatics 2016; 17:23. [PMID: 26744021 PMCID: PMC4705742 DOI: 10.1186/s12859-015-0821-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 11/02/2015] [Indexed: 01/26/2023] Open
Abstract
Background Branch lengths are an important attribute of phylogenetic trees, providing essential information for many studies in evolutionary biology. Yet, part of the current methodology to reconstruct a phylogeny from genomic information — namely supertree methods — focuses on the topology or structure of the phylogenetic tree, rather than the evolutionary divergences associated to it. Moreover, accurate methods to estimate branch lengths — typically based on probabilistic analysis of a concatenated alignment — are limited by large demands in memory and computing time, and may become impractical when the data sets are too large. Results Here, we present a novel phylogenomic distance-based method, named ERaBLE (Evolutionary Rates and Branch Length Estimation), to estimate the branch lengths of a given reference topology, and the relative evolutionary rates of the genes employed in the analysis. ERaBLE uses as input data a potentially very large collection of distance matrices, where each matrix is obtained from a different genomic region — either directly from its sequence alignment, or indirectly from a gene tree inferred from the alignment. Our experiments show that ERaBLE is very fast and fairly accurate when compared to other possible approaches for the same tasks. Specifically, it efficiently and accurately deals with large data sets, such as the OrthoMaM v8 database, composed of 6,953 exons from up to 40 mammals. Conclusions ERaBLE may be used as a complement to supertree methods — or it may provide an efficient alternative to maximum likelihood analysis of concatenated alignments — to estimate branch lengths from phylogenomic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0821-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manuel Binet
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Olivier Gascuel
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| | - Celine Scornavacca
- Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Fabio Pardi
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
49
|
Levy Karin E, Rabin A, Ashkenazy H, Shkedy D, Avram O, Cartwright RA, Pupko T. Inferring Indel Parameters using a Simulation-based Approach. Genome Biol Evol 2015; 7:3226-38. [PMID: 26537226 PMCID: PMC4700945 DOI: 10.1093/gbe/evv212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.
Collapse
Affiliation(s)
- Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Avigayel Rabin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Dafna Shkedy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Oren Avram
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel The Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe School of Life Sciences, Arizona State University, Tempe
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
50
|
Nguyen LP, Galtier N, Nabholz B. Gene expression, chromosome heterogeneity and the fast-X effect in mammals. Biol Lett 2015; 11:20150010. [PMID: 25716091 DOI: 10.1098/rsbl.2015.0010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The higher rate of non-synonymous over synonymous substitutions (dN/dS) of the X chromosome compared with autosomes is often interpreted as a consequence of X hemizygosity. However, other factors, such as gene expression, are also known to vary between X and autosomes. Analysing 4800 orthologues in six mammals, we found that gene expression levels, associated with GC content, fully account for the variation in dN/dS between X and autosomes with no detectable effect of hemizygosity. We also report an extensive variance in dN/dS and gene expression between autosomes.
Collapse
Affiliation(s)
- Linh-Phuong Nguyen
- Institut des Sciences de l'Evolution, CC64, Université Montpellier II, Place Eugène Bataillon, Montpellier cedex 5 34095, France
| | - Nicolas Galtier
- Institut des Sciences de l'Evolution, CC64, Université Montpellier II, Place Eugène Bataillon, Montpellier cedex 5 34095, France
| | - Benoit Nabholz
- Institut des Sciences de l'Evolution, CC64, Université Montpellier II, Place Eugène Bataillon, Montpellier cedex 5 34095, France
| |
Collapse
|