1
|
Szasz-Green T, Shores K, Vanga V, Zacharias L, Lawton AK, Dapper AL. Comparative Phylogenetics Reveal Clade-specific Drivers of Recombination Rate Evolution Across Vertebrates. Mol Biol Evol 2025; 42:msaf100. [PMID: 40331240 PMCID: PMC12100477 DOI: 10.1093/molbev/msaf100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 03/06/2025] [Accepted: 04/11/2025] [Indexed: 05/08/2025] Open
Abstract
Meiotic recombination is an integral cellular process, required for the production of viable gametes. Recombination rate is a fundamental genomic parameter, modulating genomic responses to selection. Our increasingly detailed understanding of its molecular underpinnings raises the prospect that we can gain insight into trait divergence by examining the molecular evolution of recombination genes from a pathway perspective, as in mammals, where protein-coding changes in later stages of the recombination pathway are connected to divergence in intra-clade recombination rate. Here, we leverage increased availability of avian and teleost genomes to reconstruct the evolution of the recombination pathway across two additional vertebrate clades: birds, which have higher and more variable rates of recombination and similar divergence times to mammals, and teleost fish, which have much deeper divergence times. Rates of molecular evolution of recombination genes are highly correlated between vertebrate clades and significantly elevated compared to control panels, suggesting that they experience similar selective pressures. Avian recombination genes are significantly more likely to exhibit signatures of positive selection than other clades, unrestricted to later stages of the pathway. Signatures of positive selection in genes linked to recombination rate variation in mammalian populations and those with signatures of positive selection across the avian phylogeny are highly correlated. In contrast, teleost fish recombination genes have significantly less evidence of positive selection despite high intra-clade recombination rate variability. Gaining clade-specific understanding of patterns of variation in recombination genes can elucidate drivers of recombination rate and thus, factors influencing genetic diversity, selection efficacy, and species divergence.
Collapse
Affiliation(s)
- Taylor Szasz-Green
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Katherynne Shores
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Vineel Vanga
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Luke Zacharias
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Andrew K Lawton
- Department of Biology, Appalachian State University, Boone, NC 28608, USA
| | - Amy L Dapper
- Department of Biological Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| |
Collapse
|
2
|
Robert Kolar M, Mitra D, Kobzarenko V. Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization. PLoS Comput Biol 2025; 21:e1012391. [PMID: 40273414 DOI: 10.1371/journal.pcbi.1012391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 04/08/2025] [Indexed: 04/26/2025] Open
Abstract
We have developed a robust method for efficiently tracking multiple co-occurring mutations in a sequence database. Evolution often hinges on the interaction of several mutations to produce significant phenotypic changes that lead to the proliferation of a variant. However, identifying numerous simultaneous mutations across a vast database of sequences poses a significant computational challenge. Our approach leverages a matrix factorization technique to automatically and efficiently pinpoint subsets of positions where co-mutations occur, appearing in a substantial number of sequences within the database. We validated our method using SARS-CoV-2 receptor-binding domains, comprising approximately seven hundred thousand sequences of the Spike protein, demonstrating superior performance compared to a reasonably exhaustive brute-force method. Furthermore, we explore the biological significance of the identified co-mutational positions (CMPs) and their potential impact on the virus's evolution and functionality, identifying key mutations in Delta and Omicron variants. This analysis underscores the significant role of identified CMPs in understanding the evolutionary trajectory. By tracking the "birth" and "death" of CMPs, we can elucidate the persistence and impact of specific groups of mutations across different viral strains, providing valuable insights into the virus' adaptability and thus, possibly aiding vaccine design strategies.
Collapse
Affiliation(s)
- Michael Robert Kolar
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| | - Debasis Mitra
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| | - Valerie Kobzarenko
- BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America
| |
Collapse
|
3
|
Verdonk H, Pivirotto A, Pavinato V, Hey J, Pond SLK. A New Comparative Framework for Estimating Selection on Synonymous Substitutions. Mol Biol Evol 2025; 42:msaf068. [PMID: 40129111 PMCID: PMC11979333 DOI: 10.1093/molbev/msaf068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 01/07/2025] [Accepted: 02/19/2025] [Indexed: 03/26/2025] Open
Abstract
Selection on synonymous codon usage is a well-known and widespread phenomenon, yet existing models often do not account for it or its effect on synonymous substitution rates. In this article, we develop and expand the capabilities of multiclass synonymous substitution (MSS) models, which account for such selection by partitioning synonymous substitutions into 2 or more classes and estimating a relative substitution rate for each class, while accounting for important confounders like mutation bias. We identify extensive heterogeneity among relative synonymous substitution rates in an empirical dataset of ∼12,000 gene alignments from 12 Drosophila species. We validate model performance using data simulated under a forward population genetic simulation, demonstrating that MSS models are robust to model misspecification. MSS rates are significantly correlated with other covariates of selection on codon usage (population-level polymorphism data and tRNA abundance data), suggesting that models can detect weak signatures of selection on codon usage. With the MSS model, we can now study selection on synonymous substitutions in diverse taxa, independent of any a priori assumptions about the forces driving that selection.
Collapse
Affiliation(s)
- Hannah Verdonk
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Alyssa Pivirotto
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| | - Vitor Pavinato
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| | - Jody Hey
- Department of Biology, Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| | - Sergei L K Pond
- Department of Biology, Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| |
Collapse
|
4
|
Selberg A, Clark NL, Sackton TB, Muse SV, Lucaci AG, Weaver S, Nekrutenko A, Chikina M, Pond SLK. Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.11.13.620707. [PMID: 39605407 PMCID: PMC11601313 DOI: 10.1101/2024.11.13.620707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an "error-sink" component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.
Collapse
Affiliation(s)
- Avery Selberg
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Nathan L Clark
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Spencer V Muse
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Alexander G Lucaci
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- Weill Cornell Medicine, The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
5
|
Verdonk H, Pivirotto A, Pavinato V, Hey J, Pond SLK. A new comparative framework for estimating selection on synonymous substitutions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.17.613331. [PMID: 39975314 PMCID: PMC11838523 DOI: 10.1101/2024.09.17.613331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Selection on synonymous codon usage is a well known and widespread phenomenon, yet existing models often do not account for it or its effect on synonymous substitution rates. In this article, we develop and expand the capabilities of Multiclass Synonymous Substitution (MSS) models, which account for such selection by partitioning synonymous substitutions into two or more classes and estimating a relative substitution rate for each class, while accounting for important confounders like mutation bias. We identify extensive heterogeneity among relative synonymous substitution rates in an empirical dataset of ~12,000 gene alignments from twelve Drosophila species. We validate model performance using data simulated under a forward population genetic simulation, demonstrating that MSS models are robust to model misspecification. MSS rates are significantly correlated with other covariates of selection on codon usage (population-level polymorphism data and tRNA abundance data), suggesting that models can detect weak signatures of selection on codon usage. With the MSS model, we can now study selection on synonymous substitutions in diverse taxa, independent of any a priori assumptions about the forces driving that selection.
Collapse
Affiliation(s)
- Hannah Verdonk
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, Pennsylvania, USA
| | - Alyssa Pivirotto
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, Pennsylvania, USA
| | - Vitor Pavinato
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, Pennsylvania, USA
| | - Jody Hey
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, Pennsylvania, USA
| | - Sergei LK Pond
- Institute for Genomics and Evolutionary Medicine, Department of Biology, Temple University, Philadelphia, Pennsylvania, USA
| |
Collapse
|
6
|
Lucaci AG, Brew WE, Lamanna J, Selberg A, Carnevale V, Moore AR, Kosakovsky Pond SL. The evolution of mammalian Rem2: unraveling the impact of purifying selection and coevolution on protein function, and implications for human disorders. FRONTIERS IN BIOINFORMATICS 2024; 4:1381540. [PMID: 38978817 PMCID: PMC11228553 DOI: 10.3389/fbinf.2024.1381540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 05/28/2024] [Indexed: 07/10/2024] Open
Abstract
Rad And Gem-Like GTP-Binding Protein 2 (Rem2), a member of the RGK family of Ras-like GTPases, is implicated in Huntington's disease and Long QT Syndrome and is highly expressed in the brain and endocrine cells. We examine the evolutionary history of Rem2 identified in various mammalian species, focusing on the role of purifying selection and coevolution in shaping its sequence and protein structural constraints. Our analysis of Rem2 sequences across 175 mammalian species found evidence for strong purifying selection in 70% of non-invariant codon sites which is characteristic of essential proteins that play critical roles in biological processes and is consistent with Rem2's role in the regulation of neuronal development and function. We inferred epistatic effects in 50 pairs of codon sites in Rem2, some of which are predicted to have deleterious effects on human health. Additionally, we reconstructed the ancestral evolutionary history of mammalian Rem2 using protein structure prediction of extinct and extant sequences which revealed the dynamics of how substitutions that change the gene sequence of Rem2 can impact protein structure in variable regions while maintaining core functional mechanisms. By understanding the selective pressures, protein- and gene - interactions that have shaped the sequence and structure of the Rem2 protein, we gain a stronger understanding of its biological and functional constraints.
Collapse
Affiliation(s)
- Alexander G Lucaci
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, United States
- Weill Cornell Medicine, The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, United States
| | - William E Brew
- Department of Biology, Temple University, Philadelphia, PA, United States
| | - Jason Lamanna
- Department of Biology, Temple University, Philadelphia, PA, United States
- Institute for Computational Molecular Science, Temple University, Philadelphia, PA, United States
| | - Avery Selberg
- Department of Biology, Temple University, Philadelphia, PA, United States
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
| | - Vincenzo Carnevale
- Department of Biology, Temple University, Philadelphia, PA, United States
- Institute for Computational Molecular Science, Temple University, Philadelphia, PA, United States
| | - Anna R Moore
- Department of Biology, Temple University, Philadelphia, PA, United States
| | - Sergei L Kosakovsky Pond
- Department of Biology, Temple University, Philadelphia, PA, United States
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
| |
Collapse
|
7
|
Lucaci AG, Pond SLK. AOC: Analysis of Orthologous Collections - an application for the characterization of natural selection in protein-coding sequences. ARXIV 2024:arXiv:2406.09522v1. [PMID: 38947939 PMCID: PMC11213150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Motivation Modern molecular sequence analysis increasingly relies on automated and robust software tools for interpretation, annotation, and biological insight. The Analysis of Orthologous Collections (AOC) application automates the identification of genomic sites and species/lineages influenced by natural selection in coding sequence analysis. AOC quantifies different types of selection: negative, diversifying or directional positive, or differential selection between groups of branches. We include all steps necessary to go from unaligned homologous sequences to complete results and interactive visualizations that are designed to aid in the useful interpretation and contextualization. Results We are motivated by a desire to make evolutionary analyses as simple as possible, and to close the disparity in the literature between genes which draw a significant amount of interest and those that are largely overlooked and underexplored. We believe that such underappreciated and understudied genetic datasets can hold rich biological information and offer substantial insights into the diverse patterns and processes of evolution, especially if domain experts are able to perform the analyses themselves. Availability and implementation A Snakemake [Mölder et al., 2021] application implementation is publicly available on GitHub at https://github.com/aglucaci/AnalysisOfOrthologousCollections and is accompanied by software documentation and a tutorial.
Collapse
Affiliation(s)
- Alexander G Lucaci
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10021, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | |
Collapse
|
8
|
Bowman J, Lynch VJ. Rapid evolution of genes with anti-cancer functions during the origins of large bodies and cancer resistance in elephants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582135. [PMID: 38463968 PMCID: PMC10925141 DOI: 10.1101/2024.02.27.582135] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Elephants have emerged as a model system to study the evolution of body size and cancer resistance because, despite their immense size, they have a very low prevalence of cancer. Previous studies have found that duplication of tumor suppressors at least partly contributes to the evolution of anti-cancer cellular phenotypes in elephants. Still, many other mechanisms must have contributed to their augmented cancer resistance. Here, we use a suite of codon-based maximum-likelihood methods and a dataset of 13,310 protein-coding gene alignments from 261 Eutherian mammals to identify positively selected and rapidly evolving elephant genes. We found 496 genes (3.73% of alignments tested) with statistically significant evidence for positive selection and 660 genes (4.96% of alignments tested) that likely evolved rapidly in elephants. Positively selected and rapidly evolving genes are statistically enriched in gene ontology terms and biological pathways related to regulated cell death mechanisms, DNA damage repair, cell cycle regulation, epidermal growth factor receptor (EGFR) signaling, and immune functions, particularly neutrophil granules and degranulation. All of these biological factors are plausibly related to the evolution of cancer resistance. Thus, these positively selected and rapidly evolving genes are promising candidates for genes contributing to elephant-specific traits, including the evolution of molecular and cellular characteristics that enhance cancer resistance.
Collapse
Affiliation(s)
- Jacob Bowman
- Department of Biological Sciences, University at Buffalo, SUNY, 551 Cooke Hall, Buffalo, NY, 14260, USA
| | - Vincent J. Lynch
- Department of Biological Sciences, University at Buffalo, SUNY, 551 Cooke Hall, Buffalo, NY, 14260, USA
| |
Collapse
|
9
|
Yan H, Hu Z, Thomas GWC, Edwards SV, Sackton TB, Liu JS. PhyloAcc-GT: A Bayesian Method for Inferring Patterns of Substitution Rate Shifts on Targeted Lineages Accounting for Gene Tree Discordance. Mol Biol Evol 2023; 40:msad195. [PMID: 37665177 PMCID: PMC10540510 DOI: 10.1093/molbev/msad195] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 09/05/2023] Open
Abstract
An important goal of evolutionary genomics is to identify genomic regions whose substitution rates differ among lineages. For example, genomic regions experiencing accelerated molecular evolution in some lineages may provide insight into links between genotype and phenotype. Several comparative genomics methods have been developed to identify genomic accelerations between species, including a Bayesian method called PhyloAcc, which models shifts in substitution rate in multiple target lineages on a phylogeny. However, few methods consider the possibility of discordance between the trees of individual loci and the species tree due to incomplete lineage sorting, which might cause false positives. Here, we present PhyloAcc-GT, which extends PhyloAcc by modeling gene tree heterogeneity. Given a species tree, we adopt the multispecies coalescent model as the prior distribution of gene trees, use Markov chain Monte Carlo (MCMC) for inference, and design novel MCMC moves to sample gene trees efficiently. Through extensive simulations, we show that PhyloAcc-GT outperforms PhyloAcc and other methods in identifying target lineage-specific accelerations and detecting complex patterns of rate shifts, and is robust to specification of population size parameters. PhyloAcc-GT is usually more conservative than PhyloAcc in calling convergent rate shifts because it identifies more accelerations on ancestral than on terminal branches. We apply PhyloAcc-GT to two examples of convergent evolution: flightlessness in ratites and marine mammal adaptations, and show that PhyloAcc-GT is a robust tool to identify shifts in substitution rate associated with specific target lineages while accounting for incomplete lineage sorting.
Collapse
Affiliation(s)
- Han Yan
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Zhirui Hu
- Department of Statistics, Harvard University, Cambridge, MA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | | | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
10
|
Lucaci AG, Zehr JD, Enard D, Thornton JW, Kosakovsky Pond SL. Evolutionary Shortcuts via Multinucleotide Substitutions and Their Impact on Natural Selection Analyses. Mol Biol Evol 2023; 40:msad150. [PMID: 37395787 PMCID: PMC10336034 DOI: 10.1093/molbev/msad150] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/15/2023] [Accepted: 06/26/2023] [Indexed: 07/04/2023] Open
Abstract
Inference and interpretation of evolutionary processes, in particular of the types and targets of natural selection affecting coding sequences, are critically influenced by the assumptions built into statistical models and tests. If certain aspects of the substitution process (even when they are not of direct interest) are presumed absent or are modeled with too crude of a simplification, estimates of key model parameters can become biased, often systematically, and lead to poor statistical performance. Previous work established that failing to accommodate multinucleotide (or multihit, MH) substitutions strongly biases dN/dS-based inference towards false-positive inferences of diversifying episodic selection, as does failing to model variation in the rate of synonymous substitution (SRV) among sites. Here, we develop an integrated analytical framework and software tools to simultaneously incorporate these sources of evolutionary complexity into selection analyses. We found that both MH and SRV are ubiquitous in empirical alignments, and incorporating them has a strong effect on whether or not positive selection is detected (1.4-fold reduction) and on the distributions of inferred evolutionary rates. With simulation studies, we show that this effect is not attributable to reduced statistical power caused by using a more complex model. After a detailed examination of 21 benchmark alignments and a new high-resolution analysis showing which parts of the alignment provide support for positive selection, we show that MH substitutions occurring along shorter branches in the tree explain a significant fraction of discrepant results in selection detection. Our results add to the growing body of literature which examines decades-old modeling assumptions (including MH) and finds them to be problematic for comparative genomic data analysis. Because multinucleotide substitutions have a significant impact on natural selection detection even at the level of an entire gene, we recommend that selection analyses of this type consider their inclusion as a matter of routine. To facilitate this procedure, we developed, implemented, and benchmarked a simple and well-performing model testing selection detection framework able to screen an alignment for positive selection with two biologically important confounding processes: site-to-site synonymous rate variation, and multinucleotide instantaneous substitutions.
Collapse
Affiliation(s)
- Alexander G Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Jordan D Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona
| | - Joseph W Thornton
- Department of Human Genetics, University of Chicago, Chicago, Illinois
- Department of Ecology & Evolution, University of Chicago, Chicago, Illinois
| | | |
Collapse
|
11
|
Silva SR, Miranda VFO, Michael TP, Płachno BJ, Matos RG, Adamec L, Pond SLK, Lucaci AG, Pinheiro DG, Varani AM. The phylogenomics and evolutionary dynamics of the organellar genomes in carnivorous Utricularia and Genlisea species (Lentibulariaceae). Mol Phylogenet Evol 2023; 181:107711. [PMID: 36693533 DOI: 10.1016/j.ympev.2023.107711] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023]
Abstract
Utricularia and Genlisea are highly specialized carnivorous plants whose phylogenetic history has been poorly explored using phylogenomic methods. Additional sampling and genomic data are needed to advance our phylogenetic and taxonomic knowledge of this group of plants. Within a comparative framework, we present a characterization of plastome (PT) and mitochondrial (MT) genes of 26 Utricularia and six Genlisea species, with representatives of all subgenera and growth habits. All PT genomes maintain similar gene content, showing minor variation across the genes located between the PT junctions. One exception is a major variation related to different patterns in the presence and absence of ndh genes in the small single copy region, which appears to follow the phylogenetic history of the species rather than their lifestyle. All MT genomes exhibit similar gene content, with most differences related to a lineage-specific pseudogenes. We find evidence for episodic positive diversifying selection in PT and for most of the Utricularia MT genes that may be related to the current hypothesis that bladderworts' nuclear DNA is under constant ROS oxidative DNA damage and unusual DNA repair mechanisms, or even low fidelity polymerase that bypass lesions which could also be affecting the organellar genomes. Finally, both PT and MT phylogenetic trees were well resolved and highly supported, providing a congruent phylogenomic hypothesis for Utricularia and Genlisea clade given the study sampling.
Collapse
Affiliation(s)
- Saura R Silva
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Vitor F O Miranda
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Biology, Laboratory of Plant Systematics, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Todd P Michael
- Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
| | - Bartosz J Płachno
- Department of Plant Cytology and Embryology, Institute of Botany, Faculty of Biology, Jagiellonian University in Kraków, Gronostajowa 9 St., 30-387 Cracow, Poland.
| | - Ramon G Matos
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Biology, Laboratory of Plant Systematics, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Lubomir Adamec
- Department of Experimental and Functional Morphology, Institute of Botany CAS, Dukelská 135, CZ-379 01 Třeboň, Czech Republic.
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.
| | - Alexander G Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA.
| | - Daniel G Pinheiro
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| | - Alessandro M Varani
- UNESP - São Paulo State University, School of Agricultural and Veterinarian Sciences, Department of Agricultural and Environmental Biotechnology, Campus Jaboticabal, CEP 14884-900 SP, Brazil.
| |
Collapse
|
12
|
Lucaci AG, Zehr JD, Shank SD, Bouvier D, Ostrovsky A, Mei H, Nekrutenko A, Martin DP, Kosakovsky Pond SL. RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis. PLoS One 2022; 17:e0275623. [PMID: 36322581 PMCID: PMC9629619 DOI: 10.1371/journal.pone.0275623] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 09/20/2022] [Indexed: 11/06/2022] Open
Abstract
An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected "query" viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality "background" sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance.
Collapse
Affiliation(s)
- Alexander G. Lucaci
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Jordan D. Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Stephen D. Shank
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Alexander Ostrovsky
- Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, MD, United States of America
| | - Han Mei
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, United States of America
| | - Darren P. Martin
- Division of Computational Biology, Department of Integrative Biomedical Sciences, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | - Sergei L. Kosakovsky Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
13
|
Lucaci AG, Notaras MJ, Kosakovsky Pond SL, Colak D. The evolution of BDNF is defined by strict purifying selection and prodomain spatial coevolution, but what does it mean for human brain disease? Transl Psychiatry 2022; 12:258. [PMID: 35732627 PMCID: PMC9217794 DOI: 10.1038/s41398-022-02021-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 05/24/2022] [Accepted: 06/07/2022] [Indexed: 11/09/2022] Open
Abstract
Brain-Derived Neurotrophic Factor (BDNF) is an essential mediator of brain assembly, development, and maturation. BDNF has been implicated in a variety of brain disorders such as neurodevelopmental disorders (e.g., autism spectrum disorder), neuropsychiatric disorders (e.g., anxiety, depression, PTSD, and schizophrenia), and various neurodegenerative disorders (e.g., Parkinson's, Alzheimer's, etc.). To better understand the role of BDNF in disease, we sought to define the evolution of BDNF within Mammalia. We conducted sequence alignment and phylogenetic reconstruction of BDNF across a diverse selection of >160 mammalian species spanning ~177 million years of evolution. The selective evolutionary change was examined via several independent computational models of codon evolution including FEL (pervasive diversifying selection), MEME (episodic selection), and BGM (structural coevolution of sites within a single molecule). We report strict purifying selection in the main functional domain of BDNF (NGF domain, essentially comprising the mature BDNF protein). Additionally, we discover six sites in our homologous alignment which are under episodic selection in early regulatory regions (i.e. the prodomain) and 23 pairs of coevolving sites that are distributed across the entirety of BDNF. Coevolving BDNF sites exhibited complex spatial relationships and geometric features including triangular relations, acyclic graph networks, double-linked sites, and triple-linked sites, although the most notable pattern to emerge was that changes in the mature region of BDNF tended to coevolve along with sites in the prodomain. Thus, we propose that the discovery of both local and distal sites of coevolution likely reflects 'evolutionary fine-tuning' of BDNF's underlying regulation and function in mammals. This tracks with the observation that BDNF's mature domain (which encodes mature BDNF protein) is largely conserved, while the prodomain (which is linked to regulation and its own unique functionality) exhibits more pervasive and diversifying evolutionary selection. That said, the fact that negative purifying selection also occurs in BDNF's prodomain also highlights that this region also contains critical sites of sensitivity which also partially explains its disease relevance (via Val66Met and other prodomain variants). Taken together, these computational evolutionary analyses provide important context as to the origins and sensitivity of genetic changes within BDNF that may help to deconvolute the role of BDNF polymorphisms in human brain disorders.
Collapse
Affiliation(s)
- Alexander G. Lucaci
- grid.264727.20000 0001 2248 3398Institute for Genomics and Evolutionary Medicine, Science & Education Research Center, Temple University, Philadelphia, PA USA
| | - Michael J. Notaras
- grid.5386.8000000041936877XCenter for Neurogenetics, Brain & Mind Research Institute, Weill Medical College, Cornell University, New York, New York, USA
| | - Sergei L. Kosakovsky Pond
- grid.264727.20000 0001 2248 3398Institute for Genomics and Evolutionary Medicine, Science & Education Research Center, Temple University, Philadelphia, PA USA
| | - Dilek Colak
- Center for Neurogenetics, Brain & Mind Research Institute, Weill Medical College, Cornell University, New York, New York, USA. .,Gale and Ira Drukier Institute for Children's Health, Weill Cornell Medical College, Cornell University, New York, NY, USA.
| |
Collapse
|
14
|
Freitas L, Nery MF. Positive selection in multiple salivary gland proteins of Anophelinae reveals potential targets for vector control. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 100:105271. [PMID: 35339698 DOI: 10.1016/j.meegid.2022.105271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 03/11/2022] [Accepted: 03/14/2022] [Indexed: 06/14/2023]
Abstract
Anopheles is a genus belonging to the Culicidae family, which has great medical importance due to its role as a vector of Plasmodium, the causative agent of malaria. Great focus has been given to the salivary gland proteins (SGPs) group from Anopheles' functional genomics. This class of proteins is essential to blood-feeding behavior as they have attributes such as vasodilators and anti-clotting properties. Recently, a comprehensive review on Anopheles SGPs was performed; however, the authors did not deeply explore the adaptive molecular evolution of these genes. In this context, this work aimed to perform a more detailed analysis of the adaptive molecular evolution of SGPs in Anopheles, carrying out positive selection and gene family evolution analysis on 824 SGPs. Our results show that most SGPs have positively selected codon sites that can be used as targets in developing new strategies for vector control and that younger SGPs evolve at a faster rate than older SGPs. Notably, we could not find any evidence of an accelerated shift in SGPs' rates of gene gain and loss compared with other proteins, as suggested in previous works.
Collapse
Affiliation(s)
- Lucas Freitas
- Laboratório de Genômica Evolutiva, Departamento de Genética, Evolução, Microbiologia e Imunologia, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil.
| | - Mariana F Nery
- Laboratório de Genômica Evolutiva, Departamento de Genética, Evolução, Microbiologia e Imunologia, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil
| |
Collapse
|
15
|
Steward RA, de Jong MA, Oostra V, Wheat CW. Alternative splicing in seasonal plasticity and the potential for adaptation to environmental change. Nat Commun 2022; 13:755. [PMID: 35136048 PMCID: PMC8825856 DOI: 10.1038/s41467-022-28306-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 01/19/2022] [Indexed: 12/15/2022] Open
Abstract
Seasonal plasticity is accomplished via tightly regulated developmental cascades that translate environmental cues into trait changes. Little is known about how alternative splicing and other posttranscriptional molecular mechanisms contribute to plasticity or how these mechanisms impact how plasticity evolves. Here, we use transcriptomic and genomic data from the butterfly Bicyclus anynana, a model system for seasonal plasticity, to compare the extent of differential expression and splicing and test how these axes of transcriptional plasticity differ in their potential for evolutionary change. Between seasonal morphs, we find that differential splicing affects a smaller but functionally unique set of genes compared to differential expression. Further, we find strong support for the novel hypothesis that spliced genes are more susceptible than differentially expressed genes to erosion of genetic variation due to selection on seasonal plasticity. Our results suggest that splicing plasticity is especially likely to experience genetic constraints that could affect the potential of wild populations to respond to rapidly changing environments.
Collapse
Affiliation(s)
| | | | - Vicencio Oostra
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
16
|
MacLean OA, Lytras S, Weaver S, Singer JB, Boni MF, Lemey P, Kosakovsky Pond SL, Robertson DL. Natural selection in the evolution of SARS-CoV-2 in bats created a generalist virus and highly capable human pathogen. PLoS Biol 2021; 19:e3001115. [PMID: 33711012 PMCID: PMC7990310 DOI: 10.1371/journal.pbio.3001115] [Citation(s) in RCA: 126] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 03/24/2021] [Accepted: 01/25/2021] [Indexed: 02/08/2023] Open
Abstract
Virus host shifts are generally associated with novel adaptations to exploit the cells of the new host species optimally. Surprisingly, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has apparently required little to no significant adaptation to humans since the start of the Coronavirus Disease 2019 (COVID-19) pandemic and to October 2020. Here we assess the types of natural selection taking place in Sarbecoviruses in horseshoe bats versus the early SARS-CoV-2 evolution in humans. While there is moderate evidence of diversifying positive selection in SARS-CoV-2 in humans, it is limited to the early phase of the pandemic, and purifying selection is much weaker in SARS-CoV-2 than in related bat Sarbecoviruses. In contrast, our analysis detects evidence for significant positive episodic diversifying selection acting at the base of the bat virus lineage SARS-CoV-2 emerged from, accompanied by an adaptive depletion in CpG composition presumed to be linked to the action of antiviral mechanisms in these ancestral bat hosts. The closest bat virus to SARS-CoV-2, RmYN02 (sharing an ancestor about 1976), is a recombinant with a structure that includes differential CpG content in Spike; clear evidence of coinfection and evolution in bats without involvement of other species. While an undiscovered "facilitating" intermediate species cannot be discounted, collectively, our results support the progenitor of SARS-CoV-2 being capable of efficient human-human transmission as a consequence of its adaptive evolutionary history in bats, not humans, which created a relatively generalist virus.
Collapse
Affiliation(s)
- Oscar A. MacLean
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Spyros Lytras
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Steven Weaver
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, Pennsylvania, United States of America
| | - Joshua B. Singer
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Maciej F. Boni
- Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Sergei L. Kosakovsky Pond
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, Pennsylvania, United States of America
| | - David L. Robertson
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| |
Collapse
|