1
|
Martí-Gómez C, Zhou J, Chen WC, Kinney JB, McCandlish DM. Inference and visualization of complex genotype-phenotype maps with gpmap-tools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642267. [PMID: 40161830 PMCID: PMC11952336 DOI: 10.1101/2025.03.09.642267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools (https://github.com/cmarti/gpmap-tools), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.
Collapse
Affiliation(s)
- Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - Wei-Chia Chen
- Department of Physics, National Chung Cheng University, Chiayi 62102, Taiwan, Republic of China
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
2
|
Malekpour SA, Kalirad A, Majidian S. Inferring the Selective History of CNVs Using a Maximum Likelihood Model. Genome Biol Evol 2025; 17:evaf050. [PMID: 40100752 PMCID: PMC11950529 DOI: 10.1093/gbe/evaf050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/27/2025] [Accepted: 03/13/2025] [Indexed: 03/20/2025] Open
Abstract
Copy number variations (CNVs)-structural variations generated by deletion and/or duplication that result in a change in DNA dosage-are prevalent in nature. CNVs can drastically affect the phenotype of an organism and have been shown to be both involved in genetic disorders and be used as raw material in adaptive evolution. Unlike single-nucleotide variations, the often large and varied effects of CNVs on phenotype hinders our ability to infer their selective advantage based on the population genetics data. Here, we present a likelihood-based approach, dubbed PoMoCNV (POlymorphism-aware phylogenetic MOdel for CNVs), that estimates the evolutionary parameters such as mutation rates among different copy numbers and relative fitness loss per copy deletion at a genomic locus based on population genetics data. As a case study, we analyze the genomics data of 40 strains of Caenorhabditis elegans, representing four different populations. We take advantage of the data on chromatin accessibility to interpret the mutation rate and fitness of copy numbers, as inferred by PoMoCNV, specifically in open or closed chromatin loci. We further test the reliability of PoMoCNV by estimating the evolutionary parameters of CNVs for mutation-accumulation experiments in C. elegans with varying levels of genetic drift.
Collapse
Affiliation(s)
- Seyed Amir Malekpour
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 19395-5746, Iran
| | - Ata Kalirad
- Department for Integrative Evolutionary Biology, Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
3
|
Ma K, Yang X, Mao Y. Advancing evolutionary medicine with complete primate genomes and advanced biotechnologies. Trends Genet 2025; 41:201-217. [PMID: 39627062 DOI: 10.1016/j.tig.2024.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/03/2024] [Accepted: 11/06/2024] [Indexed: 03/06/2025]
Abstract
Evolutionary medicine, which integrates evolutionary biology and medicine, significantly enhances our understanding of human traits and disease susceptibility. However, previous studies in this field have often focused on single-nucleotide variants due to technological limitations in characterizing complex genomic regions, hindering the comprehensive analyses of their evolutionary origins and clinical significance. In this review, we summarize recent advancements in complete telomere-to-telomere (T2T), primate genomes and other primate resources, and illustrate how these resources facilitate the research of complex regions. We focus on several biomedically relevant regions to examine the relationship between primate genome evolution and human diseases. We also highlight the potentials of high-throughput functional genomic technologies for assessing candidate loci. Finally, we discuss future directions for primate research within the context of evolutionary medicine.
Collapse
Affiliation(s)
- Kaiyue Ma
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China; Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China.
| |
Collapse
|
4
|
Nelson MG, Talavera D. Identification of coevolving positions by ancestral reconstruction. Commun Biol 2025; 8:329. [PMID: 40021815 PMCID: PMC11871020 DOI: 10.1038/s42003-025-07676-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 02/05/2025] [Indexed: 03/03/2025] Open
Abstract
Coevolution within proteins occurs when changes in one position affect the selective pressure in another position to preserve the protein structure or function. The identification of coevolving positions within proteins remains contentious, with most methods disregarding the phylogenetic information. Here, we present a time-efficient approach for detecting coevolving pairs, which is almost perfect in terms of precision and specificity. It is based on maximum parsimony-based ancestral reconstruction followed by the identification of pairs with a depletion on separate changes when compared to their number of concurrent changes. Our analysis of a previously characterised biological dataset shows that the coevolving pairs that we identified tend to be close in the protein sequence and structure, slightly less solvent exposed and have a higher mutation rate. We also show how the ancestral reconstruction can be used to detect favourable and unfavourable amino acid combinations. Altogether, we demonstrate how this approach is essential for identifying pairs of positions with weak covariation patterns.
Collapse
Affiliation(s)
- Michael G Nelson
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK.
| |
Collapse
|
5
|
Duan B, Qiu C, Sze SH, Kaplan C. Widespread epistasis shapes RNA Polymerase II active site function and evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.02.27.530048. [PMID: 36909581 PMCID: PMC10002619 DOI: 10.1101/2023.02.27.530048] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
Multi-subunit RNA Polymerases (msRNAPs) are responsible for transcription in all kingdoms of life. These enzymes rely on dynamic, highly conserved active site domains such as the so-called "trigger loop" (TL) to accomplish steps in the transcription cycle. Mutations in the RNA polymerase II (Pol II) TL confer a spectrum of biochemical and genetic phenotypes that suggest two main classes, which decrease or increase catalysis or other nucleotide addition cycle (NAC) events. The Pol II active site relies on networks of residue interactions to function and mutations likely perturb these networks in ways that may alter mechanisms. We have undertaken a structural genetics approach to reveal residue interactions within and surrounding the Pol II TL - determining its "interaction landscape" - by deep mutational scanning in Saccharomyces cerevisiae Pol II. This analysis reveals connections between TL residues and surrounding domains, demonstrating that TL function is tightly coupled to its specific enzyme context.
Collapse
Affiliation(s)
- Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Sing-Hoi Sze
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA
| | - Craig Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
6
|
Dibyachintan S, Dubé AK, Bradley D, Lemieux P, Dionne U, Landry CR. Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network. Nat Commun 2025; 16:1530. [PMID: 39934115 PMCID: PMC11814230 DOI: 10.1038/s41467-025-56597-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 01/20/2025] [Indexed: 02/13/2025] Open
Abstract
Paralogous genes are often functionally redundant for long periods of time. While their functions are preserved, paralogs accumulate cryptic changes in sequence and expression, which could modulate the impact of future mutations through epistasis. We examine the impact of mutations on redundant myosin proteins that have maintained the same binding preference despite having accumulated differences in expression levels and amino acid substitutions in the last 100 million years. By quantifying the impact of all single-amino acid substitutions in their SH3 domains on the physical interaction with their interaction partners, we show that the same mutations in the paralogous SH3s change binding in a paralog-specific and interaction partner-specific manner. This contingency is explained by the difference in promoter strength of the two paralogous myosin genes and epistatic interactions between the mutations introduced and cryptic divergent sites within the SH3s. One significant consequence of this contingency is that while some mutations would be sufficient to nonfunctionalize one paralog, they would have minimal impact on the other. Our results reveal how cryptic divergence, which accumulates while maintaining functional redundancy in cellular networks, could bias gene duplicates to specific fates.
Collapse
Affiliation(s)
- Soham Dibyachintan
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - David Bradley
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - Pascale Lemieux
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Ugo Dionne
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Christian R Landry
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada.
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada.
- Département de Biologie, Université Laval, Québec, QC, Canada.
| |
Collapse
|
7
|
Dutta A, Schacherer J. The dynamics of loss of heterozygosity events in genomes. EMBO Rep 2025; 26:602-612. [PMID: 39747660 PMCID: PMC11811284 DOI: 10.1038/s44319-024-00353-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/18/2024] [Accepted: 12/09/2024] [Indexed: 01/04/2025] Open
Abstract
Genomic instability is a hallmark of tumorigenesis, yet it also plays an essential role in evolution. Large-scale population genomics studies have highlighted the importance of loss of heterozygosity (LOH) events, which have long been overlooked in the context of genetic diversity and instability. Among various types of genomic mutations, LOH events are the most common and affect a larger portion of the genome. They typically arise from recombination-mediated repair of double-strand breaks (DSBs) or from lesions that are processed into DSBs. LOH events are critical drivers of genetic diversity, enabling rapid phenotypic variation and contributing to tumorigenesis. Understanding the accumulation of LOH, along with its underlying mechanisms, distribution, and phenotypic consequences, is therefore crucial. In this review, we explore the spectrum of LOH events, their mechanisms, and their impact on fitness and phenotype, drawing insights from Saccharomyces cerevisiae to cancer. We also emphasize the role of LOH in genomic instability, disease, and genome evolution.
Collapse
Affiliation(s)
- Abhishek Dutta
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France.
- Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
8
|
Duan B, Qiu C, Lockless SW, Sze SH, Kaplan CD. Higher-order epistasis within Pol II trigger loop haplotypes. Genetics 2024; 228:iyae172. [PMID: 39446980 PMCID: PMC11631520 DOI: 10.1093/genetics/iyae172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 10/22/2024] [Indexed: 10/26/2024] Open
Abstract
RNA polymerase II (Pol II) has a highly conserved domain, the trigger loop (TL), that controls transcription fidelity and speed. We previously probed pairwise genetic interactions between residues within and surrounding the TL for the purpose of understand functional interactions between residues and to understand how individual mutants might alter TL function. We identified widespread incompatibility between TLs of different species when placed in the Saccharomyces cerevisiae Pol II context, indicating species-specific interactions between otherwise highly conserved TLs and its surroundings. These interactions represent epistasis between TL residues and the rest of Pol II. We sought to understand why certain TL sequences are incompatible with S. cerevisiae Pol II and to dissect the nature of genetic interactions within multiply substituted TLs as a window on higher order epistasis in this system. We identified both positive and negative higher-order residue interactions within example TL haplotypes. Intricate higher-order epistasis formed by TL residues was sometimes only apparent from analysis of intermediate genotypes, emphasizing complexity of epistatic interactions. Furthermore, we distinguished TL substitutions with distinct classes of epistatic patterns, suggesting specific TL residues that potentially influence TL evolution. Our examples of complex residue interactions suggest possible pathways for epistasis to facilitate Pol II evolution.
Collapse
Affiliation(s)
- Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, MA 02215, USA
| | - Steve W Lockless
- Department of Biology, Texas A&M University, College Station, TX 77843, USA
| | - Sing-Hoi Sze
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843, USA
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
9
|
Duan B, Qiu C, Lockless SW, Sze SH, Kaplan CD. Higher-order epistasis within Pol II trigger loop haplotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576280. [PMID: 38293233 PMCID: PMC10827151 DOI: 10.1101/2024.01.20.576280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
RNA polymerase II (Pol II) has a highly conserved domain, the trigger loop (TL), that controls transcription fidelity and speed. We previously probed pairwise genetic interactions between residues within and surrounding the TL for the purpose of understand functional interactions between residues and to understand how individual mutants might alter TL function. We identified widespread incompatibility between TLs of different species when placed in the Saccharomyces cerevisiae Pol II context, indicating species-specific interactions between otherwise highly conserved TLs and its surroundings. These interactions represent epistasis between TL residues and the rest of Pol II. We sought to understand why certain TL sequences are incompatible with S. cerevisiae Pol II and to dissect the nature of genetic interactions within multiply substituted TLs as a window on higher order epistasis in this system. We identified both positive and negative higher-order residue interactions within example TL haplotypes. Intricate higher-order epistasis formed by TL residues was sometimes only apparent from analysis of intermediate genotypes, emphasizing complexity of epistatic interactions. Furthermore, we distinguished TL substitutions with distinct classes of epistatic patterns, suggesting specific TL residues that potentially influence TL evolution. Our examples of complex residue interactions suggest possible pathways for epistasis to facilitate Pol II evolution.
Collapse
Affiliation(s)
- Bingbing Duan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| | - Chenxi Qiu
- Department of Genetics, Harvard Medical School, Boston, MA 02215
| | - Steve W Lockless
- Department of Biology, Texas A&M University, College Station, TX 77843
| | - Sing-Hoi Sze
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX 77843
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260
| |
Collapse
|
10
|
Park Y, Metzger BPH, Thornton JW. The simplicity of protein sequence-function relationships. Nat Commun 2024; 15:7953. [PMID: 39261454 PMCID: PMC11390738 DOI: 10.1038/s41467-024-51895-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 08/20/2024] [Indexed: 09/13/2024] Open
Abstract
How complex are the rules by which a protein's sequence determines its function? High-order epistatic interactions among residues are thought to be pervasive, suggesting an idiosyncratic and unpredictable sequence-function relationship. But many prior studies may have overestimated epistasis, because they analyzed sequence-function relationships relative to a single reference sequence-which causes measurement noise and local idiosyncrasies to snowball into high-order epistasis-or they did not fully account for global nonlinearities. Here we present a reference-free method that jointly infers specific epistatic interactions and global nonlinearity using a bird's-eye view of sequence space. This technique yields the simplest explanation of sequence-function relationships and is more robust than existing methods to measurement noise, missing data, and model misspecification. We reanalyze 20 experimental datasets and find that context-independent amino acid effects and pairwise interactions, along with a simple nonlinearity to account for limited dynamic range, explain a median of 96% of phenotypic variance and over 92% in every case. Only a tiny fraction of genotypes are strongly affected by higher-order epistasis. Sequence-function relationships are also sparse: a miniscule fraction of amino acids and interactions account for 90% of phenotypic variance. Sequence-function causality across these datasets is therefore simple, opening the way for tractable approaches to characterize proteins' genetic architecture.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA
- Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea
| | - Brian P H Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
11
|
Dereli O, Kuru N, Akkoyun E, Bircan A, Tastan O, Adebali O. PHACTboost: A Phylogeny-Aware Pathogenicity Predictor for Missense Mutations via Boosting. Mol Biol Evol 2024; 41:msae136. [PMID: 38934805 PMCID: PMC11251492 DOI: 10.1093/molbev/msae136] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/30/2024] [Accepted: 06/24/2024] [Indexed: 06/28/2024] Open
Abstract
Most algorithms that are used to predict the effects of variants rely on evolutionary conservation. However, a majority of such techniques compute evolutionary conservation by solely using the alignment of multiple sequences while overlooking the evolutionary context of substitution events. We had introduced PHACT, a scoring-based pathogenicity predictor for missense mutations that can leverage phylogenetic trees, in our previous study. By building on this foundation, we now propose PHACTboost, a gradient boosting tree-based classifier that combines PHACT scores with information from multiple sequence alignments, phylogenetic trees, and ancestral reconstruction. By learning from data, PHACTboost outperforms PHACT. Furthermore, the results of comprehensive experiments on carefully constructed sets of variants demonstrated that PHACTboost can outperform 40 prevalent pathogenicity predictors reported in the dbNSFP, including conventional tools, metapredictors, and deep learning-based approaches as well as more recent tools such as AlphaMissense, EVE, and CPT-1. The superiority of PHACTboost over these methods was particularly evident in case of hard variants for which different pathogenicity predictors offered conflicting results. We provide predictions of 215 million amino acid alterations over 20,191 proteins. PHACTboost is available at https://github.com/CompGenomeLab/PHACTboost. PHACTboost can improve our understanding of genetic diseases and facilitate more accurate diagnoses.
Collapse
Affiliation(s)
- Onur Dereli
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Nurdan Kuru
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Emrah Akkoyun
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
- Network Technologies Department, TÜBİTAK-ULAKBİM Turkish Academic Network and Information Center, Ankara 06530, Turkey
| | - Aylin Bircan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ogün Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
- Biological Sciences, TÜBİTAK Research Institute for Fundamental Sciences, Gebze 41470, Turkey
| |
Collapse
|
12
|
Schneemann H, De Sanctis B, Welch JJ. Fisher's Geometric Model as a Tool to Study Speciation. Cold Spring Harb Perspect Biol 2024; 16:a041442. [PMID: 38253415 PMCID: PMC11216183 DOI: 10.1101/cshperspect.a041442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Interactions between alleles and across environments play an important role in the fitness of hybrids and are at the heart of the speciation process. Fitness landscapes capture these interactions and can be used to model hybrid fitness, helping us to interpret empirical observations and clarify verbal models. Here, we review recent progress in understanding hybridization outcomes through Fisher's geometric model, an intuitive and analytically tractable fitness landscape that captures many fitness patterns observed across taxa. We use case studies to show how the model parameters can be estimated from different types of data and discuss how these estimates can be used to make inferences about the divergence history and genetic architecture. We also highlight some areas where the model's predictions differ from alternative incompatibility-based models, such as the snowball effect and outlier patterns in genome scans.
Collapse
Affiliation(s)
- Hilde Schneemann
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Bianca De Sanctis
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
| | - John J Welch
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| |
Collapse
|
13
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The genetic landscape of a metabolic interaction. Nat Commun 2024; 15:3351. [PMID: 38637543 PMCID: PMC11026382 DOI: 10.1038/s41467-024-47671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focus on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We use deep mutational scanning to quantify the growth rate effect of 2696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Form Bio, Dallas, TX, 75226, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- The Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
14
|
Dibyachintan S, Dube AK, Bradley D, Lemieux P, Dionne U, Landry CR. Cryptic genetic variation shapes the fate of gene duplicates in a protein interaction network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.23.581840. [PMID: 38464075 PMCID: PMC10925128 DOI: 10.1101/2024.02.23.581840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Paralogous genes are often redundant for long periods of time before they diverge in function. While their functions are preserved, paralogous proteins can accumulate mutations that, through epistasis, could impact their fate in the future. By quantifying the impact of all single-amino acid substitutions on the binding of two myosin proteins to their interaction partners, we find that the future evolution of these proteins is highly contingent on their regulatory divergence and the mutations that have silently accumulated in their protein binding domains. Differences in the promoter strength of the two paralogs amplify the impact of mutations on binding in the lowly expressed one. While some mutations would be sufficient to non-functionalize one paralog, they would have minimal impact on the other. Our results reveal how functionally equivalent protein domains could be destined to specific fates by regulatory and cryptic coding sequence changes that currently have little to no functional impact.
Collapse
Affiliation(s)
- Soham Dibyachintan
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Alexandre K Dube
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - David Bradley
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| | - Pascale Lemieux
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
| | - Ugo Dionne
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Current affiliation: Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Christian R Landry
- PROTEO-Regroupement Québécois de Recherche sur la Fonction, l'Ingénierie et les Applications des Protéines, Québec, QC, Canada
- Centre de Recherche en Données Massives de l'Université Laval, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-Informatique, Université Laval, Québec, QC, Canada
- Département de Biologie, Université Laval, Québec, QC, Canada
| |
Collapse
|
15
|
Lei H, Li J, Zhao B, Kou SH, Xiao F, Chen T, Wang SM. Evolutionary origin of germline pathogenic variants in human DNA mismatch repair genes. Hum Genomics 2024; 18:5. [PMID: 38287404 PMCID: PMC10823654 DOI: 10.1186/s40246-024-00573-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 01/17/2024] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Mismatch repair (MMR) system is evolutionarily conserved for genome stability maintenance. Germline pathogenic variants (PVs) in MMR genes that lead to MMR functional deficiency are associated with high cancer risk. Knowing the evolutionary origin of germline PVs in human MMR genes will facilitate understanding the biological base of MMR deficiency in cancer. However, systematic knowledge is lacking to address the issue. In this study, we performed a comprehensive analysis to know the evolutionary origin of human MMR PVs. METHODS We retrieved MMR gene variants from the ClinVar database. The genomes of 100 vertebrates were collected from the UCSC genome browser and ancient human sequencing data were obtained through comprehensive data mining. Cross-species conservation analysis was performed based on the phylogenetic relationship among 100 vertebrates. Rescaled ancient sequencing data were used to perform variant calling for archeological analysis. RESULTS Using the phylogenetic approach, we traced the 3369 MMR PVs identified in modern humans in 99 non-human vertebrate genomes but found no evidence for cross-species conservation as the source for human MMR PVs. Using the archeological approach, we searched the human MMR PVs in over 5000 ancient human genomes dated from 45,045 to 100 years before present and identified a group of MMR PVs shared between modern and ancient humans mostly within 10,000 years with similar quantitative patterns. CONCLUSION Our study reveals that MMR PVs in modern humans were arisen within the recent human evolutionary history.
Collapse
Affiliation(s)
- Huijun Lei
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
- Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
- Department of Cancer Prevention, Zhejiang Cancer Hospital, Hangzhou, 310022, Zhejiang, China
| | - Jiaheng Li
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Bojin Zhao
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Si Hoi Kou
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Fengxia Xiao
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Tianhui Chen
- Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China.
- Department of Cancer Prevention, Zhejiang Cancer Hospital, Hangzhou, 310022, Zhejiang, China.
| | - San Ming Wang
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China.
| |
Collapse
|
16
|
Nguyen TN, Ingle C, Thompson S, Reynolds KA. The Genetic Landscape of a Metabolic Interaction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542639. [PMID: 37645784 PMCID: PMC10461916 DOI: 10.1101/2023.05.28.542639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Enzyme abundance, catalytic activity, and ultimately sequence are all shaped by the need of growing cells to maintain metabolic flux while minimizing accumulation of deleterious intermediates. While much prior work has explored the constraints on protein sequence and evolution induced by physical protein-protein interactions, the sequence-level constraints emerging from non-binding functional interactions in metabolism remain unclear. To quantify how variation in the activity of one enzyme constrains the biochemical parameters and sequence of another, we focused on dihydrofolate reductase (DHFR) and thymidylate synthase (TYMS), a pair of enzymes catalyzing consecutive reactions in folate metabolism. We used deep mutational scanning to quantify the growth rate effect of 2,696 DHFR single mutations in 3 TYMS backgrounds under conditions selected to emphasize biochemical epistasis. Our data are well-described by a relatively simple enzyme velocity to growth rate model that quantifies how metabolic context tunes enzyme mutational tolerance. Together our results reveal the structural distribution of epistasis in a metabolic enzyme and establish a foundation for the design of multi-enzyme systems.
Collapse
Affiliation(s)
- Thuy N. Nguyen
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| | - Samuel Thompson
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158
| | - Kimberly A. Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, USA, 75390
| |
Collapse
|
17
|
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich ASD, Fiziev PP, Kuderna LFK, Sundaram L, Wu Y, Adhikari A, Field Y, Chen C, Batzoglou S, Aguet F, Lemire G, Reimers R, Balick D, Janiak MC, Kuhlwilm M, Orkin JD, Manu S, Valenzuela A, Bergman J, Rousselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, do Amaral JV, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Bataillon T, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin A, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Lek M, Sunyaev S, O'Donnell-Luria A, Rehm HL, Xu J, Rogers J, Marques-Bonet T, Farh KKH. The landscape of tolerated genetic variation in humans and primates. Science 2023; 380:eabn8153. [PMID: 37262156 DOI: 10.1126/science.abn8197] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 03/22/2023] [Indexed: 06/03/2023]
Abstract
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.
Collapse
Affiliation(s)
- Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Tobias Hamp
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeffrey Ede
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Jeremy McRae
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
| | - Yanshen Yang
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | | | - Petko P Fiziev
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yibing Wu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Aashish Adhikari
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Yair Field
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Chen Chen
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Serafim Batzoglou
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Rebecca Reimers
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Daniel Balick
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Mareike C Janiak
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Martin Kuhlwilm
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, 1030 Vienna, Austria
| | - Joseph D Orkin
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Département d'anthropologie, Université de Montréal, 3150 Jean-Brillant, Montréal, QC H3T 1N8, Canada
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Alejandro Valenzuela
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
- Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University, 8000 Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Av. Franklin D. Roosevelt 50, CP 160/12, B-1050 Brussels, Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT 84102, USA
| | - Iracilda Sampaio
- Universidade Federal do Para, Guamá, Belém - PA, 66075-110, Brazil
| | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
| | - João Valsecchi do Amaral
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Amazonas, 69553-225, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia - RedeFauna, Manaus, Amazonas, 69080-900, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica - ComFauna, Iquitos, Loreto, 16001, Peru
| | - Mariluce Messias
- Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
- PPGREN - Programa de Pós-Graduação "Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil
| | - Maria N F da Silva
- Instituto Nacional de Pesquisas da Amazonia, Petrópolis, Manaus - AM, 69067-375, Brazil
| | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Rogerio Rossi
- Universidade Federal do Mato Grosso, Boa Esperança, Cuiabá - MT, 78060-900, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil
- Department of Biology, Trinity University, San Antonio, TX 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar
| | | | | | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Eduardo Fernandez-Duque
- Yale University, New Haven, CT 06520, USA
- Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina
| | | | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, PoB 16316, Addis Ababa 1000, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China
- Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Shangcheng District, Hangzhou 310006, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office, P.O. Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, 17493 Greifswald - Insei Riems, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi 100000, Vietnam
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, 70191 Stuttgart, Germany
| | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Av. Doctor Aiguader, N88, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C. Wellington 30, 08005 Barcelona, Spain
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Ninh Binh Province 430000, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| | - Jessica Lee
- Mandai Nature, 80 Mandai Lake Road, Singapore 729826, Republic of Singapore
| | - Patrick Tan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore 168582, Republic of Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Chambers Street, Edinburgh EH1 1JF, UK
- School of Geosciences, University of Edinburgh, Drummond Street, Edinburgh EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, 37077 Göttingen, Germany
- Leibniz Science Campus Primate Cognition, 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra, Pg. Luís Companys 23, 08010 Barcelona, Spain
| | - Amanda Melin
- Department of Anthropology & Archaeology, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
- Department of Medical Genetics, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh EH8 9XP, UK
| | | | - Robin M D Beck
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Shamil Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA
- Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jinbo Xu
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA
| |
Collapse
|
18
|
Gao H, Hamp T, Ede J, Schraiber JG, McRae J, Singer-Berk M, Yang Y, Dietrich A, Fiziev P, Kuderna L, Sundaram L, Wu Y, Adhikari A, Field Y, Chen C, Batzoglou S, Aguet F, Lemire G, Reimers R, Balick D, Janiak MC, Kuhlwilm M, Orkin JD, Manu S, Valenzuela A, Bergman J, Rouselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath J, Hvilsom C, Juan D, Frandsen P, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, do Amaral JV, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Batallion T, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin A, Guschanski K, Schierup MH, Beck RMD, Umapathy G, Roos C, Boubli JP, Lek M, Sunyaev S, O’Donnell A, Rehm H, Xu J, Rogers J, Marques-Bonet T, Kai-How Farh K. The landscape of tolerated genetic variation in humans and primates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.01.538953. [PMID: 37205491 PMCID: PMC10187174 DOI: 10.1101/2023.05.01.538953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole genome sequencing data for 809 individuals from 233 primate species, and identified 4.3 million common protein-altering variants with orthologs in human. We show that these variants can be inferred to have non-deleterious effects in human based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases. One Sentence Summary Deep learning classifier trained on 4.3 million common primate missense variants predicts variant pathogenicity in humans.
Collapse
Affiliation(s)
- Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Tobias Hamp
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Jeffrey Ede
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Joshua G. Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Jeremy McRae
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
| | - Yanshen Yang
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Anastasia Dietrich
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Petko Fiziev
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Lukas Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Yibing Wu
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Aashish Adhikari
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Yair Field
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Chen Chen
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Serafim Batzoglou
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Rebecca Reimers
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Daniel Balick
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Mareike C. Janiak
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Martin Kuhlwilm
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Department of Evolutionary Anthropology, University of Vienna; Djerassiplatz 1, 1030, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna; 1030, Vienna, Austria
| | - Joseph D. Orkin
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Département d’anthropologie, Université de Montréal; 3150 Jean-Brillant, Montréal, QC, H3T 1N8, Canada
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR); Ghaziabad, 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Alejandro Valenzuela
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University; Aarhus, 8000, Denmark
- Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University; Aarhus, 8000, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development; Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil
- Faculty of Sciences, Department of Organismal Biology, Unit of Evolutionary Biology and Ecology, Université Libre de Bruxelles (ULB); Avenue Franklin D. Roosevelt 50, 1050, Brussels, Belgium
| | - Lidia Agueda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Ian Goodhead
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - R. Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University; SE-75236, Uppsala, Sweden
| | | | - Julie Horvath
- North Carolina Museum of Natural Sciences; Raleigh, North Carolina, 27601, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University; Durham, North Carolina , 27707, USA
- Department of Biological Sciences, North Carolina State University; Raleigh, North Carolina , 27695, USA
- Department of Evolutionary Anthropology, Duke University; Durham, North Carolina , 27708, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | - Fabricio Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah; Salt Lake City, Utah, 84102, USA
| | - Iracilda Sampaio
- Universidade Federal do Para; Guamá, Belém - PA, 66075-110, Brazil
| | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
| | - João Valsecchi do Amaral
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development; Tefé, Amazonas, 69553-225, Brazil
- Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia – RedeFauna; Manaus, Amazonas, 69080-900, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica – ComFauna; Iquitos, Loreto, 16001, Peru
| | - Mariluce Messias
- Universidade Federal de Rondonia; Porto Velho, Rondônia, 78900-000, Brazil
- PPGREN - Programa de Pós-Graduação “Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia; Porto Velho, Rondônia, 78900-000, Brazil
| | - Maria N. F. da Silva
- Instituto Nacional de Pesquisas da Amazonia; Petrópolis, Manaus - AM, 69067-375, Brazil
| | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Rogerio Rossi
- Universidade Federal do Mato Grosso; Boa Esperança, Cuiabá - MT, 78060-900, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL); Manaus, Amazonas, 69080-900, Brazil
- Department of Biology, Trinity University; San Antonio, Texas, 78212, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | - Clément J. Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga; Mahajanga, 401, Madagascar
| | | | | | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center; Houston, Texas, 77030, USA
| | | | - Joe H. Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center; Houston, Texas, 77030, USA
| | - Eduardo Fernandez-Duque
- Yale University; New Haven, Connecticut, 06520, USA
- Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina
| | | | | | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences; Kunming, Yunnan, 650223, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences; Kunming, Yunnan, 650223, China
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen; Copenhagen, DK-2100, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China
- Liangzhu Laboratory, Zhejiang University Medical Center; 1369 West Wenyi Road, Hangzhou, 311121, China
- Women’s Hospital, School of Medicine, Zhejiang University; 1 Xueshi Road, Shangcheng District, Hangzhou, 310006, China
| | - Julius D. Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Head Office; P.O.Box 661, Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health; 17493 Greifswald - Isle of Riems, Germany
| | - Minh D. Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University; Hanoi, 100000, Vietnam
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart; 70191 Stuttgart, Germany
| | - Arcadi Navarro
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology; Av. Doctor Aiguader, N88, Barcelona, 08003, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation; C. Wellington 30, Barcelona, 08005, Spain
| | - Thomas Batallion
- Bioinformatics Research Centre, Aarhus University; Aarhus, 8000, Denmark
| | - Tilo Nadler
- Cuc Phuong Commune; Nho Quan District, Ninh Binh Province, 430000, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
| | - Jessica Lee
- Mandai Nature; 80 Mandai Lake Road, Singapore 729826, Republic of Singapore
| | - Patrick Tan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM); Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School; Singapore 168582, Republic of Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM); Singapore 168582, Republic of Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School; Singapore 168582, Republic of Singapore
- SingHealth Duke-NUS Genomic Medicine Centre; Singapore 168582, Republic of Singapore
| | - Andrew C. Kitchener
- Department of Natural Sciences, National Museums Scotland; Chambers Street, Edinburgh, EH1 1JF, UK
- School of Geosciences, University of Edinburgh; Drummond Street, Edinburgh, EH8 9XP, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research; 37077 Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen; 37077 Göttingen, Germany
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
- Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
| | - Amanda Melin
- Leibniz Science Campus Primate Cognition; 37077 Göttingen, Germany
- Department of Anthropology & Archaeology and Department of Medical Genetics
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University; SE-75236, Uppsala, Sweden
- Alberta Children’s Hospital Research Institute; University of Calgary; 2500 University Dr NW T2N 1N4, Calgary, Alberta, Canada
| | | | - Robin M. D. Beck
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR); Ghaziabad, 201002, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology; Hyderabad, 500007, India
| | - Christian Roos
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh; Edinburgh, EH8 9XP, UK
| | - Jean P. Boubli
- School of Science, Engineering & Environment, University of Salford; Salford, M5 4WT, United Kingdom
| | - Monkol Lek
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research; Kellnerweg 4, 37077 Göttingen, Germany
| | - Shamil Sunyaev
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- Department of Genetics, Yale School of Medicine; New Haven, Connecticut, 06520, USA
| | - Anne O’Donnell
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Heidi Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard; Boston, Massachusetts, 02142, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - Jinbo Xu
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
- Toyota Technological Institute at Chicago; Chicago, Illinois, 60637, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine; Houston, Texas, 77030, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC); PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST); Baldiri i Reixac 4, 08028, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain; Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA) and Universitat Pompeu Fabra, Pg. Luís Companys 23, Barcelona, 08010, Spain
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc.; Foster City, California, 94404, USA
| |
Collapse
|
19
|
Luppino F, Adzhubei IA, Cassa CA, Toth-Petroczy A. DeMAG predicts the effects of variants in clinically actionable genes by integrating structural and evolutionary epistatic features. Nat Commun 2023; 14:2230. [PMID: 37076482 PMCID: PMC10115847 DOI: 10.1038/s41467-023-37661-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 03/27/2023] [Indexed: 04/21/2023] Open
Abstract
Despite the increasing use of genomic sequencing in clinical practice, the interpretation of rare genetic variants remains challenging even in well-studied disease genes, resulting in many patients with Variants of Uncertain Significance (VUSs). Computational Variant Effect Predictors (VEPs) provide valuable evidence in variant assessment, but they are prone to misclassifying benign variants, contributing to false positives. Here, we develop Deciphering Mutations in Actionable Genes (DeMAG), a supervised classifier for missense variants trained using extensive diagnostic data available in 59 actionable disease genes (American College of Medical Genetics and Genomics Secondary Findings v2.0, ACMG SF v2.0). DeMAG improves performance over existing VEPs by reaching balanced specificity (82%) and sensitivity (94%) on clinical data, and includes a novel epistatic feature, the 'partners score', which leverages evolutionary and structural partnerships of residues. The 'partners score' provides a general framework for modeling epistatic interactions, integrating both clinical and functional information. We provide our tool and predictions for all missense variants in 316 clinically actionable disease genes (demag.org) to facilitate the interpretation of variants and improve clinical decision-making.
Collapse
Affiliation(s)
- Federica Luppino
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307, Dresden, Germany
- Center for Systems Biology Dresden, 01307, Dresden, Germany
| | - Ivan A Adzhubei
- Brigham and Women's Hospital Division of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Christopher A Cassa
- Brigham and Women's Hospital Division of Genetics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307, Dresden, Germany.
- Center for Systems Biology Dresden, 01307, Dresden, Germany.
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany.
| |
Collapse
|
20
|
Huh E, Agosto MA, Wensel TG, Lichtarge O. Coevolutionary signals in metabotropic glutamate receptors capture residue contacts and long-range functional interactions. J Biol Chem 2023; 299:103030. [PMID: 36806686 PMCID: PMC10060750 DOI: 10.1016/j.jbc.2023.103030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023] Open
Abstract
Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.
Collapse
Affiliation(s)
- Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Retina and Optic Nerve Research Laboratory, Department of Physiology and Biophysics, Dalhousie University, Halifax, Canada
| | - Theodore G Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
21
|
Lai HY, Yu YH, Jhou YT, Liao CW, Leu JY. Multiple intermolecular interactions facilitate rapid evolution of essential genes. Nat Ecol Evol 2023; 7:745-755. [PMID: 36997737 PMCID: PMC10172115 DOI: 10.1038/s41559-023-02029-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 02/21/2023] [Indexed: 04/01/2023]
Abstract
Essential genes are commonly assumed to function in basic cellular processes and to change slowly. However, it remains unclear whether all essential genes are similarly conserved or if their evolutionary rates can be accelerated by specific factors. To address these questions, we replaced 86 essential genes of Saccharomyces cerevisiae with orthologues from four other species that diverged from S. cerevisiae about 50, 100, 270 and 420 Myr ago. We identify a group of fast-evolving genes that often encode subunits of large protein complexes, including anaphase-promoting complex/cyclosome (APC/C). Incompatibility of fast-evolving genes is rescued by simultaneously replacing interacting components, suggesting it is caused by protein co-evolution. Detailed investigation of APC/C further revealed that co-evolution involves not only primary interacting proteins but also secondary ones, suggesting the evolutionary impact of epistasis. Multiple intermolecular interactions in protein complexes may provide a microenvironment facilitating rapid evolution of their subunits.
Collapse
Affiliation(s)
- Huei-Yi Lai
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Yen-Hsin Yu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Yu-Ting Jhou
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Chia-Wei Liao
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
22
|
Densi A, Iyer RS, Bhat PJ. Synonymous and Nonsynonymous Substitutions in Dictyostelium discoideum Ammonium Transporter amtA Are Necessary for Functional Complementation in Saccharomyces cerevisiae. Microbiol Spectr 2023; 11:e0384722. [PMID: 36840598 PMCID: PMC10100761 DOI: 10.1128/spectrum.03847-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/24/2023] [Indexed: 02/24/2023] Open
Abstract
Ammonium transporters are present in all three domains of life. They have undergone extensive horizontal gene transfer (HGT), gene duplication, and functional diversification and therefore offer an excellent paradigm to study protein evolution. We attempted to complement a mep1Δmep2Δmep3Δ strain of Saccharomyces cerevisiae (triple-deletion strain), which otherwise cannot grow on ammonium as a sole nitrogen source at concentrations of <3 mM, with amtA of Dictyostelium discoideum, an orthologue of S. cerevisiae MEP2. We observed that amtA did not complement the triple-deletion strain of S. cerevisiae for growth on low-ammonium medium. We isolated two mutant derivatives of amtA (amtA M1 and amtA M2) from a PCR-generated mutant plasmid library that complemented the triple-deletion strain of S. cerevisiae. amtA M1 bears three nonsynonymous and two synonymous substitutions, which are necessary for its functionality. amtA M2 bears two nonsynonymous substitutions and one synonymous substitution, all of which are necessary for functionality. Interestingly, AmtA M1 transports ammonium but does not confer methylamine toxicity, while AmtA M2 transports ammonium and confers methylamine toxicity, demonstrating functional diversification. Preliminary biochemical analyses indicated that the mutants differ in their conformations as well as their mechanisms of ammonium transport. These intriguing results clearly point out that protein evolution cannot be fathomed by studying nonsynonymous and synonymous substitutions in isolation. The above-described observations have significant implications for various facets of biological processes and are discussed in detail. IMPORTANCE Functional diversification following gene duplication is one of the major driving forces of protein evolution. While the role of nonsynonymous substitutions in the functional diversification of proteins is well recognized, knowledge of the role of synonymous substitutions in protein evolution is in its infancy. Using functional complementation, we isolated two functional alleles of the D. discoideum ammonium transporter gene (amtA), which otherwise does not function in S. cerevisiae as an ammonium transporters. One of them is an ammonium transporter, while the other is an ammonium transporter that also confers methylammonium (ammonium analogue) toxicity, suggesting functional diversification. Surprisingly, both alleles require a combination of synonymous and nonsynonymous substitutions for their functionality. These results bring out a hitherto-unknown pathway of protein evolution and pave the way for not only understanding protein evolution but also interpreting single nucleotide polymorphisms (SNPs).
Collapse
Affiliation(s)
- Asha Densi
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Revathi S. Iyer
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Paike Jayadeva Bhat
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
23
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
24
|
Vigué L, Croce G, Petitjean M, Ruppé E, Tenaillon O, Weigt M. Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes. Nat Commun 2022; 13:4030. [PMID: 35821377 PMCID: PMC9276797 DOI: 10.1038/s41467-022-31643-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 06/27/2022] [Indexed: 12/05/2022] Open
Abstract
Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes from the analysis of distant homologues. By comparing a context-aware Direct-Coupling Analysis modelling to a non-epistatic approach, we show that the genetic context strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. The study of more distant species suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions.
Collapse
Affiliation(s)
- Lucile Vigué
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Giancarlo Croce
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics-SIB, Lausanne, Switzerland
| | - Marie Petitjean
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Etienne Ruppé
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
- Laboratoire de Bactériologie, Hôpital Bichat, APHP, Paris, France
| | - Olivier Tenaillon
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France.
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology-LCQB, Paris, France.
| |
Collapse
|
25
|
Pfennig A, Lachance J. Hybrid fitness effects modify fixation probabilities of introgressed alleles. G3 GENES|GENOMES|GENETICS 2022; 12:6583188. [PMID: 35536195 PMCID: PMC9258535 DOI: 10.1093/g3journal/jkac113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 04/28/2022] [Indexed: 11/12/2022]
Abstract
Hybridization is a common occurrence in natural populations, and introgression is a major source of genetic variation. Despite the evolutionary importance of adaptive introgression, classical population genetics theory does not take into account hybrid fitness effects. Specifically, heterosis (i.e. hybrid vigor) and Dobzhansky–Muller incompatibilities influence the fates of introgressed alleles. Here, we explicitly account for polygenic, unlinked hybrid fitness effects when tracking a rare introgressed marker allele. These hybrid fitness effects quickly decay over time due to repeated backcrossing, enabling a separation-of-timescales approach. Using diffusion and branching process theory in combination with computer simulations, we formalize the intuition behind how hybrid fitness effects affect introgressed alleles. We find that hybrid fitness effects can significantly hinder or boost the fixation probability of introgressed alleles, depending on the relative strength of heterosis and Dobzhansky–Muller incompatibilities effects. We show that the inclusion of a correction factor (α, representing the compounded effects of hybrid fitness effects over time) into classic population genetics theory yields accurate fixation probabilities. Despite having a strong impact on the probability of fixation, hybrid fitness effects only subtly change the distribution of fitness effects of introgressed alleles that reach fixation. Although strong Dobzhansky–Muller incompatibility effects may expedite the loss of introgressed alleles, fixation times are largely unchanged by hybrid fitness effects.
Collapse
Affiliation(s)
- Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
| | - Joseph Lachance
- School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA
| |
Collapse
|
26
|
Park Y, Metzger BPH, Thornton JW. Epistatic drift causes gradual decay of predictability in protein evolution. Science 2022; 376:823-830. [PMID: 35587978 DOI: 10.1126/science.abn6895] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Epistatic interactions can make the outcomes of evolution unpredictable, but no comprehensive data are available on the extent and temporal dynamics of changes in the effects of mutations as protein sequences evolve. Here, we use phylogenetic deep mutational scanning to measure the functional effect of every possible amino acid mutation in a series of ancestral and extant steroid receptor DNA binding domains. Across 700 million years of evolution, epistatic interactions caused the effects of most mutations to become decorrelated from their initial effects and their windows of evolutionary accessibility to open and close transiently. Most effects changed gradually and without bias at rates that were largely constant across time, indicating a neutral process caused by many weak epistatic interactions. Our findings show that protein sequences drift inexorably into contingency and unpredictability, but that the process is statistically predictable, given sufficient phylogenetic and experimental data.
Collapse
Affiliation(s)
- Yeonwoo Park
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Brian P H Metzger
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Joseph W Thornton
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL, USA.,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.,Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
27
|
Local and Global Protein Interactions Contribute to Residue Entrenchment in Beta-Lactamase TEM-1. Antibiotics (Basel) 2022; 11:antibiotics11050652. [PMID: 35625296 PMCID: PMC9137480 DOI: 10.3390/antibiotics11050652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/29/2022] [Accepted: 05/05/2022] [Indexed: 11/22/2022] Open
Abstract
Due to their rapid evolution and their impact on healthcare, beta-lactamases, protein degrading beta-lactam antibiotics, are used as generic models of protein evolution. Therefore, we investigated the mutation effects in two distant beta-lactamases, TEM-1 and CTX-M-15. Interestingly, we found a site with a complex pattern of genetic interactions. Mutation G251W in TEM-1 inactivates the protein’s function, just as the reciprocal mutation, W251G, does in CTX-M-15. The phylogenetic analysis revealed that mutation G has been entrenched in TEM-1’s background: while rarely observed throughout the phylogeny, it is essential in TEM-1. Using a rescue experiment, in the TEM-1 G251W mutant, we identified sites that alleviate the deviation from G to W. While few of these mutations could potentially involve local interactions, most of them were found on distant residues in the 3D structure. Many well-known mutations that have an impact on protein stability, such as M182T, were recovered. Our results therefore suggest that entrenchment of an amino acid may rely on diffuse interactions among multiple sites, with a major impact on protein stability.
Collapse
|
28
|
Stolyarova AV, Neretina TV, Zvyagina EA, Fedotova AV, Kondrashov A, Bazykin GA. Complex fitness landscape shapes variation in a hyperpolymorphic species. eLife 2022; 11:76073. [PMID: 35532122 PMCID: PMC9187340 DOI: 10.7554/elife.76073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 05/09/2022] [Indexed: 11/13/2022] Open
Abstract
It is natural to assume that patterns of genetic variation in hyperpolymorphic species can reveal large-scale properties of the fitness landscape that are hard to detect by studying species with ordinary levels of genetic variation. Here, we study such patterns in a fungus Schizophyllum commune, the most polymorphic species known. Throughout the genome, short-range linkage disequilibrium (LD) caused by attraction of minor alleles is higher between pairs of nonsynonymous than of synonymous variants. This effect is especially pronounced for pairs of sites that are located within the same gene, especially if a large fraction of the gene is covered by haploblocks, genome segments where the gene pool consists of two highly divergent haplotypes, which is a signature of balancing selection. Haploblocks are usually shorter than 1000 nucleotides, and collectively cover about 10% of the S. commune genome. LD tends to be substantially higher for pairs of nonsynonymous variants encoding amino acids that interact within the protein. There is a substantial correlation between LDs at the same pairs of nonsynonymous mutations in the USA and the Russian populations. These patterns indicate that selection in S. commune involves positive epistasis due to compensatory interactions between nonsynonymous alleles. When less polymorphic species are studied, analogous patterns can be detected only through interspecific comparisons. Changes to DNA known as mutations may alter how the proteins and other components of a cell work, and thus play an important role in allowing living things to evolve new traits and abilities over many generations. Whether a mutation is beneficial or harmful may differ depending on the genetic background of the individual – that is, depending on other mutations present in other positions within the same gene – due to a phenomenon called epistasis. Epistasis is known to affect how various species accumulate differences in their DNA compared to each other over time. For example, a mutation that is rare in humans and known to cause disease may be widespread in other primates because its negative effect is canceled out by another mutation that is standard for these species but absent in humans. However, it remains unclear whether epistasis plays a significant part in shaping genetic differences between individuals of the same species. A type of fungus known as Schizophyllum commune lives on rotting wood and is found across the world. It is one of the most genetically diverse species currently known, so there is a higher chance of pairs of compensatory mutations occurring and persisting for a long time in S. commune than in most other species, providing a unique opportunity to study epistasis. Here, Stolyarova et al. studied two distinct populations of S. commune, one from the USA and one from Russia. The team found that – unlike in humans, flies and other less genetically diverse species – epistasis maintains combinations of mutations in S. commune that individually would be harmful to the fungus but together compensate for each other. For example, pairs of mutations affecting specific molecules known as amino acids – the building blocks of proteins – that physically interact with each other tended to be found together in the same individuals. One potential downside of having pairs of compensatory mutations in the genome is that when the organism reproduces, the process of making sex cells may split up these pairs so that harmful mutations are inherited without their partner mutations. Thus, epistasis may have helped shape the way S. commune and other genetically diverse species have evolved.
Collapse
Affiliation(s)
| | - Tatiana V Neretina
- Biological Faculty, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Elena A Zvyagina
- Biological Faculty, Lomonosov Moscow State University, Moscow, Russian Federation
| | - Anna V Fedotova
- Skolkovo Institute of Science and Technology, Moscow, Russian Federation
| | - Alexey Kondrashov
- Department of Ecology and Evolutionary Biology, University of Michigan-Ann Arbor, Ann Arbor, United States
| | - Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russian Federation
| |
Collapse
|
29
|
Ding D, Green AG, Wang B, Lite TLV, Weinstein EN, Marks DS, Laub MT. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 2022; 6:590-603. [PMID: 35361892 PMCID: PMC9090974 DOI: 10.1038/s41559-022-01688-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/31/2022] [Indexed: 01/08/2023]
Abstract
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin-antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin-antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.
Collapse
Affiliation(s)
- David Ding
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Boyuan Wang
- Department of Pharmacology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Thuy-Lan Vo Lite
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
30
|
Phylogenetic inference of changes in amino acid propensities with single-position resolution. PLoS Comput Biol 2022; 18:e1009878. [PMID: 35180226 PMCID: PMC9106220 DOI: 10.1371/journal.pcbi.1009878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 05/13/2022] [Accepted: 01/28/2022] [Indexed: 11/19/2022] Open
Abstract
Fitness conferred by the same allele may differ between genotypes and environments, and these differences shape variation and evolution. Changes in amino acid propensities at protein sites over the course of evolution have been inferred from sequence alignments statistically, but the existing methods are data-intensive and aggregate multiple sites. Here, we develop an approach to detect individual amino acids that confer different fitness in different groups of species from combined sequence and phylogenetic data. Using the fact that the probability of a substitution to an amino acid depends on its fitness, our method looks for amino acids such that substitutions to them occur more frequently in one group of lineages than in another. We validate our method using simulated evolution of a protein site under different scenarios and show that it has high specificity for a wide range of assumptions regarding the underlying changes in selection, while its sensitivity differs between scenarios. We apply our method to the env gene of two HIV-1 subtypes, A and B, and to the HA gene of two influenza A subtypes, H1 and H3, and show that the inferred fitness changes are consistent with the fitness differences observed in deep mutational scanning experiments. We find that changes in relative fitness of different amino acid variants within a site do not always trigger episodes of positive selection and therefore may not result in an overall increase in the frequency of substitutions, but can still be detected from changes in relative frequencies of different substitutions.
Collapse
|
31
|
Zeng Z, Aptekmann AA, Bromberg Y. Decoding the effects of synonymous variants. Nucleic Acids Res 2021; 49:12673-12691. [PMID: 34850938 PMCID: PMC8682775 DOI: 10.1093/nar/gkab1159] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 11/02/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022] Open
Abstract
Synonymous single nucleotide variants (sSNVs) are common in the human genome but are often overlooked. However, sSNVs can have significant biological impact and may lead to disease. Existing computational methods for evaluating the effect of sSNVs suffer from the lack of gold-standard training/evaluation data and exhibit over-reliance on sequence conservation signals. We developed synVep (synonymous Variant effect predictor), a machine learning-based method that overcomes both of these limitations. Our training data was a combination of variants reported by gnomAD (observed) and those unreported, but possible in the human genome (generated). We used positive-unlabeled learning to purify the generated variant set of any likely unobservable variants. We then trained two sequential extreme gradient boosting models to identify subsets of the remaining variants putatively enriched and depleted in effect. Our method attained 90% precision/recall on a previously unseen set of variants. Furthermore, although synVep does not explicitly use conservation, its scores correlated with evolutionary distances between orthologs in cross-species variation analysis. synVep was also able to differentiate pathogenic vs. benign variants, as well as splice-site disrupting variants (SDV) vs. non-SDVs. Thus, synVep provides an important improvement in annotation of sSNVs, allowing users to focus on variants that most likely harbor effects.
Collapse
Affiliation(s)
- Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Ariel A Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
32
|
Dahanayaka BA, Vaghefi N, Snyman L, Martin A. Investigating In Vitro Mating Preference Between or Within the Two Forms of Pyrenophora teres and Its Hybrids. PHYTOPATHOLOGY 2021; 111:2278-2286. [PMID: 34033506 DOI: 10.1094/phyto-02-21-0058-r] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Net blotch diseases result in significant yield losses to barley industries worldwide. They occur as net-form and spot-form net blotch caused by Pyrenophora teres f. teres and P. teres f. maculata, respectively. Hybridization between the forms was proposed to be rare, but recent identifications of field hybrids has renewed interest in the frequency and mechanisms underlying hybridization. This study investigates the mating preference of P. teres f. teres, P. teres f. maculata, and laboratory-produced hybrids in vitro, using 24 different isolates and four different experimental setups. Two crosses in our study produced ascospores during two intervals separated by a 32- to 35-day period of no ascospore production. For these crosses, P. teres f. teres isolates mated with isolates of the same form during the early ascospore production interval, and produced hybrids during the later interval. P. teres f. maculata isolates did not mate with isolates of the same form, but instead hybridized with P. teres f. teres isolates. Analyses based on DArTseq markers confirmed that laboratory-produced hybrids, when given the choice to mate with both P. teres f. teres and P. teres f. maculata, mated with P. teres f. teres isolates. These results unravel a novel concept that P. teres f. teres seems to have a greater reproduction vigor than P. teres f. maculata, which could lead to increased prevalence of hybrid incidences in vivo.
Collapse
Affiliation(s)
- Buddhika A Dahanayaka
- Centre for Crop Health, University of Southern Queensland, Toowoomba, QLD 4350, Australia
| | - Niloofar Vaghefi
- Centre for Crop Health, University of Southern Queensland, Toowoomba, QLD 4350, Australia
| | - Lislé Snyman
- Department of Agriculture and Fisheries Queensland, Hermitage Research Facility, Warwick, QLD 4370, Australia
| | - Anke Martin
- Centre for Crop Health, University of Southern Queensland, Toowoomba, QLD 4350, Australia
| |
Collapse
|
33
|
Behdenna A, Godfroid M, Petot P, Pothier J, Lambert A, Achaz G. A minimal yet flexible likelihood framework to assess correlated evolution. Syst Biol 2021; 71:823-838. [PMID: 34792608 DOI: 10.1093/sysbio/syab092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 11/04/2021] [Accepted: 11/09/2021] [Indexed: 11/14/2022] Open
Abstract
An evolutionary process is reflected in the sequence of changes of any trait (e.g. morphological or molecular) through time. Yet, a better understanding of evolution would be procured by characterizing correlated evolution, or when two or more evolutionary processes interact. Previously developed parametric methods often require significant computing time as they rely on the estimation of many parameters. Here we propose a minimal likelihood framework modelling the joint evolution of two traits on a known phylogenetic tree. The type and strength of correlated evolution is characterized by a few parameters tuning mutation rates of each trait and interdependencies between these rates. The framework can be applied to study any discrete trait or character ranging from nucleotide substitution to gain or loss of a biological function. More specifically, it can be used to 1) test for independence between two evolutionary processes, 2) identify the type of interaction between them and 3) estimate parameter values of the most likely model of interaction. In the current implementation, the method takes as input a phylogenetic tree with discrete evolutionary events mapped on its branches. The method then maximizes the likelihood for one or several chosen scenarios. The strengths and limits of the method, as well as its relative power compared to a few other methods, are assessed using both simulations and data from 16S rRNA sequences in a sample of 54 γ-enterobacteria. We show that, even with datasets of fewer than 100 species, the method performs well in parameter estimation and in evolutionary model selection.
Collapse
Affiliation(s)
- Abdelkader Behdenna
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Epigene Labs, 7 Square Gabriel Fauré, 75017 Paris, France
| | - Maxime Godfroid
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
| | - Patrice Petot
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
| | - Joël Pothier
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
| | - Amaury Lambert
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Laboratoire de Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, CNRS UMR 8001, Université de Paris, 4, place Jussieu, 75005 Paris, France
| | - Guillaume Achaz
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS UMR 7205, Sorbonne Université, École Pratique des Hautes Études, Université des Antilles, 45 rue Buffon, 75005 Paris, France
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, Université PSL, 11, place Marcellin Berthelot, 75005 Paris, France
- Éco-anthropologie, Muséum National d'Histoire Naturelle, CNRS UMR 7206, Université de Paris, place du Trocadéro, 75016 Paris, France
| |
Collapse
|
34
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
35
|
Castel B, Fairhead S, Furzer OJ, Redkar A, Wang S, Cevik V, Holub EB, Jones JDG. Evolutionary trade-offs at the Arabidopsis WRR4A resistance locus underpin alternate Albugo candida race recognition specificities. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:1490-1502. [PMID: 34181787 DOI: 10.1111/tpj.15396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/18/2021] [Accepted: 06/21/2021] [Indexed: 06/13/2023]
Abstract
The oomycete Albugo candida causes white rust of Brassicaceae, including vegetable and oilseed crops, and wild relatives such as Arabidopsis thaliana. Novel White Rust Resistance (WRR) genes from Arabidopsis enable new insights into plant/parasite co-evolution. WRR4A from Arabidopsis accession Columbia (Col-0) provides resistance to many but not all white rust races, and encodes a nucleotide-binding, leucine-rich repeat immune receptor. Col-0 WRR4A resistance is broken by AcEx1, an isolate of A. candida. We identified an allele of WRR4A in Arabidopsis accession Øystese-0 (Oy-0) and other accessions that confers full resistance to AcEx1. WRR4AOy-0 carries a C-terminal extension required for recognition of AcEx1, but reduces recognition of several effectors recognized by the WRR4ACol-0 allele. WRR4AOy-0 confers full resistance to AcEx1 when expressed in the oilseed crop Camelina sativa.
Collapse
Affiliation(s)
- Baptiste Castel
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
- Department of Biological Sciences, National University of Singapore, Singapore
| | - Sebastian Fairhead
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
- Warwick Crop Centre, School of Life Sciences, University of Warwick, CV35 9EF, Wellesbourne, United Kingdom
| | - Oliver J Furzer
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
- Department of Biology, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Amey Redkar
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
- Department of Genetics, University of Cordoba, 14071, Cordoba, Spain
| | - Shanshan Wang
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
| | - Volkan Cevik
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, BA2 7AY, Bath, United Kingdom
| | - Eric B Holub
- Warwick Crop Centre, School of Life Sciences, University of Warwick, CV35 9EF, Wellesbourne, United Kingdom
| | - Jonathan D G Jones
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, NR4 7UH, Norwich, United Kingdom
| |
Collapse
|
36
|
Serrano C, Teixeira CSS, Cooper DN, Carneiro J, Lopes-Marques M, Stenson PD, Amorim A, Prata MJ, Sousa SF, Azevedo L. Compensatory epistasis explored by molecular dynamics simulations. Hum Genet 2021; 140:1329-1342. [PMID: 34173867 DOI: 10.1007/s00439-021-02307-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/20/2021] [Indexed: 11/24/2022]
Abstract
A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position.
Collapse
Affiliation(s)
- Catarina Serrano
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Carla S S Teixeira
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - João Carneiro
- CIIMAR, Interdisciplinary Centre of Marine and Environmental Research, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208, Matosinhos, Portugal
| | - Mónica Lopes-Marques
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, CF14 4XN, UK
| | - António Amorim
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Maria J Prata
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal
| | - Sérgio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, Porto, Portugal.
| | - Luísa Azevedo
- i3S, Instituto de Investigação e Inovação em Saúde, Population Genetics and Evolution Group, Universidade do Porto, Rua Alfredo Allen 208, 4200-135, Porto, Portugal.
- IPATIMUP-Institute of Molecular Pathology and Immunology, University of Porto, Rua Júlio Amaral de Carvalho 45, 4200-135, Porto, Portugal.
- Department of Biology, Faculty of Sciences, University of Porto, Rua Do Campo Alegre, s/n, 4169-007, Porto, Portugal.
| |
Collapse
|
37
|
Miton CM, Buda K, Tokuriki N. Epistasis and intramolecular networks in protein evolution. Curr Opin Struct Biol 2021; 69:160-168. [PMID: 34077895 DOI: 10.1016/j.sbi.2021.04.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 12/01/2022]
Abstract
Proteins are molecular machines composed of complex, highly connected amino acid networks. Their functional optimization requires the reorganization of these intramolecular networks by evolution. In this review, we discuss the mechanisms by which epistasis, that is, the dependence of the effect of a mutation on the genetic background, rewires intramolecular interactions to alter protein function. Deciphering the biophysical basis of epistasis is crucial to our understanding of evolutionary dynamics and the elucidation of sequence-structure-function relationships. We featured recent studies that provide insights into the molecular mechanisms giving rise to epistasis, particularly at the structural level. These studies illustrate the convoluted and fascinating nature of the intramolecular networks co-opted by epistasis during the evolution of protein function.
Collapse
Affiliation(s)
- Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
| | - Karol Buda
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada.
| |
Collapse
|
38
|
Yan Y, Li Z, Li Y, Wu Z, Yang R. Correlated Evolution of Large DNA Fragments in the 3D Genome of Arabidopsis thaliana. Mol Biol Evol 2021; 37:1621-1636. [PMID: 32044988 DOI: 10.1093/molbev/msaa031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In eukaryotes, the three-dimensional (3D) conformation of the genome is far from random, and this nonrandom chromatin organization is strongly correlated with gene expression and protein function, which are two critical determinants of the selective constraints and evolutionary rates of genes. However, whether genes and other elements that are located close to each other in the 3D genome evolve in a coordinated way has not been investigated in any organism. To address this question, we constructed chromatin interaction networks (CINs) in Arabidopsis thaliana based on high-throughput chromosome conformation capture data and demonstrated that adjacent large DNA fragments in the CIN indeed exhibit more similar levels of polymorphism and evolutionary rates than random fragment pairs. Using simulations that account for the linear distance between fragments, we proved that the 3D chromosomal organization plays a role in the observed correlated evolution. Spatially interacting fragments also exhibit more similar mutation rates and functional constraints in both coding and noncoding regions than the random expectations, indicating that the correlated evolution between 3D neighbors is a result of combined evolutionary forces. A collection of 39 genomic and epigenomic features can explain much of the variance in genetic diversity and evolutionary rates across the genome. Moreover, features that have a greater effect on the evolution of regional sequences tend to show higher similarity between neighboring fragments in the CIN, suggesting a pivotal role of epigenetic modifications and chromatin organization in determining the correlated evolution of large DNA fragments in the 3D genome.
Collapse
Affiliation(s)
- Yubin Yan
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhaohong Li
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Ye Li
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Zefeng Wu
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| | - Ruolin Yang
- College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, China
| |
Collapse
|
39
|
Mahlich Y, Miller M, Zeng Z, Bromberg Y. Low Diversity of Human Variation Despite Mostly Mild Functional Impact of De Novo Variants. Front Mol Biosci 2021; 8:635382. [PMID: 33816556 PMCID: PMC8012514 DOI: 10.3389/fmolb.2021.635382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 02/01/2021] [Indexed: 01/07/2023] Open
Abstract
Non-synonymous Single Nucleotide Variants (nsSNVs), resulting in single amino acid variants (SAVs), are important drivers of evolutionary adaptation across the tree of life. Humans carry on average over 10,000 SAVs per individual genome, many of which likely have little to no impact on the function of the protein they affect. Experimental evidence for protein function changes as a result of SAVs remain sparse – a situation that can be somewhat alleviated by predicting their impact using computational methods. Here, we used SNAP to examine both observed and in silico generated human variation in a set of 1,265 proteins that are consistently found across a number of diverse species. The number of SAVs that are predicted to have any functional effect on these proteins is smaller than expected, suggesting sequence/function optimization over evolutionary timescales. Additionally, we find that only a few of the yet-unobserved SAVs could drastically change the function of these proteins, while nearly a quarter would have only a mild functional effect. We observed that variants common in the human population localized to less conserved protein positions and carried mild to moderate functional effects more frequently than rare variants. As expected, rare variants carried severe effects more frequently than common variants. In line with current assumptions, we demonstrated that the change of the human reference sequence amino acid to the reference of another species (a cross-species variant) is unlikely to significantly impact protein function. However, we also observed that many cross-species variants may be weakly non-neutral for the purposes of quick adaptation to environmental changes, but may not be identified as such by current state-of-the-art methodology.
Collapse
Affiliation(s)
- Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Maximillian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States.,Department of Genetics, Rutgers University, Piscataway, NJ, United States
| |
Collapse
|
40
|
Lai J, Sarkar IN. A Phylogenetic Approach to Analyze the Conservativeness of BRCA1 and BRCA2 Mutations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:677-686. [PMID: 33936442 PMCID: PMC8075528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identifying pathogenic mutations in BRCA1 and BRCA2 is a critical step for breast cancer prediction. Genome-wide association studies (GWAS) are the most commonly used method for inferring pathogenic mutations. However, identifying pathogenic mutations using GWAS can be difficult. The hypothesis of this study is that the pathogenic mutations in human BRCA1/BRCA2, which are present in many species, are more likely to be located in the evolutionarily conserved sites. This study defines the evolutionary conservativeness based on the previously developed Characteristic Attribute Organization System (CAOS) software. ClinVar is used to identify human pathogenic mutations in BRCA1 and BRCA2. Statistical tests suggest that compared to the non-pathogenic mutations, human pathogenic mutations were more likely to locate at the evolutionary conserved positions. The approach presented in this study shows promise in identifying pathogenic mutations in humans, suggesting that the methodology may be applied to other disease-related genes to identify putative pathogenic mutations.
Collapse
Affiliation(s)
- Jiaying Lai
- Center for Biomedical Informatics, Brown University, Providence, RI, USA
| | - Indra Neil Sarkar
- Center for Biomedical Informatics, Brown University, Providence, RI, USA
- Rhode Island Quality Institute, Providence, RI, USA
| |
Collapse
|
41
|
Neverov AD, Popova AV, Fedonin GG, Cheremukhin EA, Klink GV, Bazykin GA. Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins. PLoS Genet 2021; 17:e1008711. [PMID: 33493156 PMCID: PMC7861529 DOI: 10.1371/journal.pgen.1008711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 02/04/2021] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
The rate of evolution differs between protein sites and changes with time. However, the link between these two phenomena remains poorly understood. Here, we design a phylogenetic approach for distinguishing pairs of amino acid sites that evolve concordantly, i.e., such that substitutions at one site trigger subsequent substitutions at the other; and also pairs of sites that evolve discordantly, so that substitutions at one site impede subsequent substitutions at the other. We distinguish groups of amino acid sites that undergo coordinated evolution and evolve discordantly from other such groups. In mitochondrion-encoded proteins of metazoans and fungi, we show that concordantly evolving sites are clustered in protein structures. By analysing the phylogenetic patterns of substitutions at concordantly and discordantly evolving site pairs, we find that concordant evolution has two distinct causes: epistatic interactions between amino acid substitutions and episodes of selection independently affecting substitutions at different sites. The rate of substitutions at concordantly evolving groups of protein sites changes in the course of evolution, indicating episodes of selection limited to some of the lineages. The phylogenetic positions of these changes are consistent between proteins, suggesting common selective forces underlying them. The mode and rate of evolution of a protein site depends on the effect of its mutations on protein fitness. The fitness effect of a mutation itself can change in the course of evolution for at least two reasons. First, it can be modulated by substitutions occurring at other sites, a phenomenon called epistasis. Second, changes in selection can be non-epistatic, affecting sites independently of one another. Here, we analyse substitutions accumulated by the evolving lineages of the five proteins encoded by the mitochondrial genomes of thousands of species of metazoans and fungi. We show that substitutions at different amino acid sites occur in a coordinated fashion, and this coordination is caused both by epistasis and by episodes of selection affecting groups of sites. We partition each protein into several groups of concordantly evolving sites such that evolution of sites from different groups is discordant, and show that the proteins encoded by the mitochondrial genome consist of coevolving structural blocks. Some of these blocks have a clear functional specialization, e.g. are associated with interfaces between proteins composing respiratory complexes. Together, our results reveal a previously unrecognized complexity in the causes of variation in evolutionary rates between protein sites.
Collapse
Affiliation(s)
- Alexey D. Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- * E-mail:
| | - Anfisa V. Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
| | - Gennady G. Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia
| | | | - Galya V. Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Georgii A. Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
42
|
Castellana S, Biagini T, Petrizzelli F, Parca L, Panzironi N, Caputo V, Vescovi AL, Carella M, Mazza T. MitImpact 3: modeling the residue interaction network of the Respiratory Chain subunits. Nucleic Acids Res 2021; 49:D1282-D1288. [PMID: 33300029 PMCID: PMC7779045 DOI: 10.1093/nar/gkaa1032] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 10/14/2020] [Accepted: 12/08/2020] [Indexed: 12/26/2022] Open
Abstract
Numerous lines of evidence have shown that the interaction between the nuclear and mitochondrial genomes ensures the efficient functioning of the OXPHOS complexes, with substantial implications in bioenergetics, adaptation, and disease. Their interaction is a fascinating and complex trait of the eukaryotic cell that MitImpact explores with its third major release. MitImpact expands its collection of genomic, clinical, and functional annotations of all non-synonymous substitutions of the human mitochondrial genome with new information on putative Compensated Pathogenic Deviations and co-varying amino acid sites of the Respiratory Chain subunits. It further provides evidence of energetic and structural residue compensation by techniques of molecular dynamics simulation. MitImpact is freely accessible at http://mitimpact.css-mendel.it.
Collapse
Affiliation(s)
- Stefano Castellana
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Tommaso Biagini
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Francesco Petrizzelli
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Luca Parca
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Noemi Panzironi
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Rome 00161, Italy
| | - Angelo Luigi Vescovi
- ISBReMIT Institute for Stem Cell Biology, Regenerative Medicine and Innovative Therapies, IRCSS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| | - Massimo Carella
- Laboratory of Medical Genetics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG) 71013, Italy
| | - Tommaso Mazza
- Laboratory of Bioinformatics, IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), 71013, Italy
| |
Collapse
|
43
|
Hill T, Unckless RL. Recurrent evolution of high virulence in isolated populations of a DNA virus. eLife 2020; 9:e58931. [PMID: 33112738 PMCID: PMC7685711 DOI: 10.7554/elife.58931] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 10/28/2020] [Indexed: 12/30/2022] Open
Abstract
Hosts and viruses are constantly evolving in response to each other: as a host attempts to suppress a virus, the virus attempts to evade and suppress the host's immune system. Here, we describe the recurrent evolution of a virulent strain of a DNA virus, which infects multiple Drosophila species. Specifically, we identified two distinct viral types that differ 100-fold in viral titer in infected individuals, with similar differences observed in multiple species. Our analysis suggests that one of the viral types recurrently evolved at least four times in the past ~30,000 years, three times in Arizona and once in another geographically distinct species. This recurrent evolution may be facilitated by an effective mutation rate which increases as each prior mutation increases viral titer and effective population size. The higher titer viral type suppresses the host-immune system and an increased virulence compared to the low viral titer type.
Collapse
Affiliation(s)
- Tom Hill
- The Department of Molecular Biosciences, University of KansasLawrenceUnited States
| | - Robert L Unckless
- The Department of Molecular Biosciences, University of KansasLawrenceUnited States
| |
Collapse
|
44
|
Rochman ND, Wolf YI, Koonin EV. Deep phylogeny of cancer drivers and compensatory mutations. Commun Biol 2020; 3:551. [PMID: 33009502 PMCID: PMC7532533 DOI: 10.1038/s42003-020-01276-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 09/03/2020] [Indexed: 12/14/2022] Open
Abstract
Driver mutations (DM) are the genetic impetus for most cancers. The DM are assumed to be deleterious in species evolution, being eliminated by purifying selection unless compensated by other mutations. We present deep phylogenies for 84 cancer driver genes and investigate the prevalence of 434 DM across gene-species trees. The DM are rare in species evolution, and 181 are completely absent, validating their negative fitness effect. The DM are more common in unicellular than in multicellular eukaryotes, suggesting a link between these mutations and cell proliferation control. 18 DM appear as the ancestral state in one or more major clades, including 3 among mammals. We identify within-gene, compensatory mutations for 98 DM and infer likely interactions between the DM and compensatory sites in protein structures. These findings elucidate the evolutionary status of DM and are expected to advance the understanding of the functions and evolution of oncogenes and tumor suppressors. Rochman et al. present deep phylogenies for 84 cancer driver genes and examine the prevalence of driver mutations across gene-species trees. Their results show that driver mutations are rare in species evolution and give insight into the evolution of driver mutations and oncogenes.
Collapse
Affiliation(s)
- Nash D Rochman
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
45
|
Stolyarova AV, Nabieva E, Ptushenko VV, Favorov AV, Popova AV, Neverov AD, Bazykin GA. Senescence and entrenchment in evolution of amino acid sites. Nat Commun 2020; 11:4603. [PMID: 32929079 PMCID: PMC7490271 DOI: 10.1038/s41467-020-18366-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 08/20/2020] [Indexed: 01/01/2023] Open
Abstract
Amino acid propensities at a site change in the course of protein evolution. This may happen for two reasons. Changes may be triggered by substitutions at epistatically interacting sites elsewhere in the genome. Alternatively, they may arise due to environmental changes that are external to the genome. Here, we design a framework for distinguishing between these alternatives. Using analytical modelling and simulations, we show that they cause opposite dynamics of the fitness of the allele currently occupying the site: it tends to increase with the time since its origin due to epistasis ("entrenchment"), but to decrease due to random environmental fluctuations ("senescence"). By analysing the genomes of vertebrates and insects, we show that the amino acids originating at negatively selected sites experience strong entrenchment. By contrast, the amino acids originating at positively selected sites experience senescence. We propose that senescence of the current allele is a cause of adaptive evolution.
Collapse
Affiliation(s)
- A V Stolyarova
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia.
| | - E Nabieva
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| | - V V Ptushenko
- Department of Photochemistry and Photobiology, N. M. Emanuel Institute of Biochemical Physics of Russian Academy of Sciences, Moscow, 119334, Russia
- A. N. Belozersky Institute of Physical-Chemical Biology, M. V. Lomonosov Moscow State University, Moscow, 119992, Russia
| | - A V Favorov
- Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Laboratory of System Biology and Computational Genetics, Vavilov Institute of General Genetics, Moscow, 119991, Russia
| | - A V Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - A D Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - G A Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| |
Collapse
|
46
|
Schober AF, Mathis AD, Ingle C, Park JO, Chen L, Rabinowitz JD, Junier I, Rivoire O, Reynolds KA. A Two-Enzyme Adaptive Unit within Bacterial Folate Metabolism. Cell Rep 2020; 27:3359-3370.e7. [PMID: 31189117 DOI: 10.1016/j.celrep.2019.05.030] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 04/05/2019] [Accepted: 05/09/2019] [Indexed: 11/29/2022] Open
Abstract
Enzyme function and evolution are influenced by the larger context of a metabolic pathway. Deleterious mutations or perturbations in one enzyme can often be compensated by mutations to others. We used comparative genomics and experiments to examine evolutionary interactions with the essential metabolic enzyme dihydrofolate reductase (DHFR). Analyses of synteny and co-occurrence across bacterial species indicate that DHFR is coupled to thymidylate synthase (TYMS) but relatively independent from the rest of folate metabolism. Using quantitative growth rate measurements and forward evolution in Escherichia coli, we demonstrate that the two enzymes adapt as a relatively independent unit in response to antibiotic stress. Metabolomic profiling revealed that TYMS activity must not exceed DHFR activity to prevent the depletion of reduced folates and the accumulation of the intermediate dihydrofolate. Comparative genomics analyses identified >200 gene pairs with similar statistical signatures of modular co-evolution, suggesting that cellular pathways may be decomposable into small adaptive units.
Collapse
Affiliation(s)
- Andrew F Schober
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Andrew D Mathis
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Christine Ingle
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Junyoung O Park
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Li Chen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Joshua D Rabinowitz
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Department of Chemistry, Princeton University, Princeton, NJ 08544, USA
| | - Ivan Junier
- Centre National de la Recherche Scientifique, Université Grenoble Alpes, TIMC-IMAG, F-38000 Grenoble, France
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, F-75005 Paris, France
| | - Kimberly A Reynolds
- The Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
47
|
Monteiro AN, Bouwman P, Kousholt AN, Eccles DM, Millot GA, Masson JY, Schmidt MK, Sharan SK, Scully R, Wiesmüller L, Couch F, Vreeswijk MPG. Variants of uncertain clinical significance in hereditary breast and ovarian cancer genes: best practices in functional analysis for clinical annotation. J Med Genet 2020; 57:509-518. [PMID: 32152249 DOI: 10.1136/jmedgenet-2019-106368] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 10/28/2019] [Accepted: 12/01/2019] [Indexed: 12/16/2022]
Affiliation(s)
- Alvaro N Monteiro
- Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida, USA
| | - Peter Bouwman
- Division of Molecular Pathology, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Arne N Kousholt
- Division of Molecular Pathology, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Diana M Eccles
- Cancer Sciences, University of Southampton Faculty of Medicine, Southampton, UK
| | - Gael A Millot
- Hub-DBC, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Jean-Yves Masson
- CHU de Québec-Université Laval, Oncology Division, Laval University Cancer Research Center, Quebec City, Quebec, Canada
| | - Marjanka K Schmidt
- Division of Molecular Pathology, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Shyam K Sharan
- National Cancer Institute at Frederick, Frederick, Maryland, USA
| | - Ralph Scully
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | | | | | | |
Collapse
|
48
|
Sharma V, Hiller M. Losses of human disease-associated genes in placental mammals. NAR Genom Bioinform 2019; 2:lqz012. [PMID: 33575564 PMCID: PMC7671337 DOI: 10.1093/nargab/lqz012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 08/24/2019] [Accepted: 10/08/2019] [Indexed: 02/07/2023] Open
Abstract
We systematically investigate whether losses of human disease-associated genes occurred in other mammals during evolution. We first show that genes lost in any of 62 non-human mammals generally have a lower degree of pleiotropy, and are highly depleted in essential and disease-associated genes. Despite this under-representation, we discovered multiple genes implicated in human disease that are truly lost in non-human mammals. In most cases, traits resembling human disease symptoms are present but not deleterious in gene-loss species, exemplified by losses of genes causing human eye or teeth disorders in poor-vision or enamel-less mammals. We also found widespread losses of PCSK9 and CETP genes, where loss-of-function mutations in humans protect from atherosclerosis. Unexpectedly, we discovered losses of disease genes (TYMP, TBX22, ABCG5, ABCG8, MEFV, CTSE) where deleterious phenotypes do not manifest in the respective species. A remarkable example is the uric acid-degrading enzyme UOX, which we found to be inactivated in elephants and manatees. While UOX loss in hominoids led to high serum uric acid levels and a predisposition for gout, elephants and manatees exhibit low uric acid levels, suggesting alternative ways of metabolizing uric acid. Together, our results highlight numerous mammals that are 'natural knockouts' of human disease genes.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany.,Center for Systems Biology Dresden, 01307 Dresden, Germany
| |
Collapse
|
49
|
Hoehe MR, Herwig R, Mao Q, Peters BA, Drmanac R, Church GM, Huebsch T. Significant abundance of cis configurations of coding variants in diploid human genomes. Nucleic Acids Res 2019; 47:2981-2995. [PMID: 30698752 PMCID: PMC6451136 DOI: 10.1093/nar/gkz031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 12/05/2018] [Accepted: 01/15/2019] [Indexed: 12/12/2022] Open
Abstract
To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.
Collapse
Affiliation(s)
- Margret R Hoehe
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Ralf Herwig
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Qing Mao
- Complete Genomics, Inc., San Jose, CA 95112, USA
| | - Brock A Peters
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - Radoje Drmanac
- Complete Genomics, Inc., San Jose, CA 95112, USA.,BGI-Shenzhen, Shenzhen 518083, China
| | - George M Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Thomas Huebsch
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
50
|
Zheng J, Payne JL, Wagner A. Cryptic genetic variation accelerates evolution by opening access to diverse adaptive peaks. Science 2019; 365:347-353. [DOI: 10.1126/science.aax1837] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/06/2019] [Indexed: 12/13/2022]
Abstract
Cryptic genetic variation can facilitate adaptation in evolving populations. To elucidate the underlying genetic mechanisms, we used directed evolution in Escherichia coli to accumulate variation in populations of yellow fluorescent proteins and then evolved these proteins toward the new phenotype of green fluorescence. Populations with cryptic variation evolved adaptive genotypes with greater diversity and higher fitness than populations without cryptic variation, which converged on similar genotypes. Populations with cryptic variation accumulated neutral or deleterious mutations that break the constraints on the order in which adaptive mutations arise. In doing so, cryptic variation opens paths to adaptive genotypes, creates historical contingency, and reduces the predictability of evolution by allowing different replicate populations to climb different adaptive peaks and explore otherwise-inaccessible regions of an adaptive landscape.
Collapse
Affiliation(s)
- Jia Zheng
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Joshua L. Payne
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|