51
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
52
|
Zhang Z, Xiong P, Zhang T, Wang J, Zhan J, Zhou Y. Accurate inference of the full base-pairing structure of RNA by deep mutational scanning and covariation-induced deviation of activity. Nucleic Acids Res 2020; 48:1451-1465. [PMID: 31872260 PMCID: PMC7026644 DOI: 10.1093/nar/gkz1192] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/10/2019] [Accepted: 12/11/2019] [Indexed: 11/12/2022] Open
Abstract
Despite the large number of noncoding RNAs in human genome and their roles in many diseases include cancer, we know very little about them due to lack of structural clues. The centerpiece of the structural clues is the full RNA base-pairing structure of secondary and tertiary contacts that can be precisely obtained only from costly and time-consuming 3D structure determination. Here, we performed deep mutational scanning of self-cleaving CPEB3 ribozyme by error-prone PCR and showed that a library of <5 × 104 single-to-triple mutants is sufficient to infer 25 of 26 base pairs including non-nested, nonhelical, and noncanonical base pairs with both sensitivity and precision at 96%. Such accurate inference was further confirmed by a twister ribozyme at 100% precision with only noncanonical base pairs as false negatives. The performance was resulted from analyzing covariation-induced deviation of activity by utilizing both functional and nonfunctional variants for unsupervised classification, followed by Monte Carlo (MC) simulated annealing with mutation-derived scores. Highly accurate inference can also be obtained by combining MC with evolution/direct coupling analysis, R-scape or epistasis analysis. The results highlight the usefulness of deep mutational scanning for high-accuracy structural inference of self-cleaving ribozymes with implications for other structured RNAs that permit high-throughput functional selections.
Collapse
Affiliation(s)
- Zhe Zhang
- High Magnetic Field Laboratory, Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, Anhui, P. R. China
- University of Chinese Academy of Sciences, Beijing 101408, P. R. China
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Peng Xiong
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Tongchuan Zhang
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Junfeng Wang
- High Magnetic Field Laboratory, Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, Anhui, P. R. China
- Institute of Physical Science and Information Technology, Anhui University, Hefei 230031, Anhui, P. R. China
| | - Jian Zhan
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD 4222, Australia
| |
Collapse
|
53
|
Esteban L, Lonishin LR, Bobrovskiy DM, Leleytner G, Bogatyreva NS, Kondrashov FA, Ivankov DN. HypercubeME: two hundred million combinatorially complete datasets from a single experiment. Bioinformatics 2019; 36:btz841. [PMID: 31742320 PMCID: PMC7703787 DOI: 10.1093/bioinformatics/btz841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Lyubov R Lonishin
- Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg 195251, Russia
| | - Daniil M Bobrovskiy
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow 119234, Russia
| | - Gregory Leleytner
- Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Moscow 141701, Russia
| | - Natalya S Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
- Bioinformatics and Genomics Programme, Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Moscow 142290, Russia
| | | | - Dmitry N Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| |
Collapse
|
54
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 154] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
55
|
Adaptive walks on high-dimensional fitness landscapes and seascapes with distance-dependent statistics. Theor Popul Biol 2019; 130:13-49. [PMID: 31605706 DOI: 10.1016/j.tpb.2019.09.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/07/2019] [Accepted: 09/12/2019] [Indexed: 11/21/2022]
Abstract
The dynamics of evolution is intimately shaped by epistasis - interactions between genetic elements which cause the fitness-effect of combinations of mutations to be non-additive. Analyzing evolutionary dynamics that involves large numbers of epistatic mutations is intrinsically difficult. A crucial feature is that the fitness landscape in the vicinity of the current genome depends on the evolutionary history. A key step is thus developing models that enable study of the effects of past evolution on future evolution. In this work, we introduce a broad class of high-dimensional random fitness landscapes for which the correlations between fitnesses of genomes are a general function of genetic distance. Their Gaussian character allows for tractable computational as well as analytic understanding. We study the properties of these landscapes focusing on the simplest evolutionary process: random adaptive (uphill) walks. Conventional measures of "ruggedness" are shown to not much affect such adaptive walks. Instead, the long-distance statistics of epistasis cause all properties to be highly conditional on past evolution, determining the statistics of the local landscape (the distribution of fitness-effects of available mutations and combinations of these), as well as the global geometry of evolutionary trajectories. In order to further explore the effects of conditioning on past evolution, we model the effects of slowly changing environments. At long times, such fitness "seascapes" cause a statistical steady state with highly intermittent evolutionary dynamics: populations undergo bursts of rapid adaptation, interspersed with periods in which adaptive mutations are rare and the population waits for more new directions to be opened up by changes in the environment. Finally, we discuss prospects for studying more complex evolutionary dynamics and on broader classes of high-dimensional landscapes and seascapes.
Collapse
|
56
|
Kemble H, Nghe P, Tenaillon O. Recent insights into the genotype-phenotype relationship from massively parallel genetic assays. Evol Appl 2019; 12:1721-1742. [PMID: 31548853 PMCID: PMC6752143 DOI: 10.1111/eva.12846] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/21/2019] [Accepted: 07/02/2019] [Indexed: 12/20/2022] Open
Abstract
With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.
Collapse
Affiliation(s)
- Harry Kemble
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Philippe Nghe
- École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), UMR CNRS‐ESPCI CBI 8231PSL Research UniversityParis Cedex 05France
| | - Olivier Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Unité Mixte de Recherche 1137Université Paris Diderot, Université Paris NordParisFrance
| |
Collapse
|
57
|
Li X, Lalić J, Baeza-Centurion P, Dhar R, Lehner B. Changes in gene expression predictably shift and switch genetic interactions. Nat Commun 2019; 10:3886. [PMID: 31467279 PMCID: PMC6715729 DOI: 10.1038/s41467-019-11735-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 11/18/2022] Open
Abstract
Non-additive interactions between mutations occur extensively and also change across conditions, making genetic prediction a difficult challenge. To better understand the plasticity of genetic interactions (epistasis), we combine mutations in a single protein performing a single function (a transcriptional repressor inhibiting a target gene). Even in this minimal system, genetic interactions switch from positive (suppressive) to negative (enhancing) as the expression of the gene changes. These seemingly complicated changes can be predicted using a mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype. More generally, changes in gene expression should be expected to alter the effects of mutations and how they interact whenever the relationship between expression and a phenotype is nonlinear, which is the case for most genes. These results have important implications for understanding genotype-phenotype maps and illustrate how changes in genetic interactions can often—but not always—be predicted by hierarchical mechanistic models. Non-additive genetic interactions are plastic and can complicate genetic prediction. Here, using deep mutagenesis of the lambda repressor, Li et al. reveal that changes in gene expression can alter the strength and direction of genetic interactions between mutations in many genes and develop mathematical models for predicting them.
Collapse
Affiliation(s)
- Xianghua Li
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Jasna Lalić
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Pablo Baeza-Centurion
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Riddhiman Dhar
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
58
|
Large-effect flowering time mutations reveal conditionally adaptive paths through fitness landscapes in Arabidopsis thaliana. Proc Natl Acad Sci U S A 2019; 116:17890-17899. [PMID: 31420516 PMCID: PMC6731683 DOI: 10.1073/pnas.1902731116] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Mutations are often assumed to be largely detrimental to fitness, but they may also be beneficial, and mutations with large phenotypic effects can persist in nature. One explanation for these observations is that mutations may be beneficial in specific environments because these conditions shift trait expression toward higher fitness. This hypothesis is rarely tested due to the difficulty of replicating mutants in multiple natural environments and measuring their phenotypes. We did so by planting Arabidopsis thaliana genotypes with large-effect flowering time mutations in field sites across the species’ European climate range. We quantified the adaptive value of mutant traits, finding that certain mutations increased fitness in some environments but not in others. Contrary to previous assumptions that most mutations are deleterious, there is increasing evidence for persistence of large-effect mutations in natural populations. A possible explanation for these observations is that mutant phenotypes and fitness may depend upon the specific environmental conditions to which a mutant is exposed. Here, we tested this hypothesis by growing large-effect flowering time mutants of Arabidopsis thaliana in multiple field sites and seasons to quantify their fitness effects in realistic natural conditions. By constructing environment-specific fitness landscapes based on flowering time and branching architecture, we observed that a subset of mutations increased fitness, but only in specific environments. These mutations increased fitness via different paths: through shifting flowering time, branching, or both. Branching was under stronger selection, but flowering time was more genetically variable, pointing to the importance of indirect selection on mutations through their pleiotropic effects on multiple phenotypes. Finally, mutations in hub genes with greater connectedness in their regulatory networks had greater effects on both phenotypes and fitness. Together, these findings indicate that large-effect mutations may persist in populations because they influence traits that are adaptive only under specific environmental conditions. Understanding their evolutionary dynamics therefore requires measuring their effects in multiple natural environments.
Collapse
|
59
|
Polaski JT, Kletzien OA, Drogalis LK, Batey RT. A functional genetic screen reveals sequence preferences within a key tertiary interaction in cobalamin riboswitches required for ligand selectivity. Nucleic Acids Res 2019; 46:9094-9105. [PMID: 29945209 PMCID: PMC6158498 DOI: 10.1093/nar/gky539] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Accepted: 05/30/2018] [Indexed: 01/14/2023] Open
Abstract
Riboswitches are structured mRNA sequences that regulate gene expression by directly binding intracellular metabolites. Generating the appropriate regulatory response requires the RNA rapidly and stably acquire higher-order structure to form the binding pocket, bind the appropriate effector molecule and undergo a structural transition to inform the expression machinery. These requirements place riboswitches under strong kinetic constraints, likely restricting the sequence space accessible by recurrent structural modules such as the kink turn and the T-loop. Class-II cobalamin riboswitches contain two T-loop modules: one directing global folding of the RNA and another buttressing the ligand binding pocket. While the T-loop module directing folding is highly conserved, the T-loop associated with binding is substantially less so, with no clear consensus sequence. To further understand the functional role of the binding-associated module, a functional genetic screen of a library of riboswitches with the T-loop and its interacting nucleotides was used to build an experimental phylogeny comprised of sequences that possess a wide range of cobalamin-dependent regulatory activity. Our results reveal conservation patterns of the T-loop and its interaction with the binding core that allow for rapid tertiary structure formation and demonstrate its importance for generating strong ligand-dependent repression of mRNA expression.
Collapse
Affiliation(s)
- Jacob T Polaski
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| | - Otto A Kletzien
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| | - Lea K Drogalis
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| | - Robert T Batey
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA
| |
Collapse
|
60
|
Zheng J, Payne JL, Wagner A. Cryptic genetic variation accelerates evolution by opening access to diverse adaptive peaks. Science 2019; 365:347-353. [DOI: 10.1126/science.aax1837] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/06/2019] [Indexed: 12/13/2022]
Abstract
Cryptic genetic variation can facilitate adaptation in evolving populations. To elucidate the underlying genetic mechanisms, we used directed evolution in Escherichia coli to accumulate variation in populations of yellow fluorescent proteins and then evolved these proteins toward the new phenotype of green fluorescence. Populations with cryptic variation evolved adaptive genotypes with greater diversity and higher fitness than populations without cryptic variation, which converged on similar genotypes. Populations with cryptic variation accumulated neutral or deleterious mutations that break the constraints on the order in which adaptive mutations arise. In doing so, cryptic variation opens paths to adaptive genotypes, creates historical contingency, and reduces the predictability of evolution by allowing different replicate populations to climb different adaptive peaks and explore otherwise-inaccessible regions of an adaptive landscape.
Collapse
Affiliation(s)
- Jia Zheng
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Joshua L. Payne
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
61
|
Aris-Brosou S, Parent L, Ibeh N. Viral Long-Term Evolutionary Strategies Favor Stability over Proliferation. Viruses 2019; 11:v11080677. [PMID: 31344814 PMCID: PMC6722887 DOI: 10.3390/v11080677] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 07/12/2019] [Accepted: 07/20/2019] [Indexed: 02/01/2023] Open
Abstract
Viruses are known to have some of the highest and most diverse mutation rates found in any biological replicator, with single-stranded (ss) RNA viruses evolving the fastest, and double-stranded (ds) DNA viruses having rates approaching those of bacteria. As mutation rates are tightly and negatively correlated with genome size, selection is a clear driver of viral evolution. However, the role of intragenomic interactions as drivers of viral evolution is still unclear. To understand how these two processes affect the long-term evolution of viruses infecting humans, we comprehensively analyzed ssRNA, ssDNA, dsRNA, and dsDNA viruses, to find which virus types and which functions show evidence for episodic diversifying selection and correlated evolution. We show that selection mostly affects single stranded viruses, that correlated evolution is more prevalent in DNA viruses, and that both processes, taken independently, mostly affect viral replication. However, the genes that are jointly affected by both processes are involved in key aspects of their life cycle, favoring viral stability over proliferation. We further show that both evolutionary processes are intimately linked at the amino acid level, which suggests that it is the joint action of selection and correlated evolution, and not just selection, that shapes the evolutionary trajectories of viruses—and possibly of their epidemiological potential.
Collapse
Affiliation(s)
- Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N 6N5, Canada.
| | - Louis Parent
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Neke Ibeh
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
62
|
Abstract
Evolvability is the ability of a biological system to produce phenotypic variation that is both heritable and adaptive. It has long been the subject of anecdotal observations and theoretical work. In recent years, however, the molecular causes of evolvability have been an increasing focus of experimental work. Here, we review recent experimental progress in areas as different as the evolution of drug resistance in cancer cells and the rewiring of transcriptional regulation circuits in vertebrates. This research reveals the importance of three major themes: multiple genetic and non-genetic mechanisms to generate phenotypic diversity, robustness in genetic systems, and adaptive landscape topography. We also discuss the mounting evidence that evolvability can evolve and the question of whether it evolves adaptively.
Collapse
|
63
|
Wu X, Bartel DP. kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences. Nucleic Acids Res 2019; 45:W534-W538. [PMID: 28460012 PMCID: PMC5570168 DOI: 10.1093/nar/gkx323] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Accepted: 04/13/2017] [Indexed: 12/26/2022] Open
Abstract
Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/.
Collapse
Affiliation(s)
- Xuebing Wu
- Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David P Bartel
- Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
64
|
Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS. Inferring protein 3D structure from deep mutation scans. Nat Genet 2019; 51:1170-1176. [PMID: 31209393 PMCID: PMC7295002 DOI: 10.1038/s41588-019-0432-9] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 11/09/2022]
Abstract
We describe an experimental method of three-dimensional (3D) structure determination that exploits the increasing ease of high-throughput mutational scans. Inspired by the success of using natural, evolutionary sequence covariation to compute protein and RNA folds, we explored whether 'laboratory', synthetic sequence variation might also yield 3D structures. We analyzed five large-scale mutational scans and discovered that the pairs of residues with the largest positive epistasis in the experiments are sufficient to determine the 3D fold. We show that the strongest epistatic pairings from genetic screens of three proteins, a ribozyme and a protein interaction reveal 3D contacts within and between macromolecules. Using these experimental epistatic pairs, we compute ab initio folds for a GB1 domain (within 1.8 Å of the crystal structure) and a WW domain (2.1 Å). We propose strategies that reduce the number of mutants needed for contact prediction, suggesting that genomics-based techniques can efficiently predict 3D structure.
Collapse
Affiliation(s)
- Nathan J Rollins
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Frank J Poelwijk
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Stiffler
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nicholas P Gauthier
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
65
|
Wei X, Zhang J. Patterns and Mechanisms of Diminishing Returns from Beneficial Mutations. Mol Biol Evol 2019; 36:1008-1021. [PMID: 30903691 DOI: 10.1093/molbev/msz035] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Diminishing returns epistasis causes the benefit of the same advantageous mutation smaller in fitter genotypes and is frequently observed in experimental evolution. However, its occurrence in other contexts, environment dependence, and mechanistic basis are unclear. Here, we address these questions using 1,005 sequenced segregants generated from a yeast cross. Under each of 47 examined environments, 66-92% of tested polymorphisms exhibit diminishing returns epistasis. Surprisingly, improving environment quality also reduces the benefits of advantageous mutations even when fitness is controlled for, indicating the necessity to revise the global epistasis hypothesis. We propose that diminishing returns originates from the modular organization of life where the contribution of each functional module to fitness is determined jointly by the genotype and environment and has an upper limit, and demonstrate that our model predictions match empirical observations. These findings broaden the concept of diminishing returns epistasis, reveal its generality and potential cause, and have important evolutionary implications.
Collapse
Affiliation(s)
- Xinzhu Wei
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| |
Collapse
|
66
|
Schmiedel JM, Lehner B. Determining protein structures using deep mutagenesis. Nat Genet 2019; 51:1177-1186. [PMID: 31209395 PMCID: PMC7610650 DOI: 10.1038/s41588-019-0431-x] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 12/12/2022]
Abstract
Determining the three-dimensional structures of macromolecules is a major goal of biological research, because of the close relationship between structure and function; however, thousands of protein domains still have unknown structures. Structure determination usually relies on physical techniques including X-ray crystallography, NMR spectroscopy and cryo-electron microscopy. Here we present a method that allows the high-resolution three-dimensional backbone structure of a biological macromolecule to be determined only from measurements of the activity of mutant variants of the molecule. This genetic approach to structure determination relies on the quantification of genetic interactions (epistasis) between mutations and the discrimination of direct from indirect interactions. This provides an alternative experimental strategy for structure determination, with the potential to reveal functional and in vivo structures.
Collapse
Affiliation(s)
- Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- ICREA, Barcelona, Spain.
| |
Collapse
|
67
|
Currin A, Kwok J, Sadler JC, Bell EL, Swainston N, Ababi M, Day P, Turner NJ, Kell DB. GeneORator: An Effective Strategy for Navigating Protein Sequence Space More Efficiently through Boolean OR-Type DNA Libraries. ACS Synth Biol 2019; 8:1371-1378. [PMID: 31132850 PMCID: PMC7007284 DOI: 10.1021/acssynbio.9b00063] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Directed evolution requires the creation of genetic diversity and subsequent screening or selection for improved variants. For DNA mutagenesis, conventional site-directed methods implicitly utilize the Boolean AND operator (creating all mutations simultaneously), producing a combinatorial explosion in the number of genetic variants as the number of mutations increases. We introduce GeneORator, a novel strategy for creating DNA libraries based on the Boolean logical OR operator. Here, a single library is divided into many subsets, each containing different combinations of the desired mutations. Consequently, the effect of adding more mutations on the number of genetic combinations is additive (Boolean OR logic) and not exponential (AND logic). We demonstrate this strategy with large-scale mutagenesis studies, using monoamine oxidase-N ( Aspergillus niger) as the exemplar target. First, we mutated every residue in the secondary structure-containing regions (276 out of a total 495 amino acids) to screen for improvements in kcat. Second, combinatorial OR-type libraries permitted screening of diverse mutation combinations in the enzyme active site to detect activity toward novel substrates. In both examples, OR-type libraries effectively reduced the number of variants searched up to 1010-fold, dramatically reducing the screening effort required to discover variants with improved and/or novel activity. Importantly, this approach enables the screening of a greater diversity of mutation combinations, accessing a larger area of a protein's sequence space. OR-type libraries can be applied to any biological engineering objective requiring DNA mutagenesis, and the approach has wide ranging applications in, for example, enzyme engineering, antibody engineering, and synthetic biology.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Jane Kwok
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Joanna C. Sadler
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Elizabeth L. Bell
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Neil Swainston
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Maria Ababi
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
- School of Computer Science, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Philip Day
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Nicholas J. Turner
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| | - Douglas B. Kell
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, United Kingdom
- School of Chemistry, The University of Manchester, Manchester M13 9PL, United Kingdom
| |
Collapse
|
68
|
Rockne RC, Hawkins-Daarud A, Swanson KR, Sluka JP, Glazier JA, Macklin P, Hormuth DA, Jarrett AM, Lima EABF, Tinsley Oden J, Biros G, Yankeelov TE, Curtius K, Al Bakir I, Wodarz D, Komarova N, Aparicio L, Bordyuh M, Rabadan R, Finley SD, Enderling H, Caudell J, Moros EG, Anderson ARA, Gatenby RA, Kaznatcheev A, Jeavons P, Krishnan N, Pelesko J, Wadhwa RR, Yoon N, Nichol D, Marusyk A, Hinczewski M, Scott JG. The 2019 mathematical oncology roadmap. Phys Biol 2019; 16:041005. [PMID: 30991381 PMCID: PMC6655440 DOI: 10.1088/1478-3975/ab1a09] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Whether the nom de guerre is Mathematical Oncology, Computational or Systems Biology, Theoretical Biology, Evolutionary Oncology, Bioinformatics, or simply Basic Science, there is no denying that mathematics continues to play an increasingly prominent role in cancer research. Mathematical Oncology-defined here simply as the use of mathematics in cancer research-complements and overlaps with a number of other fields that rely on mathematics as a core methodology. As a result, Mathematical Oncology has a broad scope, ranging from theoretical studies to clinical trials designed with mathematical models. This Roadmap differentiates Mathematical Oncology from related fields and demonstrates specific areas of focus within this unique field of research. The dominant theme of this Roadmap is the personalization of medicine through mathematics, modelling, and simulation. This is achieved through the use of patient-specific clinical data to: develop individualized screening strategies to detect cancer earlier; make predictions of response to therapy; design adaptive, patient-specific treatment plans to overcome therapy resistance; and establish domain-specific standards to share model predictions and to make models and simulations reproducible. The cover art for this Roadmap was chosen as an apt metaphor for the beautiful, strange, and evolving relationship between mathematics and cancer.
Collapse
Affiliation(s)
- Russell C Rockne
- Department of Computational and Quantitative Medicine, Division of Mathematical Oncology, City of Hope National Medical Center, Duarte, CA 91010, United States of America. Author to whom any correspondence should be addressed
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Bendixsen DP, Collet J, Østman B, Hayden EJ. Genotype network intersections promote evolutionary innovation. PLoS Biol 2019; 17:e3000300. [PMID: 31136568 PMCID: PMC6555535 DOI: 10.1371/journal.pbio.3000300] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 06/07/2019] [Accepted: 05/15/2019] [Indexed: 12/27/2022] Open
Abstract
Evolutionary innovations are qualitatively novel traits that emerge through evolution and increase biodiversity. The genetic mechanisms of innovation remain poorly understood. A systems view of innovation requires the analysis of genotype networks—the vast networks of genetic variants that produce the same phenotype. Innovations can occur at the intersection of two different genotype networks. However, the experimental characterization of genotype networks has been hindered by the vast number of genetic variants that need to be functionally analyzed. Here, we use high-throughput sequencing to study the fitness landscape at the intersection of the genotype networks of two catalytic RNA molecules (ribozymes). We determined the ability of numerous neighboring RNA sequences to catalyze two different chemical reactions, and we use these data as a proxy for a genotype to fitness map where two functions come in close proximity. We find extensive functional overlap, and numerous genotypes can catalyze both functions. We demonstrate through evolutionary simulations that these numerous points of intersection facilitate the discovery of a new function. However, the rate of adaptation of the new function depends upon the local ruggedness around the starting location in the genotype network. As a consequence, one direction of adaptation is more rapid than the other. We find that periods of neutral evolution increase rates of adaptation to the new function by allowing populations to spread out in their genotype network. Our study reveals the properties of a fitness landscape where genotype networks intersect and the consequences for evolutionary innovations. Our results suggest that historic innovations in natural systems may have been facilitated by overlapping genotype networks. The determination of the empirical fitness landscape at the genotypic intersection between two different catalytic RNA (ribozyme) functions reveals details about how novel traits can emerge through evolutionary innovation.
Collapse
Affiliation(s)
- Devin P. Bendixsen
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, Idaho, United States of America
- * E-mail: (DPB); (EJH)
| | - James Collet
- Department of Biological Science, Boise State University, Boise, Idaho, United States of America
| | - Bjørn Østman
- Keck Graduate Institute, Claremont, California, United States of America
| | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, Idaho, United States of America
- Department of Biological Science, Boise State University, Boise, Idaho, United States of America
- * E-mail: (DPB); (EJH)
| |
Collapse
|
70
|
Kinney JB, McCandlish DM. Massively Parallel Assays and Quantitative Sequence-Function Relationships. Annu Rev Genomics Hum Genet 2019; 20:99-127. [PMID: 31091417 DOI: 10.1146/annurev-genom-083118-014845] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
71
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 150] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
72
|
Abstract
For nearly a century adaptive landscapes have provided overviews of the evolutionary process and yet they remain metaphors. We redefine adaptive landscapes in terms of biological processes rather than descriptive phenomenology. We focus on the underlying mechanisms that generate emergent properties such as epistasis, dominance, trade-offs and adaptive peaks. We illustrate the utility of landscapes in predicting the course of adaptation and the distribution of fitness effects. We abandon aged arguments concerning landscape ruggedness in favor of empirically determining landscape architecture. In so doing, we transform the landscape metaphor into a scientific framework within which causal hypotheses can be tested.
Collapse
Affiliation(s)
- Xiao Yi
- BioTechnology Institute, University of Minnesota, St. Paul, MN
| | - Antony M Dean
- BioTechnology Institute, University of Minnesota, St. Paul, MN
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN
| |
Collapse
|
73
|
Qiu C, Kaplan CD. Functional assays for transcription mechanisms in high-throughput. Methods 2019; 159-160:115-123. [PMID: 30797033 PMCID: PMC6589137 DOI: 10.1016/j.ymeth.2019.02.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 02/18/2019] [Indexed: 01/12/2023] Open
Abstract
Dramatic increases in the scale of programmed synthesis of nucleic acid libraries coupled with deep sequencing have powered advances in understanding nucleic acid and protein biology. Biological systems centering on nucleic acids or encoded proteins greatly benefit from such high-throughput studies, given that large DNA variant pools can be synthesized and DNA, or RNA products of transcription, can be easily analyzed by deep sequencing. Here we review the scope of various high-throughput functional assays for studies of nucleic acids and proteins in general, followed by discussion of how these types of study have yielded insights into the RNA Polymerase II (Pol II) active site as an example. We discuss methodological considerations in the design and execution of these experiments that should be valuable to studies in any system.
Collapse
Affiliation(s)
- Chenxi Qiu
- Department of Medicine, Division of Translational Therapeutics, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Cancer Research Institute, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Craig D Kaplan
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| |
Collapse
|
74
|
Pokusaeva VO, Usmanova DR, Putintseva EV, Espinar L, Sarkisyan KS, Mishin AS, Bogatyreva NS, Ivankov DN, Akopyan AV, Avvakumov SY, Povolotskaya IS, Filion GJ, Carey LB, Kondrashov FA. An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLoS Genet 2019; 15:e1008079. [PMID: 30969963 PMCID: PMC6476524 DOI: 10.1371/journal.pgen.1008079] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 04/22/2019] [Accepted: 03/11/2019] [Indexed: 11/18/2022] Open
Abstract
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible. An intuitive understanding of protein evolution dictates that, with the exception of adaptive substitutions, amino acid states should be freely exchangeable between the same gene from different species. However, the extent to which this assertion holds true has not been tested in a controlled experiment. Here, we show that whether an amino acid state can be exchanged between orthologues depends on other amino acid states in the same protein. Furthermore, we show that the mode of interaction of amino acid states is multidimensional. Assuming that amino acid replacements influence the protein in several independent ways substantially improves our ability to predict the effect of an amino acid state in a protein sequence that has not been observed in nature.
Collapse
Affiliation(s)
| | - Dinara R. Usmanova
- Department of Systems Biology, Columbia University, New York, NY, United States of America
| | | | - Lorena Espinar
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Karen S. Sarkisyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- Medical Research Council London Institute of Medical Sciences, Imperial College London, London, United Kingdom
| | | | - Natalya S. Bogatyreva
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Dmitry N. Ivankov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
| | - Arseniy V. Akopyan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Sergey Ya. Avvakumov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
| | - Inna S. Povolotskaya
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
| | - Guillaume J. Filion
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Lucas B. Carey
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Center for Quantitative Biology and Peking-Tsinghua Joint Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail: (LBC); (FAK)
| | - Fyodor A. Kondrashov
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, Austria
- * E-mail: (LBC); (FAK)
| |
Collapse
|
75
|
Knapp DJHF, Michaels YS, Jamilly M, Ferry QRV, Barbosa H, Milne TA, Fulga TA. Decoupling tRNA promoter and processing activities enables specific Pol-II Cas9 guide RNA expression. Nat Commun 2019; 10:1490. [PMID: 30940799 PMCID: PMC6445147 DOI: 10.1038/s41467-019-09148-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 02/22/2019] [Indexed: 11/08/2022] Open
Abstract
Spatial/temporal control of Cas9 guide RNA expression could considerably expand the utility of CRISPR-based technologies. Current approaches based on tRNA processing offer a promising strategy but suffer from high background. Here, to address this limitation, we present a screening platform which allows simultaneous measurements of the promoter strength, 5', and 3' processing efficiencies across a library of tRNA variants. This analysis reveals that the sequence determinants underlying these activities, while overlapping, are dissociable. Rational design based on the ensuing principles allowed us to engineer an improved tRNA scaffold that enables highly specific guide RNA production from a Pol-II promoter. When benchmarked against other reported systems this tRNA scaffold is superior to most alternatives, and is equivalent in function to an optimized version of the Csy4-based guide RNA release system. The results and methods described in this manuscript enable avenues of research both in genome engineering and basic tRNA biology.
Collapse
MESH Headings
- CRISPR-Associated Protein 9/metabolism
- Gene Editing
- Gene Expression Regulation
- Humans
- Nucleic Acid Conformation
- Promoter Regions, Genetic
- RNA Polymerase II/genetics
- RNA Polymerase II/metabolism
- RNA, Guide, CRISPR-Cas Systems/chemistry
- RNA, Guide, CRISPR-Cas Systems/genetics
- RNA, Guide, CRISPR-Cas Systems/metabolism
- RNA, Transfer/chemistry
- RNA, Transfer/genetics
- RNA, Transfer/metabolism
Collapse
Affiliation(s)
- David J H F Knapp
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.
| | - Yale S Michaels
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Max Jamilly
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Quentin R V Ferry
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Hector Barbosa
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK
| | - Thomas A Milne
- Weatherall Institute of Molecular Medicine, MRC Molecular Haematology Unit, NIHR Oxford Biomedical Research Centre Programme, University of Oxford, Oxford, OX3 9DS, UK
| | - Tudor A Fulga
- Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.
| |
Collapse
|
76
|
Computational Complexity as an Ultimate Constraint on Evolution. Genetics 2019; 212:245-265. [PMID: 30833289 DOI: 10.1534/genetics.119.302000] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 02/22/2019] [Indexed: 01/28/2023] Open
Abstract
Experiments show that evolutionary fitness landscapes can have a rich combinatorial structure due to epistasis. For some landscapes, this structure can produce a computational constraint that prevents evolution from finding local fitness optima-thus overturning the traditional assumption that local fitness peaks can always be reached quickly if no other evolutionary forces challenge natural selection. Here, I introduce a distinction between easy landscapes of traditional theory where local fitness peaks can be found in a moderate number of steps, and hard landscapes where finding local optima requires an infeasible amount of time. Hard examples exist even among landscapes with no reciprocal sign epistasis; on these semismooth fitness landscapes, strong selection weak mutation dynamics cannot find the unique peak in polynomial time. More generally, on hard rugged fitness landscapes that include reciprocal sign epistasis, no evolutionary dynamics-even ones that do not follow adaptive paths-can find a local fitness optimum quickly. Moreover, on hard landscapes, the fitness advantage of nearby mutants cannot drop off exponentially fast but has to follow a power-law that long-term evolution experiments have associated with unbounded growth in fitness. Thus, the constraint of computational complexity enables open-ended evolution on finite landscapes. Knowing this constraint allows us to use the tools of theoretical computer science and combinatorial optimization to characterize the fitness landscapes that we expect to see in nature. I present candidates for hard landscapes at scales from single genes, to microbes, to complex organisms with costly learning (Baldwin effect) or maintained cooperation (Hankshaw effect). Just how ubiquitous hard landscapes (and the corresponding ultimate constraint on evolution) are in nature becomes an open empirical question.
Collapse
|
77
|
Baeza-Centurion P, Miñana B, Schmiedel JM, Valcárcel J, Lehner B. Combinatorial Genetics Reveals a Scaling Law for the Effects of Mutations on Splicing. Cell 2019; 176:549-563.e23. [PMID: 30661752 DOI: 10.1016/j.cell.2018.12.010] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/29/2018] [Accepted: 12/07/2018] [Indexed: 02/08/2023]
Abstract
Despite a wealth of molecular knowledge, quantitative laws for accurate prediction of biological phenomena remain rare. Alternative pre-mRNA splicing is an important regulated step in gene expression frequently perturbed in human disease. To understand the combined effects of mutations during evolution, we quantified the effects of all possible combinations of exonic mutations accumulated during the emergence of an alternatively spliced human exon. This revealed that mutation effects scale non-monotonically with the inclusion level of an exon, with each mutation having maximum effect at a predictable intermediate inclusion level. This scaling is observed genome-wide for cis and trans perturbations of splicing, including for natural and disease-associated variants. Mathematical modeling suggests that competition between alternative splice sites is sufficient to cause this non-linearity in the genotype-phenotype map. Combining the global scaling law with specific pairwise interactions between neighboring mutations allows accurate prediction of the effects of complex genotype changes involving >10 mutations.
Collapse
Affiliation(s)
- Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Belén Miñana
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Jörn M Schmiedel
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Juan Valcárcel
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| |
Collapse
|
78
|
Blanco C, Janzen E, Pressman A, Saha R, Chen IA. Molecular Fitness Landscapes from High-Coverage Sequence Profiling. Annu Rev Biophys 2019; 48:1-18. [PMID: 30601678 DOI: 10.1146/annurev-biophys-052118-115333] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The function of fitness (or molecular activity) in the space of all possible sequences is known as the fitness landscape. Evolution is a random walk on the fitness landscape, with a bias toward climbing hills. Mapping the topography of real fitness landscapes is fundamental to understanding evolution, but previous efforts were hampered by the difficulty of obtaining large, quantitative data sets. The accessibility of high-throughput sequencing (HTS) has transformed this study, enabling large-scale enumeration of fitness for many mutants and even complete sequence spaces in some cases. We review the progress of high-throughput studies in mapping molecular fitness landscapes, both in vitro and in vivo, as well as opportunities for future research. Such studies are rapidly growing in number. HTS is expected to have a profound effect on the understanding of real molecular fitness landscapes.
Collapse
Affiliation(s)
- Celia Blanco
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , ,
| | - Evan Janzen
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , , .,Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106, USA
| | - Abe Pressman
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , , .,Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | - Ranajay Saha
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, California 93106, USA; , , , ,
| | - Irene A Chen
- Biomolecular Science and Engineering Program, University of California, Santa Barbara, California 93106, USA
| |
Collapse
|
79
|
Singhal S, Gomez SM, Burch CL. Recombination drives the evolution of mutational robustness. ACTA ACUST UNITED AC 2019; 13:142-149. [PMID: 31572829 DOI: 10.1016/j.coisb.2018.12.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Recombination can impose fitness costs as beneficial parental combinations of alleles are broken apart, a phenomenon known as recombination load. Computational models suggest that populations may evolve a reduced recombination load by reducing either the likelihood of recombination events (bring interacting loci in physical proximity) or the strength of interactions between loci (make loci more independent of one another). We review evidence for each of these possibilities and their consequences for the genotype-fitness relationship. In particular, we expect that reducing interaction strengths between loci will lead to genomes that are also robust to mutational perturbations, but reducing recombination rates alone will not. We note that both mechanisms most likely played a role in the evolution of extant populations, and that both can result in the frequently-observed pattern of physical linkage between interacting loci.
Collapse
Affiliation(s)
- Sonia Singhal
- Biology Department, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Shawn M Gomez
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514.,Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514.,Joint Department of Biomedical Engineering at University of North Carolina at Chapel Hill and North Carolina State University, Chapel Hill, NC, USA
| | - Christina L Burch
- Biology Department, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
80
|
Slomka S, Pilpel Y. Meiotic Recombination: Genetics' Good Old Scalpel. Cell 2018; 172:391-392. [PMID: 29373827 DOI: 10.1016/j.cell.2018.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In the era of genome engineering, a new study returns to classical genetics to decipher genotype-phenotype relationships in unprecedented throughput and with unprecedented accuracy. Capitalizing on natural variation in yeast strains and frequent meiotic recombination, She and Jarosz (2018) dissect and map to nucleotide resolution, simple and complex determinants of diverse phenotypic traits.
Collapse
Affiliation(s)
- Shai Slomka
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
| |
Collapse
|
81
|
Hartman EC, Lobba MJ, Favor AH, Robinson SA, Francis MB, Tullman-Ercek D. Experimental Evaluation of Coevolution in a Self-Assembling Particle. Biochemistry 2018; 58:1527-1538. [DOI: 10.1021/acs.biochem.8b00948] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Emily C. Hartman
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Marco J. Lobba
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Andrew H. Favor
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Stephanie A. Robinson
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
| | - Matthew B. Francis
- Department of Chemistry, University of California, Berkeley, California 94720-1460, United States
- Materials Sciences Division, Lawrence Berkeley National Laboratories, Berkeley, California 94720-1460, United States
| | - Danielle Tullman-Ercek
- Department of Chemical and Biological Engineering, Northwestern University, 2145 Sheridan Road, Technological Institute E136, Evanston, Illinois 60208-3120, United States
| |
Collapse
|
82
|
Hilton SK, Bloom JD. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol 2018; 4:vey033. [PMID: 30425841 PMCID: PMC6220371 DOI: 10.1093/ve/vey033] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein’s amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Collapse
Affiliation(s)
- Sarah K Hilton
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA
| | - Jesse D Bloom
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
83
|
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods 2018; 15:816-822. [PMID: 30250057 DOI: 10.1038/s41592-018-0138-4] [Citation(s) in RCA: 320] [Impact Index Per Article: 45.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/29/2018] [Indexed: 01/05/2023]
Abstract
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.
Collapse
Affiliation(s)
- Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.,Program in Systems Biology, Harvard University, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
84
|
Duan C, Huan Q, Chen X, Wu S, Carey LB, He X, Qian W. Reduced intrinsic DNA curvature leads to increased mutation rate. Genome Biol 2018; 19:132. [PMID: 30217230 PMCID: PMC6138893 DOI: 10.1186/s13059-018-1525-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 09/05/2018] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Mutation rates vary across the genome. Many trans factors that influence mutation rates have been identified, as have specific sequence motifs at the 1-7-bp scale, but cis elements remain poorly characterized. The lack of understanding regarding why different sequences have different mutation rates hampers our ability to identify positive selection in evolution and to identify driver mutations in tumorigenesis. RESULTS Here, we use a combination of synthetic genes and sequences of thousands of isolated yeast colonies to show that intrinsic DNA curvature is a major cis determinant of mutation rate. Mutation rate negatively correlates with DNA curvature within genes, and a 10% decrease in curvature results in a 70% increase in mutation rate. Consistently, both yeast and humans accumulate mutations in regions with small curvature. We further show that this effect is due to differences in the intrinsic mutation rate, likely due to differences in mutagen sensitivity and not due to differences in the local activity of DNA repair. CONCLUSIONS Our study establishes a framework for understanding the cis properties of DNA sequence in modulating the local mutation rate and identifies a novel causal source of non-uniform mutation rates across the genome.
Collapse
Affiliation(s)
- Chaorui Duan
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qing Huan
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xiaoshu Chen
- Human Genome Research Institute and Department of Medical Genetics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China
| | - Shaohuan Wu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lucas B Carey
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Xionglei He
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Wenfeng Qian
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. .,Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
85
|
Zimmerman SM, Kon Y, Hauke AC, Ruiz BY, Fields S, Phizicky EM. Conditional accumulation of toxic tRNAs to cause amino acid misincorporation. Nucleic Acids Res 2018; 46:7831-7843. [PMID: 30007351 PMCID: PMC6125640 DOI: 10.1093/nar/gky623] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 06/22/2018] [Accepted: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
To develop a system for conditional amino acid misincorporation, we engineered tRNAs in the yeast Saccharomyces cerevisiae to be substrates of the rapid tRNA decay (RTD) pathway, such that they accumulate when RTD is turned off. We used this system to test the effects on growth of a library of tRNASer variants with all possible anticodons, and show that many are lethal when RTD is inhibited and the tRNA accumulates. Using mass spectrometry, we measured serine misincorporation in yeast containing each of six tRNA variants, and for five of them identified hundreds of peptides with serine substitutions at the targeted amino acid sites. Unexpectedly, we found that there is not a simple correlation between toxicity and the level of serine misincorporation; in particular, high levels of serine misincorporation can occur at cysteine residues without obvious growth defects. We also showed that toxic tRNAs can be used as a tool to identify sequence variants that reduce tRNA function. Finally, we generalized this method to another tRNA species, and generated conditionally toxic tRNATyr variants in a similar manner. This method should facilitate the study of tRNA biology and provide a tool to probe the effects of amino acid misincorporation on cellular physiology.
Collapse
Affiliation(s)
| | - Yoshiko Kon
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, NY 14642, USA
| | - Alayna C Hauke
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, NY 14642, USA
| | - Bianca Y Ruiz
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Eric M Phizicky
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, NY 14642, USA
| |
Collapse
|
86
|
Abstract
The pool of transfer RNA (tRNA) molecules in cells allows the ribosome to decode genetic information. This repertoire of molecular decoders is positioned in the crossroad of the genome, the transcriptome, and the proteome. Omics and systems biology now allow scientists to explore the entire repertoire of tRNAs of many organisms, revealing basic exciting biology. The tRNA gene set of hundreds of species is now characterized, in addition to the tRNA genes of organelles and viruses. Genes encoding tRNAs for certain anticodon types appear in dozens of copies in a genome, while others are universally absent from any genome. Transcriptome measurement of tRNAs is challenging, but in recent years new technologies have allowed researchers to determine the dynamic expression patterns of tRNAs. These advances reveal that availability of ready-to-translate tRNA molecules is highly controlled by several transcriptional and posttranscriptional regulatory processes. This regulation shapes the proteome according to the cellular state. The tRNA pool profoundly impacts many aspects of cellular and organismal life, including protein expression level, translation accuracy, adequacy of folding, and even mRNA stability. As a result, the shape of the tRNA pool affects organismal health and may participate in causing conditions such as cancer and neurological conditions.
Collapse
Affiliation(s)
- Roni Rak
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100 Israel;
| | - Orna Dahan
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100 Israel;
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100 Israel;
| |
Collapse
|
87
|
Lyons DM, Lauring AS. Mutation and Epistasis in Influenza Virus Evolution. Viruses 2018; 10:E407. [PMID: 30081492 PMCID: PMC6115771 DOI: 10.3390/v10080407] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 07/30/2018] [Accepted: 07/30/2018] [Indexed: 12/25/2022] Open
Abstract
Influenza remains a persistent public health challenge, because the rapid evolution of influenza viruses has led to marginal vaccine efficacy, antiviral resistance, and the annual emergence of novel strains. This evolvability is driven, in part, by the virus's capacity to generate diversity through mutation and reassortment. Because many new traits require multiple mutations and mutations are frequently combined by reassortment, epistatic interactions between mutations play an important role in influenza virus evolution. While mutation and epistasis are fundamental to the adaptability of influenza viruses, they also constrain the evolutionary process in important ways. Here, we review recent work on mutational effects and epistasis in influenza viruses.
Collapse
Affiliation(s)
- Daniel M Lyons
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| | - Adam S Lauring
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA.
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
88
|
Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet 2018; 137:665-678. [PMID: 30073413 PMCID: PMC6153521 DOI: 10.1007/s00439-018-1916-x] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 07/21/2018] [Indexed: 12/12/2022]
Abstract
Given the constantly improving cost and speed of genome sequencing, it is reasonable to expect that personal genomes will soon be known for many millions of humans. This stands in stark contrast with our limited ability to interpret the sequence variants which we find. Although it is, perhaps, easiest to interpret variants in coding regions, knowledge of functional impact is unknown for the vast majority of missense variants. While many computational approaches can predict the impact of coding variants, they are given a little weight in the current guidelines for interpreting clinical variants. Laboratory assays produce comparatively more trustworthy results, but until recently did not scale to the space of all possible mutations. The development of deep mutational scanning and other multiplexed assays of variant effect has now brought feasibility of this endeavour within view. Here, we review progress in this field over the last decade, break down the different approaches into their components, and compare methodological differences.
Collapse
|
89
|
Aguilar‐Rodríguez J, Peel L, Stella M, Wagner A, Payne JL. The architecture of an empirical genotype-phenotype map. Evolution 2018; 72:1242-1260. [PMID: 29676774 PMCID: PMC6055911 DOI: 10.1111/evo.13487] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 04/03/2018] [Indexed: 12/15/2022]
Abstract
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein-binding microarrays to study an empirical GP map of transcription factor (TF) -binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are "small-world" and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF-binding sites in vivo. We discuss our findings in the context of regulatory evolution.
Collapse
Affiliation(s)
- José Aguilar‐Rodríguez
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Current Address: Department of Biology, Stanford University, StanfordCA, USA; Department of Chemical and Systems Biology, Stanford UniversityStanfordCAUSA
| | - Leto Peel
- Institute of Information and Communication Technologies, Electronics and Applied MathematicsUniversité Catholique de LouvainLouvain‐la‐NeuveBelgium
- Namur Center for Complex SystemsUniversity of NamurNamurBelgium
| | - Massimo Stella
- Institute for Complex Systems Simulation, Department of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUnited Kingdom
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
- The Santa Fe InstituteSanta FeNew MexicoUSA
| | - Joshua L. Payne
- Swiss Institute of BioinformaticsLausanneSwitzerland
- Institute for Integrative Biology, ETHZurichSwitzerland
| |
Collapse
|
90
|
Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature 2018; 558:117-121. [PMID: 29849145 PMCID: PMC6193533 DOI: 10.1038/s41586-018-0170-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 04/09/2018] [Indexed: 01/09/2023]
Abstract
A central question in genetics and evolution is the extent to which the outcomes of mutations change depending on the genetic context in which they occur1-3. Pairwise interactions between mutations have been systematically mapped within4-18 and between 19 genes, and have been shown to contribute substantially to phenotypic variation among individuals 20 . However, the extent to which genetic interactions themselves are stable or dynamic across genotypes is unclear21, 22. Here we quantify more than 45,000 genetic interactions between the same 87 pairs of mutations across more than 500 closely related genotypes of a yeast tRNA. Notably, all pairs of mutations interacted in at least 9% of genetic backgrounds and all pairs switched from interacting positively to interacting negatively in different genotypes (false discovery rate < 0.1). Higher-order interactions are also abundant and dynamic across genotypes. The epistasis in this tRNA means that all individual mutations switch from detrimental to beneficial, even in closely related genotypes. As a consequence, accurate genetic prediction requires mutation effects to be measured across different genetic backgrounds and the use of higher-order epistatic terms.
Collapse
|
91
|
Li C, Zhang J. Multi-environment fitness landscapes of a tRNA gene. Nat Ecol Evol 2018; 2:1025-1032. [PMID: 29686238 PMCID: PMC5966336 DOI: 10.1038/s41559-018-0549-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 03/27/2018] [Indexed: 11/09/2022]
Abstract
A fitness landscape (FL) describes the genotype-fitness relationship in a given environment. To explain and predict evolution, it is imperative to measure the FL in multiple environments because the natural environment changes frequently. Using a high-throughput method that combines precise gene replacement with next-generation sequencing, we determine the in vivo FL of a yeast tRNA gene comprising over 23,000 genotypes in four environments. Although genotype-by-environment interaction (G×E) is abundantly detected, its pattern is so simple that we can transform an existing FL to that in a new environment with fitness measures of only a few genotypes in the new environment. Under each environment, we observe prevalent, negatively biased epistasis between mutations (G×G). Epistasis-by-environment interaction (G×G×E) is also prevalent, but trends in epistasis difference between environments are predictable. Our study thus reveals simple rules underlying seemingly complex FLs, opening the door to understanding and predicting FLs in general.
Collapse
Affiliation(s)
- Chuan Li
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.,Department of Biology, Stanford University, Stanford, CA, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
92
|
Abstract
Cells regulate the activity of genes in a variety of ways. For example, they regulate transcription through DNA binding proteins called transcription factors, and they regulate mRNA stability and processing through RNA binding proteins. Based on current knowledge, transcriptional regulation is more widespread and is involved in many more evolutionary adaptations than posttranscriptional regulation. The reason could be that transcriptional regulation is studied more intensely. We suggest instead that transcriptional regulation harbors an intrinsic evolutionary advantage: when mutations change transcriptional regulation, they are more likely to bring forth novel patterns of such regulation. That is, transcriptional regulation is more evolvable. Our analysis suggests a reason why a specific kind of gene regulation is especially abundant in the living world. Much of gene regulation is carried out by proteins that bind DNA or RNA molecules at specific sequences. One class of such proteins is transcription factors, which bind short DNA sequences to regulate transcription. Another class is RNA binding proteins, which bind short RNA sequences to regulate RNA maturation, transport, and stability. Here, we study the robustness and evolvability of these regulatory mechanisms. To this end, we use experimental binding data from 172 human and fruit fly transcription factors and RNA binding proteins as well as human polymorphism data to study the evolution of binding sites in vivo. We find little difference between the robustness of regulatory protein–RNA interactions and transcription factor–DNA interactions to DNA mutations. In contrast, we find that RNA-mediated regulation is less evolvable than transcriptional regulation, because mutations are less likely to create interactions of an RNA molecule with a new RNA binding protein than they are to create interactions of a gene regulatory region with a new transcription factor. Our observations are consistent with the high level of conservation observed for interactions between RNA binding proteins and their target molecules as well as the evolutionary plasticity of regulatory regions bound by transcription factors. They may help explain why transcriptional regulation is implicated in many more evolutionary adaptations and innovations than RNA-mediated gene regulation.
Collapse
|
93
|
Payea MJ, Sloma MF, Kon Y, Young DL, Guy MP, Zhang X, De Zoysa T, Fields S, Mathews DH, Phizicky EM. Widespread temperature sensitivity and tRNA decay due to mutations in a yeast tRNA. RNA (NEW YORK, N.Y.) 2018; 24:410-422. [PMID: 29259051 PMCID: PMC5824359 DOI: 10.1261/rna.064642.117] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 05/14/2023]
Abstract
Microorganisms have universally adapted their RNAs and proteins to survive at a broad range of temperatures and growth conditions. However, for RNAs, there is little quantitative understanding of the effects of mutations on function at high temperatures. To understand how variant tRNA function is affected by temperature change, we used the tRNA nonsense suppressor SUP4oc of the yeast Saccharomyces cerevisiae to perform a high-throughput quantitative screen of tRNA function at two different growth temperatures. This screen yielded comparative values for 9243 single and double variants. Surprisingly, despite the ability of S. cerevisiae to grow at temperatures as low as 15°C and as high as 39°C, the vast majority of variants that could be scored lost half or more of their function when evaluated at 37°C relative to 28°C. Moreover, temperature sensitivity of a tRNA variant was highly associated with its susceptibility to the rapid tRNA decay (RTD) pathway, implying that RTD is responsible for most of the loss of function of variants at higher temperature. Furthermore, RTD may also operate in a met22Δ strain, which was previously thought to fully inhibit RTD. Consistent with RTD acting to degrade destabilized tRNAs, the stability of a tRNA molecule can be used to predict temperature sensitivity with high confidence. These findings offer a new perspective on the stability of tRNA molecules and their quality control at high temperature.
Collapse
Affiliation(s)
- Matthew J Payea
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Michael F Sloma
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Yoshiko Kon
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - David L Young
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Michael P Guy
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Xiaoju Zhang
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Thareendra De Zoysa
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Department of Medicine, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| | - Eric M Phizicky
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine, Rochester, New York 14642, USA
| |
Collapse
|
94
|
Lundin E, Tang PC, Guy L, Näsvall J, Andersson DI. Experimental Determination and Prediction of the Fitness Effects of Random Point Mutations in the Biosynthetic Enzyme HisA. Mol Biol Evol 2018; 35:704-718. [PMID: 29294020 PMCID: PMC5850734 DOI: 10.1093/molbev/msx325] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The distribution of fitness effects of mutations is a factor of fundamental importance in evolutionary biology. We determined the distribution of fitness effects of 510 mutants that each carried between 1 and 10 mutations (synonymous and nonsynonymous) in the hisA gene, encoding an essential enzyme in the l-histidine biosynthesis pathway of Salmonella enterica. For the full set of mutants, the distribution was bimodal with many apparently neutral mutations and many lethal mutations. For a subset of 81 single, nonsynonymous mutants most mutations appeared neutral at high expression levels, whereas at low expression levels only a few mutations were neutral. Furthermore, we examined how the magnitude of the observed fitness effects was correlated to several measures of biophysical properties and phylogenetic conservation.We conclude that for HisA: (i) The effect of mutations can be masked by high expression levels, such that mutations that are deleterious to the function of the protein can still be neutral with regard to organism fitness if the protein is expressed at a sufficiently high level; (ii) the shape of the fitness distribution is dependent on the extent to which the protein is rate-limiting for growth; (iii) negative epistatic interactions, on an average, amplified the combined effect of nonsynonymous mutations; and (iv) no single sequence-based predictor could confidently predict the fitness effects of mutations in HisA, but a combination of multiple predictors could predict the effect with a SD of 0.04 resulting in 80% of the mutations predicted within 12% of their observed selection coefficients.
Collapse
Affiliation(s)
- Erik Lundin
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Po-Cheng Tang
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Joakim Näsvall
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
95
|
Bunzel HA, Garrabou X, Pott M, Hilvert D. Speeding up enzyme discovery and engineering with ultrahigh-throughput methods. Curr Opin Struct Biol 2018; 48:149-156. [PMID: 29413955 DOI: 10.1016/j.sbi.2017.12.010] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 12/26/2017] [Indexed: 01/24/2023]
Abstract
Exploring the sequence space of enzyme catalysts is ultimately a numbers game. Ultrahigh-throughput screening methods for rapid analysis of millions of variants are therefore increasingly important for investigating sequence-function relationships, searching large metagenomic libraries for interesting activities, and accelerating enzyme evolution in the laboratory. Recent applications of such technologies are reviewed here, with a particular focus on the practical benefits of droplet-based microfluidics for the directed evolution of natural and artificial enzymes. Broader implementation of such rapid, cost-effective screening technologies is likely to redefine the way enzymes are studied and engineered for academic and industrial purposes.
Collapse
Affiliation(s)
- Hans Adrian Bunzel
- Laboratory of Organic Chemistry, ETH Zurich, Zurich CH-8093, Switzerland
| | - Xavier Garrabou
- Laboratory of Organic Chemistry, ETH Zurich, Zurich CH-8093, Switzerland
| | - Moritz Pott
- Laboratory of Organic Chemistry, ETH Zurich, Zurich CH-8093, Switzerland
| | - Donald Hilvert
- Laboratory of Organic Chemistry, ETH Zurich, Zurich CH-8093, Switzerland.
| |
Collapse
|
96
|
Evolutionary mechanisms studied through protein fitness landscapes. Curr Opin Struct Biol 2018; 48:141-148. [DOI: 10.1016/j.sbi.2018.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 12/26/2017] [Accepted: 01/01/2018] [Indexed: 12/15/2022]
|
97
|
Khromov P, Malliaris CD, Morozov AV. Generalization of the Ewens sampling formula to arbitrary fitness landscapes. PLoS One 2018; 13:e0190186. [PMID: 29324850 PMCID: PMC5764269 DOI: 10.1371/journal.pone.0190186] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 12/08/2017] [Indexed: 11/30/2022] Open
Abstract
In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the “full connectivity” assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.
Collapse
Affiliation(s)
- Pavel Khromov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Constantin D. Malliaris
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Alexandre V. Morozov
- Department of Physics and Astronomy and Center for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
98
|
Obolski U, Ram Y, Hadany L. Key issues review: evolution on rugged adaptive landscapes. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:012602. [PMID: 29051394 DOI: 10.1088/1361-6633/aa94d4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaptive landscapes represent a mapping between genotype and fitness. Rugged adaptive landscapes contain two or more adaptive peaks: allele combinations with higher fitness than any of their neighbors in the genetic space. How do populations evolve on such rugged landscapes? Evolutionary biologists have struggled with this question since it was first introduced in the 1930s by Sewall Wright. Discoveries in the fields of genetics and biochemistry inspired various mathematical models of adaptive landscapes. The development of landscape models led to numerous theoretical studies analyzing evolution on rugged landscapes under different biological conditions. The large body of theoretical work suggests that adaptive landscapes are major determinants of the progress and outcome of evolutionary processes. Recent technological advances in molecular biology and microbiology allow experimenters to measure adaptive values of large sets of allele combinations and construct empirical adaptive landscapes for the first time. Such empirical landscapes have already been generated in bacteria, yeast, viruses, and fungi, and are contributing to new insights about evolution on adaptive landscapes. In this Key Issues Review we will: (i) introduce the concept of adaptive landscapes; (ii) review the major theoretical studies of evolution on rugged landscapes; (iii) review some of the recently obtained empirical adaptive landscapes; (iv) discuss recent mathematical and statistical analyses motivated by empirical adaptive landscapes, as well as provide the reader with instructions and source code to implement simulations of evolution on adaptive landscapes; and (v) discuss possible future directions for this exciting field.
Collapse
|
99
|
Negative Epistasis in Experimental RNA Fitness Landscapes. J Mol Evol 2017; 85:159-168. [PMID: 29127445 DOI: 10.1007/s00239-017-9817-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 10/28/2017] [Indexed: 10/18/2022]
Abstract
Mutations and their effects on fitness are a fundamental component of evolution. The effects of some mutations change in the presence of other mutations, and this is referred to as epistasis. Epistasis can occur between mutations in different genes or within the same gene. A systematic study of epistasis requires the analysis of numerous mutations and their combinations, which has recently become feasible with advancements in DNA synthesis and sequencing. Here we review the mutational effects and epistatic interactions within RNA molecules revealed by several recent high-throughput mutational studies involving two ribozymes studied in vitro, as well as a tRNA and a snoRNA studied in yeast. The data allow an analysis of the distribution of fitness effects of individual mutations as well as combinations of two or more mutations. Two different approaches to measuring epistasis in the data both reveal a predominance of negative epistasis, such that higher combinations of two or more mutations are typically lower in fitness than expected from the effect of each individual mutation. These data are in contrast to past studies of epistasis that used computationally predicted secondary structures of RNA that revealed a predominance of positive epistasis. The RNA data reviewed here are more similar to that found from mutational experiments on individual protein enzymes, suggesting that a common thermodynamic framework may explain negative epistasis between mutations within macromolecules.
Collapse
|
100
|
Dawn of the in vivo RNA structurome and interactome. Biochem Soc Trans 2017; 44:1395-1410. [PMID: 27911722 DOI: 10.1042/bst20160075] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 06/19/2016] [Accepted: 07/04/2016] [Indexed: 12/11/2022]
Abstract
RNA is one of the most fascinating biomolecules in living systems given its structural versatility to fold into elaborate architectures for important biological functions such as gene regulation, catalysis, and information storage. Knowledge of RNA structures and interactions can provide deep insights into their functional roles in vivo For decades, RNA structural studies have been conducted on a transcript-by-transcript basis. The advent of next-generation sequencing (NGS) has enabled the development of transcriptome-wide structural probing methods to profile the global landscape of RNA structures and interactions, also known as the RNA structurome and interactome, which transformed our understanding of the RNA structure-function relationship on a transcriptomic scale. In this review, molecular tools and NGS methods used for RNA structure probing are presented, novel insights uncovered by RNA structurome and interactome studies are highlighted, and perspectives on current challenges and potential future directions are discussed. A more complete understanding of the RNA structures and interactions in vivo will help illuminate the novel roles of RNA in gene regulation, development, and diseases.
Collapse
|