1
|
Chen WC, Zhou J, McCandlish DM. Density estimation for ordinal biological sequences and its applications. ArXiv 2024:arXiv:2404.11228v1. [PMID: 38699164 PMCID: PMC11065051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides us a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.
Collapse
|
2
|
Livesey BJ, Badonyi M, Dias M, Frazer J, Kumar S, Lindorff-Larsen K, McCandlish DM, Orenbuch R, Shearer CA, Muffley L, Foreman J, Glazer AM, Lehner B, Marks DS, Roth FP, Rubin AF, Starita LM, Marsh JA. Guidelines for releasing a variant effect predictor. ArXiv 2024:arXiv:2404.10807v1. [PMID: 38699161 PMCID: PMC11065047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.
Collapse
Affiliation(s)
- Benjamin J. Livesey
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mafalda Dias
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jonathan Frazer
- Centre for Genomic Regulation (CRG),The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Sushant Kumar
- Department of Medical Biophysics, University of Toronto; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rose Orenbuch
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | | | - Lara Muffley
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Ben Lehner
- Wellcome Sanger Institute, Cambridge, UK; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Debora S. Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Boston, MA, USA
| | - Frederick P. Roth
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Alan F. Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research; Department of Medical Biology, University of Melbourne, Parkville, Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington and the Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
3
|
Seitz EE, McCandlish DM, Kinney JB, Koo PK. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. bioRxiv 2024:2023.11.14.567120. [PMID: 38013993 PMCID: PMC10680760 DOI: 10.1101/2023.11.14.567120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.
Collapse
Affiliation(s)
- Evan E Seitz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
4
|
Ishigami Y, Wong MS, Martí-Gómez C, Ayaz A, Kooshkbaghi M, Hanson SM, McCandlish DM, Krainer AR, Kinney JB. Specificity, synergy, and mechanisms of splice-modifying drugs. Nat Commun 2024; 15:1880. [PMID: 38424098 PMCID: PMC10904865 DOI: 10.1038/s41467-024-46090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 02/10/2024] [Indexed: 03/02/2024] Open
Abstract
Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5' splice site sequences, suggest that branaplam recognizes 5' splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.
Collapse
Affiliation(s)
- Yuma Ishigami
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mandy S Wong
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- Beam Therapeutics, Cambridge, MA, 02142, USA
| | | | - Andalus Ayaz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mahdi Kooshkbaghi
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- The Estée Lauder Companies, New York, NY, 10153, USA
| | | | | | - Adrian R Krainer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| | - Justin B Kinney
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
5
|
Avizemer Z, Martí-Gómez C, Hoch SY, McCandlish DM, Fleishman SJ. Evolutionary paths that link orthogonal pairs of binding proteins. Res Sq 2023:rs.3.rs-2836905. [PMID: 37131620 PMCID: PMC10153392 DOI: 10.21203/rs.3.rs-2836905/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Some protein binding pairs exhibit extreme specificities that functionally insulate them from homologs. Such pairs evolve mostly by accumulating single-point mutations, and mutants are selected if their affinity exceeds the threshold required for function1-4. Thus, homologous and high-specificity binding pairs bring to light an evolutionary conundrum: how does a new specificity evolve while maintaining the required affinity in each intermediate5,6? Until now, a fully functional single-mutation path that connects two orthogonal pairs has only been described where the pairs were mutationally close thus enabling experimental enumeration of all intermediates2. We present an atomistic and graph-theoretical framework for discovering low molecular strain single-mutation paths that connect two extant pairs, enabling enumeration beyond experimental capability. We apply it to two orthogonal bacterial colicin endonuclease-immunity pairs separated by 17 interface mutations7. We were not able to find a strain-free and functional path in the sequence space defined by the two extant pairs. But including mutations that bridge amino acids that cannot be exchanged through single-nucleotide mutations led us to a strain-free 19-mutation trajectory that is completely viable in vivo. Our experiments show that the specificity switch is remarkably abrupt, resulting from only one radical mutation on each partner. Furthermore, each of the critical specificity-switch mutations increases fitness, demonstrating that functional divergence could be driven by positive Darwinian selection. These results reveal how even radical functional changes in an epistatic fitness landscape may evolve.
Collapse
Affiliation(s)
- Ziv Avizemer
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
| | - Sarel J. Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, 7610001, Rehovot, Israel
| |
Collapse
|
6
|
Aguirre L, Hendelman A, Hutton SF, McCandlish DM, Lippman ZB. Idiosyncratic and dose-dependent epistasis drives variation in tomato fruit size. Science 2023; 382:315-320. [PMID: 37856609 PMCID: PMC10602613 DOI: 10.1126/science.adi5222] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 09/06/2023] [Indexed: 10/21/2023]
Abstract
Epistasis between genes is traditionally studied with mutations that eliminate protein activity, but most natural genetic variation is in cis-regulatory DNA and influences gene expression and function quantitatively. In this study, we used natural and engineered cis-regulatory alleles in a plant stem-cell circuit to systematically evaluate epistatic relationships controlling tomato fruit size. Combining a promoter allelic series with two other loci, we collected over 30,000 phenotypic data points from 46 genotypes to quantify how allele strength transforms epistasis. We revealed a saturating dose-dependent relationship but also allele-specific idiosyncratic interactions, including between alleles driving a step change in fruit size during domestication. Our approach and findings expose an underexplored dimension of epistasis, in which cis-regulatory allelic diversity within gene regulatory networks elicits nonlinear, unpredictable interactions that shape phenotypes.
Collapse
Affiliation(s)
- Lyndsey Aguirre
- Cold Spring Harbor Laboratory, School of Biological Sciences, Cold Spring Harbor, NY, USA
| | - Anat Hendelman
- Cold Spring Harbor Laboratory; Cold Spring Harbor, NY, USA
| | - Samuel F. Hutton
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL, USA
| | | | - Zachary B. Lippman
- Cold Spring Harbor Laboratory, School of Biological Sciences, Cold Spring Harbor, NY, USA
- Cold Spring Harbor Laboratory; Cold Spring Harbor, NY, USA
- Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
7
|
Gitschlag BL, Cano AV, Payne JL, McCandlish DM, Stoltzfus A. Mutation and Selection Induce Correlations between Selection Coefficients and Mutation Rates. Am Nat 2023; 202:534-557. [PMID: 37792926 DOI: 10.1086/726014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
AbstractThe joint distribution of selection coefficients and mutation rates is a key determinant of the genetic architecture of molecular adaptation. Three different distributions are of immediate interest: (1) the "nominal" distribution of possible changes, prior to mutation or selection; (2) the "de novo" distribution of realized mutations; and (3) the "fixed" distribution of selectively established mutations. Here, we formally characterize the relationships between these joint distributions under the strong-selection/weak-mutation (SSWM) regime. The de novo distribution is enriched relative to the nominal distribution for the highest rate mutations, and the fixed distribution is further enriched for the most highly beneficial mutations. Whereas mutation rates and selection coefficients are often assumed to be uncorrelated, we show that even with no correlation in the nominal distribution, the resulting de novo and fixed distributions can have correlations with any combination of signs. Nonetheless, we suggest that natural systems with a finite number of beneficial mutations will frequently have the kind of nominal distribution that induces negative correlations in the fixed distribution. We apply our mathematical framework, along with population simulations, to explore joint distributions of selection coefficients and mutation rates from deep mutational scanning and cancer informatics. Finally, we consider the evolutionary implications of these joint distributions together with two additional joint distributions relevant to parallelism and the rate of adaptation.
Collapse
|
8
|
Cano AV, Gitschlag BL, Rozhoňová H, Stoltzfus A, McCandlish DM, Payne JL. Mutation bias and the predictability of evolution. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220055. [PMID: 37004719 PMCID: PMC10067271 DOI: 10.1098/rstb.2022.0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Predicting evolutionary outcomes is an important research goal in a diversity of contexts. The focus of evolutionary forecasting is usually on adaptive processes, and efforts to improve prediction typically focus on selection. However, adaptive processes often rely on new mutations, which can be strongly influenced by predictable biases in mutation. Here, we provide an overview of existing theory and evidence for such mutation-biased adaptation and consider the implications of these results for the problem of prediction, in regard to topics such as the evolution of infectious diseases, resistance to biochemical agents, as well as cancer and other kinds of somatic evolution. We argue that empirical knowledge of mutational biases is likely to improve in the near future, and that this knowledge is readily applicable to the challenges of short-term prediction. This article is part of the theme issue 'Interdisciplinary approaches to predicting evolutionary biology'.
Collapse
Affiliation(s)
- Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Bryan L Gitschlag
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Hana Rozhoňová
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Arlin Stoltzfus
- Office of Data and Informatics, Material Measurement Laboratory, National Institute of Standards and Technology, Rockville, MD 20899, USA
- Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
9
|
Weinstein JY, Martí-Gómez C, Lipsh-Sokolik R, Hoch SY, Liebermann D, Nevo R, Weissman H, Petrovich-Kopitman E, Margulies D, Ivankov D, McCandlish DM, Fleishman SJ. Designed active-site library reveals thousands of functional GFP variants. Nat Commun 2023; 14:2890. [PMID: 37210560 DOI: 10.1038/s41467-023-38099-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/13/2023] [Indexed: 05/22/2023] Open
Abstract
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Collapse
Affiliation(s)
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Demian Liebermann
- Department of Chemical and Biological Physics, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Reinat Nevo
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Haim Weissman
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | | | - David Margulies
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Dmitry Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel.
| |
Collapse
|
10
|
Ishigami Y, Wong MS, Aldaravi CMG, Kooshkbaghi M, Ayaz A, McCandlish DM, Krainer AR, Kinney JB. Specificity, cooperativity, synergy, and mechanisms of splice-modifying drugs. Biophys J 2023; 122:271a. [PMID: 36783339 DOI: 10.1016/j.bpj.2022.11.1547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023] Open
Affiliation(s)
- Yuma Ishigami
- Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| | | | - Carlos M G Aldaravi
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| | - Mahdi Kooshkbaghi
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| | - Andalus Ayaz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| | | | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Laurel Hollow, NY, USA
| |
Collapse
|
11
|
Tareen A, Kooshkbaghi M, Posfai A, Ireland WT, McCandlish DM, Kinney JB. MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol 2022; 23:98. [PMID: 35428271 PMCID: PMC9011994 DOI: 10.1186/s13059-022-02661-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/24/2022] [Indexed: 12/17/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.
Collapse
|
12
|
Inam H, Sokirniy I, Rao Y, Shah A, Naeemikia F, O'Brien E, Dong C, McCandlish DM, Pritchard JR. Genomic and experimental evidence that ALK ATI does not predict single agent sensitivity to ALK inhibitors. iScience 2021; 24:103343. [PMID: 34825133 PMCID: PMC8603052 DOI: 10.1016/j.isci.2021.103343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 06/17/2021] [Accepted: 10/22/2021] [Indexed: 12/01/2022] Open
Abstract
Genomic data can facilitate personalized treatment decisions by enabling therapeutic hypotheses in individual patients. Mutual exclusivity has been an empirically useful signal for identifying activating mutations that respond to single agent targeted therapies. However, a low mutation frequency can underpower this signal for rare variants. We develop a resampling based method for the direct pairwise comparison of conditional selection between sets of gene pairs. We apply this method to a transcript variant of anaplastic lymphoma kinase (ALK) in melanoma, termed ALKATI that was suggested to predict sensitivity to ALK inhibitors and we find that it is not mutually exclusive with key melanoma oncogenes. Furthermore, we find that ALKATI is not likely to be sufficient for cellular transformation or growth, and it does not predict single agent therapeutic dependency. Our work strongly disfavors the role of ALKATI as a targetable oncogenic driver that might be sensitive to single agent ALK treatment.
Collapse
Affiliation(s)
- Haider Inam
- Department of Biomedical Engineering, 211 Wartik Lab, The Pennsylvania State University, University Park, PA 16802, USA
| | - Ivan Sokirniy
- The Huck Institute for the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Yiyun Rao
- The Huck Institute for the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Anushka Shah
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Farnaz Naeemikia
- Department of Biomedical Engineering, 211 Wartik Lab, The Pennsylvania State University, University Park, PA 16802, USA
| | - Edward O'Brien
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Cheng Dong
- Department of Biomedical Engineering, 211 Wartik Lab, The Pennsylvania State University, University Park, PA 16802, USA
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Justin R. Pritchard
- Department of Biomedical Engineering, 211 Wartik Lab, The Pennsylvania State University, University Park, PA 16802, USA
- The Huck Institute for the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
13
|
McCandlish DM. System-specificity of genotype-phenotype map structure: Comment on "From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics" by Susanna Manrubia et al. Phys Life Rev 2021; 39:73-75. [PMID: 34538592 DOI: 10.1016/j.plrev.2021.08.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 08/29/2021] [Indexed: 11/18/2022]
Affiliation(s)
- David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, USA.
| |
Collapse
|
14
|
Affiliation(s)
- David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| | - Gregory I Lang
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, 18015, USA
| |
Collapse
|
15
|
Storz JF, Natarajan C, Signore AV, Witt CC, McCandlish DM, Stoltzfus A. The role of mutation bias in adaptive molecular evolution: insights from convergent changes in protein function. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180238. [PMID: 31154983 DOI: 10.1098/rstb.2018.0238] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
An underexplored question in evolutionary genetics concerns the extent to which mutational bias in the production of genetic variation influences outcomes and pathways of adaptive molecular evolution. In the genomes of at least some vertebrate taxa, an important form of mutation bias involves changes at CpG dinucleotides: if the DNA nucleotide cytosine (C) is immediately 5' to guanine (G) on the same coding strand, then-depending on methylation status-point mutations at both sites occur at an elevated rate relative to mutations at non-CpG sites. Here, we examine experimental data from case studies in which it has been possible to identify the causative substitutions that are responsible for adaptive changes in the functional properties of vertebrate haemoglobin (Hb). Specifically, we examine the molecular basis of convergent increases in Hb-O2 affinity in high-altitude birds. Using a dataset of experimentally verified, affinity-enhancing mutations in the Hbs of highland avian taxa, we tested whether causative changes are enriched for mutations at CpG dinucleotides relative to the frequency of CpG mutations among all possible missense mutations. The tests revealed that a disproportionate number of causative amino acid replacements were attributable to CpG mutations, suggesting that mutation bias can influence outcomes of molecular adaptation. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Jay F Storz
- 1 School of Biological Sciences, University of Nebraska , Lincoln, NE 68588 , USA
| | | | - Anthony V Signore
- 1 School of Biological Sciences, University of Nebraska , Lincoln, NE 68588 , USA
| | - Christopher C Witt
- 2 Department of Biology, University of New Mexico , Albuquerque, NM 87131 , USA.,3 Museum of Southwestern Biology, University of New Mexico , Albuquerque, NM 87131 , USA
| | | | - Arlin Stoltzfus
- 5 Office of Data and Informatics, Material Measurement Laboratory, NIST, and Institute for Bioscience and Biotechnology Research , Rockville, MD 20850 , USA
| |
Collapse
|
16
|
Abstract
Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.
Collapse
Affiliation(s)
- Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA; ,
| |
Collapse
|
17
|
Posfai A, Zhou J, Plotkin JB, Kinney JB, McCandlish DM. Selection for Protein Stability Enriches for Epistatic Interactions. Genes (Basel) 2018; 9:E423. [PMID: 30134605 PMCID: PMC6162820 DOI: 10.3390/genes9090423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 07/30/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
18
|
Abstract
While mutational biases strongly influence neutral molecular evolution, the role of mutational biases in shaping the course of adaptation is less clear. Here we consider the frequency of transitions relative to transversions among adaptive substitutions. Because mutation rates for transitions are higher than those for transversions, if mutational biases influence the dynamics of adaptation, then transitions should be overrepresented among documented adaptive substitutions. To test this hypothesis, we assembled two sets of data on putatively adaptive amino acid replacements that have occurred in parallel during evolution, either in nature or in the laboratory. We find that the frequency of transitions in these data sets is much higher than would be predicted under a null model where mutation has no effect. Our results are qualitatively similar even if we restrict ourself to changes that have occurred, not merely twice, but three or more times. These results suggest that the course of adaptation is biased by mutation.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Genome-scale Measurements Group, Material Measurement Laboratory, NIST, and Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
19
|
Bienvenu F, Akçay E, Legendre S, McCandlish DM. The genealogical decomposition of a matrix population model with applications to the aggregation of stages. Theor Popul Biol 2017; 115:69-80. [PMID: 28476403 DOI: 10.1016/j.tpb.2017.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Revised: 04/19/2017] [Accepted: 04/26/2017] [Indexed: 10/19/2022]
Abstract
Matrix projection models are a central tool in many areas of population biology. In most applications, one starts from the projection matrix to quantify the asymptotic growth rate of the population (the dominant eigenvalue), the stable stage distribution, and the reproductive values (the dominant right and left eigenvectors, respectively). Any primitive projection matrix also has an associated ergodic Markov chain that contains information about the genealogy of the population. In this paper, we show that these facts can be used to specify any matrix population model as a triple consisting of the ergodic Markov matrix, the dominant eigenvalue and one of the corresponding eigenvectors. This decomposition of the projection matrix separates properties associated with lineages from those associated with individuals. It also clarifies the relationships between many quantities commonly used to describe such models, including the relationship between eigenvalue sensitivities and elasticities. We illustrate the utility of such a decomposition by introducing a new method for aggregating classes in a matrix population model to produce a simpler model with a smaller number of classes. Unlike the standard method, our method has the advantage of preserving reproductive values and elasticities. It also has conceptually satisfying properties such as commuting with changes of units.
Collapse
Affiliation(s)
- François Bienvenu
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), CNRS, INSERM, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France; University of Pennsylvania Biology Department, Philadelphia, PA 19104, USA; Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, F-75005 Paris, France.
| | - Erol Akçay
- University of Pennsylvania Biology Department, Philadelphia, PA 19104, USA
| | - Stéphane Legendre
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), CNRS, INSERM, Ecole Normale Supérieure, PSL Research University, F-75005 Paris, France
| | - David M McCandlish
- University of Pennsylvania Biology Department, Philadelphia, PA 19104, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
20
|
Newberry MG, McCandlish DM, Plotkin JB. Assortative mating can impede or facilitate fixation of underdominant alleles. Theor Popul Biol 2016; 112:14-21. [PMID: 27497738 DOI: 10.1016/j.tpb.2016.07.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 07/08/2016] [Accepted: 07/21/2016] [Indexed: 11/19/2022]
Abstract
Underdominant mutations have fixed between divergent species, yet classical models suggest that rare underdominant alleles are purged quickly except in small or subdivided populations. We predict that underdominant alleles that also influence mate choice, such as those affecting coloration patterns visible to mates and predators alike, can fix more readily. We analyze a mechanistic model of positive assortative mating in which individuals have n chances to sample compatible mates. This one-parameter model naturally spans random mating (n=1) and complete assortment (n→∞), yet it produces sexual selection whose strength depends non-monotonically on n. This sexual selection interacts with viability selection to either inhibit or facilitate fixation. As mating opportunities increase, underdominant alleles fix as frequently as neutral mutations, even though sexual selection and underdominance independently each suppress rare alleles. This mechanism allows underdominant alleles to fix in large populations and illustrates how life history can affect evolutionary change.
Collapse
Affiliation(s)
| | | | - Joshua B Plotkin
- University of Pennsylvania, Biology Department, Philadelphia, PA, USA.
| |
Collapse
|
21
|
McCandlish DM, Otwinowski J, Plotkin JB. Detecting epistasis from an ensemble of adapting populations. Evolution 2015; 69:2359-70. [PMID: 26194030 DOI: 10.1111/evo.12735] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 07/07/2015] [Indexed: 12/11/2022]
Abstract
The role that epistasis plays during adaptation remains an outstanding problem, which has received considerable attention in recent years. Most of the recent empirical studies are based on ensembles of replicate populations that adapt in a fixed, laboratory controlled condition. Researchers often seek to infer the presence and form of epistasis in the fitness landscape from the time evolution of various statistics averaged across the ensemble of populations. Here, we provide a rigorous analysis of what quantities, drawn from time series of such ensembles, can be used to infer epistasis for populations evolving under weak mutation on finite-site fitness landscapes. First, we analyze the mean fitness trajectory-that is, the time course of the ensemble average fitness. We show that for any epistatic fitness landscape and starting genotype, there always exists a non-epistatic fitness landscape that produces the exact same mean fitness trajectory. Thus, the presence of epistasis is not identifiable from the mean fitness trajectory. By contrast, we show that two other ensemble statistics-the time evolution of the fitness variance across populations, and the time evolution of the mean number of substitutions-can detect certain forms of epistasis in the underlying fitness landscape.
Collapse
Affiliation(s)
- David M McCandlish
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 19104.
| | - Jakub Otwinowski
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 19104
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 19104
| |
Collapse
|
22
|
McCandlish DM, Epstein CL, Plotkin JB. Formal properties of the probability of fixation: identities, inequalities and approximations. Theor Popul Biol 2014; 99:98-113. [PMID: 25450112 DOI: 10.1016/j.tpb.2014.11.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 11/03/2014] [Accepted: 11/11/2014] [Indexed: 12/22/2022]
Abstract
The formula for the probability of fixation of a new mutation is widely used in theoretical population genetics and molecular evolution. Here we derive a series of identities, inequalities and approximations for the exact probability of fixation of a new mutation under the Moran process (equivalent results hold for the approximate probability of fixation under the Wright-Fisher process, after an appropriate change of variables). We show that the logarithm of the fixation probability has particularly simple behavior when the selection coefficient is measured as a difference of Malthusian fitnesses, and we exploit this simplicity to derive inequalities and approximations. We also present a comprehensive comparison of both existing and new approximations for the fixation probability, highlighting those approximations that induce a reversible Markov chain when used to describe the dynamics of evolution under weak mutation. To demonstrate the power of these results, we consider the classical problem of determining the total substitution rate across an ensemble of biallelic loci and prove that, at equilibrium, a strict majority of substitutions are due to drift rather than selection.
Collapse
Affiliation(s)
- David M McCandlish
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States.
| | - Charles L Epstein
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, United States
| | - Joshua B Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
23
|
Abstract
Many models of evolution calculate the rate of evolution by multiplying the rate at which new mutations originate within a population by a probability of fixation. Here we review the historical origins, contemporary applications, and evolutionary implications of these "origin-fixation" models, which are widely used in evolutionary genetics, molecular evolution, and phylogenetics. Origin-fixation models were first introduced in 1969, in association with an emerging view of "molecular" evolution. Early origin-fixation models were used to calculate an instantaneous rate of evolution across a large number of independently evolving loci; in the 1980s and 1990s, a second wave of origin-fixation models emerged to address a sequence of fixation events at a single locus. Although origin fixation models have been applied to a broad array of problems in contemporary evolutionary research, their rise in popularity has not been accompanied by an increased appreciation of their restrictive assumptions or their distinctive implications. We argue that origin-fixation models constitute a coherent theory of mutation-limited evolution that contrasts sharply with theories of evolution that rely on the presence of standing genetic variation. A major unsolved question in evolutionary biology is the degree to which these models provide an accurate approximation of evolution in natural populations.
Collapse
|
24
|
McCandlish DM, Epstein CL, Plotkin JB. THE INEVITABILITY OF UNCONDITIONALLY DELETERIOUS SUBSTITUTIONS DURING ADAPTATION. Evolution 2014; 68:1351-64. [DOI: 10.1111/evo.12350] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 12/17/2013] [Indexed: 01/29/2023]
Affiliation(s)
- David M. McCandlish
- Department of Biology; University of Pennsylvania; Philadelphia Pennsylvania 19104
| | - Charles L. Epstein
- Department of Mathematics; University of Pennsylvania; Philadelphia Pennsylvania 19104
| | - Joshua B. Plotkin
- Department of Biology; University of Pennsylvania; Philadelphia Pennsylvania 19104
| |
Collapse
|
25
|
McCandlish DM, Rajon E, Shah P, Ding Y, Plotkin JB. The role of epistasis in protein evolution. Nature 2013; 497:E1-2; discussion E2-3. [PMID: 23719465 DOI: 10.1038/nature12219] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 04/18/2013] [Indexed: 01/15/2023]
Affiliation(s)
- David M McCandlish
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | |
Collapse
|
26
|
Abstract
Can we define a measure that describes how easy or difficult it is for a population to evolve to a specific genotype? For populations evolving under weak mutation on a time-invariant fitness landscape, I argue that one appropriate measure is the expected waiting time, starting from equilibrium, for a population to become fixed for a given genotype. Under this definition for the "findability" of genotypes, I show that for any pair of genotypes (1) a population at equilibrium is always more likely to fix at the more findable before the less findable genotype and (2) the expected time to evolve from the more findable to the less findable genotype is always greater that the expected time to evolve in the opposite direction. Although increasing the fitness of a genotype always increases its findability, in general there is no simple relationship between the rank ordering of genotypes by fitness and the rank ordering of genotypes by findability. I also present a method for quantifying the relative contributions of mutation, selection, substitution rate, and probability of reversion to a genotype's findability.
Collapse
Affiliation(s)
- David M McCandlish
- Biology Department, Duke University, Box 90338, Durham, North Carolina, 27708; Current Address: Lynch Labs, Room 204K, Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, 19104.
| |
Collapse
|
27
|
Abstract
Fitness landscapes are a classical concept for thinking about the relationship between genotype and fitness. However, because the space of genotypes is typically high-dimensional, the structure of fitness landscapes can be difficult to understand and the heuristic approach of thinking about fitness landscapes as low-dimensional, continuous surfaces may be misleading. Here, I present a rigorous method for creating low-dimensional representations of fitness landscapes. The basic idea is to plot the genotypes in a manner that reflects the ease or difficulty of evolving from one genotype to another. Such a layout can be constructed using the eigenvectors of the transition matrix describing the evolution of a population on the fitness landscape when mutation is weak. In addition, the eigendecomposition of this transition matrix provides a new, high-level view of evolution on a fitness landscape. I demonstrate these techniques by visualizing the fitness landscape for selection for the amino acid serine and by visualizing a neutral network derived from the RNA secondary structure genotype-phenotype map.
Collapse
Affiliation(s)
- David M McCandlish
- Department of Biology, Duke University, Box 90338 Durham, North Carolina 27708, USA.
| |
Collapse
|
28
|
Tolstorukov MY, Colasanti AV, McCandlish DM, Olson WK, Zhurkin VB. A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning. J Mol Biol 2007; 371:725-38. [PMID: 17585938 PMCID: PMC2000845 DOI: 10.1016/j.jmb.2007.05.048] [Citation(s) in RCA: 168] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Revised: 05/17/2007] [Accepted: 05/18/2007] [Indexed: 11/17/2022]
Abstract
How eukaryotic genomes encode the folding of DNA into nucleosomes and how this intrinsic organization of chromatin guides biological function are questions of wide interest. The physical basis of nucleosome positioning lies in the sequence-dependent propensity of DNA to adopt the tightly bent configuration imposed by the binding of the histone proteins. Traditionally, only DNA bending and twisting deformations are considered, while the effects of the lateral displacements of adjacent base pairs are neglected. We demonstrate, however, that these displacements have a much more important structural role than ever imagined. Specifically, the lateral Slide deformations observed at sites of local anisotropic bending of DNA define its superhelical trajectory in chromatin. Furthermore, the computed cost of deforming DNA on the nucleosome is sequence-specific: in optimally positioned sequences the most easily deformed base-pair steps (CA:TG and TA) occur at sites of large positive Slide and negative Roll (where the DNA bends into the minor groove). These conclusions rest upon a treatment of DNA that goes beyond the conventional ribbon model, incorporating all essential degrees of freedom of "real" duplexes in the estimation of DNA deformation energies. Indeed, only after lateral Slide displacements are considered are we able to account for the sequence-specific folding of DNA found in nucleosome structures. The close correspondence between the predicted and observed nucleosome locations demonstrates the potential advantage of our "structural" approach in the computer mapping of nucleosome positioning.
Collapse
|