1
|
Jonas F, Navon Y, Barkai N. Intrinsically disordered regions as facilitators of the transcription factor target search. Nat Rev Genet 2025; 26:424-435. [PMID: 39984675 DOI: 10.1038/s41576-025-00816-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/14/2025] [Indexed: 02/23/2025]
Abstract
Transcription factors (TFs) contribute to organismal development and function by regulating gene expression. Despite decades of research, the factors determining the specificity and speed at which eukaryotic TFs detect their target binding sites remain poorly understood. Recent studies have pointed to intrinsically disordered regions (IDRs) within TFs as key regulators of the process by which TFs find their target sites on DNA (the TF target search). However, IDRs are challenging to study because they can confer specificity despite low sequence complexity and can be functionally conserved despite rapid sequence divergence. Nevertheless, emerging computational and experimental approaches are beginning to elucidate the sequence-function relationship within the IDRs of TFs. Additional insights are informing potential mechanisms underlying the IDR-directed search for the DNA targets of TFs, including incorporation into biomolecular condensates, facilitating TF co-localization, and the hypothesis that IDRs recognize and directly interact with specific genomic regions.
Collapse
Affiliation(s)
- Felix Jonas
- School of Science, Constructor University, Bremen, Germany.
| | - Yoav Navon
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
2
|
Lobel JH, Ingolia NT. Deciphering disordered regions controlling mRNA decay in high-throughput. Nature 2025:10.1038/s41586-025-08919-x. [PMID: 40269159 DOI: 10.1038/s41586-025-08919-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 03/19/2025] [Indexed: 04/25/2025]
Abstract
Intrinsically disordered regions within proteins drive specific molecular functions despite lacking a defined structure1,2. Although disordered regions are integral to controlling mRNA stability and translation, the mechanisms underlying these regulatory effects remain unclear3. Here we reveal the molecular determinants of this activity using high-throughput functional profiling. Systematic mutagenesis across hundreds of regulatory disordered elements, combined with machine learning, reveals a complex pattern of molecular features important for their activity. The presence and arrangement of aromatic residues strongly predicts the ability of seemingly diverse protein sequences to influence mRNA stability and translation. We further show how many of these regulatory elements exert their effects by engaging core mRNA decay machinery. Our results define molecular features and biochemical pathways that explain how disordered regions control mRNA expression and shed light on broader principles within functional, unstructured proteins.
Collapse
Affiliation(s)
- Joseph H Lobel
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nicholas T Ingolia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA.
- Center for Computational Biology and California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
3
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. Protein Sci 2025; 34:e70004. [PMID: 39720898 DOI: 10.1002/pro.70004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 11/19/2024] [Accepted: 12/05/2024] [Indexed: 12/26/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.
Collapse
Affiliation(s)
| | - Amy E Keating
- Department of Biology, MIT, Cambridge, Massachusetts, USA
- Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA
- Koch Institute for Integrative Cancer Research, Cambridge, Massachusetts, USA
| |
Collapse
|
4
|
Chow CFW, Ghosh S, Hadarovich A, Toth-Petroczy A. SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences. Proc Natl Acad Sci U S A 2024; 121:e2401622121. [PMID: 39383002 PMCID: PMC11494347 DOI: 10.1073/pnas.2401622121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/30/2024] [Indexed: 10/11/2024] Open
Abstract
Intrinsically disordered regions (IDRs) are structurally flexible protein segments with regulatory functions in multiple contexts, such as in the assembly of biomolecular condensates. Since IDRs undergo more rapid evolution than ordered regions, identifying homology of such poorly conserved regions remains challenging for state-of-the-art alignment-based methods that rely on position-specific conservation of residues. Thus, systematic functional annotation and evolutionary analysis of IDRs have been limited, despite them comprising ~21% of proteins. To accurately assess homology between unalignable sequences, we developed an alignment-free sequence comparison algorithm, SHARK (Similarity/Homology Assessment by Relating K-mers). We trained SHARK-dive, a machine learning homology classifier, which achieved superior performance to standard alignment-based approaches in assessing evolutionary homology in unalignable sequences. Furthermore, it correctly identified dissimilar but functionally analogous IDRs in IDR-replacement experiments reported in the literature, whereas alignment-based tools were incapable of detecting such functional relationships. SHARK-dive not only predicts functionally similar IDRs at a proteome-wide scale but also identifies cryptic sequence properties and motifs that drive remote homology and analogy, thereby providing interpretable and experimentally verifiable hypotheses of the sequence determinants that underlie such relationships. SHARK-dive acts as an alternative to alignment to facilitate systematic analysis and functional annotation of the unalignable protein universe.
Collapse
Affiliation(s)
- Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
- Cluster of Excellence Physics of Life, Technische Universität Dresden, Dresden01062, Germany
| | - Soumyadeep Ghosh
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
| | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden01307, Germany
- Center for Systems Biology Dresden, Dresden01307, Germany
- Cluster of Excellence Physics of Life, Technische Universität Dresden, Dresden01062, Germany
| |
Collapse
|
5
|
Halpin JC, Keating AE. PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.23.604860. [PMID: 39091826 PMCID: PMC11291154 DOI: 10.1101/2024.07.23.604860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. The ability to predict domain-SLiM interactions would allow researchers to map protein interaction networks, predict the effects of perturbations to those networks, and develop biologically meaningful hypotheses. Unfortunately, sequence database searches for SLiMs generally yield mostly biologically irrelevant motif matches or false positives. To improve the prediction of novel SLiM interactions, researchers employ filters to discriminate between biologically relevant and improbable motif matches. One promising criterion for identifying biologically relevant SLiMs is the sequence conservation of the motif, exploiting the fact that functional motifs are more likely to be conserved than spurious motif matches. However, the difficulty of aligning disordered regions has significantly hampered the utility of this approach. We present PairK (pairwise k-mer alignment), an MSA-free method to quantify motif conservation in disordered regions. PairK outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor on the task of identifying biologically important motif instances. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that SLiMs may be more conserved than is implied by MSA-based metrics. PairK is available as open-source code at https://github.com/jacksonh1/pairk.
Collapse
Affiliation(s)
- Jackson C. Halpin
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
| | - Amy E. Keating
- MIT Department of Biology, 77 Massachusetts Ave., Cambridge, MA 02139
- MIT Department of Biological Engineering, 77 Massachusetts Ave., Cambridge, MA 02139
- Koch Institute for Integrative Cancer Research, 77 Massachusetts Ave., Cambridge, MA 02139
| |
Collapse
|
6
|
Zarin T, Lehner B. A complete map of specificity encoding for a partially fuzzy protein interaction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591103. [PMID: 38712134 PMCID: PMC11071492 DOI: 10.1101/2024.04.25.591103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Thousands of human proteins function by binding short linear motifs embedded in intrinsically disordered regions. How affinity and specificity are encoded in these binding domains and the motifs themselves is not well understood. The evolvability of binding specificity - how rapidly and extensively it can change upon mutation - is also largely unexplored, as is the contribution of 'fuzzy' dynamic residues to affinity and specificity in protein-protein interactions. Here we report the first complete map of specificity encoding for a globular protein domain. Quantifying >200,000 energetic interactions between a PDZ domain and its ligand identifies 20 major energetically coupled pairs of sites that control specificity. These are organized into six modules, with most mutations in each module reprogramming specificity for a single position in the ligand. Nine of the major energetic couplings controlling specificity are between structural contacts and 11 have an allosteric mechanism of action. The dynamic tail of the ligand is more robust to mutation than the structured residues but contributes additively to binding affinity and communicates with structured residues to enable changes in specificity. Our results quantify the binding specificities of >1,800 globular proteins to reveal how specificity is encoded and provide a direct comparison of the encoding of affinity and specificity in structured and dynamic molecular recognition.
Collapse
Affiliation(s)
- Taraneh Zarin
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Barcelona, Spain
- Wellcome Sanger Institute, Cambridge, UK
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
7
|
Idrees S, Paudel KR. Proteome-wide assessment of human interactome as a source of capturing domain-motif and domain-domain interactions. J Cell Commun Signal 2024; 18:e12014. [PMID: 38545252 PMCID: PMC10964934 DOI: 10.1002/ccs3.12014] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 12/11/2023] [Indexed: 06/29/2024] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in various biological processes by establishing domain-motif (DMI) and domain-domain interactions (DDIs). While the existence of real DMIs/DDIs is generally assumed, it is rarely tested; therefore, this study extensively compared high-throughput methods and public PPI repositories as sources for DMI and DDI prediction based on the assumption that the human interactome provides sufficient data for the reliable identification of DMIs and DDIs. Different datasets from leading high-throughput methods (Yeast two-hybrid [Y2H], Affinity Purification coupled Mass Spectrometry [AP-MS], and Co-fractionation-coupled Mass Spectrometry) were assessed for their ability to capture DMIs and DDIs using known DMI/DDI information. High-throughput methods were not notably worse than PPI databases and, in some cases, appeared better. In conclusion, all PPI datasets demonstrated significant enrichment in DMIs and DDIs (p-value <0.001), establishing Y2H and AP-MS as reliable methods for predicting these interactions. This study provides valuable insights for biologists in selecting appropriate methods for predicting DMIs, ultimately aiding in SLiM discovery.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular SciencesUniversity of New South WalesSydneyNew South WalesAustralia
- Centre for InflammationCentenary Institute and the University of Technology SydneySchool of Life SciencesFaculty of ScienceSydneyNew South WalesAustralia
| | - Keshav Raj Paudel
- Centre for InflammationCentenary Institute and the University of Technology SydneySchool of Life SciencesFaculty of ScienceSydneyNew South WalesAustralia
| |
Collapse
|
8
|
Lotthammer JM, Ginell GM, Griffith D, Emenecker RJ, Holehouse AS. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat Methods 2024; 21:465-476. [PMID: 38297184 PMCID: PMC10927563 DOI: 10.1038/s41592-023-02159-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 12/20/2023] [Indexed: 02/02/2024]
Abstract
Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes.
Collapse
Affiliation(s)
- Jeffrey M Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Garrett M Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
9
|
Daybog I, Kolodny O. A computational framework for resolving the microbiome diversity conundrum. Nat Commun 2023; 14:7977. [PMID: 38042865 PMCID: PMC10693575 DOI: 10.1038/s41467-023-42768-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 10/20/2023] [Indexed: 12/04/2023] Open
Abstract
Recent empirical studies offer conflicting findings regarding the relation between host fitness and the composition of its microbiome, a conflict which we term 'the microbial β- diversity conundrum'. The microbiome is crucial for host wellbeing and survival. Surprisingly, different healthy individuals' microbiome compositions, even in the same population, often differ dramatically, contrary to the notion that a vital trait should be highly conserved. Moreover, gnotobiotic individuals exhibit highly deleterious phenotypes, supporting the view that the microbiome is paramount to host fitness. However, the introduction of almost arbitrarily selected microbiota into the system often achieves a significant rescue effect of the deleterious phenotypes. This is true even for microbiota from soil or phylogenetically distant host species, highlighting an apparent paradox. We suggest several solutions to the paradox using a computational framework, simulating the population dynamics of hosts and their microbiomes over multiple generations. The answers invoke factors such as host population size, the specific mode of microbial contribution to host fitness, and typical microbiome richness, offering solutions to the conundrum by highlighting scenarios where even when a host's fitness is determined in full by its microbiome composition, this composition has little effect on the natural selection dynamics of the population.
Collapse
Affiliation(s)
- Itay Daybog
- Department of Ecology, Evolution and Behavior, The A. Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.
| | - Oren Kolodny
- Department of Ecology, Evolution and Behavior, The A. Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.
| |
Collapse
|
10
|
Gonzalez JP, Frandsen KEH, Kesten C. The role of intrinsic disorder in binding of plant microtubule-associated proteins to the cytoskeleton. Cytoskeleton (Hoboken) 2023; 80:404-436. [PMID: 37578201 DOI: 10.1002/cm.21773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/28/2023] [Accepted: 07/30/2023] [Indexed: 08/15/2023]
Abstract
Microtubules (MTs) represent one of the main components of the eukaryotic cytoskeleton and support numerous critical cellular functions. MTs are in principle tube-like structures that can grow and shrink in a highly dynamic manner; a process largely controlled by microtubule-associated proteins (MAPs). Plant MAPs are a phylogenetically diverse group of proteins that nonetheless share many common biophysical characteristics and often contain large stretches of intrinsic protein disorder. These intrinsically disordered regions are determinants of many MAP-MT interactions, in which structural flexibility enables low-affinity protein-protein interactions that enable a fine-tuned regulation of MT cytoskeleton dynamics. Notably, intrinsic disorder is one of the major obstacles in functional and structural studies of MAPs and represents the principal present-day challenge to decipher how MAPs interact with MTs. Here, we review plant MAPs from an intrinsic protein disorder perspective, by providing a complete and up-to-date summary of all currently known members, and address the current and future challenges in functional and structural characterization of MAPs.
Collapse
Affiliation(s)
- Jordy Perez Gonzalez
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Kristian E H Frandsen
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Christopher Kesten
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| |
Collapse
|
11
|
Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A 2023; 120:e2304302120. [PMID: 37878721 PMCID: PMC10622901 DOI: 10.1073/pnas.2304302120] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/30/2023] [Indexed: 10/27/2023] Open
Abstract
The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.
Collapse
Affiliation(s)
- T. Reid Alderson
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Đesika Kolarić
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Alan M. Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
| | - Julie D. Forman-Kay
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
12
|
Idrees S, Paudel KR, Sadaf T, Hansbro PM. How different viruses perturb host cellular machinery via short linear motifs. EXCLI JOURNAL 2023; 22:1113-1128. [PMID: 38054205 PMCID: PMC10694346 DOI: 10.17179/excli2023-6328] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/18/2023] [Indexed: 12/07/2023]
Abstract
The virus interacts with its hosts by developing protein-protein interactions. Most viruses employ protein interactions to imitate the host protein: A viral protein with the same amino acid sequence or structure as the host protein attaches to the host protein's binding partner and interferes with the host protein's pathways. Being opportunistic, viruses have evolved to manipulate host cellular mechanisms by mimicking short linear motifs. In this review, we shed light on the current understanding of mimicry via short linear motifs and focus on viral mimicry by genetically different viral subtypes by providing recent examples of mimicry evidence and how high-throughput methods can be a reliable source to study SLiM-mediated viral mimicry.
Collapse
Affiliation(s)
- Sobia Idrees
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, School of Life Sciences, Faculty of Science, Sydney, New South Wales, Australia
| | - Keshav Raj Paudel
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, School of Life Sciences, Faculty of Science, Sydney, New South Wales, Australia
| | - Tayyaba Sadaf
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, School of Life Sciences, Faculty of Science, Sydney, New South Wales, Australia
| | - Philip M. Hansbro
- Centre for Inflammation, Centenary Institute and the University of Technology Sydney, School of Life Sciences, Faculty of Science, Sydney, New South Wales, Australia
| |
Collapse
|
13
|
Papageorgiou AC, Pospisilova M, Cibulka J, Ashraf R, Waudby CA, Kadeřávek P, Maroz V, Kubicek K, Prokop Z, Krejci L, Tripsianes K. Recognition and coacervation of G-quadruplexes by a multifunctional disordered region in RECQ4 helicase. Nat Commun 2023; 14:6751. [PMID: 37875529 PMCID: PMC10598209 DOI: 10.1038/s41467-023-42503-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 10/12/2023] [Indexed: 10/26/2023] Open
Abstract
Biomolecular polyelectrolyte complexes can be formed between oppositely charged intrinsically disordered regions (IDRs) of proteins or between IDRs and nucleic acids. Highly charged IDRs are abundant in the nucleus, yet few have been functionally characterized. Here, we show that a positively charged IDR within the human ATP-dependent DNA helicase Q4 (RECQ4) forms coacervates with G-quadruplexes (G4s). We describe a three-step model of charge-driven coacervation by integrating equilibrium and kinetic binding data in a global numerical model. The oppositely charged IDR and G4 molecules form a complex in the solution that follows a rapid nucleation-growth mechanism leading to a dynamic equilibrium between dilute and condensed phases. We also discover a physical interaction with Replication Protein A (RPA) and demonstrate that the IDR can switch between the two extremes of the structural continuum of complexes. The structural, kinetic, and thermodynamic profile of its interactions revealed a dynamic disordered complex with nucleic acids and a static ordered complex with RPA protein. The two mutually exclusive binding modes suggest a regulatory role for the IDR in RECQ4 function by enabling molecular handoffs. Our study extends the functional repertoire of IDRs and demonstrates a role of polyelectrolyte complexes involved in G4 binding.
Collapse
Affiliation(s)
- Anna C Papageorgiou
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Michaela Pospisilova
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Jakub Cibulka
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Raghib Ashraf
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Christopher A Waudby
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
- School of Pharmacy, University College London, London, WC1N 1AX, UK
| | - Pavel Kadeřávek
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Volha Maroz
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Karel Kubicek
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
- Department of Condensed Matter Physics, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
- International Clinical Research Center, St Anne's University Hospital, Brno, Czech Republic
| | - Lumir Krejci
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic.
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.
- International Clinical Research Center, St Anne's University Hospital, Brno, Czech Republic.
| | | |
Collapse
|
14
|
Jonas F, Carmi M, Krupkin B, Steinberger J, Brodsky S, Jana T, Barkai N. The molecular grammar of protein disorder guiding genome-binding locations. Nucleic Acids Res 2023; 51:4831-4844. [PMID: 36938874 PMCID: PMC10250222 DOI: 10.1093/nar/gkad184] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/25/2023] [Accepted: 03/15/2023] [Indexed: 03/21/2023] Open
Abstract
Intrinsically disordered regions (IDRs) direct transcription factors (TFs) towards selected genomic occurrences of their binding motif, as exemplified by budding yeast's Msn2. However, the sequence basis of IDR-directed TF binding selectivity remains unknown. To reveal this sequence grammar, we analyze the genomic localizations of >100 designed IDR mutants, each carrying up to 122 mutations within this 567-AA region. Our data points at multivalent interactions, carried by hydrophobic-mostly aliphatic-residues dispersed within a disordered environment and independent of linear sequence motifs, as the key determinants of Msn2 genomic localization. The implications of our results for the mechanistic basis of IDR-based TF binding preferences are discussed.
Collapse
Affiliation(s)
- Felix Jonas
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Miri Carmi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Beniamin Krupkin
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Joseph Steinberger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Tamar Jana
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
15
|
Mitrea DM, Mittasch M, Gomes BF, Klein IA, Murcko MA. Modulating biomolecular condensates: a novel approach to drug discovery. Nat Rev Drug Discov 2022; 21:841-862. [PMID: 35974095 PMCID: PMC9380678 DOI: 10.1038/s41573-022-00505-4] [Citation(s) in RCA: 167] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2022] [Indexed: 12/12/2022]
Abstract
In the past decade, membraneless assemblies known as biomolecular condensates have been reported to play key roles in many cellular functions by compartmentalizing specific proteins and nucleic acids in subcellular environments with distinct properties. Furthermore, growing evidence supports the view that biomolecular condensates often form by phase separation, in which a single-phase system demixes into a two-phase system consisting of a condensed phase and a dilute phase of particular biomolecules. Emerging understanding of condensate function in normal and aberrant cellular states, and of the mechanisms of condensate formation, is providing new insights into human disease and revealing novel therapeutic opportunities. In this Perspective, we propose that such insights could enable a previously unexplored drug discovery approach based on identifying condensate-modifying therapeutics (c-mods), and we discuss the strategies, techniques and challenges involved.
Collapse
|
16
|
Chaudhary A, Chaurasia PK, Kushwaha S, Chauhan P, Chawade A, Mani A. Correlating multi-functional role of cold shock domain proteins with intrinsically disordered regions. Int J Biol Macromol 2022; 220:743-753. [PMID: 35987358 DOI: 10.1016/j.ijbiomac.2022.08.100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/14/2022] [Indexed: 11/05/2022]
Abstract
Cold shock proteins (CSPs) are an ancient and conserved family of proteins. They are renowned for their role in response to low-temperature stress in bacteria and nucleic acid binding activities. In prokaryotes, cold and non-cold inducible CSPs are involved in various cellular and metabolic processes such as growth and development, osmotic oxidation, starvation, stress tolerance, and host cell invasion. In prokaryotes, cold shock condition reduces cell transcription and translation efficiency. Eukaryotic cold shock domain (CSD) proteins are evolved form of prokaryotic CSPs where CSD is flanked by N- and C-terminal domains. Eukaryotic CSPs are multi-functional proteins. CSPs also act as nucleic acid chaperons by preventing the formation of secondary structures in mRNA at low temperatures. In human, CSD proteins play a crucial role in the progression of breast cancer, colon cancer, lung cancer, and Alzheimer's disease. A well-defined three-dimensional structure of intrinsically disordered regions of CSPs family members is still undetermined. In this article, intrinsic disorder regions of CSPs have been explored systematically to understand the pleiotropic role of the cold shock family of proteins.
Collapse
Affiliation(s)
- Amit Chaudhary
- Department of Metallurgical Engineering & Materials Science, Indian Institute of Technology Bombay
| | - Pankaj Kumar Chaurasia
- PG Department of Chemistry, L.S. College, Babasaheb Bhimrao Ambedkar Bihar University, Muzaffarpur, Bihar 842001, India
| | - Sandeep Kushwaha
- National Institute of Animal Biotechnology, Hyderabad 500032, India.
| | | | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, 230 53 Alnarp, Sweden.
| | - Ashutosh Mani
- Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211004, India.
| |
Collapse
|
17
|
Sangster AG, Zarin T, Moses AM. Evolution of short linear motifs and disordered proteins Topic: yeast as model system to study evolution. Curr Opin Genet Dev 2022; 76:101964. [PMID: 35939968 DOI: 10.1016/j.gde.2022.101964] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/29/2022] [Accepted: 07/08/2022] [Indexed: 11/26/2022]
Abstract
Evolutionary preservation of protein structure had a major influence on the field of molecular evolution: changes in individual amino acids that did not disrupt protein folding would either have no effect or subtly change the 'lock' so that it could fit a new 'key'. Homology of individual amino acids could be confidently assigned through sequence alignments, and models of evolution could be tested. This view of molecular evolution excluded large regions of proteins that could not be confidently aligned, such as intrinsically disordered regions (IDRs) that do not fold into stable structures. In the last decade, major progress has been made in understanding the evolution of IDRs, much of it facilitated by new experimental and computational approaches in yeast. Here, we review this progress as well as several still outstanding questions.
Collapse
Affiliation(s)
- Ami G Sangster
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada
| | - Taraneh Zarin
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada. https://twitter.com/@taraneh_z
| | - Alan M Moses
- Cell & Systems Biology, University of Toronto, 25 Harbord St., Toronto, ON M5S 3G5, Canada.
| |
Collapse
|
18
|
Lu AX, Lu AX, Pritišanac I, Zarin T, Forman-Kay JD, Moses AM. Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning. PLoS Comput Biol 2022; 18:e1010238. [PMID: 35767567 PMCID: PMC9275697 DOI: 10.1371/journal.pcbi.1010238] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 07/12/2022] [Accepted: 05/23/2022] [Indexed: 02/07/2023] Open
Abstract
A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.
Collapse
Affiliation(s)
- Alex X. Lu
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Amy X. Lu
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, Canada
| | - Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Julie D. Forman-Kay
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, Canada
- Department of Biochemistry, University of Toronto, Toronto, Canada
| | - Alan M. Moses
- Department of Computer Science, University of Toronto, Toronto, Canada
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| |
Collapse
|
19
|
Abstract
Since the large-scale experimental characterization of protein–protein interactions (PPIs) is not possible for all species, several computational PPI prediction methods have been developed that harness existing data from other species. While PPI network prediction has been extensively used in eukaryotes, microbial network inference has lagged behind. However, bacterial interactomes can be built using the same principles and techniques; in fact, several methods are better suited to bacterial genomes. These predicted networks allow systems-level analyses in species that lack experimental interaction data. This review describes the current network inference and analysis techniques and summarizes the use of computationally-predicted microbial interactomes to date.
Collapse
|
20
|
Dennis EM, Garcia DM. Biochemical Principles in Prion-Based Inheritance. EPIGENOMES 2022; 6:4. [PMID: 35225957 PMCID: PMC8883993 DOI: 10.3390/epigenomes6010004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/13/2022] [Accepted: 01/20/2022] [Indexed: 12/14/2022] Open
Abstract
Prions are proteins that can stably fold into alternative structures that frequently alter their activities. They can self-template their alternate structures and are inherited across cell divisions and generations. While they have been studied for more than four decades, their enigmatic nature has limited their discovery. In the last decade, we have learned just how widespread they are in nature, the many beneficial phenotypes that they confer, while also learning more about their structures and modes of inheritance. Here, we provide a brief review of the biochemical principles of prion proteins, including their sequences, characteristics and structures, and what is known about how they self-template, citing examples from multiple organisms. Prion-based inheritance is the most understudied segment of epigenetics. Here, we lay a biochemical foundation and share a framework for how to define these molecules, as new examples are unearthed throughout nature.
Collapse
Affiliation(s)
- Emily M. Dennis
- Department of Chemistry and Biochemistry, Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA;
| | - David M. Garcia
- Department of Biology, Institute of Molecular Biology, University of Oregon, Eugene, OR 97403, USA
| |
Collapse
|
21
|
Zarin T, Strome B, Peng G, Pritišanac I, Forman-Kay JD, Moses AM. Identifying molecular features that are associated with biological function of intrinsically disordered protein regions. eLife 2021; 10:e60220. [PMID: 33616531 PMCID: PMC7932695 DOI: 10.7554/elife.60220] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 02/22/2021] [Indexed: 12/17/2022] Open
Abstract
In previous work, we showed that intrinsically disordered regions (IDRs) of proteins contain sequence-distributed molecular features that are conserved over evolution, despite little sequence similarity that can be detected in alignments (Zarin et al., 2019). Here, we aim to use these molecular features to predict specific biological functions for individual IDRs and identify the molecular features within them that are associated with these functions. We find that the predictable functions are diverse. Examining the associated molecular features, we note some that are consistent with previous reports and identify others that were previously unknown. We experimentally confirm that elevated isoelectric point and hydrophobicity, features that are positively associated with mitochondrial localization, are necessary for mitochondrial targeting function. Remarkably, increasing isoelectric point in a synthetic IDR restores weak mitochondrial targeting. We believe feature analysis represents a new systematic approach to understand how biological functions of IDRs are specified by their protein sequences.
Collapse
Affiliation(s)
- Taraneh Zarin
- Department of Cell and Systems Biology, University of TorontoTorontoCanada
| | - Bob Strome
- Department of Cell and Systems Biology, University of TorontoTorontoCanada
| | - Gang Peng
- Department of Cell and Systems Biology, University of TorontoTorontoCanada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of TorontoTorontoCanada
- Program in Molecular Medicine, Hospital for Sick ChildrenTorontoCanada
| | - Julie D Forman-Kay
- Program in Molecular Medicine, Hospital for Sick ChildrenTorontoCanada
- Department of Biochemistry, University of TorontoTorontoCanada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of TorontoTorontoCanada
| |
Collapse
|
22
|
Newaz K, Wright G, Piland J, Li J, Clark PL, Emrich SJ, Milenković T. Network analysis of synonymous codon usage. Bioinformatics 2020; 36:4876-4884. [PMID: 32609328 DOI: 10.1093/bioinformatics/btaa603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/05/2020] [Accepted: 06/22/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Most amino acids are encoded by multiple synonymous codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. RESULTS We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the structural core are network-central, and those on the surface are not. Then, we study potential differences between network centralities and thus structural positions of amino acids encoded by conserved rare, non-conserved rare and commonly used codons. We find that in 84% of proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different codon centrality trends, i.e. different relationships between structural positions of the three codon categories. We see several cases of all proteins from our data with some structural or functional property being in the same group. Also, we see a case of all proteins in some group having the same property. Our work shows that codon usage is linked to the final protein structure and thus possibly to co-translational protein folding. AVAILABILITY AND IMPLEMENTATION https://nd.edu/∼cone/CodonUsage/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Gabriel Wright
- Department of Computer Science and Engineering.,Eck institute for Global Health
| | - Jacob Piland
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics
| | - Patricia L Clark
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| |
Collapse
|
23
|
Awan MG, Deslippe J, Buluc A, Selvitopi O, Hofmeyr S, Oliker L, Yelick K. ADEPT: a domain independent sequence alignment strategy for gpu architectures. BMC Bioinformatics 2020; 21:406. [PMID: 32933482 PMCID: PMC7493400 DOI: 10.1186/s12859-020-03720-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 08/21/2020] [Indexed: 12/28/2022] Open
Abstract
Background Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases. Results In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT’s driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation. Conclusions ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.
Collapse
Affiliation(s)
- Muaaz G Awan
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA.
| | - Jack Deslippe
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Aydin Buluc
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Oguz Selvitopi
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| | - Katherine Yelick
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA
| |
Collapse
|
24
|
Kuzmin E, VanderSluis B, Nguyen Ba AN, Wang W, Koch EN, Usaj M, Khmelinskii A, Usaj MM, van Leeuwen J, Kraus O, Tresenrider A, Pryszlak M, Hu MC, Varriano B, Costanzo M, Knop M, Moses A, Myers CL, Andrews BJ, Boone C. Exploring whole-genome duplicate gene retention with complex genetic interaction analysis. Science 2020; 368:eaaz5667. [PMID: 32586993 PMCID: PMC7539174 DOI: 10.1126/science.aaz5667] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/06/2020] [Indexed: 12/25/2022]
Abstract
Whole-genome duplication has played a central role in the genome evolution of many organisms, including the human genome. Most duplicated genes are eliminated, and factors that influence the retention of persisting duplicates remain poorly understood. We describe a systematic complex genetic interaction analysis with yeast paralogs derived from the whole-genome duplication event. Mapping of digenic interactions for a deletion mutant of each paralog, and of trigenic interactions for the double mutant, provides insight into their roles and a quantitative measure of their functional redundancy. Trigenic interaction analysis distinguishes two classes of paralogs: a more functionally divergent subset and another that retained more functional overlap. Gene feature analysis and modeling suggest that evolutionary trajectories of duplicated genes are dictated by combined functional and structural entanglement factors.
Collapse
Affiliation(s)
- Elena Kuzmin
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Alex N Nguyen Ba
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Elizabeth N Koch
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Matej Usaj
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Anton Khmelinskii
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | | | | | - Oren Kraus
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Amy Tresenrider
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Michael Pryszlak
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Ming-Che Hu
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Brenda Varriano
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Michael Costanzo
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Michael Knop
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
- Cell Morphogenesis and Signal Transduction, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Alan Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Brenda J Andrews
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Charles Boone
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
25
|
Erijman A, Kozlowski L, Sohrabi-Jahromi S, Fishburn J, Warfield L, Schreiber J, Noble WS, Söding J, Hahn S. A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning. Mol Cell 2020; 78:890-902.e6. [PMID: 32416068 PMCID: PMC7275923 DOI: 10.1016/j.molcel.2020.04.020] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 03/11/2020] [Accepted: 04/15/2020] [Indexed: 01/03/2023]
Abstract
Acidic transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that promote their dynamic behavior, "fuzzy" interactions, and target specificity. We screened a large set of random 30-mer peptides for AD function in yeast and trained a deep neural network (ADpred) on the AD-positive and -negative sequences. ADpred identifies known acidic ADs within transcription factors and accurately predicts the consequences of mutations. Our work reveals that strong acidic ADs contain multiple clusters of hydrophobic residues near acidic side chains, explaining why ADs often have a biased amino acid composition. ADs likely use a binding mechanism similar to avidity where a minimum number of weak dynamic interactions are required between activator and target to generate biologically relevant affinity and in vivo function. This mechanism explains the basis for fuzzy binding observed between acidic ADs and targets.
Collapse
Affiliation(s)
- Ariel Erijman
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Lukasz Kozlowski
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Salma Sohrabi-Jahromi
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - James Fishburn
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Linda Warfield
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jacob Schreiber
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany.
| | - Steven Hahn
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| |
Collapse
|
26
|
James K, Olson PD. The tapeworm interactome: inferring confidence scored protein-protein interactions from the proteome of Hymenolepis microstoma. BMC Genomics 2020; 21:346. [PMID: 32380953 PMCID: PMC7204028 DOI: 10.1186/s12864-020-6710-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Background Reference genome and transcriptome assemblies of helminths have reached a level of completion whereby secondary analyses that rely on accurate gene estimation or syntenic relationships can be now conducted with a high level of confidence. Recent public release of the v.3 assembly of the mouse bile-duct tapeworm, Hymenolepis microstoma, provides chromosome-level characterisation of the genome and a stabilised set of protein coding gene models underpinned by bioinformatic and empirical data. However, interactome data have not been produced. Conserved protein-protein interactions in other organisms, termed interologs, can be used to transfer interactions between species, allowing systems-level analysis in non-model organisms. Results Here, we describe a probabilistic, integrated network of interologs for the H. microstoma proteome, based on conserved protein interactions found in eukaryote model species. Almost a third of the 10,139 gene models in the v.3 assembly could be assigned interaction data and assessment of the resulting network indicates that topologically-important proteins are related to essential cellular pathways, and that the network clusters into biologically meaningful components. Moreover, network parameters are similar to those of single-species interaction networks that we constructed in the same way for S. cerevisiae, C. elegans and H. sapiens, demonstrating that information-rich, system-level analyses can be conducted even on species separated by a large phylogenetic distance from the major model organisms from which most protein interaction evidence is based. Using the interolog network, we then focused on sub-networks of interactions assigned to discrete suites of genes of interest, including signalling components and transcription factors, germline multipotency genes, and genes differentially-expressed between larval and adult worms. Results show not only an expected bias toward highly-conserved proteins, such as components of intracellular signal transduction, but in some cases predicted interactions with transcription factors that aid in identifying their target genes. Conclusions With key helminth genomes now complete, systems-level analyses can provide an important predictive framework to guide basic and applied research on helminths and will become increasingly informative as new protein-protein interaction data accumulate.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Newcastle Upon Tyne, UK. .,Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK.
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK
| |
Collapse
|
27
|
Shafee T, Bacic A, Johnson K. Evolution of Sequence-Diverse Disordered Regions in a Protein Family: Order within the Chaos. Mol Biol Evol 2020; 37:2155-2172. [DOI: 10.1093/molbev/msaa096] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Abstract
Approaches for studying the evolution of globular proteins are now well established yet are unsuitable for disordered sequences. Our understanding of the evolution of proteins containing disordered regions therefore lags that of globular proteins, limiting our capacity to estimate their evolutionary history, classify paralogs, and identify potential sequence–function relationships. Here, we overcome these limitations by using new analytical approaches that project representations of sequence space to dissect the evolution of proteins with both ordered and disordered regions, and the correlated changes between these. We use the fasciclin-like arabinogalactan proteins (FLAs) as a model family, since they contain a variable number of globular fasciclin domains as well as several distinct types of disordered regions: proline (Pro)-rich arabinogalactan (AG) regions and longer Pro-depleted regions.
Sequence space projections of fasciclin domains from 2019 FLAs from 78 species identified distinct clusters corresponding to different types of fasciclin domains. Clusters can be similarly identified in the seemingly random Pro-rich AG and Pro-depleted disordered regions. Sequence features of the globular and disordered regions clearly correlate with one another, implying coevolution of these distinct regions, as well as with the N-linked and O-linked glycosylation motifs. We reconstruct the overall evolutionary history of the FLAs, annotated with the changing domain architectures, glycosylation motifs, number and length of AG regions, and disordered region sequence features. Mapping these features onto the functionally characterized FLAs therefore enables their sequence–function relationships to be interrogated. These findings will inform research on the abundant disordered regions in protein families from all kingdoms of life.
Collapse
Affiliation(s)
- Thomas Shafee
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
| | - Antony Bacic
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| | - Kim Johnson
- Department of Animal, Plant and Soil Sciences, La Trobe Institute for Agriculture & Food, La Trobe University, Melbourne, VIC, Australia
- Sino-Australia Plant Cell Wall Research Centre, College of Forestry and Biotechnology, Zhejiang Agriculture and Forestry University, Lin’an, Hangzhou, China
| |
Collapse
|
28
|
Iserman C, Desroches Altamirano C, Jegers C, Friedrich U, Zarin T, Fritsch AW, Mittasch M, Domingues A, Hersemann L, Jahnel M, Richter D, Guenther UP, Hentze MW, Moses AM, Hyman AA, Kramer G, Kreysing M, Franzmann TM, Alberti S. Condensation of Ded1p Promotes a Translational Switch from Housekeeping to Stress Protein Production. Cell 2020; 181:818-831.e19. [PMID: 32359423 PMCID: PMC7237889 DOI: 10.1016/j.cell.2020.04.009] [Citation(s) in RCA: 143] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 11/16/2019] [Accepted: 04/06/2020] [Indexed: 11/24/2022]
Abstract
Cells sense elevated temperatures and mount an adaptive heat shock response that involves changes in gene expression, but the underlying mechanisms, particularly on the level of translation, remain unknown. Here we report that, in budding yeast, the essential translation initiation factor Ded1p undergoes heat-induced phase separation into gel-like condensates. Using ribosome profiling and an in vitro translation assay, we reveal that condensate formation inactivates Ded1p and represses translation of housekeeping mRNAs while promoting translation of stress mRNAs. Testing a variant of Ded1p with altered phase behavior as well as Ded1p homologs from diverse species, we demonstrate that Ded1p condensation is adaptive and fine-tuned to the maximum growth temperature of the respective organism. We conclude that Ded1p condensation is an integral part of an extended heat shock response that selectively represses translation of housekeeping mRNAs to promote survival under conditions of severe heat stress.
Collapse
Affiliation(s)
- Christiane Iserman
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Christine Desroches Altamirano
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; BIOTEC and CMCB, Technische Universität Dresden, Tatzberg 47/48, 01307 Dresden, Germany
| | - Ceciel Jegers
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Ulrike Friedrich
- Center for Molecular Biology of the University of Heidelberg, German Cancer Research Center, DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| | - Anatol W Fritsch
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Matthäus Mittasch
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Antonio Domingues
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Lena Hersemann
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Marcus Jahnel
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; BIOTEC and CMCB, Technische Universität Dresden, Tatzberg 47/48, 01307 Dresden, Germany
| | - Doris Richter
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; BIOTEC and CMCB, Technische Universität Dresden, Tatzberg 47/48, 01307 Dresden, Germany
| | - Ulf-Peter Guenther
- DKMS Life Science Lab GmbH, St. Petersburger Str. 2, 01069 Dresden, Germany
| | - Matthias W Hentze
- EMBL Heidelberg, Director's Research Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada; Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Anthony A Hyman
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Günter Kramer
- Center for Molecular Biology of the University of Heidelberg, German Cancer Research Center, DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | - Moritz Kreysing
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Titus M Franzmann
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; BIOTEC and CMCB, Technische Universität Dresden, Tatzberg 47/48, 01307 Dresden, Germany
| | - Simon Alberti
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany; BIOTEC and CMCB, Technische Universität Dresden, Tatzberg 47/48, 01307 Dresden, Germany.
| |
Collapse
|
29
|
Tuttle LM, Pacheco D, Warfield L, Luo J, Ranish J, Hahn S, Klevit RE. Gcn4-Mediator Specificity Is Mediated by a Large and Dynamic Fuzzy Protein-Protein Complex. Cell Rep 2019; 22:3251-3264. [PMID: 29562181 PMCID: PMC5908246 DOI: 10.1016/j.celrep.2018.02.097] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 01/17/2018] [Accepted: 02/25/2018] [Indexed: 11/12/2022] Open
Abstract
Transcription activation domains (ADs) are inherently disordered proteins that often target multiple coactivator complexes, but the specificity of these interactions is not understood. Efficient transcription activation by yeast Gcn4 requires its tandem ADs and four activator-binding domains (ABDs) on its target, the Mediator subunit Med15. Multiple ABDs are a common feature of coactivator complexes. We find that the large Gcn4-Med15 complex is heterogeneous and contains nearly all possible AD-ABD interactions. Gcn4-Med15 forms via a dynamic fuzzy protein-protein interface, where ADs bind the ABDs in multiple orientations via hydrophobic regions that gain helicity. This combinatorial mechanism allows individual low-affinity and specificity interactions to generate a biologically functional, specific, and higher affinity complex despite lacking a defined protein-protein interface. This binding strategy is likely representative of many activators that target multiple coactivators, as it allows great flexibility in combinations of activators that can cooperate to regulate genes with variable coactivator requirements.
Collapse
Affiliation(s)
- Lisa M Tuttle
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Derek Pacheco
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Linda Warfield
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jie Luo
- The Institute for Systems Biology, Seattle, WA 98109, USA
| | - Jeff Ranish
- The Institute for Systems Biology, Seattle, WA 98109, USA
| | - Steven Hahn
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | - Rachel E Klevit
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
30
|
Entropy and Information within Intrinsically Disordered Protein Regions. ENTROPY 2019; 21:e21070662. [PMID: 33267376 PMCID: PMC7515160 DOI: 10.3390/e21070662] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 06/27/2019] [Accepted: 07/01/2019] [Indexed: 02/06/2023]
Abstract
Bioinformatics and biophysical studies of intrinsically disordered proteins and regions (IDRs) note the high entropy at individual sequence positions and in conformations sampled in solution. This prevents application of the canonical sequence-structure-function paradigm to IDRs and motivates the development of new methods to extract information from IDR sequences. We argue that the information in IDR sequences cannot be fully revealed through positional conservation, which largely measures stable structural contacts and interaction motifs. Instead, considerations of evolutionary conservation of molecular features can reveal the full extent of information in IDRs. Experimental quantification of the large conformational entropy of IDRs is challenging but can be approximated through the extent of conformational sampling measured by a combination of NMR spectroscopy and lower-resolution structural biology techniques, which can be further interpreted with simulations. Conformational entropy and other biophysical features can be modulated by post-translational modifications that provide functional advantages to IDRs by tuning their energy landscapes and enabling a variety of functional interactions and modes of regulation. The diverse mosaic of functional states of IDRs and their conformational features within complexes demands novel metrics of information, which will reflect the complicated sequence-conformational ensemble-function relationship of IDRs.
Collapse
|
31
|
Zarin T, Strome B, Nguyen Ba AN, Alberti S, Forman-Kay JD, Moses AM. Proteome-wide signatures of function in highly diverged intrinsically disordered regions. eLife 2019; 8:e46883. [PMID: 31264965 PMCID: PMC6634968 DOI: 10.7554/elife.46883] [Citation(s) in RCA: 122] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 07/01/2019] [Indexed: 12/24/2022] Open
Abstract
Intrinsically disordered regions make up a large part of the proteome, but the sequence-to-function relationship in these regions is poorly understood, in part because the primary amino acid sequences of these regions are poorly conserved in alignments. Here we use an evolutionary approach to detect molecular features that are preserved in the amino acid sequences of orthologous intrinsically disordered regions. We find that most disordered regions contain multiple molecular features that are preserved, and we define these as 'evolutionary signatures' of disordered regions. We demonstrate that intrinsically disordered regions with similar evolutionary signatures can rescue function in vivo, and that groups of intrinsically disordered regions with similar evolutionary signatures are strongly enriched for functional annotations and phenotypes. We propose that evolutionary signatures can be used to predict function for many disordered regions from their amino acid sequences.
Collapse
Affiliation(s)
- Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Bob Strome
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Simon Alberti
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Molecular and Cellular Bioengineering, Biotechnology Center, Technische Universität Dresden, Dresden, Germany
| | - Julie D Forman-Kay
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, Canada
- Department of Biochemistry, University of Toronto, Toronto, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
32
|
Krystkowiak I, Davey NE. SLiMSearch: a framework for proteome-wide discovery and annotation of functional modules in intrinsically disordered regions. Nucleic Acids Res 2019; 45:W464-W469. [PMID: 28387819 PMCID: PMC5570202 DOI: 10.1093/nar/gkx238] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Accepted: 04/05/2017] [Indexed: 12/12/2022] Open
Abstract
The extensive intrinsically disordered regions of higher eukaryotic proteomes contain vast numbers of functional interaction modules known as short linear motifs (SLiMs). Here, we present SLiMSearch, a motif discovery tool that scans a motif consensus, representing the specificity determinants of a motif-binding domain, against a proteome to discover putative novel motif instances. SLiMSearch applies several distinct and complementary approaches exploiting the common properties of SLiMs to predict novel motifs. Consensus matches are annotated with overlapping sequence annotation, including feature information describing protein modular architecture, post-translational modification, structure, sequence variation and experimental characterisation of functional regions. Discriminatory motif attributes such as conservation and accessibility are also calculated. In addition, SLiMSearch provides functional enrichment and evolutionary analysis tools. The enrichment tool analyses GO terms, keywords and interacting partner enrichment to indicate possible motif function. The evolutionary tool evaluates motif taxonomic range and the conservation of motif sequence context. Consensus matches can be filtered based on motif attributes such as accessibility and taxonomic range; or by the localisation, interacting partners or ontology annotation of the peptide-containing protein. SLiMSearch supports a range of species of experimental and therapeutic relevance and is available online at http://slim.ucd.ie/slimsearch/.
Collapse
Affiliation(s)
- Izabella Krystkowiak
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.,UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Norman E Davey
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.,UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
33
|
Brace JL, Doerfler MD, Weiss EL. A cell separation checkpoint that enforces the proper order of late cytokinetic events. J Cell Biol 2019; 218:150-170. [PMID: 30455324 PMCID: PMC6314563 DOI: 10.1083/jcb.201805100] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 08/28/2018] [Accepted: 10/05/2018] [Indexed: 01/28/2023] Open
Abstract
Eukaryotic cell division requires dependency relationships in which late processes commence only after early ones are appropriately completed. We have discovered a system that blocks late events of cytokinesis until early ones are successfully accomplished. In budding yeast, cytokinetic actomyosin ring contraction and membrane ingression are coupled with deposition of an extracellular septum that is selectively degraded in its primary septum immediately after its completion by secreted enzymes. We find this secretion event is linked to septum completion and forestalled when the process is slowed. Delay of septum degradation requires Fir1, an intrinsically disordered protein localized to the cytokinesis site that is degraded upon septum completion but stabilized when septation is aberrant. Fir1 protects cytokinesis in part by inhibiting a separation-specific exocytosis function of the NDR/LATS kinase Cbk1, a key component of "hippo" signaling that induces mother-daughter separation. We term this system enforcement of cytokinesis order, a checkpoint ensuring proper temporal sequence of mechanistically incompatible processes of cytokinesis.
Collapse
Affiliation(s)
- Jennifer L Brace
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| | - Matthew D Doerfler
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| | - Eric L Weiss
- Department of Molecular Biosciences, Northwestern University, Evanston, IL
| |
Collapse
|
34
|
Strome B, Hsu IS, Li Cheong Man M, Zarin T, Nguyen Ba A, Moses AM. Short linear motifs in intrinsically disordered regions modulate HOG signaling capacity. BMC SYSTEMS BIOLOGY 2018; 12:75. [PMID: 29970070 PMCID: PMC6029073 DOI: 10.1186/s12918-018-0597-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 06/22/2018] [Indexed: 02/04/2023]
Abstract
Background The effort to characterize intrinsically disordered regions of signaling proteins is rapidly expanding. An important class of disordered interaction modules are ubiquitous and functionally diverse elements known as short linear motifs (SLiMs). Results To further examine the role of SLiMs in signal transduction, we used a previously devised bioinformatics method to predict evolutionarily conserved SLiMs within a well-characterized pathway in S. cerevisiae. Using a single cell, reporter-based flow cytometry assay in conjunction with a fluorescent reporter driven by a pathway-specific promoter, we quantitatively assessed pathway output via systematic deletions of individual motifs. We found that, when deleted, 34% (10/29) of predicted SLiMs displayed a significant decrease in pathway output, providing evidence that these motifs play a role in signal transduction. Assuming that mutations in SLiMs have quantitative effects on mechanisms of signaling, we show that perturbations of parameters in a previously published stochastic model of HOG signaling could reproduce the quantitative effects of 4 out of 7 mutations in previously unknown SLiMs. Conclusions Our study suggests that, even in well-characterized pathways, large numbers of functional elements remain undiscovered, and that challenges remain for application of systems biology models to interpret the effects of mutations in signaling pathways. Electronic supplementary material The online version of this article (10.1186/s12918-018-0597-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bob Strome
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Ian Shenyen Hsu
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Mitchell Li Cheong Man
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Taraneh Zarin
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Alex Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada
| | - Alan M Moses
- Department of Cell & Systems Biology, University of Toronto, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada. .,Center for Analysis of Genome Evolution and Function, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada.
| |
Collapse
|
35
|
Krystkowiak I, Manguy J, Davey NE. PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants. Nucleic Acids Res 2018; 46:W235-W241. [PMID: 29873773 PMCID: PMC6030969 DOI: 10.1093/nar/gky426] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 04/11/2018] [Accepted: 05/15/2018] [Indexed: 11/29/2022] Open
Abstract
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Collapse
Affiliation(s)
- Izabella Krystkowiak
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Jean Manguy
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
- Food for Health Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Norman E Davey
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
36
|
A disordered acidic domain in GPIHBP1 harboring a sulfated tyrosine regulates lipoprotein lipase. Proc Natl Acad Sci U S A 2018; 115:E6020-E6029. [PMID: 29899144 DOI: 10.1073/pnas.1806774115] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The intravascular processing of triglyceride-rich lipoproteins depends on lipoprotein lipase (LPL) and GPIHBP1, a membrane protein of endothelial cells that binds LPL within the subendothelial spaces and shuttles it to the capillary lumen. In the absence of GPIHBP1, LPL remains mislocalized within the subendothelial spaces, causing severe hypertriglyceridemia (chylomicronemia). The N-terminal domain of GPIHBP1, an intrinsically disordered region (IDR) rich in acidic residues, is important for stabilizing LPL's catalytic domain against spontaneous and ANGPTL4-catalyzed unfolding. Here, we define several important properties of GPIHBP1's IDR. First, a conserved tyrosine in the middle of the IDR is posttranslationally modified by O-sulfation; this modification increases both the affinity of GPIHBP1-LPL interactions and the ability of GPIHBP1 to protect LPL against ANGPTL4-catalyzed unfolding. Second, the acidic IDR of GPIHBP1 increases the probability of a GPIHBP1-LPL encounter via electrostatic steering, increasing the association rate constant (kon) for LPL binding by >250-fold. Third, we show that LPL accumulates near capillary endothelial cells even in the absence of GPIHBP1. In wild-type mice, we expect that the accumulation of LPL in close proximity to capillaries would increase interactions with GPIHBP1. Fourth, we found that GPIHBP1's IDR is not a key factor in the pathogenicity of chylomicronemia in patients with the GPIHBP1 autoimmune syndrome. Finally, based on biophysical studies, we propose that the negatively charged IDR of GPIHBP1 traverses a vast space, facilitating capture of LPL by capillary endothelial cells and simultaneously contributing to GPIHBP1's ability to preserve LPL structure and activity.
Collapse
|
37
|
Transcription Activation Domains of the Yeast Factors Met4 and Ino2: Tandem Activation Domains with Properties Similar to the Yeast Gcn4 Activator. Mol Cell Biol 2018; 38:MCB.00038-18. [PMID: 29507182 DOI: 10.1128/mcb.00038-18] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 02/24/2018] [Indexed: 11/20/2022] Open
Abstract
Eukaryotic transcription activation domains (ADs) are intrinsically disordered polypeptides that typically interact with coactivator complexes, leading to stimulation of transcription initiation, elongation, and chromatin modifications. Here we examined the properties of two strong and conserved yeast ADs: Met4 and Ino2. Both factors have tandem ADs that were identified by conserved sequence and functional studies. While the AD function of both factors depended on hydrophobic residues, Ino2 further required key conserved acidic and polar residues for optimal function. Binding studies showed that the ADs bound multiple Med15 activator-binding domains (ABDs) with similar orders of micromolar affinity and similar but distinct thermodynamic properties. Protein cross-linking data show that no unique complex was formed upon Met4-Med15 binding. Rather, we observed heterogeneous AD-ABD contacts with nearly every possible AD-ABD combination. Many of these properties are similar to those observed with yeast activator Gcn4, which forms a large heterogeneous, dynamic, and fuzzy complex with Med15. We suggest that this molecular behavior is common among eukaryotic activators.
Collapse
|
38
|
Functional Analysis of Human Hub Proteins and Their Interactors Involved in the Intrinsic Disorder-Enriched Interactions. Int J Mol Sci 2017; 18:ijms18122761. [PMID: 29257115 PMCID: PMC5751360 DOI: 10.3390/ijms18122761] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/13/2017] [Accepted: 12/15/2017] [Indexed: 12/15/2022] Open
Abstract
Some of the intrinsically disordered proteins and protein regions are promiscuous interactors that are involved in one-to-many and many-to-one binding. Several studies have analyzed enrichment of intrinsic disorder among the promiscuous hub proteins. We extended these works by providing a detailed functional characterization of the disorder-enriched hub protein-protein interactions (PPIs), including both hubs and their interactors, and by analyzing their enrichment among disease-associated proteins. We focused on the human interactome, given its high degree of completeness and relevance to the analysis of the disease-linked proteins. We quantified and investigated numerous functional and structural characteristics of the disorder-enriched hub PPIs, including protein binding, structural stability, evolutionary conservation, several categories of functional sites, and presence of over twenty types of posttranslational modifications (PTMs). We showed that the disorder-enriched hub PPIs have a significantly enlarged number of disordered protein binding regions and long intrinsically disordered regions. They also include high numbers of targeting, catalytic, and many types of PTM sites. We empirically demonstrated that these hub PPIs are significantly enriched among 11 out of 18 considered classes of human diseases that are associated with at least 100 human proteins. Finally, we also illustrated how over a dozen specific human hubs utilize intrinsic disorder for their promiscuous PPIs.
Collapse
|
39
|
Sequence conservation of protein binding segments in intrinsically disordered regions. Biochem Biophys Res Commun 2017; 494:602-607. [DOI: 10.1016/j.bbrc.2017.10.099] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 10/18/2017] [Indexed: 12/11/2022]
|
40
|
Tromer E, Bade D, Snel B, Kops GJPL. Phylogenomics-guided discovery of a novel conserved cassette of short linear motifs in BubR1 essential for the spindle checkpoint. Open Biol 2017; 6:rsob.160315. [PMID: 28003474 PMCID: PMC5204127 DOI: 10.1098/rsob.160315] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 12/01/2016] [Indexed: 11/12/2022] Open
Abstract
The spindle assembly checkpoint (SAC) maintains genomic integrity by preventing progression of mitotic cell division until all chromosomes are stably attached to spindle microtubules. The SAC critically relies on the paralogues Bub1 and BubR1/Mad3, which integrate kinetochore–spindle attachment status with generation of the anaphase inhibitory complex MCC. We previously reported on the widespread occurrences of independent gene duplications of an ancestral ‘MadBub’ gene in eukaryotic evolution and the striking parallel subfunctionalization that lead to loss of kinase function in BubR1/Mad3-like paralogues. Here, we present an elaborate subfunctionalization analysis of the Bub1/BubR1 gene family and perform de novo sequence discovery in a comparative phylogenomics framework to trace the distribution of ancestral sequence features to extant paralogues throughout the eukaryotic tree of life. We show that known ancestral sequence features are consistently retained in the same functional paralogue: GLEBS/CMI/CDII/kinase in the Bub1-like and KEN1/KEN2/D-Box in the BubR1/Mad3-like. The recently described ABBA motif can be found in either or both paralogues. We however discovered two additional ABBA motifs that flank KEN2. This cassette of ABBA1-KEN2-ABBA2 forms a strictly conserved module in all ancestral and BubR1/Mad3-like proteins, suggestive of a specific and crucial SAC function. Indeed, deletion of the ABBA motifs in human BUBR1 abrogates the SAC and affects APC/C–Cdc20 interactions. Our detailed comparative genomics analyses thus enabled discovery of a conserved cassette of motifs essential for the SAC and shows how this approach can be used to uncover hitherto unrecognized functional protein features.
Collapse
Affiliation(s)
- Eelco Tromer
- Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.,Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, 3584 CH, Utrecht, The Netherlands
| | - Debora Bade
- Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, 3584 CH, Utrecht, The Netherlands
| | - Geert J P L Kops
- Hubrecht Institute-KNAW (Royal Netherlands Academy of Arts and Sciences), Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands .,Cancer Genomics Netherlands, University Medical Center Utrecht, 3584 CG, Utrecht, The Netherlands.,Center for Molecular Medicine, University Medical Center Utrecht, 3584 CG, Utrecht, The Netherlands
| |
Collapse
|
41
|
Wang S, Ma J, Xu J. AUCpreD: proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2017; 32:i672-i679. [PMID: 27587688 DOI: 10.1093/bioinformatics/btw446] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile. METHOD This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence-structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data. RESULTS Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others. AVAILABILITY AND IMPLEMENTATION http://raptorx2.uchicago.edu/StructurePropertyPred/predict/ CONTACT wangsheng@uchicago.edu, jinboxu@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
42
|
Kelil A, Dubreuil B, Levy ED, Michnick SW. Exhaustive search of linear information encoding protein-peptide recognition. PLoS Comput Biol 2017; 13:e1005499. [PMID: 28426660 PMCID: PMC5417721 DOI: 10.1371/journal.pcbi.1005499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 05/04/2017] [Accepted: 04/04/2017] [Indexed: 11/24/2022] Open
Abstract
High-throughput in vitro methods have been extensively applied to identify linear information that encodes peptide recognition. However, these methods are limited in number of peptides, sequence variation, and length of peptides that can be explored, and often produce solutions that are not found in the cell. Despite the large number of methods developed to attempt addressing these issues, the exhaustive search of linear information encoding protein-peptide recognition has been so far physically unfeasible. Here, we describe a strategy, called DALEL, for the exhaustive search of linear sequence information encoded in proteins that bind to a common partner. We applied DALEL to explore binding specificity of SH3 domains in the budding yeast Saccharomyces cerevisiae. Using only the polypeptide sequences of SH3 domain binding proteins, we succeeded in identifying the majority of known SH3 binding sites previously discovered either in vitro or in vivo. Moreover, we discovered a number of sites with both non-canonical sequences and distinct properties that may serve ancillary roles in peptide recognition. We compared DALEL to a variety of state-of-the-art algorithms in the blind identification of known binding sites of the human Grb2 SH3 domain. We also benchmarked DALEL on curated biological motifs derived from the ELM database to evaluate the effect of increasing/decreasing the enrichment of the motifs. Our strategy can be applied in conjunction with experimental data of proteins interacting with a common partner to identify binding sites among them. Yet, our strategy can also be applied to any group of proteins of interest to identify enriched linear motifs or to exhaustively explore the space of linear information encoded in a polypeptide sequence. Finally, we have developed a webserver located at http://michnick.bcm.umontreal.ca/dalel, offering user-friendly interface and providing different scenarios utilizing DALEL. Here we describe the first strategy for the exhaustive search of the linear information encoding protein-peptide recognition; an approach that has previously been physically unfeasible because the combinatorial space of polypeptide sequences is too vast. The search covers the entire space of sequences with no restriction on motif length or composition, and includes all possible combinations of amino acids at distinct positions of each sequence, as well as positions with correlated preferences for amino acids.
Collapse
Affiliation(s)
- Abdellali Kelil
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Stephen W. Michnick
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
43
|
Nguyen Ba AN, Strome B, Osman S, Legere EA, Zarin T, Moses AM. Parallel reorganization of protein function in the spindle checkpoint pathway through evolutionary paths in the fitness landscape that appear neutral in laboratory experiments. PLoS Genet 2017; 13:e1006735. [PMID: 28410373 PMCID: PMC5409178 DOI: 10.1371/journal.pgen.1006735] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/28/2017] [Accepted: 04/05/2017] [Indexed: 11/22/2022] Open
Abstract
Regulatory networks often increase in complexity during evolution through gene duplication and divergence of component proteins. Two models that explain this increase in complexity are: 1) adaptive changes after gene duplication, such as resolution of adaptive conflicts, and 2) non-adaptive processes such as duplication, degeneration and complementation. Both of these models predict complementary changes in the retained duplicates, but they can be distinguished by direct fitness measurements in organisms with short generation times. Previously, it has been observed that repeated duplication of an essential protein in the spindle checkpoint pathway has occurred multiple times over the eukaryotic tree of life, leading to convergent protein domain organization in its duplicates. Here, we replace the paralog pair in S. cerevisiae with a single-copy protein from a species that did not undergo gene duplication. Surprisingly, using quantitative fitness measurements in laboratory conditions stressful for the spindle-checkpoint pathway, we find no evidence that reorganization of protein function after gene duplication is beneficial. We then reconstruct several evolutionary intermediates from the inferred ancestral network to the extant one, and find that, at the resolution of our assay, there exist stepwise mutational paths from the single protein to the divergent pair of extant proteins with no apparent fitness defects. Parallel evolution has been taken as strong evidence for natural selection, but our results suggest that even in these cases, reorganization of protein function after gene duplication may be explained by neutral processes.
Collapse
Affiliation(s)
- Alex N. Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Bob Strome
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Selma Osman
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Elizabeth-Ann Legere
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Taraneh Zarin
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Alan M. Moses
- Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
44
|
Functional Analysis of Kinases and Transcription Factors in Saccharomyces cerevisiae Using an Integrated Overexpression Library. G3-GENES GENOMES GENETICS 2017; 7:911-921. [PMID: 28122947 PMCID: PMC5345721 DOI: 10.1534/g3.116.038471] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Kinases and transcription factors (TFs) are key modulators of important signaling pathways and their activities underlie the proper function of many basic cellular processes such as cell division, differentiation, and development. Changes in kinase and TF dosage are often associated with disease, yet a systematic assessment of the cellular phenotypes caused by the combined perturbation of kinases and TFs has not been undertaken. We used a reverse-genetics approach to study the phenotypic consequences of kinase and TF overexpression (OE) in the budding yeast, Saccharomyces cerevisiae. We constructed a collection of strains expressing stably integrated inducible alleles of kinases and TFs and used a variety of assays to characterize the phenotypes caused by TF and kinase OE. We used the Synthetic Genetic Array (SGA) method to examine dosage-dependent genetic interactions (GIs) between 239 gain-of-function (OE) alleles of TFs and six loss-of-function (LOF) and seven OE kinase alleles, the former identifying Synthetic Dosage Lethal (SDL) interactions and the latter testing a GI we call Double Dosage Lethality (DDL). We identified and confirmed 94 GIs between 65 OE alleles of TFs and 9 kinase alleles. Follow-up experiments validated regulatory relationships between genetically interacting pairs (Cdc28–Stb1 and Pho85–Pdr1), suggesting that GI studies involving OE alleles of regulatory proteins will be a rich source of new functional information.
Collapse
|
45
|
Selection maintains signaling function of a highly diverged intrinsically disordered region. Proc Natl Acad Sci U S A 2017; 114:E1450-E1459. [PMID: 28167781 DOI: 10.1073/pnas.1614787114] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Intrinsically disordered regions (IDRs) are characterized by their lack of stable secondary or tertiary structure and comprise a large part of the eukaryotic proteome. Although these regions play a variety of signaling and regulatory roles, they appear to be rapidly evolving at the primary sequence level. To understand the functional implications of this rapid evolution, we focused on a highly diverged IDR in Saccharomyces cerevisiae that is involved in regulating multiple conserved MAPK pathways. We hypothesized that under stabilizing selection, the functional output of orthologous IDRs could be maintained, such that diverse genotypes could lead to similar function and fitness. Consistent with the stabilizing selection hypothesis, we find that diverged, orthologous IDRs can mostly recapitulate wild-type function and fitness in S. cerevisiae We also find that the electrostatic charge of the IDR is correlated with signaling output and, using phylogenetic comparative methods, find evidence for selection maintaining this quantitative molecular trait despite underlying genotypic divergence.
Collapse
|
46
|
Viscardi LH, Tovo-Rodrigues L, Paré P, Fagundes NJR, Salzano FM, Paixão-Côrtes VR, Bau CHD, Bortolini MC. FOXP in Tetrapoda: Intrinsically Disordered Regions, Short Linear Motifs and their evolutionary significance. Genet Mol Biol 2017; 40:181-190. [PMID: 28257525 PMCID: PMC5409772 DOI: 10.1590/1678-4685-gmb-2016-0115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/03/2016] [Indexed: 11/22/2022] Open
Abstract
The FOXP subfamily is probably the most extensively characterized subfamily of the forkhead superfamily, playing important roles in development and homeostasis in vertebrates. Intrinsically disorder protein regions (IDRs) are protein segments that exhibit multiple physical interactions and play critical roles in various biological processes, including regulation and signaling. IDRs in proteins may play an important role in the evolvability of genetic systems. In this study, we analyzed 77 orthologous FOXP genes/proteins from Tetrapoda, regarding protein disorder content and evolutionary rate. We also predicted the number and type of short linear motifs (SLIMs) in the IDRs. Similar levels of protein disorder (approximately 70%) were found for FOXP1, FOXP2, and FOXP4. However, for FOXP3, which is shorter in length and has a more specific function, the disordered content was lower (30%). Mammals showed higher protein disorders for FOXP1 and FOXP4 than non-mammals. Specific analyses related to linear motifs in the four genes showed also a clear differentiation between FOXPs in mammals and non-mammals. We predicted for the first time the role of IDRs and SLIMs in the FOXP gene family associated with possible adaptive novelties within Tetrapoda. For instance, we found gain and loss of important phosphorylation sites in the Homo sapiens FOXP2 IDR regions, with possible implication for the evolution of human speech.
Collapse
Affiliation(s)
- Lucas Henriques Viscardi
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| | - Luciana Tovo-Rodrigues
- Programa de Pós-Graduação em Epidemiologia, Universidade Federal de
Pelotas, Pelotas, RS, Brazil
| | - Pamela Paré
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| | - Nelson Jurandi Rosa Fagundes
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| | - Francisco Mauro Salzano
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| | - Vanessa Rodrigues Paixão-Côrtes
- Programa de Pós-Graduação em Genética e Biodiversidade, Instituto de
Biologia, Universidade Federal da Bahia, Salvador, BA, Brazil
| | - Claiton Henrique Dotto Bau
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| | - Maria Cátira Bortolini
- Programa de Pós-Graduação em Genética e Biologia Molecular,
Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre,
RS, Brazil
| |
Collapse
|
47
|
A bioinformatics pipeline to search functional motifs within whole-proteome data: a case study of poxviruses. Virus Genes 2016; 53:173-178. [PMID: 28000080 PMCID: PMC5357487 DOI: 10.1007/s11262-016-1416-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 12/01/2016] [Indexed: 12/19/2022]
Abstract
Proteins harbor domains or short linear motifs, which facilitate their functions and interactions. Finding functional motifs in protein sequences could predict the putative cellular roles or characteristics of hypothetical proteins. In this study, we present Shetti-Motif, which is an interactive tool to (i) map UniProt and PROSITE flat files, (ii) search for multiple pre-defined consensus patterns or experimentally validated functional motifs in large datasets protein sequences (proteome-wide), (iii) search for motifs containing repeated residues (low-complexity regions, e.g., Leu-, SR-, PEST-rich motifs, etc.). As proof of principle, using this comparative proteomics pipeline, eleven proteomes encoded by member of Poxviridae family were searched against about 100 experimentally validated functional motifs. The closely related viruses and viruses infect the same host cells (e.g. vaccinia and variola viruses) show similar motif-containing proteins profile. The motifs encoded by these viruses are correlated, which explains why poxviruses are able to interact with wide range of host cells. In conclusion, this in silico analysis is useful to establish a dataset(s) or potential proteins for further investigation or compare between species.
Collapse
|
48
|
Evolution of domain-peptide interactions to coadapt specificity and affinity to functional diversity. Proc Natl Acad Sci U S A 2016; 113:E3862-71. [PMID: 27317745 DOI: 10.1073/pnas.1518469113] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Evolution of complexity in eukaryotic proteomes has arisen, in part, through emergence of modular independently folded domains mediating protein interactions via binding to short linear peptides in proteins. Over 30 years, structural properties and sequence preferences of these peptides have been extensively characterized. Less successful, however, were efforts to establish relationships between physicochemical properties and functions of domain-peptide interactions. To our knowledge, we have devised the first strategy to exhaustively explore the binding specificity of protein domain-peptide interactions. We applied the strategy to SH3 domains to determine the properties of their binding peptides starting from various experimental data. The strategy identified the majority (∼70%) of experimentally determined SH3 binding sites. We discovered mutual relationships among binding specificity, binding affinity, and structural properties and evolution of linear peptides. Remarkably, we found that these properties are also related to functional diversity, defined by depth of proteins within hierarchies of gene ontologies. Our results revealed that linear peptides evolved to coadapt specificity and affinity to functional diversity of domain-peptide interactions. Thus, domain-peptide interactions follow human-constructed gene ontologies, which suggest that our understanding of biological process hierarchies reflect the way chemical and thermodynamic properties of linear peptides and their interaction networks, in general, have evolved.
Collapse
|
49
|
Uhart M, Flores G, Bustos DM. Controllability of protein-protein interaction phosphorylation-based networks: Participation of the hub 14-3-3 protein family. Sci Rep 2016; 6:26234. [PMID: 27195976 PMCID: PMC4872533 DOI: 10.1038/srep26234] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/28/2016] [Indexed: 12/26/2022] Open
Abstract
Posttranslational regulation of protein function is an ubiquitous mechanism in eukaryotic cells. Here, we analyzed biological properties of nodes and edges of a human protein-protein interaction phosphorylation-based network, especially of those nodes critical for the network controllability. We found that the minimal number of critical nodes needed to control the whole network is 29%, which is considerably lower compared to other real networks. These critical nodes are more regulated by posttranslational modifications and contain more binding domains to these modifications than other kinds of nodes in the network, suggesting an intra-group fast regulation. Also, when we analyzed the edges characteristics that connect critical and non-critical nodes, we found that the former are enriched in domain-to-eukaryotic linear motif interactions, whereas the later are enriched in domain-domain interactions. Our findings suggest a possible structure for protein-protein interaction networks with a densely interconnected and self-regulated central core, composed of critical nodes with a high participation in the controllability of the full network, and less regulated peripheral nodes. Our study offers a deeper understanding of complex network control and bridges the controllability theorems for complex networks and biological protein-protein interaction phosphorylation-based networked systems.
Collapse
Affiliation(s)
- Marina Uhart
- Cell Signal Integration Lab, Instituto de Histología y Embriología “Dr. Mario H. Burgos” CCT CONICET Mendoza Facultad de Ciencias Médicas U.N. Cuyo P.O. Box 56 - Mendoza - ZIP 5500 Argentina
| | - Gabriel Flores
- Eventioz/Eventbrite Company, Adolfo A Calle 1853, Dorrego, Guaymallén, Mendoza, Argentina
| | - Diego M. Bustos
- Cell Signal Integration Lab, Instituto de Histología y Embriología “Dr. Mario H. Burgos” CCT CONICET Mendoza Facultad de Ciencias Médicas U.N. Cuyo P.O. Box 56 - Mendoza - ZIP 5500 Argentina
| |
Collapse
|
50
|
Banerjee S, Chakraborty S, De RK. Deciphering the cause of evolutionary variance within intrinsically disordered regions in human proteins. J Biomol Struct Dyn 2016; 35:233-249. [PMID: 26790343 DOI: 10.1080/07391102.2016.1143877] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Why the intrinsically disordered regions evolve within human proteome has became an interesting question for a decade. Till date, it remains an unsolved yet an intriguing issue to investigate why some of the disordered regions evolve rapidly while the rest are highly conserved across mammalian species. Identifying the key biological factors, responsible for the variation in the conservation rate of different disordered regions within the human proteome, may revisit the above issue. We emphasized that among the other biological features (multifunctionality, gene essentiality, protein connectivity, number of unique domains, gene expression level and expression breadth) considered in our study, the number of unique protein domains acts as a strong determinant that negatively influences the conservation of disordered regions. In this context, we justified that proteins having a fewer types of domains preferably need to conserve their disordered regions to enhance their structural flexibility which in turn will facilitate their molecular interactions. In contrast, the selection pressure acting on the stretches of disordered regions is not so strong in the case of multi-domains proteins. Therefore, we reasoned that the presence of conserved disordered stretches may compensate the functions of multiple domains within a single domain protein. Interestingly, we noticed that the influence of the unique domain number and expression level acts differently on the evolution of disordered regions from that of well-structured ones.
Collapse
Affiliation(s)
- Sanghita Banerjee
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| | | | - Rajat K De
- a Machine Intelligence Unit , Indian Statistical Institute , 203 Barrackpore Trunk Road, Kolkata 700108 , India
| |
Collapse
|