1
|
Balderson B, Fane M, Harvey TJ, Piper M, Smith A, Bodén M. Systematic analysis of the transcriptional landscape of melanoma reveals drug-target expression plasticity. Brief Funct Genomics 2024:elad055. [PMID: 38183207 DOI: 10.1093/bfgp/elad055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 10/25/2023] [Accepted: 12/04/2023] [Indexed: 01/07/2024] Open
Abstract
Metastatic melanoma originates from melanocytes of the skin. Melanoma metastasis results in poor treatment prognosis for patients and is associated with epigenetic and transcriptional changes that reflect the developmental program of melanocyte differentiation from neural crest stem cells. Several studies have explored melanoma transcriptional heterogeneity using microarray, bulk and single-cell RNA-sequencing technologies to derive data-driven models of the transcriptional-state change which occurs during melanoma progression. No study has systematically examined how different models of melanoma progression derived from different data types, technologies and biological conditions compare. Here, we perform a cross-sectional study to identify averaging effects of bulk-based studies that mask and distort apparent melanoma transcriptional heterogeneity; we describe new transcriptionally distinct melanoma cell states, identify differential co-expression of genes between studies and examine the effects of predicted drug susceptibilities of different cell states between studies. Importantly, we observe considerable variability in drug-target gene expression between studies, indicating potential transcriptional plasticity of melanoma to down-regulate these drug targets and thereby circumvent treatment. Overall, observed differences in gene co-expression and predicted drug susceptibility between studies suggest bulk-based transcriptional measurements do not reliably gauge heterogeneity and that melanoma transcriptional plasticity is greater than described when studies are considered in isolation.
Collapse
Affiliation(s)
- Brad Balderson
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, 4072 Queensland, Australia
| | - Mitchell Fane
- Fox Chase Cancer Centre, Philadelphia, 19019 Pennsylvania, United States of America
| | - Tracey J Harvey
- School of Biomedical Sciences, University of Queensland, Brisbane, 4072 Queensland, Australia
| | - Michael Piper
- School of Biomedical Sciences, University of Queensland, Brisbane, 4072 Queensland, Australia
| | - Aaron Smith
- School of Biomedical Sciences, Queensland University of Technology, Brisbane, 4072 Queensland, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, 4072 Queensland, Australia
| |
Collapse
|
2
|
Sun Y, Shim WJ, Shen S, Sinniah E, Pham D, Su Z, Mizikovsky D, White MD, Ho JK, Nguyen Q, Bodén M, Palpant N. Inferring cell diversity in single cell data using consortium-scale epigenetic data as a biological anchor for cell identity. Nucleic Acids Res 2023; 51:e62. [PMID: 37125641 PMCID: PMC10287941 DOI: 10.1093/nar/gkad307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 04/28/2023] [Indexed: 05/02/2023] Open
Abstract
Methods for cell clustering and gene expression from single-cell RNA sequencing (scRNA-seq) data are essential for biological interpretation of cell processes. Here, we present TRIAGE-Cluster which uses genome-wide epigenetic data from diverse bio-samples to identify genes demarcating cell diversity in scRNA-seq data. By integrating patterns of repressive chromatin deposited across diverse cell types with weighted density estimation, TRIAGE-Cluster determines cell type clusters in a 2D UMAP space. We then present TRIAGE-ParseR, a machine learning method which evaluates gene expression rank lists to define gene groups governing the identity and function of cell types. We demonstrate the utility of this two-step approach using atlases of in vivo and in vitro cell diversification and organogenesis. We also provide a web accessible dashboard for analysis and download of data and software. Collectively, genome-wide epigenetic repression provides a versatile strategy to define cell diversity and study gene regulation of scRNA-seq data.
Collapse
Affiliation(s)
- Yuliangzi Sun
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Woo Jun Shim
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Sophie Shen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Enakshi Sinniah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Duy Pham
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Zezhuo Su
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Dalia Mizikovsky
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Melanie D White
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Joshua W K Ho
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Park, Hong Kong SAR, China
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
3
|
Foley G, Mora A, Ross CM, Bottoms S, Sützl L, Lamprecht ML, Zaugg J, Essebier A, Balderson B, Newell R, Thomson RES, Kobe B, Barnard RT, Guddat L, Schenk G, Carsten J, Gumulya Y, Rost B, Haltrich D, Sieber V, Gillam EMJ, Bodén M. Engineering indel and substitution variants of diverse and ancient enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP). PLoS Comput Biol 2022; 18:e1010633. [PMID: 36279274 PMCID: PMC9632902 DOI: 10.1371/journal.pcbi.1010633] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 11/03/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open
Abstract
Ancestral sequence reconstruction is a technique that is gaining widespread use in molecular evolution studies and protein engineering. Accurate reconstruction requires the ability to handle appropriately large numbers of sequences, as well as insertion and deletion (indel) events, but available approaches exhibit limitations. To address these limitations, we developed Graphical Representation of Ancestral Sequence Predictions (GRASP), which efficiently implements maximum likelihood methods to enable the inference of ancestors of families with more than 10,000 members. GRASP implements partial order graphs (POGs) to represent and infer insertion and deletion events across ancestors, enabling the identification of building blocks for protein engineering. To validate the capacity to engineer novel proteins from realistic data, we predicted ancestor sequences across three distinct enzyme families: glucose-methanol-choline (GMC) oxidoreductases, cytochromes P450, and dihydroxy/sugar acid dehydratases (DHAD). All tested ancestors demonstrated enzymatic activity. Our study demonstrates the ability of GRASP (1) to support large data sets over 10,000 sequences and (2) to employ insertions and deletions to identify building blocks for engineering biologically active ancestors, by exploring variation over evolutionary time.
Collapse
Affiliation(s)
- Gabriel Foley
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Connie M. Ross
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Scott Bottoms
- Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany
| | - Leander Sützl
- Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
| | - Marnie L. Lamprecht
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Julian Zaugg
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Alexandra Essebier
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Rhys Newell
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Raine E. S. Thomson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, Brisbane, Australia
| | - Ross T. Barnard
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Luke Guddat
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Gerhard Schenk
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Sustainable Minerals Institute, The University of Queensland, Brisbane, Australia
| | - Jörg Carsten
- Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
| | - Yosephine Gumulya
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Burkhard Rost
- Fakultät für Informatik, Technische Universität München, Munich, Germany
| | - Dietmar Haltrich
- Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
| | - Volker Sieber
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany
- Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
| | - Elizabeth M. J. Gillam
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- * E-mail: (MB); (EMJG)
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- * E-mail: (MB); (EMJG)
| |
Collapse
|
4
|
Harris KL, Thomson RES, Gumulya Y, Foley G, Carrera-Pacheco SE, Syed P, Janosik T, Sandinge AS, Andersson S, Jurva U, Bodén M, Gillam EMJ. Ancestral sequence reconstruction of a cytochrome P450 family involved in chemical defence reveals the functional evolution of a promiscuous, xenobiotic-metabolizing enzyme in vertebrates. Mol Biol Evol 2022; 39:6593376. [PMID: 35639613 PMCID: PMC9185370 DOI: 10.1093/molbev/msac116] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The cytochrome P450 family 1 enzymes (CYP1s) are a diverse family of hemoprotein monooxygenases, which metabolize many xenobiotics including numerous environmental carcinogens. However, their historical function and evolution remain largely unstudied. Here we investigate CYP1 evolution via the reconstruction and characterization of the vertebrate CYP1 ancestors. Younger ancestors and extant forms generally demonstrated higher activity toward typical CYP1 xenobiotic and steroid substrates than older ancestors, suggesting significant diversification away from the original CYP1 function. Caffeine metabolism appears to be a recently evolved trait of the CYP1A subfamily, observed in the mammalian CYP1A lineage, and may parallel the recent evolution of caffeine synthesis in multiple separate plant species. Likewise, the aryl hydrocarbon receptor agonist, 6-formylindolo[3,2-b]carbazole (FICZ) was metabolized to a greater extent by certain younger ancestors and extant forms, suggesting that activity toward FICZ increased in specific CYP1 evolutionary branches, a process that may have occurred in parallel to the exploitation of land where UV-exposure was higher than in aquatic environments. As observed with previous reconstructions of P450 enzymes, thermostability correlated with evolutionary age; the oldest ancestor was up to 35 °C more thermostable than the extant forms, with a 10T50 (temperature at which 50% of the hemoprotein remains intact after 10 min) of 71 °C. This robustness may have facilitated evolutionary diversification of the CYP1s by buffering the destabilizing effects of mutations that conferred novel functions, a phenomenon which may also be useful in exploiting the catalytic versatility of these ancestral enzymes for commercial application as biocatalysts.
Collapse
Affiliation(s)
- Kurt L Harris
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Raine E S Thomson
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Yosephine Gumulya
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Gabriel Foley
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Saskya E Carrera-Pacheco
- Centro de Investigación Biomédica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170147, Ecuador
| | - Parnayan Syed
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Tomasz Janosik
- RISE Research Institutes of Sweden, Division Bioeconomy and Health, Chemical Process and Pharmaceutical Development, Södertälje, Sweden
| | - Ann-Sofie Sandinge
- DMPK, Early Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, Astrazeneca, Gothenburg, Sweden
| | - Shalini Andersson
- Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca, Gothenburg, Sweden
| | - Ulrik Jurva
- DMPK, Early Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, Astrazeneca, Gothenburg, Sweden
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| | - Elizabeth M J Gillam
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Brisbane, 4072 Australia
| |
Collapse
|
5
|
Yaghmaeian Salmani B, Balderson B, Bauer S, Ekman H, Starkenberg A, Perlmann T, Piper M, Bodén M, Thor S. Selective requirement for polycomb repressor complex 2 in the generation of specific hypothalamic neuronal subtypes. Development 2022; 149:274592. [PMID: 35245348 PMCID: PMC8959139 DOI: 10.1242/dev.200076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 01/18/2022] [Indexed: 11/20/2022]
Abstract
The hypothalamus displays staggering cellular diversity, chiefly established during embryogenesis by the interplay of several signalling pathways and a battery of transcription factors. However, the contribution of epigenetic cues to hypothalamus development remains unclear. We mutated the polycomb repressor complex 2 gene Eed in the developing mouse hypothalamus, which resulted in the loss of H3K27me3, a fundamental epigenetic repressor mark. This triggered ectopic expression of posteriorly expressed regulators (e.g. Hox homeotic genes), upregulation of cell cycle inhibitors and reduced proliferation. Surprisingly, despite these effects, single cell transcriptomic analysis revealed that most neuronal subtypes were still generated in Eed mutants. However, we observed an increase in glutamatergic/GABAergic double-positive cells, as well as loss/reduction of dopamine, hypocretin and Tac2-Pax6 neurons. These findings indicate that many aspects of the hypothalamic gene regulatory flow can proceed without the key H3K27me3 epigenetic repressor mark, but points to a unique sensitivity of particular neuronal subtypes to a disrupted epigenomic landscape. Summary: Polycomb repressor complex 2 inactivation results in selective effects on mouse hypothalamic development, increasing glutamatergic/GABA cells, while reducing dopamine, Hcrt and Tac2-Pax6 cells.
Collapse
Affiliation(s)
- Behzad Yaghmaeian Salmani
- Department of Clinical and Experimental Medicine, Linkoping University, SE-58185 Linkoping, Sweden
- Department of Cell and Molecular Biology, Karolinska Institute, SE-17177 Stockholm, Sweden
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Susanne Bauer
- Department of Clinical and Experimental Medicine, Linkoping University, SE-58185 Linkoping, Sweden
| | - Helen Ekman
- Department of Clinical and Experimental Medicine, Linkoping University, SE-58185 Linkoping, Sweden
| | - Annika Starkenberg
- Department of Clinical and Experimental Medicine, Linkoping University, SE-58185 Linkoping, Sweden
| | - Thomas Perlmann
- Department of Cell and Molecular Biology, Karolinska Institute, SE-17177 Stockholm, Sweden
| | - Michael Piper
- School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Stefan Thor
- Department of Clinical and Experimental Medicine, Linkoping University, SE-58185 Linkoping, Sweden
- School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
6
|
Mora A, Rakar J, Cobeta IM, Salmani BY, Starkenberg A, Thor S, Bodén M. Variational autoencoding of gene landscapes during mouse CNS development uncovers layered roles of Polycomb Repressor Complex 2. Nucleic Acids Res 2022; 50:1280-1296. [PMID: 35048973 PMCID: PMC8860581 DOI: 10.1093/nar/gkac006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/22/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.
Collapse
Affiliation(s)
- Ariane Mora
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jonathan Rakar
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Ignacio Monedero Cobeta
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,Department of Physiology, Universidad Autonoma de Madrid, Madrid, Spain
| | - Behzad Yaghmaeian Salmani
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,Department of Cell and Molecular Biology, Karolinska Institute, SE-171 65 Stockholm, Sweden
| | - Annika Starkenberg
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Stefan Thor
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden.,School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
7
|
Kojic M, Gawda T, Gaik M, Begg A, Salerno-Kochan A, Kurniawan ND, Jones A, Drożdżyk K, Kościelniak A, Chramiec-Głąbik A, Hediyeh-Zadeh S, Kasherman M, Shim WJ, Sinniah E, Genovesi LA, Abrahamsen RK, Fenger CD, Madsen CG, Cohen JS, Fatemi A, Stark Z, Lunke S, Lee J, Hansen JK, Boxill MF, Keren B, Marey I, Saenz MS, Brown K, Alexander SA, Mureev S, Batzilla A, Davis MJ, Piper M, Bodén M, Burne THJ, Palpant NJ, Møller RS, Glatt S, Wainwright BJ. Elp2 mutations perturb the epitranscriptome and lead to a complex neurodevelopmental phenotype. Nat Commun 2021; 12:2678. [PMID: 33976153 PMCID: PMC8113450 DOI: 10.1038/s41467-021-22888-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 03/24/2021] [Indexed: 02/03/2023] Open
Abstract
Intellectual disability (ID) and autism spectrum disorder (ASD) are the most common neurodevelopmental disorders and are characterized by substantial impairment in intellectual and adaptive functioning, with their genetic and molecular basis remaining largely unknown. Here, we identify biallelic variants in the gene encoding one of the Elongator complex subunits, ELP2, in patients with ID and ASD. Modelling the variants in mice recapitulates the patient features, with brain imaging and tractography analysis revealing microcephaly, loss of white matter tract integrity and an aberrant functional connectome. We show that the Elp2 mutations negatively impact the activity of the complex and its function in translation via tRNA modification. Further, we elucidate that the mutations perturb protein homeostasis leading to impaired neurogenesis, myelin loss and neurodegeneration. Collectively, our data demonstrate an unexpected role for tRNA modification in the pathogenesis of monogenic ID and ASD and define Elp2 as a key regulator of brain development.
Collapse
Affiliation(s)
- Marija Kojic
- The University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, Brisbane, QLD, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Tomasz Gawda
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Monika Gaik
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Alexander Begg
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Anna Salerno-Kochan
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Postgraduate School of Molecular Medicine, Warsaw, Poland
| | - Nyoman D Kurniawan
- Centre for Advanced Imaging, The University of Queensland, Brisbane, QLD, Australia
| | - Alun Jones
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Katarzyna Drożdżyk
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Anna Kościelniak
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | | | - Soroor Hediyeh-Zadeh
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Maria Kasherman
- School of Biomedical Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Woo Jun Shim
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Enakshi Sinniah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Laura A Genovesi
- The University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, Brisbane, QLD, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Rannvá K Abrahamsen
- Department of Epilepsy Genetics and Personalized Medicine, Danish Epilepsy Centre, Dianalund, Denmark
| | - Christina D Fenger
- Department of Epilepsy Genetics and Personalized Medicine, Danish Epilepsy Centre, Dianalund, Denmark
| | - Camilla G Madsen
- Centre for Functional and Diagnostic Imaging and Research, Hvidovre Hospital, Hvidovre, Denmark
| | - Julie S Cohen
- Department of Neurology and Developmental Medicine, Division of Neurogenetics, Kennedy Krieger Institute, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ali Fatemi
- Department of Neurology and Developmental Medicine, Division of Neurogenetics, Kennedy Krieger Institute, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Zornitza Stark
- Department of Paediatrics, The University of Melbourne, Melbourne, VIC, Australia
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, VIC, Australia
- Australian Genomics Health Alliance, Parkville, VIC, Australia
| | - Sebastian Lunke
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, VIC, Australia
- Australian Genomics Health Alliance, Parkville, VIC, Australia
- The University of Melbourne, Melbourne, VIC, Australia
| | - Joy Lee
- Department of Paediatrics, The University of Melbourne, Melbourne, VIC, Australia
- Department of Metabolic Medicine, Royal Children's Hospital, Parkville, VIC, Australia
| | - Jonas K Hansen
- Department of Paediatrics, Regional Hospital Viborg, Viborg, Denmark
| | - Martin F Boxill
- Department of Paediatrics, Regional Hospital Viborg, Viborg, Denmark
| | - Boris Keren
- Department of Genetics, Pitié-Salpêtrière Hospital, AP-HP, Paris, France
| | - Isabelle Marey
- Department of Genetics, Pitié-Salpêtrière Hospital, AP-HP, Paris, France
| | - Margarita S Saenz
- The University of Colorado Anschutz, Children's Hospital Colorado, Aurora, CO, USA
| | - Kathleen Brown
- The University of Colorado Anschutz, Children's Hospital Colorado, Aurora, CO, USA
| | - Suzanne A Alexander
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD, Australia
| | - Sergey Mureev
- CSIRO-QUT Synthetic Biology Alliance, Centre for Tropical Crops and Bio-commodities, Queensland University of Technology, Brisbane, QLD, Australia
| | - Alina Batzilla
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
- The Ruprecht Karl University of Heidelberg, Heidelberg, Germany
| | - Melissa J Davis
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
- Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC, Australia
| | - Michael Piper
- School of Biomedical Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Thomas H J Burne
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD, Australia
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Rikke S Møller
- Department of Epilepsy Genetics and Personalized Medicine, Danish Epilepsy Centre, Dianalund, Denmark
- Department for Regional Health Research, The University of Southern Denmark, Odense, Denmark
| | - Sebastian Glatt
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
| | - Brandon J Wainwright
- The University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, Brisbane, QLD, Australia.
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
8
|
O'Connor T, Grant CE, Bodén M, Bailey TL. T-Gene: improved target gene prediction. Bioinformatics 2020; 36:3902-3904. [PMID: 32246829 DOI: 10.1093/bioinformatics/btaa227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 03/04/2020] [Accepted: 03/30/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying the genes regulated by a given transcription factor (TF) (its 'target genes') is a key step in developing a comprehensive understanding of gene regulation. Previously, we developed a method (CisMapper) for predicting the target genes of a TF based solely on the correlation between a histone modification at the TF's binding site and the expression of the gene across a set of tissues or cell lines. That approach is limited to organisms for which extensive histone and expression data are available, and does not explicitly incorporate the genomic distance between the TF and the gene. RESULTS We present the T-Gene algorithm, which overcomes these limitations. It can be used to predict which genes are most likely to be regulated by a TF, and which of the TF's binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene's promoter, achieving median precision above 60%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median precision above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. AVAILABILITY AND IMPLEMENTATION The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Charles E Grant
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada, Reno, NV 89557, USA
| |
Collapse
|
9
|
Shim WJ, Sinniah E, Xu J, Vitrinel B, Alexanian M, Andreoletti G, Shen S, Sun Y, Balderson B, Boix C, Peng G, Jing N, Wang Y, Kellis M, Tam PPL, Smith A, Piper M, Christiaen L, Nguyen Q, Bodén M, Palpant NJ. Conserved Epigenetic Regulatory Logic Infers Genes Governing Cell Identity. Cell Syst 2020; 11:625-639.e13. [PMID: 33278344 PMCID: PMC7781436 DOI: 10.1016/j.cels.2020.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 08/31/2020] [Accepted: 11/09/2020] [Indexed: 01/06/2023]
Abstract
Determining genes that orchestrate cell differentiation in development and disease remains a fundamental goal of cell biology. This study establishes a genome-wide metric based on the gene-repressive trimethylation of histone H3 at lysine 27 (H3K27me3) across hundreds of diverse cell types to identify genetic regulators of cell differentiation. We introduce a computational method, TRIAGE, which uses discordance between gene-repressive tendency and expression to identify genetic drivers of cell identity. We apply TRIAGE to millions of genome-wide single-cell transcriptomes, diverse omics platforms, and eukaryotic cells and tissue types. Using a wide range of data, we validate the performance of TRIAGE in identifying cell-type-specific regulatory factors across diverse species including human, mouse, boar, bird, fish, and tunicate. Using CRISPR gene editing, we use TRIAGE to experimentally validate RNF220 as a regulator of Ciona cardiopharyngeal development and SIX3 as required for differentiation of endoderm in human pluripotent stem cells. A record of this paper’s transparent peer review process is included in the Supplemental Information. Perturbing genes controlling cell decisions have major implications in development or disease. However, identifying key regulatory genes from the thousands expressed in a cell is challenging. TRIAGE is a computational method that distills patterns of epigenetic repression across diverse cell types to infer regulatory genes using input gene expression data from any cell type. Demonstrating its utility, we combine single-cell RNA-seq and TRIAGE to identify and experimentally confirm novel regulators of heart development in evolutionarily distant species.
Collapse
Affiliation(s)
- Woo Jun Shim
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Enakshi Sinniah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Jun Xu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Burcu Vitrinel
- Center for Developmental Genetics, Department of Biology, New York University, New York, NY, USA
| | - Michael Alexanian
- Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
| | - Gaia Andreoletti
- Institute for Computational Health Sciences, University of California, San Francisco, CA 94158, USA
| | - Sophie Shen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Yuliangzi Sun
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Carles Boix
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences and Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou, China; State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Naihe Jing
- CAS Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences and Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou, China; State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yuliang Wang
- Paul G. Allen School of Computer Science and Engineering and Institute for Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA, USA
| | | | - Patrick P L Tam
- The University of Sydney, Children's Medical Research Institute, and School of Medical Sciences, Faculty of Medicine and Health, Westmead, NSW 2145, Australia
| | - Aaron Smith
- Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia; Translational Research Institute, Woolloongabba, Brisbane, Australia
| | - Michael Piper
- School of Biomedical Sciences, The University of Queensland, Brisbane, Australia; Queensland Brain Institute, The University of Queensland, Brisbane, Australia
| | - Lionel Christiaen
- Center for Developmental Genetics, Department of Biology, New York University, New York, NY, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
10
|
Lai JS, Rost B, Kobe B, Bodén M. Evolutionary model of protein secondary structure capable of revealing new biological relationships. Proteins 2020; 88:1251-1259. [PMID: 32394426 DOI: 10.1002/prot.25898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 10/24/2019] [Accepted: 04/27/2020] [Indexed: 11/09/2022]
Abstract
Ancestral sequence reconstruction has had recent success in decoding the origins and the determinants of complex protein functions. However, phylogenetic analyses of remote homologues must handle extreme amino acid sequence diversity resulting from extended periods of evolutionary change. We exploited the wealth of protein structures to develop an evolutionary model based on protein secondary structure. The approach follows the differences between discrete secondary structure states observed in modern proteins and those hypothesized in their immediate ancestors. We implemented maximum likelihood-based phylogenetic inference to reconstruct ancestral secondary structure. The predictive accuracy from the use of the evolutionary model surpasses that of comparative modeling and sequence-based prediction; the reconstruction extracts information not available from modern structures or the ancestral sequences alone. Based on a phylogenetic analysis of a sequence-diverse protein family, we showed that the model can highlight relationships that are evolutionarily rooted in structure and not evident in amino acid-based analysis.
Collapse
Affiliation(s)
- Jhih-Siang Lai
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Burkhard Rost
- Department of Informatics, Institute of Advanced Studies (TUM-IAS), School of Life Sciences (WZW), Technical University of Munich (TUM), Garching, Bavaria, Germany
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.,Australian Infectious Diseases Research Centre, The University of Queensland, Brisbane, Queensland, Australia.,Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.,Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
11
|
Fraser J, Essebier A, Brown AS, Davila RA, Harkins D, Zalucki O, Shapiro LP, Penzes P, Wainwright BJ, Scott MP, Gronostajski RM, Bodén M, Piper M, Harvey TJ. Common Regulatory Targets of NFIA, NFIX and NFIB during Postnatal Cerebellar Development. Cerebellum 2020; 19:89-101. [PMID: 31838646 PMCID: PMC7815246 DOI: 10.1007/s12311-019-01089-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Transcriptional regulation plays a central role in controlling neural stem and progenitor cell proliferation and differentiation during neurogenesis. For instance, transcription factors from the nuclear factor I (NFI) family have been shown to co-ordinate neural stem and progenitor cell differentiation within multiple regions of the embryonic nervous system, including the neocortex, hippocampus, spinal cord and cerebellum. Knockout of individual Nfi genes culminates in similar phenotypes, suggestive of common target genes for these transcription factors. However, whether or not the NFI family regulates common suites of genes remains poorly defined. Here, we use granule neuron precursors (GNPs) of the postnatal murine cerebellum as a model system to analyse regulatory targets of three members of the NFI family: NFIA, NFIB and NFIX. By integrating transcriptomic profiling (RNA-seq) of Nfia- and Nfix-deficient GNPs with epigenomic profiling (ChIP-seq against NFIA, NFIB and NFIX, and DNase I hypersensitivity assays), we reveal that these transcription factors share a large set of potential transcriptional targets, suggestive of complementary roles for these NFI family members in promoting neural development.
Collapse
Affiliation(s)
- James Fraser
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia
| | - Alexandra Essebier
- The School of Chemistry and Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | - Alexander S Brown
- Department of Developmental Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Raul Ayala Davila
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia
| | - Danyon Harkins
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia
| | - Oressia Zalucki
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia
| | - Lauren P Shapiro
- Department of Physiology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peter Penzes
- Department of Physiology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Brandon J Wainwright
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | - Matthew P Scott
- Department of Developmental Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Richard M Gronostajski
- Department of Biochemistry, Program in Genetics, Genomics and Bioinformatics, Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY, USA
| | - Mikael Bodén
- The School of Chemistry and Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | - Michael Piper
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia.
- Queensland Brain Institute, The University of Queensland, Brisbane, 4072, Australia.
| | - Tracey J Harvey
- The School of Biomedical Sciences, The University of Queensland, Brisbane, 4072, Australia.
| |
Collapse
|
12
|
Littmann M, Goldberg T, Seitz S, Bodén M, Rost B. Correction to: Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 2019; 20:727. [PMID: 31861997 PMCID: PMC6925513 DOI: 10.1186/s12859-019-3305-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Tatyana Goldberg
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
| | - Sebastian Seitz
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, UQ (University of Queensland), Cooper Rd, Brisbane City, QLD 4072, Australia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany.,Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
13
|
Foley G, Sützl L, D'Cunha SA, Gillam EM, Bodén M. SeqScrub: a web tool for automatic cleaning and annotation of FASTA file headers for bioinformatic applications. Biotechniques 2019; 67:50-54. [PMID: 31218882 DOI: 10.2144/btn-2018-0188] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Data consistency is necessary for effective bioinformatic analysis. SeqScrub is a web tool that parses and maintains consistent information about protein and DNA sequences in FASTA file format, checks if records are current, and adds taxonomic information by matching identifiers against entries in authoritative biological sequence databases. SeqScrub provides a powerful, yet simple workflow for managing, enriching and exchanging data, which is crucial to establish a record of provenance for sequences found from broad and varied searches; for example, using BLAST on continually updated genome sequence sets. Headers standardized using SeqScrub can be parsed by a majority of bioinformatic tools, stay uniformly named between collaborators and contain informative labels to aid management of reproducible, scientific data. SeqScrub is available at http://bioinf.scmb.uq.edu.au/seqscrub.
Collapse
Affiliation(s)
- Gabriel Foley
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Leander Sützl
- Food Biotechnology Laboratory, Department of Food Sciences & Technology, BOKU University of Natural Resources & Life Sciences, Vienna, Austria
| | - Stephlina A D'Cunha
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Elizabeth Mj Gillam
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
14
|
Sützl L, Foley G, Gillam EMJ, Bodén M, Haltrich D. The GMC superfamily of oxidoreductases revisited: analysis and evolution of fungal GMC oxidoreductases. Biotechnol Biofuels 2019; 12:118. [PMID: 31168323 PMCID: PMC6509819 DOI: 10.1186/s13068-019-1457-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 05/02/2019] [Indexed: 05/03/2023]
Abstract
BACKGROUND The glucose-methanol-choline (GMC) superfamily is a large and functionally diverse family of oxidoreductases that share a common structural fold. Fungal members of this superfamily that are characterised and relevant for lignocellulose degradation include aryl-alcohol oxidoreductase, alcohol oxidase, cellobiose dehydrogenase, glucose oxidase, glucose dehydrogenase, pyranose dehydrogenase, and pyranose oxidase, which together form family AA3 of the auxiliary activities in the CAZy database of carbohydrate-active enzymes. Overall, little is known about the extant sequence space of these GMC oxidoreductases and their phylogenetic relations. Although some individual forms are well characterised, it is still unclear how they compare in respect of the complete enzyme class and, therefore, also how generalizable are their characteristics. RESULTS To improve the understanding of the GMC superfamily as a whole, we used sequence similarity networks to cluster large numbers of fungal GMC sequences and annotate them according to functionality. Subsequently, different members of the GMC superfamily were analysed in detail with regard to their sequences and phylogeny. This allowed us to define the currently characterised sequence space and show that complete clades of some enzymes have not been studied in any detail to date. Finally, we interpret our results from an evolutionary perspective, where we could show, for example, that pyranose dehydrogenase evolved from aryl-alcohol oxidoreductase after a change in substrate specificity and that the cytochrome domain of cellobiose dehydrogenase was regularly lost during evolution. CONCLUSIONS This study offers new insights into the sequence variation and phylogenetic relationships of fungal GMC/AA3 sequences. Certain clades of these GMC enzymes identified in our phylogenetic analyses are completely uncharacterised to date, and might include enzyme activities of varying specificities and/or activities that are hitherto unstudied.
Collapse
Affiliation(s)
- Leander Sützl
- Food Biotechnology Laboratory, Department of Food Science and Technology, BOKU-University of Natural Resources and Life Sciences Vienna, Vienna, Austria
- Doctoral Programme BioToP-Biomolecular Technology of Proteins, BOKU-University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| | - Gabriel Foley
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Elizabeth M J Gillam
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Mikael Bodén
- School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Dietmar Haltrich
- Food Biotechnology Laboratory, Department of Food Science and Technology, BOKU-University of Natural Resources and Life Sciences Vienna, Vienna, Austria
- Doctoral Programme BioToP-Biomolecular Technology of Proteins, BOKU-University of Natural Resources and Life Sciences Vienna, Vienna, Austria
| |
Collapse
|
15
|
Littmann M, Goldberg T, Seitz S, Bodén M, Rost B. Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 2019; 20:205. [PMID: 31014229 PMCID: PMC6480651 DOI: 10.1186/s12859-019-2790-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 04/02/2019] [Indexed: 12/21/2022] Open
Abstract
Background Sub-nuclear structures or locations are associated with various nuclear processes. Proteins localized in these substructures are important to understand the interior nuclear mechanisms. Despite advances in high-throughput methods, experimental protein annotations remain limited. Predictions of cellular compartments have become very accurate, largely at the expense of leaving out substructures inside the nucleus making a fine-grained analysis impossible. Results Here, we present a new method (LocNuclei) that predicts nuclear substructures from sequence alone. LocNuclei used a string-based Profile Kernel with Support Vector Machines (SVMs). It distinguishes sub-nuclear localization in 13 distinct substructures and distinguishes between nuclear proteins confined to the nucleus and those that are also native to other compartments (traveler proteins). High performance was achieved by implicitly leveraging a large biological knowledge-base in creating predictions by homology-based inference through BLAST. Using this approach, the performance reached AUC = 0.70–0.74 and Q13 = 59–65%. Travelling proteins (nucleus and other) were identified at Q2 = 70–74%. A Gene Ontology (GO) analysis of the enrichment of biological processes revealed that the predicted sub-nuclear compartments matched the expected functionality. Analysis of protein-protein interactions (PPI) show that formation of compartments and functionality of proteins in these compartments highly rely on interactions between proteins. This suggested that the LocNuclei predictions carry important information about function. The source code and data sets are available through GitHub: https://github.com/Rostlab/LocNuclei. Conclusions LocNuclei predicts subnuclear compartments and traveler proteins accurately. These predictions carry important information about functionality and PPIs. Electronic supplementary material The online version of this article (10.1186/s12859-019-2790-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Tatyana Goldberg
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
| | - Sebastian Seitz
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, UQ (University of Queensland), Cooper Rd, Brisbane City, QLD, 4072, Australia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany.,Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
16
|
O'Connor T, Bodén M, Bailey TL. CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data. Nucleic Acids Res 2018; 45:e19. [PMID: 28204599 PMCID: PMC5389714 DOI: 10.1093/nar/gkw956] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 09/30/2016] [Accepted: 10/10/2016] [Indexed: 12/18/2022] Open
Abstract
Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF.
Collapse
Affiliation(s)
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada School of Medicine, Reno, NV 89557-0357, USA
| |
Collapse
|
17
|
Patrick R, Kobe B, Lê Cao KA, Bodén M. PhosphoPICK-SNP: quantifying the effect of amino acid variants on protein phosphorylation. Bioinformatics 2018; 33:1773-1781. [PMID: 28186228 DOI: 10.1093/bioinformatics/btx072] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 02/07/2017] [Indexed: 12/15/2022] Open
Abstract
Motivation Genome-wide association studies are identifying single nucleotide variants (SNVs) linked to various diseases, however the functional effect caused by these variants is often unknown. One potential functional effect, the loss or gain of protein phosphorylation sites, can be induced through variations in key amino acids that disrupt or introduce valid kinase binding patterns. Current methods for predicting the effect of SNVs on phosphorylation operate on the sequence content of reference and variant proteins. However, consideration of the amino acid sequence alone is insufficient for predicting phosphorylation change, as context factors determine kinase-substrate selection. Results We present here a method for quantifying the effect of SNVs on protein phosphorylation through an integrated system of motif analysis and context-based assessment of kinase targets. By predicting the effect that known variants across the proteome have on phosphorylation, we are able to use this background of proteome-wide variant effects to quantify the significance of novel variants for modifying phosphorylation. We validate our method on a manually curated set of phosphorylation change-causing variants from the primary literature, showing that the method predicts known examples of phosphorylation change at high levels of specificity. We apply our approach to data-sets of variants in phosphorylation site regions, showing that variants causing predicted phosphorylation loss are over-represented among disease-associated variants. Availability and Implementation The method is freely available as a web-service at the website http://bioinf.scmb.uq.edu.au/phosphopick/snp. Contact m.boden@uq.edu.au. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ralph Patrick
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Australia
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Australia.,Institute for Molecular Bioscience, The University of Queensland, St Lucia, Australia.,Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, Australia
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, Translational Research Institute, Woolloongabba, QLD, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Australia.,Institute for Molecular Bioscience, The University of Queensland, St Lucia, Australia
| |
Collapse
|
18
|
Affiliation(s)
- Julian Zaugg
- School of Chemistry and Molecular Biosciences, University of Queensland, 4072 Brisbane, Australia
| | - Yosephine Gumulya
- School of Chemistry and Molecular Biosciences, University of Queensland, 4072 Brisbane, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, 4072 Brisbane, Australia
- Institute for Molecular Bioscience, University of Queensland, 4072 Brisbane, Australia
| | - Alan E. Mark
- School of Chemistry and Molecular Biosciences, University of Queensland, 4072 Brisbane, Australia
- Institute for Molecular Bioscience, University of Queensland, 4072 Brisbane, Australia
| | - Alpeshkumar K. Malde
- School of Chemistry and Molecular Biosciences, University of Queensland, 4072 Brisbane, Australia
| |
Collapse
|
19
|
Essebier A, Lamprecht M, Piper M, Bodén M. Bioinformatics approaches to predict target genes from transcription factor binding data. Methods 2017; 131:111-119. [DOI: 10.1016/j.ymeth.2017.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 08/29/2017] [Accepted: 09/03/2017] [Indexed: 12/28/2022] Open
|
20
|
Patrick R, Horin C, Kobe B, Cao KAL, Bodén M. Prediction of kinase-specific phosphorylation sites through an integrative model of protein context and sequence. Biochim Biophys Acta 2016; 1864:1599-608. [PMID: 27507704 DOI: 10.1016/j.bbapap.2016.08.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 07/08/2016] [Accepted: 08/03/2016] [Indexed: 01/17/2023]
Abstract
Identifying kinase substrates and the specific phosphorylation sites they regulate is an important factor in understanding protein function regulation and signalling pathways. Computational prediction of kinase targets - assigning kinases to putative substrates, and selecting from protein sequence the sites that kinases can phosphorylate - requires the consideration of both the cellular context that kinases operate in, as well as their binding affinity. This consideration enables investigation of how phosphorylation influences a range of biological processes. We report here a novel probabilistic model for classifying kinase-specific phosphorylation sites from sequence across three model organisms: human, mouse and yeast. The model incorporates position-specific amino acid frequencies, and counts of co-occurring amino acids from kinase binding sites. We show how this model can be seamlessly integrated with protein interactions and cell-cycle abundance profiles. When evaluating the prediction accuracy of our method, PhosphoPICK, on an independent hold-out set of kinase-specific phosphorylation sites, it achieved an average specificity of 97%, with 32% sensitivity. We compared PhosphoPICK's ability, through cross-validation, to predict kinase-specific phosphorylation sites with alternative methods, and show that at high levels of specificity PhosphoPICK obtains greater sensitivity for most comparisons made. We investigated the relationship between kinase-specific phosphorylation sites and nuclear localisation signals. We show that kinases PKA, Akt1 and AurB have an over-representation of predicted binding sites at particular positions downstream from predicted nuclear localisation signals, demonstrating an important role for these kinases in regulating the nuclear import of proteins. PhosphoPICK is freely available as a web-service at http://bioinf.scmb.uq.edu.au/phosphopick.
Collapse
Affiliation(s)
- Ralph Patrick
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia 4072, Australia.
| | - Coralie Horin
- Polytech Nice-Sophia, Université Nice Sophia-Antipolis, Nice 06103, France
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia 4072, Australia; Institute for Molecular Bioscience, The University of Queensland, St Lucia 4072, Australia; Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia 4072, Australia
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, Translational Research Institute, Woolloongabba, QLD 4102, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia 4072, Australia; Institute for Molecular Bioscience, The University of Queensland, St Lucia 4072, Australia
| |
Collapse
|
21
|
Chang CW, Couñago RM, Williams SJ, Bodén M, Kobe B. Distinctive Conformation of Minor Site-Specific Nuclear Localization Signals Bound to Importin-α. Traffic 2016; 17:704. [DOI: 10.1111/tra.12395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
22
|
Essebier A, Vera Wolf P, Cao MD, Carroll BJ, Balasubramanian S, Bodén M. Statistical Enrichment of Epigenetic States Around Triplet Repeats that Can Undergo Expansions. Front Neurosci 2016; 10:92. [PMID: 27013954 PMCID: PMC4782033 DOI: 10.3389/fnins.2016.00092] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 02/23/2016] [Indexed: 12/18/2022] Open
Abstract
More than 30 human genetic diseases are linked to tri-nucleotide repeat expansions. There is no known mechanism that explains repeat expansions in full, but changes in the epigenetic state of the associated locus has been implicated in the disease pathology for a growing number of examples. A comprehensive comparative analysis of the genomic features associated with diverse repeat expansions has been lacking. Here, in an effort to decipher the propensity of repeats to undergo expansion and result in a disease state, we determine the genomic coordinates of tri-nucleotide repeat tracts at base pair resolution and computationally establish epigenetic profiles around them. Using three complementary statistical tests, we reveal that several epigenetic states are enriched around repeats that are associated with disease, even in cells that do not harbor expansion, relative to a carefully stratified background. Analysis of over one hundred cell types reveals that epigenetic states generally tend to vary widely between genic regions and cell types. However, there is qualified consistency in the epigenetic signatures of repeats associated with disease suggesting that changes to the chromatin and the DNA around an expanding repeat locus are likely to be similar. These epigenetic signatures may be exploited further to develop models that could explain the propensity of repeats to undergo expansions.
Collapse
Affiliation(s)
- Alexandra Essebier
- School of Chemistry and Molecular Biosciences, The University of Queensland St Lucia, QLD, Australia
| | - Patricia Vera Wolf
- School of Chemistry and Molecular Biosciences, The University of Queensland St Lucia, QLD, Australia
| | - Minh Duc Cao
- School of Chemistry and Molecular Biosciences, The University of Queensland St Lucia, QLD, Australia
| | - Bernard J Carroll
- School of Chemistry and Molecular Biosciences, The University of Queensland St Lucia, QLD, Australia
| | | | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland St Lucia, QLD, Australia
| |
Collapse
|
23
|
Abstract
Methods for measuring genetic distances in phylogenetics are known to be sensitive to the evolutionary model assumed. However, there is a lack of established methodology to accommodate the trade-off between incorporating sufficient biological reality and avoiding model overfitting. In addition, as traditional methods measure distances based on the observed number of substitutions, their tend to underestimate distances between diverged sequences due to backward and parallel substitutions. Various techniques were proposed to correct this, but they lack the robustness against sequences that are distantly related and of unequal base frequencies. In this article, we present a novel genetic distance estimate based on information theory that overcomes the above two hurdles. Instead of examining the observed number of substitutions, this method estimates genetic distances using Shannon's mutual information. This naturally provides an effective framework for balancing model complexity and goodness of fit. Our distance estimate is shown to be approximately linear to elapsed time and hence is less sensitive to the divergence of sequence data and compositional biased sequences. Using extensive simulation data, we show that our method 1) consistently reconstructs more accurate phylogeny topologies than existing methods, 2) is robust in extreme conditions such as diverged phylogenies, unequal base frequencies data, and heterogeneous mutation patterns, and 3) scales well with large phylogenies.
Collapse
Affiliation(s)
- Minh Duc Cao
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
| | - Lloyd Allison
- Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
| | - Trevor I Dix
- Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
| | - Mikael Bodén
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
24
|
Róna G, Borsos M, Ellis JJ, Mehdi AM, Christie M, Környei Z, Neubrandt M, Tóth J, Bozóky Z, Buday L, Madarász E, Bodén M, Kobe B, Vértessy BG. Dynamics of re-constitution of the human nuclear proteome after cell division is regulated by NLS-adjacent phosphorylation. Cell Cycle 2015; 13:3551-64. [PMID: 25483092 DOI: 10.4161/15384101.2014.960740] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Phosphorylation by the cyclin-dependent kinase 1 (Cdk1) adjacent to nuclear localization signals (NLSs) is an important mechanism of regulation of nucleocytoplasmic transport. However, no systematic survey has yet been performed in human cells to analyze this regulatory process, and the corresponding cell-cycle dynamics have not yet been investigated. Here, we focused on the human proteome and found that numerous proteins, previously not identified in this context, are associated with Cdk1-dependent phosphorylation sites adjacent to their NLSs. Interestingly, these proteins are involved in key regulatory events of DNA repair, epigenetics, or RNA editing and splicing. This finding indicates that cell-cycle dependent events of genome editing and gene expression profiling may be controlled by nucleocytoplasmic trafficking. For in-depth investigations, we selected a number of these proteins and analyzed how point mutations, expected to modify the phosphorylation ability of the NLS segments, perturb nucleocytoplasmic localization. In each case, we found that mutations mimicking hyper-phosphorylation abolish nuclear import processes. To understand the mechanism underlying these phenomena, we performed a video microscopy-based kinetic analysis to obtain information on cell-cycle dynamics on a model protein, dUTPase. We show that the NLS-adjacent phosphorylation by Cdk1 of human dUTPase, an enzyme essential for genomic integrity, results in dynamic cell cycle-dependent distribution of the protein. Non-phosphorylatable mutants have drastically altered protein re-import characteristics into the nucleus during the G1 phase. Our results suggest a dynamic Cdk1-driven mechanism of regulation of the nuclear proteome composition during the cell cycle.
Collapse
Key Words
- Cdc28, cyclin-dependent protein kinase (Cdk) encoded by CDC28
- Cdk1, cyclin-dependent kinase 1
- GO, gene ontology
- NES, nuclear export signal
- NLS, nuclear localization signal
- SNP, single nucleotide polymorphisms
- SV40, Simian virus 40
- UBA1, Ubiquitin-activating enzyme E1
- UNG2, Human Uracil-DNA glycosylase 2
- cNLS, classical nuclear localization signal
- cell cycle
- dNTP, deoxyribonucleotide triphosphate
- dTTP, deoxythymidine triphosphate
- dUMP, deoxyuridine monophosphate
- dUTP, deoxyuridine triphosphate
- dUTPase
- importin
- phosphorylation
- trafficking
Collapse
Affiliation(s)
- Gergely Róna
- a Institute of Enzymology; RCNS; Hungarian Academy of Sciences ; Budapest , Hungary
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
MOTIVATION The determinants of kinase-substrate phosphorylation can be found both in the substrate sequence and the surrounding cellular context. Cell cycle progression, interactions with mediating proteins and even prior phosphorylation events are necessary for kinases to maintain substrate specificity. While much work has focussed on the use of sequence-based methods to predict phosphorylation sites, there has been very little work invested into the application of systems biology to understand phosphorylation. Lack of specificity in many kinase substrate binding motifs means that sequence methods for predicting kinase binding sites are susceptible to high false-positive rates. RESULTS We present here a model that takes into account protein-protein interaction information, and protein abundance data across the cell cycle to predict kinase substrates for 59 human kinases that are representative of important biological pathways. The model shows high accuracy for substrate prediction (with an average AUC of 0.86) across the 59 kinases tested. When using the model to complement sequence-based kinase-specific phosphorylation site prediction, we found that the additional information increased prediction performance for most comparisons made, particularly on kinases from the CMGC family. We then used our model to identify functional overlaps between predicted CDK2 substrates and targets from the E2F family of transcription factors. Our results demonstrate that a model harnessing context data can account for the short-falls in sequence information and provide a robust description of the cellular events that regulate protein phosphorylation. AVAILABILITY AND IMPLEMENTATION The method is freely available online as a web server at the website http://bioinf.scmb.uq.edu.au/phosphopick. CONTACT m.boden@uq.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ralph Patrick
- School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia
| | - Kim-Anh Lê Cao
- School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia School of Chemistry and Molecular Biosciences and Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia 4072, Translational Research Institute, The University of Queensland Diamantina Institute, Brisbane, St Lucia 4102, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, St Lucia, 4072, Australia
| |
Collapse
|
26
|
Chang CW, Couñago R, Williams S, Bodén M, Kobe B. Importin alpha and nonclassical nuclear localization signal. Acta Crystallogr A Found Adv 2014. [DOI: 10.1107/s205327331408365x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
In the classical nuclear import pathway, the specific recognition between the nuclear receptor (importin-α) and the nuclear localization signals (NLSs) plays an essential role on facilitating the cargo import process. Importin-α has two separate NLS-binding sites (the major and the minor sites), accommodate NLSs, comprising of one (monopartite) or two clusters (bipartite) of basic residues connected by a 10 - 12 residue linker. The major NLS-binding site is the preferential binding site for most of the monopartite NLSs characterized to date. By screening random peptide libraries using importin-α variants as bait, the bound NLS sequences could be divided into six classes [1]. The class-3 minor site-specific NLSs and class-5 plant-specific NLSs feature a shorter basic cluster. The molecular basis of the specific binding between these non-classical NLSs and importin-α was not known and in particular, there was a lack of crystal structures of plant importin-α. Here, we present the first crystal structure of plant importin-α, and explain the differential binding specificity between the class-5 plant-specific NLSs and importin-α variants [2]. The binding conformation of the class-3 minor site-specific NLSs features an α-helical turn, that is distinct from the other NLSs reported structurally [3]. Comparative bioinformatic screens not only indicate both plant-specific and minor site-specific NLSs are much less prevalent than the classical NLSs, but also reveal a greater prevalence of these two classes of non-classical NLSs in rice the proteome, compared to the others from yeast, mammals, and even other plant species. Together, our data can help to characterize novel proteins containing non-classical NLSs destined for the cell nucleus by the classical nuclear import pathway.
Collapse
|
27
|
Abstract
Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/.
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia
| | | | | | | |
Collapse
|
28
|
Abstract
Directed evolution methods have proved to be highly effective in the design of novel proteins and in the generation of large libraries of diverse sequences. However, searching through the vast number of mutants produced during such experiments in order to find the best represents a daunting and difficult task. In recent years, a number of computational tools have been developed to provide guidance during this exploratory process. It can, however, be unclear as to which tool or tools best complement the chosen library design strategy. In this review, we describe and critically evaluate some of the more notable tools in this area, discussing the rationale behind each, the requirements for their implementation, and potential issues faced when using them. Some examples of their application in an experimental setting are also provided. The tools have been classified based on contrasting strategies as to how they function: prospective tools SCHEMA and OPTCOMB use extant sequence and structural data to predict optimal locations for crossover sites, whereas retrospective tools ProSAR and ASRA use property data from the mutant library to predict beneficial mutations and features. From our evaluation, we suggest that each tool can play a role in the design process; however this is largely dictated by the data available and the desired experimental strategy for the project.
Collapse
Affiliation(s)
- Julian Zaugg
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | | | | | | |
Collapse
|
29
|
Cao MD, Tasker E, Willadsen K, Imelfort M, Vishwanathan S, Sureshkumar S, Balasubramanian S, Bodén M. Inferring short tandem repeat variation from paired-end short reads. Nucleic Acids Res 2013; 42:e16. [PMID: 24353318 PMCID: PMC3919575 DOI: 10.1093/nar/gkt1313] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats.
Collapse
Affiliation(s)
- Minh Duc Cao
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, St Lucia QLD 4072, Australia, Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia, School of Biological Sciences, Monash University, Melbourne, Australia and Advanced Water Management Centre, The University of Queensland, Queensland, Australia
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Chang CW, Couñago RM, Williams SJ, Bodén M, Kobe B. Distinctive conformation of minor site-specific nuclear localization signals bound to importin-α. Traffic 2013; 14:1144-54. [PMID: 23910026 DOI: 10.1111/tra.12098] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 07/31/2013] [Accepted: 08/02/2013] [Indexed: 11/30/2022]
Abstract
Nuclear localization signals (NLSs) contain one or two clusters of basic residues and are recognized by the import receptor importin-α. There are two NLS-binding sites (major and minor) on importin-α and the major NLS-binding site is considered to be the primary binding site. Here, we used crystallographic and biochemical methods to investigate the binding between importin-α and predicted 'minor site-specific' NLSs: four peptide library-derived peptides, and the NLS from mouse RNA helicase II/Guα. The crystal structures reveal that these atypical NLSs indeed preferentially bind to the minor NLS-binding site. Unlike previously characterized NLSs, the C-terminal residues of these NLSs form an α-helical turn, stabilized by internal H-bond and cation-π interactions between the aromatic residues from the NLSs and the positively charged residues from importin-α. This helical turn sterically hinders binding at the major NLS-binding site, explaining the minor-site preference. Our data suggest the sequence RXXKR[K/X][F/Y/W]XXAF as the optimal minor NLS-binding site-specific motif, which may help identify novel proteins with atypical NLSs.
Collapse
Affiliation(s)
- Chiung-Wen Chang
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld, 4072, Australia; Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, Qld, 4072, Australia
| | | | | | | | | |
Collapse
|
31
|
Oyarzún P, Ellis JJ, Bodén M, Kobe B. PREDIVAC: CD4+ T-cell epitope prediction for vaccine design that covers 95% of HLA class II DR protein diversity. BMC Bioinformatics 2013; 14:52. [PMID: 23409948 PMCID: PMC3598884 DOI: 10.1186/1471-2105-14-52] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 01/31/2013] [Indexed: 12/18/2022] Open
Abstract
Background CD4+ T-cell epitopes play a crucial role in eliciting vigorous protective immune responses during peptide (epitope)-based vaccination. The prediction of these epitopes focuses on the peptide binding process by MHC class II proteins. The ability to account for MHC class II polymorphism is critical for epitope-based vaccine design tools, as different allelic variants can have different peptide repertoires. In addition, the specificity of CD4+ T-cells is often directed to a very limited set of immunodominant peptides in pathogen proteins. The ability to predict what epitopes are most likely to dominate an immune response remains a challenge. Results We developed the computational tool Predivac to predict CD4+ T-cell epitopes. Predivac can make predictions for 95% of all MHC class II protein variants (allotypes), a substantial advance over other available methods. Predivac bases its prediction on the concept of specificity-determining residues. The performance of the method was assessed both for high-affinity HLA class II peptide binding and CD4+ T-cell epitope prediction. In terms of epitope prediction, Predivac outperformed three available pan-specific approaches (delivering the highest specificity). A central finding was the high accuracy delivered by the method in the identification of immunodominant and promiscuous CD4+ T-cell epitopes, which play an essential role in epitope-based vaccine design. Conclusions The comprehensive HLA class II allele coverage along with the high specificity in identifying immunodominant CD4+ T-cell epitopes makes Predivac a valuable tool to aid epitope-based vaccine design in the context of a genetically heterogeneous human population.The tool is available at: http://predivac.biosci.uq.edu.au/.
Collapse
Affiliation(s)
- Patricio Oyarzún
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia.
| | | | | | | |
Collapse
|
32
|
Willadsen K, Cao MD, Wiles J, Balasubramanian S, Bodén M. Repeat-encoded poly-Q tracts show statistical commonalities across species. BMC Genomics 2013; 14:76. [PMID: 23374135 PMCID: PMC3617014 DOI: 10.1186/1471-2164-14-76] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 01/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Among repetitive genomic sequence, the class of tri-nucleotide repeats has received much attention due to their association with human diseases. Tri-nucleotide repeat diseases are caused by excessive sequence length variability; diseases such as Huntington's disease and Fragile X syndrome are tied to an increase in the number of repeat units in a tract. Motivated by the recent discovery of a tri-nucleotide repeat associated genetic defect in Arabidopsis thaliana, this study takes a cross-species approach to investigating these repeat tracts, with the goal of using commonalities between species to identify potential disease-related properties. RESULTS We find that statistical enrichment in regulatory function associations for coding region repeats - previously observed in human - is consistent across multiple organisms. By distinguishing between homo-amino acid tracts that are encoded by tri-nucleotide repeats, and those encoded by varying codons, we show that amino acid repeats - not tri-nucleotide repeats - fully explain these regulatory associations. Using this same separation between repeat- and non-repeat-encoded homo-amino acid tracts, we show that poly-glutamine tracts are disproportionately encoded by tri-nucleotide repeats, and those tracts that are encoded by tri-nucleotide repeats are also significantly longer; these results are consistent across multiple species. CONCLUSION These findings establish similarities in tri-nucleotide repeats across species at the level of protein functionality and protein sequence. The tendency of tri-nucleotide repeats to encode longer poly-glutamine tracts indicates a link with the poly-glutamine repeat diseases. The cross-species nature of this tendency suggests that unknown repeat diseases are yet to be uncovered in other species. Future discoveries of new non-human repeat associated defects may provide the breadth of information needed to unravel the mechanisms that underpin this class of human disease.
Collapse
Affiliation(s)
- Kai Willadsen
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | | | |
Collapse
|
33
|
|
34
|
Chang CW, Couñago RLM, Williams SJ, Bodén M, Kobe B. Crystal structure of rice importin-α and structural basis of its interaction with plant-specific nuclear localization signals. Plant Cell 2012; 24:5074-88. [PMID: 23250448 PMCID: PMC3556976 DOI: 10.1105/tpc.112.104422] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2012] [Revised: 10/22/2012] [Accepted: 11/26/2012] [Indexed: 05/22/2023]
Abstract
In the classical nucleocytoplasmic import pathway, nuclear localization signals (NLSs) in cargo proteins are recognized by the import receptor importin-α. Importin-α has two separate NLS binding sites (the major and the minor site), both of which recognize positively charged amino acid clusters in NLSs. Little is known about the molecular basis of the unique features of the classical nuclear import pathway in plants. We determined the crystal structure of rice (Oryza sativa) importin-α1a at 2-Å resolution. The structure reveals that the autoinhibitory mechanism mediated by the importin-β binding domain of importin-α operates in plants, with NLS-mimicking sequences binding to both minor and major NLS binding sites. Consistent with yeast and mammalian proteins, rice importin-α binds the prototypical NLS from simian virus 40 large T-antigen preferentially at the major NLS binding site. We show that two NLSs, previously described as plant specific, bind to and are functional with plant, mammalian, and yeast importin-α proteins but interact with rice importin-α more strongly. The crystal structures of their complexes with rice importin-α show that they bind to the minor NLS binding site. By contrast, the crystal structures of their complexes with mouse (Mus musculus) importin-α show preferential binding to the major NLS binding site. Our results reveal the molecular basis of a number of features of the classical nuclear transport pathway specific to plants.
Collapse
Affiliation(s)
- Chiung-Wen Chang
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane Qld 4072, Australia
- Australian Infectious Diseases Research Centre, University of Queensland, Brisbane Qld 4072, Australia
| | - Rafael Lemos Miguez Couñago
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane Qld 4072, Australia
- Australian Infectious Diseases Research Centre, University of Queensland, Brisbane Qld 4072, Australia
| | - Simon J. Williams
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane Qld 4072, Australia
- Australian Infectious Diseases Research Centre, University of Queensland, Brisbane Qld 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane Qld 4072, Australia
- School of Information Technology and Electrical Engineering, University of Queensland, Brisbane Qld 4072, Australia
| | - Boštjan Kobe
- School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Brisbane Qld 4072, Australia
- Australian Infectious Diseases Research Centre, University of Queensland, Brisbane Qld 4072, Australia
- Address correspondence to
| |
Collapse
|
35
|
Mehdi AM, Sehgal MSB, Kobe B, Bailey TL, Bodén M. DLocalMotif: a discriminative approach for discovering local motifs in protein sequences. ACTA ACUST UNITED AC 2012; 29:39-46. [PMID: 23142965 DOI: 10.1093/bioinformatics/bts654] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. RESULTS This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. AVAILABILITY http://bioinf.scmb.uq.edu.au/dlocalmotif/
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Australia
| | | | | | | | | |
Collapse
|
36
|
Willadsen K, Mohamad N, Bodén M. NSort/DB: an intranuclear compartment protein database. Genomics Proteomics Bioinformatics 2012; 10:226-9. [PMID: 23084778 PMCID: PMC5054713 DOI: 10.1016/j.gpb.2012.07.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Accepted: 04/13/2012] [Indexed: 11/19/2022]
Abstract
Distinct substructures within the nucleus are associated with a wide variety of important nuclear processes. Structures such as chromatin and nuclear pores have specific roles, while others such as Cajal bodies are more functionally varied. Understanding the roles of these membraneless intra-nuclear compartments requires extensive data sets covering nuclear and compartment-associated proteins. NSort/DB is a database providing access to intra- or sub-nuclear compartment associations for the mouse nuclear proteome. Based on resources ranging from large-scale curated data sets to detailed experiments, this data set provides a high-quality set of annotations of non-exclusive association of nuclear proteins with structures such as promyelocytic leukaemia bodies and chromatin. The database is searchable by protein identifier or compartment, and has a documented web service API. The search interface, web service and data download are all freely available online at http://www.nsort.org/db/. Availability of this data set will enable systematic analyses of the protein complements of nuclear compartments, improving our understanding of the diverse functional repertoire of these structures.
Collapse
Affiliation(s)
- Kai Willadsen
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Australia.
| | | | | |
Collapse
|
37
|
Patrick R, Cao KAL, Davis M, Kobe B, Bodén M. Mapping the stabilome: a novel computational method for classifying metabolic protein stability. BMC Syst Biol 2012; 6:60. [PMID: 22682214 PMCID: PMC3439251 DOI: 10.1186/1752-0509-6-60] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/16/2012] [Indexed: 11/30/2022]
Abstract
Background The half-life of a protein is regulated by a range of system properties, including the abundance of components of the degradative machinery and protein modifiers. It is also influenced by protein-specific properties, such as a protein’s structural make-up and interaction partners. New experimental techniques coupled with powerful data integration methods now enable us to not only investigate what features govern protein stability in general, but also to build models that identify what properties determine each protein’s metabolic stability. Results In this work we present five groups of features useful for predicting protein stability: (1) post-translational modifications, (2) domain types, (3) structural disorder, (4) the identity of a protein’s N-terminal residue and (5) amino acid sequence. We incorporate these features into a predictive model with promising accuracy. At a 20% false positive rate, the model exhibits an 80% true positive rate, outperforming the only previously proposed stability predictor. We also investigate the impact of N-terminal protein tagging as used to generate the data set, in particular the impact it may have on the measurements for secreted and transmembrane proteins; we train and test our model on a subset of the data with those proteins removed, and show that the model sustains high accuracy. Finally, we estimate system-wide metabolic stability by surveying the whole human proteome. Conclusions We describe a variety of protein features that are significantly over- or under-represented in stable and unstable proteins, including phosphorylation, acetylation and destabilizing N-terminal residues. Bayesian networks are ideal for combining these features into a predictive model with superior accuracy and transparency compared to the only other proposed stability predictor. Furthermore, our stability predictions of the human proteome will find application in the analysis of functionally related proteins, shedding new light on regulation by protein synthesis and degradation.
Collapse
Affiliation(s)
- Ralph Patrick
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Australia
| | | | | | | | | |
Collapse
|
38
|
Affiliation(s)
- Praveen K. Madala
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - David P. Fairlie
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Mikael Bodén
- Institute
for Molecular Bioscience, ‡School of Chemistry and Molecular Biosciences, and §School of Information
Technology and Electrical Engineering, The University of Queensland, St. Lucia, QLD 4072, Australia
| |
Collapse
|
39
|
Suksawatchon J, Lursinsap C, Bodén M. COMPUTING THE REVERSAL DISTANCE BETWEEN GENOMES IN THE PRESENCE OF MULTI-GENE FAMILIES VIA BINARY INTEGER PROGRAMMING. J Bioinform Comput Biol 2011; 5:117-33. [PMID: 17477494 DOI: 10.1142/s0219720007002552] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2006] [Revised: 06/16/2006] [Accepted: 10/30/2006] [Indexed: 11/18/2022]
Abstract
Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting signed genomic data. Their algorithm determines the minimum number of reversals required for rearranging a genome to another — but only in the absence of gene duplicates. However, duplicates often account for 40% of a genome. In this paper, we show how to extend Hannenhalli and Pevzner's approach to deal with genomes with multi-gene families. We propose a new heuristic algorithm to compute the nearest reversal distance between two genomes with multi-gene families via binary integer programming. The experimental results on both synthetic and real biological data demonstrate that the proposed algorithm is able to find the reversal distance with high accuracy.
Collapse
Affiliation(s)
- Jakkarin Suksawatchon
- Advanced Virtual and Intelligent Computing Center, Department of Mathematics, Chulalongkorn University, Bangkok 10330, Thailand.
| | | | | |
Collapse
|
40
|
Hawkins J, Bodén M. DETECTING AND SORTING TARGETING PEPTIDES WITH NEURAL NETWORKS AND SUPPORT VECTOR MACHINES. J Bioinform Comput Biol 2011; 4:1-18. [PMID: 16568539 DOI: 10.1142/s0219720006001771] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2005] [Revised: 07/30/2005] [Accepted: 07/31/2005] [Indexed: 11/18/2022]
Abstract
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).
Collapse
Affiliation(s)
- John Hawkins
- School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Australia.
| | | |
Collapse
|
41
|
Abstract
Motivation: Quantitative experimental analyses of the nuclear interior reveal a morphologically structured yet dynamic mix of membraneless compartments. Major nuclear events depend on the functional integrity and timely assembly of these intra-nuclear compartments. Yet, unknown drivers of protein mobility ensure that they are in the right place at the time when they are needed. Results: This study investigates determinants of associations between eight intra-nuclear compartments and their proteins in heterogeneous genome-wide data. We develop a model based on a range of candidate determinants, capable of mapping the intra-nuclear organization of proteins. The model integrates protein interactions, protein domains, post-translational modification sites and protein sequence data. The predictions of our model are accurate with a mean AUC (over all compartments) of 0.71. We present a complete map of the association of 3567 mouse nuclear proteins with intra-nuclear compartments. Each decision is explained in terms of essential interactions and domains, and qualified with a false discovery assessment. Using this resource, we uncover the collective role of transcription factors in each of the compartments. We create diagrams illustrating the outcomes of a Gene Ontology enrichment analysis. Associated with an extensive range of transcription factors, the analysis suggests that PML bodies coordinate regulatory immune responses. Contact:m.boden@uq.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Denis C Bauer
- Queensland Brain Institute, School of Chemistry and Molecular Biosciences, Queensland Facility for Advanced Bioinformatics, The University of Queensland, St Lucia, Australia
| | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
MOTIVATION Nucleo-cytoplasmic trafficking of proteins is a core regulatory process that sustains the integrity of the nuclear space of eukaryotic cells via an interplay between numerous factors. Despite progress on experimentally characterizing a number of nuclear localization signals, their presence alone remains an unreliable indicator of actual translocation. RESULTS This article introduces a probabilistic model that explicitly recognizes a variety of nuclear localization signals, and integrates relevant amino acid sequence and interaction data for any candidate nuclear protein. In particular, we develop and incorporate scoring functions based on distinct classes of classical nuclear localization signals. Our empirical results show that the model accurately predicts whether a protein is imported into the nucleus, surpassing the classification accuracy of similar predictors when evaluated on the mouse and yeast proteomes (area under the receiver operator characteristic curve of 0.84 and 0.80, respectively). The model also predicts the sequence position of a nuclear localization signal and whether it interacts with importin-α. AVAILABILITY http://pprowler.itee.uq.edu.au/NucImport
Collapse
Affiliation(s)
- Ahmed M Mehdi
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | | | | | | | | |
Collapse
|
43
|
Marfori M, Mynott A, Ellis JJ, Mehdi AM, Saunders NFW, Curmi PM, Forwood JK, Bodén M, Kobe B. Molecular basis for specificity of nuclear import and prediction of nuclear localization. Biochim Biophys Acta 2010; 1813:1562-77. [PMID: 20977914 DOI: 10.1016/j.bbamcr.2010.10.013] [Citation(s) in RCA: 303] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 06/15/2010] [Revised: 10/15/2010] [Accepted: 10/19/2010] [Indexed: 01/03/2023]
Abstract
Although proteins are translated on cytoplasmic ribosomes, many of these proteins play essential roles in the nucleus, mediating key cellular processes including but not limited to DNA replication and repair as well as transcription and RNA processing. Thus, understanding how these critical nuclear proteins are accurately targeted to the nucleus is of paramount importance in biology. Interaction and structural studies in the recent years have jointly revealed some general rules on the specificity determinants of the recognition of nuclear targeting signals by their specific receptors, at least for two nuclear import pathways: (i) the classical pathway, which involves the classical nuclear localization sequences (cNLSs) and the receptors importin-α/karyopherin-α and importin-β/karyopherin-β1; and (ii) the karyopherin-β2 pathway, which employs the proline-tyrosine (PY)-NLSs and the receptor transportin-1/karyopherin-β2. The understanding of specificity rules allows the prediction of protein nuclear localization. We review the current understanding of the molecular determinants of the specificity of nuclear import, focusing on the importin-α•cargo recognition, as well as the currently available databases and predictive tools relevant to nuclear localization. This article is part of a Special Issue entitled: Regulation of Signaling and Cellular Fate through Modulation of Nuclear Protein Import.
Collapse
Affiliation(s)
- Mary Marfori
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland 4072, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
You L, Brusic V, Gallagher M, Bodén M. Using Gaussian process with test rejection to detect T-cell epitopes in pathogen genomes. IEEE/ACM Trans Comput Biol Bioinform 2010; 7:741-751. [PMID: 21030740 DOI: 10.1109/tcbb.2008.131] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A major challenge in the development of peptide-based vaccines is finding the right immunogenic element, with efficient and long-lasting immunization effects, from large potential targets encoded by pathogen genomes. Computer models are convenient tools for scanning pathogen genomes to preselect candidate immunogenic peptides for experimental validation. Current methods predict many false positives resulting from a low prevalence of true positives. We develop a test reject method based on the prediction uncertainty estimates determined by Gaussian process regression. This method filters false positives among predicted epitopes from a pathogen genome. The performance of stand-alone Gaussian process regression is compared to other state-of-the-art methods using cross validation on 11 benchmark data sets. The results show that the Gaussian process method has the same accuracy as the top performing algorithms. The combination of Gaussian process regression with the proposed test reject method is used to detect true epitopes from the Vaccinia virus genome. The test rejection increases the prediction accuracy by reducing the number of false positives without sacrificing the method's sensitivity. We show that the Gaussian process in combination with test rejection is an effective method for prediction of T-cell epitopes in large and diverse pathogen genomes, where false positives are of concern.
Collapse
Affiliation(s)
- Liwen You
- Department of Theoretical Physics, University of Lund, Lund, Sweden.
| | | | | | | |
Collapse
|
45
|
Bodén M, Dellaire G, Burrage K, Bailey TL. A Bayesian network model of proteins' association with promyelocytic leukemia (PML) nuclear bodies. J Comput Biol 2010; 17:617-30. [PMID: 20426694 DOI: 10.1089/cmb.2009.0140] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The modularity that nuclear organization brings has the potential to explain the function of aggregates of proteins and RNA. Promyelocytic leukemia nuclear bodies are implicated in important regulatory processes. To understand the complement of proteins associated with these intra-nuclear bodies, we construct a Bayesian network model that integrates sequence and protein-protein interaction data. The model predicts association with promyelocytic leukemia nuclear bodies accurately when interaction data is available. At a false positive rate of 10%, the true positive rate is almost 50%, indicated by an independent nuclear proteome reference set. The model provides strong support for further expanding the protein complement with several important regulators and a richer functional repertoire. Using special support vector machine (SVM)-nodes (equipped with string kernels), the Bayesian network is also able to produce predictions on the basis of sequence only, with an accuracy superior to that of baseline models. Supplementary Material is available online at www.liebertonline.com.
Collapse
Affiliation(s)
- Mikael Bodén
- Institute for Molecular Bioscience, University of Queensland, St. Lucia, Queensland, Australia.
| | | | | | | |
Collapse
|
46
|
Bauer DC, Buske FA, Bailey TL, Bodén M. Predicting SUMOylation sites in developmental transcription factors of Drosophila melanogaster. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.01.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
47
|
Mohamad N, Bodén M. The proteins of intra-nuclear bodies: a data-driven analysis of sequence, interaction and expression. BMC Syst Biol 2010; 4:44. [PMID: 20388198 PMCID: PMC2859750 DOI: 10.1186/1752-0509-4-44] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 04/13/2010] [Indexed: 12/21/2022]
Abstract
Background Cajal bodies, nucleoli, PML nuclear bodies, and nuclear speckles are morpohologically distinct intra-nuclear structures that dynamically respond to cellular cues. Such nuclear bodies are hypothesized to play important regulatory roles, e.g. by sequestering and releasing transcription factors in a timely manner. While the nucleolus and nuclear speckles have received more attention experimentally, the PML nuclear body and the Cajal body are still incompletely characterized in terms of their roles and protein complement. Results By collating recent experimentally verified data, we find that almost 1000 proteins in the mouse nuclear proteome are known to associate with one or more of the nuclear bodies. Their gene ontology terms highlight their regulatory roles: splicing is confirmed to be a core activity of speckles and PML nuclear bodies house a range of proteins involved in DNA repair. We train support-vector machines to show that nuclear proteins contain discriminative sequence features that can be used to identify their intra-nuclear body associations. Prediction accuracy is highest for nucleoli and nuclear speckles. The trained models are also used to estimate the full protein complement of each nuclear body. Protein interactions are found primarily to link proteins in the nuclear speckles with proteins from other compartments. Cell cycle expression data provide support for increased activity in nucleoli, nuclear speckles and PML nuclear bodies especially during S and G2 phases. Conclusions The large-scale analysis of the mouse nuclear proteome sheds light on the functional organization of physically embodied intra-nuclear compartments. We observe partial support for the hypothesis that the physical organization of the nucleus mirrors functional modularity. However, we are unable to unambiguously identify proteins' intra-nuclear destination, suggesting that critical drivers behind of intra-nuclear translocation are yet to be identified.
Collapse
Affiliation(s)
- Nurul Mohamad
- Institute for Molecular Bioscience, The University of Queensland QLD 4072, Australia
| | | |
Collapse
|
48
|
Bailey TL, Bodén M, Whitington T, Machanick P. The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics 2010; 11:179. [PMID: 20380693 PMCID: PMC2868008 DOI: 10.1186/1471-2105-11-179] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 04/09/2010] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types-including sequence conservation, nucleosome positioning, and negative examples-can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). RESULTS We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior. CONCLUSIONS We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.
Collapse
Affiliation(s)
- Timothy L Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| | - Mikael Bodén
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| | - Tom Whitington
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| | - Philip Machanick
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia
| |
Collapse
|
49
|
Abstract
Motivation: Transcription factors (TFs) are crucial during the lifetime of the cell. Their functional roles are defined by the genes they regulate. Uncovering these roles not only sheds light on the TF at hand but puts it into the context of the complete regulatory network. Results: Here, we present an alignment- and threshold-free comparative genomics approach for assigning functional roles to DNA regulatory motifs. We incorporate our approach into the Gomo algorithm, a computational tool for detecting associations between a user-specified DNA regulatory motif [expressed as a position weight matrix (PWM)] and Gene Ontology (GO) terms. Incorporating multiple species into the analysis significantly improves Gomo's ability to identify GO terms associated with the regulatory targets of TFs. Including three comparative species in the process of predicting TF roles in Saccharomyces cerevisiae and Homo sapiens increases the number of significant predictions by 75 and 200%, respectively. The predicted GO terms are also more specific, yielding deeper biological insight into the role of the TF. Adjusting motif (binding) affinity scores for individual sequence composition proves to be essential for avoiding false positive associations. We describe a novel DNA sequence-scoring algorithm that compensates a thermodynamic measure of DNA-binding affinity for individual sequence base composition. Gomo's prediction accuracy proves to be relatively insensitive to how promoters are defined. Because Gomo uses a threshold-free form of gene set analysis, there are no free parameters to tune. Biologists can investigate the potential roles of DNA regulatory motifs of interest using Gomo via the web (http://meme.nbcr.net). Contact:t.bailey@uq.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabian A Buske
- Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | |
Collapse
|
50
|
Buske FA, Their R, Gillam EMJ, Bodén M. In silico characterization of protein chimeras: Relating sequence and function within the same fold. Proteins 2009; 77:111-20. [DOI: 10.1002/prot.22422] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|