1
|
Jagadeesh J, Vembar SS. Evolution of sequence, structural and functional diversity of the ubiquitous DNA/RNA-binding Alba domain. Sci Rep 2024; 14:30363. [PMID: 39638848 PMCID: PMC11621453 DOI: 10.1038/s41598-024-79937-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 11/13/2024] [Indexed: 12/07/2024] Open
Abstract
The DNA/RNA-binding Alba domain is prevalent across all kingdoms of life. First discovered in archaea, this protein domain has evolved from RNA- to DNA-binding, with a concomitant expansion in the range of cellular processes that it regulates. Despite its widespread presence, the full extent of its sequence, structural, and functional diversity remains unexplored. In this study, we employed iterative searches in PSI-BLAST to identify 15,161 unique Alba domain-containing proteins from the NCBI non-redundant protein database. Sequence similarity network (SSN) analysis clustered them into 13 distinct subgroups, including the archaeal Alba and eukaryotic Rpp20/Pop7 and Rpp25/Pop6 groups, as well as novel fungal and Plasmodium-specific Albas. Sequence and structural conservation analysis of the subgroups indicated high preservation of the dimer interface, with Alba domains from unicellular eukaryotes notably exhibiting structural deviations towards their C-terminal end. Finally, phylogenetic analysis, while supporting SSN clustering, revealed the evolutionary branchpoint at which the eukaryotic Rpp20- and Rpp25-like clades emerged from archaeal Albas, and the subsequent taxonomic lineage-based divergence within each clade. Taken together, this comprehensive analysis enhances our understanding of the evolutionary history of Alba domain-containing proteins across diverse organisms.
Collapse
Affiliation(s)
- Jaiganesh Jagadeesh
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, Karnataka, India
| | | |
Collapse
|
2
|
Yadav AJ, Bhagat K, Padhi AK. Integrated computational characterization of valosin-containing protein double-psi β-barrel domain: Insights into structural stability, binding mechanisms, and evolutionary significance. Int J Biol Macromol 2024; 283:137865. [PMID: 39566806 DOI: 10.1016/j.ijbiomac.2024.137865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 11/13/2024] [Accepted: 11/17/2024] [Indexed: 11/22/2024]
Abstract
Valosin-containing protein (VCP) plays a crucial role in various cellular processes, yet the molecular mechanisms and structural dynamics of its double-psi β-barrel (DPBB) domain, particularly in human, remain insufficiently explored. While previous studies have characterized the VCP_DPBB domain in other organisms, such as thermoplasma acidophilum and methanopyrus kandleri, its evolutionary conservation, binding potential, and stability in human require further investigation. To address this gap, we first employed all-atom molecular dynamics (AAMD) simulations to examine the structural dynamics of the human VCP_DPBB domain. We also assessed its amino acid interaction energies, stability, folding enthalpy, evolutionary conservation, solubility, and crystallizability using various computational frameworks. Additionally, to uncover the plausible biological function, protein-peptide docking was performed to evaluate the interactions between the DPBB domain and the C-terminal gp78 peptide of the E3 ubiquitin ligase. Further, AAMD and coarse-grained molecular dynamics (CGMD) simulations explored the binding preferences, fluctuations, and stability of human VCP_DPBB-gp78 complexes. Our findings indicate that, while thermoplasma acidophilum VCP_DPBB-gp78 showed stronger initial binding, the human VCP_DPBB-gp78 complex exhibited superior stability, binding affinity, and more stabilizing interactions. This integrated analysis provides valuable insights into the evolutionary significance and functionality of the DPBB domain, with potential therapeutic implications for VCP-related diseases.
Collapse
Affiliation(s)
- Amar Jeet Yadav
- Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India
| | - Khushboo Bhagat
- Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India
| | - Aditya K Padhi
- Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India.
| |
Collapse
|
3
|
Yang YM, Fekete A, Arsenault J, Sengar AS, Aitoubah J, Grande G, Li A, Salter EW, Wang A, Mark MD, Herlitze S, Egan SE, Salter MW, Wang LY. Intersectin-1 enhances calcium-dependent replenishment of the readily releasable pool of synaptic vesicles during development. J Physiol 2024. [PMID: 39383250 DOI: 10.1113/jp286462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 09/06/2024] [Indexed: 10/11/2024] Open
Abstract
Intersectin-1 (Itsn1) is a scaffold protein that plays a key role in coupling exocytosis and endocytosis of synaptic vesicles (SVs). However, it is unclear whether and how Itsn1 regulates these processes to support efficient neurotransmission during development. To address this, we examined the calyx of Held synapse in the auditory brainstem of wild-type and Itsn1 mutant mice before (immature) and after (mature) the onset of hearing. Itsn1 was present in the pre- and postsynaptic compartments at both developmental stages. Loss of function of Itsn1 did not alter presynaptic action potentials, Ca2+ entry via voltage-gated Ca2+ channels (VGCCs), transmitter release or short-term depression (STD) induced by depletion of SVs in the readily releasable pool (RRP) in either age group. Yet, fast Ca2+-dependent recovery from STD was attenuated in mature mutant synapses, while it was unchanged in immature mutant synapses. This deficit at mature synapses was rescued by introducing the DH-PH domains of Itsn1 into the presynaptic terminals. Inhibition of dynamin, which interacts with Itsn1 during endocytosis, had no effect on STD recovery. Interestingly, we found a developmental enrichment of Itsn1 near VGCCs, which may underlie the Itsn1-mediated fast replenishment of the RRP. Consequently, the absence of Itsn1 in mature synapses led to a higher failure rate of postsynaptic spiking during high-frequency synaptic transmission. Taken together, our findings suggest that Itsn1 translocation to the vicinity of VGCCs during development is crucial for accelerating Ca2+-dependent RRP replenishment and sustaining high-fidelity neurotransmission. KEY POINTS: Itsn1 is expressed in the pre- and postsynaptic compartments of the calyx of Held synapse. Developmental upregulation of vesicular glutamate transporter-1 is Itsn1 dependent. Itsn1 does not affect basal synaptic transmission at different developmental stages. Itsn1 is required for Ca2+-dependent recovery from short-term depression in mature synapses. Itsn1 mediates the recovery through its DH-PH domains, independent of its interactive partner dynamin. Itsn1 translocates to the vicinity of presynaptic Ca2+ channels during development. Itsn1 supports high-fidelity neurotransmission by enabling rapid recovery from vesicular depletion during repetitive activity.
Collapse
Affiliation(s)
- Yi-Mei Yang
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
- Department of Biomedical Sciences, University of Minnesota, Duluth, Minnesota, USA
| | - Adam Fekete
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Jason Arsenault
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Ameet S Sengar
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
| | - Jamila Aitoubah
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Giovanbattista Grande
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Angela Li
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
| | - Eric W Salter
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
- Department of Neuroscience, Brown University, Providence, Rhode Island, USA
| | - Alex Wang
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Neuroscience, Yale University, New Haven, Connecticut, USA
| | - Melanie D Mark
- Department of Zoology and Neurobiology, Ruhr-University Bochum, Bochum, Germany
| | - Stefan Herlitze
- Department of Zoology and Neurobiology, Ruhr-University Bochum, Bochum, Germany
| | - Sean E Egan
- Cell Biology, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Ontario, Canada
| | - Michael W Salter
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Lu-Yang Wang
- Neurosciences and Mental Health, SickKids Research Institute, Toronto, Ontario, Canada
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
4
|
Erckert K, Rost B. Assessing the role of evolutionary information for enhancing protein language model embeddings. Sci Rep 2024; 14:20692. [PMID: 39237735 PMCID: PMC11377704 DOI: 10.1038/s41598-024-71783-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
Collapse
Affiliation(s)
- Kyra Erckert
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Burkhard Rost
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
5
|
Tanoz I, Timsit Y. Protein Fold Usages in Ribosomes: Another Glance to the Past. Int J Mol Sci 2024; 25:8806. [PMID: 39201491 PMCID: PMC11354259 DOI: 10.3390/ijms25168806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/02/2024] Open
Abstract
The analysis of protein fold usage, similar to codon usage, offers profound insights into the evolution of biological systems and the origins of modern proteomes. While previous studies have examined fold distribution in modern genomes, our study focuses on the comparative distribution and usage of protein folds in ribosomes across bacteria, archaea, and eukaryotes. We identify the prevalence of certain 'super-ribosome folds,' such as the OB fold in bacteria and the SH3 domain in archaea and eukaryotes. The observed protein fold distribution in the ribosomes announces the future power-law distribution where only a few folds are highly prevalent, and most are rare. Additionally, we highlight the presence of three copies of proto-Rossmann folds in ribosomes across all kingdoms, showing its ancient and fundamental role in ribosomal structure and function. Our study also explores early mechanisms of molecular convergence, where different protein folds bind equivalent ribosomal RNA structures in ribosomes across different kingdoms. This comparative analysis enhances our understanding of ribosomal evolution, particularly the distinct evolutionary paths of the large and small subunits, and underscores the complex interplay between RNA and protein components in the transition from the RNA world to modern cellular life. Transcending the concept of folds also makes it possible to group a large number of ribosomal proteins into five categories of urfolds or metafolds, which could attest to their ancestral character and common origins. This work also demonstrates that the gradual acquisition of extensions by simple but ordered folds constitutes an inexorable evolutionary mechanism. This observation supports the idea that simple but structured ribosomal proteins preceded the development of their disordered extensions.
Collapse
Affiliation(s)
- Inzhu Tanoz
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
| | - Youri Timsit
- Aix-Marseille Université, Université de Toulon, IRD, CNRS, Mediterranean Institute of Oceanography (MIO), UM 110, 13288 Marseille, France;
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 Rue Michel-Ange, 75016 Paris, France
| |
Collapse
|
6
|
Yang K, Whitehouse RL, Dawson SL, Zhang L, Martin JG, Johnson DS, Paulo JA, Gygi SP, Yu Q. Accelerating multiplexed profiling of protein-ligand interactions: High-throughput plate-based reactive cysteine profiling with minimal input. Cell Chem Biol 2024; 31:565-576.e4. [PMID: 38118439 PMCID: PMC10960705 DOI: 10.1016/j.chembiol.2023.11.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/07/2023] [Accepted: 11/28/2023] [Indexed: 12/22/2023]
Abstract
Chemoproteomics has made significant progress in investigating small-molecule-protein interactions. However, the proteome-wide profiling of cysteine ligandability remains challenging to adapt for high-throughput applications, primarily due to a lack of platforms capable of achieving the desired depth using low input in 96- or 384-well plates. Here, we introduce a revamped, plate-based platform which enables routine interrogation of either ∼18,000 or ∼24,000 reactive cysteines based on starting amounts of 10 or 20 μg, respectively. This represents a 5-10X reduction in input and 2-3X improved coverage. We applied the platform to screen 192 electrophiles in the native HEK293T proteome, mapping the ligandability of 38,450 reactive cysteines from 8,274 human proteins. We further applied the platform to characterize new cellular targets of established drugs, uncovering that ARS-1620, a KRASG12C inhibitor, binds to and inhibits an off-target adenosine kinase ADK. The platform represents a major step forward to high-throughput proteome-wide evaluation of reactive cysteines.
Collapse
Affiliation(s)
- Ka Yang
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Shane L Dawson
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Lu Zhang
- Biogen, Cambridge, MA 02142, USA
| | | | | | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA.
| | - Qing Yu
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
7
|
Liu Y, Xu C, Zhou H, Wang W, Liu B, Li Y, Hu X, Yu F, He J. The crystal structures of Sau3AI with and without bound DNA suggest a self-activation-based DNA cleavage mechanism. Structure 2023; 31:1463-1472.e2. [PMID: 37652002 DOI: 10.1016/j.str.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 06/12/2023] [Accepted: 08/04/2023] [Indexed: 09/02/2023]
Abstract
The type II restriction endonuclease Sau3AI cleaves the sequence 5'-GATC-3' in double-strand DNA producing two sticky ends. Sau3AI cuts both DNA strands regardless of methylation status. Here, we report the crystal structures of the active site mutant Sau3AI-E64A and the C-terminal domain Sau3AI-C with a bound GATC substrate. Interestingly, the catalytic site of the N-terminal domain (Sau3AI-N) is spatially blocked by the C-terminal domain, suggesting a potential self-inhibition of the enzyme. Interruption of Sau3AI-C binding to substrate DNA disrupts Sau3AI function, suggesting a functional linkage between the N- and C-terminal domains. We propose that Sau3AI-C behaves as an allosteric effector binding one GATC substrate, which triggers a conformational change to open the N-terminal catalytic site, resulting in the subsequent GATC recognition by Sau3AI-N and cleavage of the second GATC site. Our data indicate that Sau3AI and UbaLAI might represent a new subclass of type IIE restriction enzymes.
Collapse
Affiliation(s)
- Yahui Liu
- Department of Pathogen Biology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, Hubei 430030, China
| | - Chunyan Xu
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
| | - Huan Zhou
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
| | - Weiwei Wang
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China
| | - Bing Liu
- Department of Laboratory Medicine, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Yan Li
- Department of Pathogen Biology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, Hubei 430030, China; Department of Pediatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China
| | - Xiaojian Hu
- School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Feng Yu
- Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201204, China.
| | - Jianhua He
- The Institute for Advanced Studies, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
8
|
Mathony J, Aschenbrenner S, Becker P, Niopek D. Dissecting the Determinants of Domain Insertion Tolerance and Allostery in Proteins. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2303496. [PMID: 37562980 PMCID: PMC10558690 DOI: 10.1002/advs.202303496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/21/2023] [Indexed: 08/12/2023]
Abstract
Domain insertion engineering is a promising approach to recombine the functions of evolutionarily unrelated proteins. Insertion of light-switchable receptor domains into a selected effector protein, for instance, can yield allosteric effectors with light-dependent activity. However, the parameters that determine domain insertion tolerance and allostery are poorly understood. Here, an unbiased screen is used to systematically assess the domain insertion permissibility of several evolutionary unrelated proteins. Training machine learning models on the resulting data allow to dissect features informative for domain insertion tolerance and revealed sequence conservation statistics as the strongest indicators of suitable insertion sites. Finally, extending the experimental pipeline toward the identification of switchable hybrids results in opto-chemogenetic derivatives of the transcription factor AraC that function as single-protein Boolean logic gates. The study reveals determinants of domain insertion tolerance and yielded multimodally switchable proteins with unique functional properties.
Collapse
Affiliation(s)
- Jan Mathony
- Center for Synthetic BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| | - Sabine Aschenbrenner
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| | - Philipp Becker
- Center for Synthetic BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of BiologyTechnical University of Darmstadt64287DarmstadtGermany
- Department of Biotechnology and BiomedicineTechnical University of DenmarkKongens Lyngby2800Denmark
| | - Dominik Niopek
- Institute of Pharmacy and Molecular Biotechnology (IPMB)Faculty of Engineering SciencesHeidelberg University69120HeidelbergGermany
| |
Collapse
|
9
|
Mohammadi E, Joshi SY, Deshmukh SA. Development, Validation, and Applications of Nonbonded Interaction Parameters between Coarse-Grained Amino Acid and Water Models. Biomacromolecules 2023; 24:4078-4092. [PMID: 37603467 DOI: 10.1021/acs.biomac.3c00441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Interactions between amino acids and water play an important role in determining the stability and folding/unfolding, in aqueous solution, of many biological macromolecules, which affects their function. Thus, understanding the molecular-level interactions between water and amino acids is crucial to tune their function in aqueous solutions. Herein, we have developed nonbonded interaction parameters between the coarse-grained (CG) models of 20 amino acids and the one-site CG water model. The nonbonded parameters, represented using the 12-6 Lennard Jones (LJ) potential form, have been optimized using an artificial neural network (ANN)-assisted particle swarm optimization (PSO) (ANN-assisted PSO) method. All-atom (AA) molecular dynamics (MD) simulations of dipeptides in TIP3P water molecules were performed to calculate the Gibbs hydration free energies. The nonbonded force-field (FF) parameters between CG amino acids and the one-site CG water model were developed to accurately reproduce these energies. Furthermore, to test the transferability of these newly developed parameters, we calculated the hydration free energies of the analogues of the amino acid side chains, which showed good agreement with reported experimental data. Additionally, we show the applicability of these models by performing self-assembly simulations of peptide amphiphiles. Overall, these models are transferable and can be used to study the self-assembly of various biomaterials and biomolecules to develop a mechanistic understanding of these processes.
Collapse
Affiliation(s)
- Esmat Mohammadi
- Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Soumil Y Joshi
- Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Sanket A Deshmukh
- Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
10
|
Suskiewicz MJ, Munnur D, Strømland Ø, Yang JC, Easton L, Chatrin C, Zhu K, Baretić D, Goffinont S, Schuller M, Wu WF, Elkins J, Ahel D, Sanyal S, Neuhaus D, Ahel I. Updated protein domain annotation of the PARP protein family sheds new light on biological function. Nucleic Acids Res 2023; 51:8217-8236. [PMID: 37326024 PMCID: PMC10450202 DOI: 10.1093/nar/gkad514] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/09/2023] [Accepted: 06/03/2023] [Indexed: 06/17/2023] Open
Abstract
AlphaFold2 and related computational tools have greatly aided studies of structural biology through their ability to accurately predict protein structures. In the present work, we explored AF2 structural models of the 17 canonical members of the human PARP protein family and supplemented this analysis with new experiments and an overview of recent published data. PARP proteins are typically involved in the modification of proteins and nucleic acids through mono or poly(ADP-ribosyl)ation, but this function can be modulated by the presence of various auxiliary protein domains. Our analysis provides a comprehensive view of the structured domains and long intrinsically disordered regions within human PARPs, offering a revised basis for understanding the function of these proteins. Among other functional insights, the study provides a model of PARP1 domain dynamics in the DNA-free and DNA-bound states and enhances the connection between ADP-ribosylation and RNA biology and between ADP-ribosylation and ubiquitin-like modifications by predicting putative RNA-binding domains and E2-related RWD domains in certain PARPs. In line with the bioinformatic analysis, we demonstrate for the first time PARP14's RNA-binding capability and RNA ADP-ribosylation activity in vitro. While our insights align with existing experimental data and are probably accurate, they need further validation through experiments.
Collapse
Affiliation(s)
| | - Deeksha Munnur
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Øyvind Strømland
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Ji-Chun Yang
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Laura E Easton
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Chatrin Chatrin
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Kang Zhu
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Domagoj Baretić
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | | | - Marion Schuller
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Wing-Fung Wu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Jonathan M Elkins
- Centre for Medicines Discovery, University of Oxford, Oxford OX3 7DQ, UK
| | - Dragana Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - Sumana Sanyal
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| | - David Neuhaus
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ivan Ahel
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, UK
| |
Collapse
|
11
|
Cordes MHJ, Sundman AK, Fox HC, Binford GJ. Protein salvage and repurposing in evolution: Phospholipase D toxins are stabilized by a remodeled scrap of a membrane association domain. Protein Sci 2023; 32:e4701. [PMID: 37313620 PMCID: PMC10303701 DOI: 10.1002/pro.4701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 06/03/2023] [Accepted: 06/07/2023] [Indexed: 06/15/2023]
Abstract
The glycerophosphodiester phosphodiesterase (GDPD)-like SMaseD/PLD domain family, which includes phospholipase D (PLD) toxins in recluse spiders and actinobacteria, evolved anciently in bacteria from the GDPD. The PLD enzymes retained the core (β/α)8 barrel fold of GDPD, while gaining a signature C-terminal expansion motif and losing a small insertion domain. Using sequence alignments and phylogenetic analysis, we infer that the C-terminal motif derives from a segment of an ancient bacterial PLAT domain. Formally, part of a protein containing a PLAT domain repeat underwent fusion to the C terminus of a GDPD barrel, leading to attachment of a segment of a PLAT domain, followed by a second complete PLAT domain. The complete domain was retained only in some basal homologs, but the PLAT segment was conserved and repurposed as the expansion motif. The PLAT segment corresponds to strands β7-β8 of a β-sandwich, while the expansion motif as represented in spider PLD toxins has been remodeled as an α-helix, a β-strand, and an ordered loop. The GDPD-PLAT fusion led to two acquisitions in founding the GDPD-like SMaseD/PLD family: (1) a PLAT domain that presumably supported early lipase activity by mediating membrane association, and (2) an expansion motif that putatively stabilized the catalytic domain, possibly compensating for, or permitting, loss of the insertion domain. Of wider significance, messy domain shuffling events can leave behind scraps of domains that can be salvaged, remodeled, and repurposed.
Collapse
Affiliation(s)
| | | | - Holden C. Fox
- Department of Chemistry and BiochemistryUniversity of ArizonaTucsonArizonaUSA
| | | |
Collapse
|
12
|
Li Z, Chen Y, Li Y, Zeng Y, Li W, Ma X, Huang L, Shen Y. Whole-Genome Resequencing Reveals the Diversity of Patchouli Germplasm. Int J Mol Sci 2023; 24:10970. [PMID: 37446145 DOI: 10.3390/ijms241310970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/27/2023] [Accepted: 06/27/2023] [Indexed: 07/15/2023] Open
Abstract
As an important medicinal and aromatic plant, patchouli is distributed throughout most of Asia. However, current research on patchouli's genetic diversity is limited and lacks genome-wide studies. Here, we have collected seven representative patchouli accessions from different localities and performed whole-genome resequencing on them. In total, 402,650 single nucleotide polymorphisms (SNPs) and 153,233 insertions/deletions (INDELs) were detected. Based on these abundant genetic variants, patchouli accessions were primarily classified into the Chinese group and the Southeast Asian group. However, the accession SP (Shipai) collected from China formed a distinct subgroup within the Southeast Asian group. As SP has been used as a genuine herb in traditional Chinese medicine, its unique molecular markers have been subsequently screened and verified. For 26,144 specific SNPs and 16,289 specific INDELs in SP, 10 of them were validated using Polymerase Chain Reaction (PCR) following three different approaches. Further, we analyzed the effects of total genetic variants on genes involved in the sesquiterpene synthesis pathway, which produce the primary phytochemical compounds found in patchouli. Eight genes were ultimately investigated and a gene encoding nerolidol synthetase (PatNES) was chosen and confirmed through biochemical assay. In accession YN, genetic variants in PatNES led to a loss of synthetase activity. Our results provide valuable information for understanding the diversity of patchouli germplasm resources.
Collapse
Affiliation(s)
- Zhipeng Li
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Yiqiong Chen
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Yangyan Li
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Ying Zeng
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Wanying Li
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Xiaona Ma
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Lili Huang
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
| | - Yanting Shen
- Institute of Medicinal Plant Physiology and Ecology, School of Pharmaceutical Sciences, Guangzhou University of Chinese Medicine, Guangzhou 510000, China
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100000, China
| |
Collapse
|
13
|
Holm L, Laiho A, Törönen P, Salgado M. DALI shines a light on remote homologs: One hundred discoveries. Protein Sci 2023; 32:e4519. [PMID: 36419248 PMCID: PMC9793968 DOI: 10.1002/pro.4519] [Citation(s) in RCA: 305] [Impact Index Per Article: 152.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 11/20/2022] [Indexed: 11/25/2022]
Abstract
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Collapse
Affiliation(s)
- Liisa Holm
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Aleksi Laiho
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Petri Törönen
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Marco Salgado
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
14
|
Klotz K, Radwan Y, Chakrabarti K. Dissecting Functional Biological Interactions Using Modular RNA Nanoparticles. Molecules 2022; 28:228. [PMID: 36615420 PMCID: PMC9821959 DOI: 10.3390/molecules28010228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/22/2022] [Accepted: 12/25/2022] [Indexed: 12/29/2022] Open
Abstract
Nucleic acid nanoparticles (NANPs) are an exciting and innovative technology in the context of both basic and biomedical research. Made of DNA, RNA, or their chemical analogs, NANPs are programmed for carrying out specific functions within human cells. NANPs are at the forefront of preventing, detecting, and treating disease. Their nucleic acid composition lends them biocompatibility that provides their cargo with enhanced opportunity for coordinated delivery. Of course, the NANP system of targeting specific cells and tissues is not without its disadvantages. Accumulation of NANPs outside of the target tissue and the potential for off-target effects of NANP-mediated cargo delivery present challenges to research and medical professionals and these challenges must be effectively addressed to provide safe treatment to patients. Importantly, development of NANPs with regulated biological activities and immunorecognition becomes a promising route for developing versatile nucleic acid therapeutics. In a basic research context, NANPs can assist investigators in fine-tuning the structure-function relationship of final formulations and in this review, we explore the practical applications of NANPs in laboratory and clinical settings and discuss how we can use established nucleic acid research techniques to design effective NANPs.
Collapse
Affiliation(s)
- Kaitlin Klotz
- Department of Biological Sciences, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| | - Yasmine Radwan
- Nanoscale Science Program, Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Kausik Chakrabarti
- Department of Biological Sciences, University of North Carolina at Charlotte, 9201 University City Blvd., Charlotte, NC 28223, USA
| |
Collapse
|
15
|
Budimir I, Giampieri E, Saccenti E, Suarez-Diez M, Tarozzi M, Dall'Olio D, Merlotti A, Curti N, Remondini D, Castellani G, Sala C. Intraspecies characterization of bacteria via evolutionary modeling of protein domains. Sci Rep 2022; 12:16595. [PMID: 36198716 PMCID: PMC9534902 DOI: 10.1038/s41598-022-21036-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing.
Collapse
Affiliation(s)
- Iva Budimir
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Enrico Giampieri
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, 6708 WE, Wageningen, The Netherlands
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, 6708 WE, Wageningen, The Netherlands
| | - Martina Tarozzi
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Daniele Dall'Olio
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Alessandra Merlotti
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Nico Curti
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Daniel Remondini
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Gastone Castellani
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy.
| | - Claudia Sala
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| |
Collapse
|
16
|
Russo ET, Barone F, Bateman A, Cozzini S, Punta M, Laio A. DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets. PLoS Comput Biol 2022; 18:e1010610. [PMID: 36260616 PMCID: PMC9621593 DOI: 10.1371/journal.pcbi.1010610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 10/31/2022] [Accepted: 09/26/2022] [Indexed: 11/07/2022] Open
Abstract
Proteins that are known only at a sequence level outnumber those with an experimental characterization by orders of magnitude. Classifying protein regions (domains) into homologous families can generate testable functional hypotheses for yet unannotated sequences. Existing domain family resources typically use at least some degree of manual curation: they grow slowly over time and leave a large fraction of the protein sequence space unclassified. We here describe automatic clustering by Density Peak Clustering of UniRef50 v. 2017_07, a protein sequence database including approximately 23M sequences. We performed a radical re-implementation of a pipeline we previously developed in order to allow handling millions of sequences and data volumes of the order of 3 TeraBytes. The modified pipeline, which we call DPCfam, finds ∼ 45,000 protein clusters in UniRef50. Our automatic classification is in close correspondence to the ones of the Pfam and ECOD resources: in particular, about 81% of medium-large Pfam families and 72% of ECOD families can be mapped to clusters generated by DPCfam. In addition, our protocol finds more than 14,000 clusters constituted of protein regions with no Pfam annotation, which are therefore candidates for representing novel protein families. These results are made available to the scientific community through a dedicated repository.
Collapse
Affiliation(s)
| | - Federico Barone
- SISSA, Trieste, Italy
- AREA SCIENCE PARK, Trieste, Italy
- Department of Mathematics and Geosciences, University of Trieste, Trieste, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | | | - Marco Punta
- Center for Omics Sciences, IRCCS San Raffaele Institute, Milan, Italy
- Unit of Immunogenetics, Leukemia Genomics and Immunobiology, Division of Immunology, Transplantation and Infectious Disease, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | | |
Collapse
|
17
|
Kozlova MI, Shalaeva DN, Dibrova DV, Mulkidjanian AY. Common Patterns of Hydrolysis Initiation in P-loop Fold Nucleoside Triphosphatases. Biomolecules 2022; 12:1345. [PMID: 36291554 PMCID: PMC9599529 DOI: 10.3390/biom12101345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 08/20/2022] [Accepted: 09/14/2022] [Indexed: 11/24/2022] Open
Abstract
The P-loop fold nucleoside triphosphate (NTP) hydrolases (also known as Walker NTPases) function as ATPases, GTPases, and ATP synthases, are often of medical importance, and represent one of the largest and evolutionarily oldest families of enzymes. There is still no consensus on their catalytic mechanism. To clarify this, we performed the first comparative structural analysis of more than 3100 structures of P-loop NTPases that contain bound substrate Mg-NTPs or their analogues. We proceeded on the assumption that structural features common to these P-loop NTPases may be essential for catalysis. Our results are presented in two articles. Here, in the first, we consider the structural elements that stimulate hydrolysis. Upon interaction of P-loop NTPases with their cognate activating partners (RNA/DNA/protein domains), specific stimulatory moieties, usually Arg or Lys residues, are inserted into the catalytic site and initiate the cleavage of gamma phosphate. By analyzing a plethora of structures, we found that the only shared feature was the mechanistic interaction of stimulators with the oxygen atoms of gamma-phosphate group, capable of causing its rotation. One of the oxygen atoms of gamma phosphate coordinates the cofactor Mg ion. The rotation must pull this oxygen atom away from the Mg ion. This rearrangement should affect the properties of the other Mg ligands and may initiate hydrolysis according to the mechanism elaborated in the second article.
Collapse
Affiliation(s)
- Maria I. Kozlova
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Daria N. Shalaeva
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Daria V. Dibrova
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Armen Y. Mulkidjanian
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
- Center of Cellular Nanoanalytics, Osnabrueck University, D-49069 Osnabrueck, Germany
| |
Collapse
|
18
|
Cohen Y, Hershberg R. Rapid Adaptation Often Occurs through Mutations to the Most Highly Conserved Positions of the RNA Polymerase Core Enzyme. Genome Biol Evol 2022; 14:evac105. [PMID: 35876137 PMCID: PMC9459352 DOI: 10.1093/gbe/evac105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2022] [Indexed: 11/17/2022] Open
Abstract
Mutations to the genes encoding the RNA polymerase core enzyme (RNAPC) and additional housekeeping regulatory genes were found to be involved in adaptation, in the context of numerous evolutionary experiments, in which bacteria were exposed to diverse selective pressures. This provides a conundrum, as the housekeeping genes that were so often mutated in response to these diverse selective pressures tend to be among the genes that are most conserved in their sequences across the bacterial phylogeny. In order to further examine this apparent discrepancy, we characterized the precise positions of the RNAPC involved in adaptation to a large variety of selective pressures. We found that RNAPC lab adaptations tended to occur at positions displaying traits associated with higher selective constraint. Specifically, compared to other RNAPC positions, positions involved in adaptation tended to be more conserved in their sequences within bacteria, were more often located within defined protein domains, and were located closer to the complex's active site. Higher sequence conservation was also found for resource exhaustion adaptations occurring within additional housekeeping genes. Combined, our results demonstrate that the positions that change most readily in response to well-defined selective pressures exerted in lab environments are often also those that evolve most slowly in nature.
Collapse
Affiliation(s)
- Yasmin Cohen
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, the Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel
| | - Ruth Hershberg
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, the Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel
| |
Collapse
|
19
|
Miniproteins in medicinal chemistry. Bioorg Med Chem Lett 2022; 71:128806. [PMID: 35660515 DOI: 10.1016/j.bmcl.2022.128806] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 05/11/2022] [Accepted: 05/16/2022] [Indexed: 11/20/2022]
Abstract
Miniproteins exhibit great potential as scaffolds for drug candidates because of their well-defined structure and good synthetic availability. Because of recently described methodologies for their de novo design, the field of miniproteins is emerging and can provide molecules that effectively bind to problematic targets, i.e., those that have been previously considered to be undruggable. This review describes methodologies for the development of miniprotein scaffolds and for the construction of biologically active miniproteins.
Collapse
|
20
|
Mahmud S, Guo Z, Quadir F, Liu J, Cheng J. Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps. BMC Bioinformatics 2022; 23:283. [PMID: 35854211 PMCID: PMC9295499 DOI: 10.1186/s12859-022-04829-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/08/2022] [Indexed: 01/25/2023] Open
Abstract
The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.
Collapse
Affiliation(s)
- Sajid Mahmud
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Farhan Quadir
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| |
Collapse
|
21
|
Singh NP, Krumlauf R. Diversification and Functional Evolution of HOX Proteins. Front Cell Dev Biol 2022; 10:798812. [PMID: 35646905 PMCID: PMC9136108 DOI: 10.3389/fcell.2022.798812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 04/08/2022] [Indexed: 01/07/2023] Open
Abstract
Gene duplication and divergence is a major contributor to the generation of morphological diversity and the emergence of novel features in vertebrates during evolution. The availability of sequenced genomes has facilitated our understanding of the evolution of genes and regulatory elements. However, progress in understanding conservation and divergence in the function of proteins has been slow and mainly assessed by comparing protein sequences in combination with in vitro analyses. These approaches help to classify proteins into different families and sub-families, such as distinct types of transcription factors, but how protein function varies within a gene family is less well understood. Some studies have explored the functional evolution of closely related proteins and important insights have begun to emerge. In this review, we will provide a general overview of gene duplication and functional divergence and then focus on the functional evolution of HOX proteins to illustrate evolutionary changes underlying diversification and their role in animal evolution.
Collapse
Affiliation(s)
| | - Robb Krumlauf
- Stowers Institute for Medical Research, Kansas City, MO, United States
- Department of Anatomy and Cell Biology, Kansas University Medical Center, Kansas City, KS, United States
- *Correspondence: Robb Krumlauf,
| |
Collapse
|
22
|
Ijaq J, Chandra D, Ray MK, Jagannadham MV. Investigating the Functional Role of Hypothetical Proteins From an Antarctic Bacterium Pseudomonas sp. Lz4W: Emphasis on Identifying Proteins Involved in Cold Adaptation. Front Genet 2022; 13:825269. [PMID: 35360867 PMCID: PMC8963723 DOI: 10.3389/fgene.2022.825269] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 02/07/2022] [Indexed: 11/28/2022] Open
Abstract
Exploring the molecular mechanisms behind bacterial adaptation to extreme temperatures has potential biotechnological applications. In the present study, Pseudomonas sp. Lz4W, a Gram-negative psychrophilic bacterium adapted to survive in Antarctica, was selected to decipher the molecular mechanism underlying the cold adaptation. Proteome analysis of the isolates grown at 4°C was performed to identify the proteins and pathways that are responsible for the adaptation. However, many proteins from the expressed proteome were found to be hypothetical proteins (HPs), whose function is unknown. Investigating the functional roles of these proteins may provide additional information in the biological understanding of the bacterial cold adaptation. Thus, our study aimed to assign functions to these HPs and understand their role at the molecular level. We used a structured insilico workflow combining different bioinformatics tools and databases for functional annotation. Pseudomonas sp. Lz4W genome (CP017432, version 1) contains 4493 genes and 4412 coding sequences (CDS), of which 743 CDS were annotated as HPs. Of these, from the proteome analysis, 61 HPs were found to be expressed consistently at the protein level. The amino acid sequences of these 61 HPs were submitted to our workflow and we could successfully assign a function to 18 HPs. Most of these proteins were predicted to be involved in biological mechanisms of cold adaptations such as peptidoglycan metabolism, cell wall organization, ATP hydrolysis, outer membrane fluidity, catalysis, and others. This study provided a better understanding of the functional significance of HPs in cold adaptation of Pseudomonas sp. Lz4W. Our approach emphasizes the importance of addressing the “hypothetical protein problem” for a thorough understanding of mechanisms at the cellular level, as well as, provided the assessment of integrating proteomics methods with various annotation and curation approaches to characterize hypothetical or uncharacterized protein data. The MS proteomics data generated from this study has been deposited to the ProteomeXchange through PRIDE with the dataset identifier–PXD029741.
Collapse
Affiliation(s)
- Johny Ijaq
- Metabolomics Facility, School of Life Sciences, University of Hyderabad, Hyderabad, India
| | - Deepika Chandra
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Malay Kumar Ray
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - M. V. Jagannadham
- Metabolomics Facility, School of Life Sciences, University of Hyderabad, Hyderabad, India
- *Correspondence: M. V. Jagannadham,
| |
Collapse
|
23
|
Rojano E, Jabato FM, Perkins JR, Córdoba-Caballero J, García-Criado F, Sillitoe I, Orengo C, Ranea JAG, Seoane-Zonjic P. Assigning protein function from domain-function associations using DomFun. BMC Bioinformatics 2022; 23:43. [PMID: 35033002 PMCID: PMC8761305 DOI: 10.1186/s12859-022-04565-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/05/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer's method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer's method led to the top performance in almost all scenarios. CONCLUSIONS DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer's method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun . Code maintained at https://github.com/ElenaRojano/DomFun . Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project .
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Fernando M. Jabato
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - José Córdoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Federico García-Criado
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Pedro Seoane-Zonjic
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| |
Collapse
|
24
|
Cotroneo CE, Gormley IC, Shields DC, Salter-Townshend M. Computational modelling of chromosomally clustering protein domains in bacteria. BMC Bioinformatics 2021; 22:593. [PMID: 34906073 PMCID: PMC8670047 DOI: 10.1186/s12859-021-04512-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/16/2021] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In bacteria, genes with related functions-such as those involved in the metabolism of the same compound or in infection processes-are often physically close on the genome and form groups called clusters. The enrichment of such clusters over various distantly related bacteria can be used to predict the roles of genes of unknown function that cluster with characterised genes. There is no obvious rule to define a cluster, given their variability in size and intergenic distances, and the definition of what comprises a "gene", since genes can gain and lose domains over time. Protein domains can cluster within a gene, or in adjacent genes of related function, and in both cases these are chromosomally clustered. Here, we model the distances between pairs of protein domain coding regions across a wide range of bacteria and archaea via a probabilistic two component mixture model, without imposing arbitrary thresholds in terms of gene numbers or distances. RESULTS We trained our model using matched gene ontology terms to label functionally related pairs and assess the stability of the parameters of the model across 14,178 archaeal and bacterial strains. We found that the parameters of our mixture model are remarkably stable across bacteria and archaea, except for endosymbionts and obligate intracellular pathogens. Obligate pathogens have smaller genomes, and although they vary, on average do not show noticeably different clustering distances; the main difference in the parameter estimates is that a far greater proportion of the genes sharing ontology terms are clustered. This may reflect that these genomes are enriched for complexes encoded by clustered core housekeeping genes, as a proportion of the total genes. Given the overall stability of the parameter estimates, we then used the mean parameter estimates across the entire dataset to investigate which gene ontology terms are most frequently associated with clustered genes. CONCLUSIONS Given the stability of the mixture model across species, it may be used to predict bacterial gene clusters that are shared across multiple species, in addition to giving insights into the evolutionary pressures on the chromosomal locations of genes in different species.
Collapse
Affiliation(s)
- Chiara E Cotroneo
- School of Medicine, University College Dublin, Dublin, Ireland.,Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | | | - Denis C Shields
- School of Medicine, University College Dublin, Dublin, Ireland. .,Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland.
| | | |
Collapse
|
25
|
James JK, Nanda V. A Folding Insulator Defines Cryptic Domains in Tropomyosin. J Mol Biol 2021; 433:167281. [PMID: 34606830 DOI: 10.1016/j.jmb.2021.167281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 09/20/2021] [Accepted: 09/27/2021] [Indexed: 10/20/2022]
Abstract
Multidomain proteins are the product of evolutionary selection for diversity of function through concatenation and repurposing of existing modular units of structures. In structures of proteins with multiple domains, components are often globular units stitched together with flexible linkers. Multidomain proteins often fold as multiple distinct order-disorder transitions. However, the relationship between structure and folding is not always straightforward. Tropomyosin binds to actin in muscle and cytoskeletal filaments. The structure is that of a continuous ɑ-helix lacking domain boundaries, but unfolding shows distinct transitions suggesting at least three possible domains do exist. To explore how domains might occur in a continuous structure, we used Lifson-Roig helix-coil models with sequence domains of varying helical nucleation propensities. Of these models, ones with a central folding insulator, separating folding of N- and C-terminal domains, are most consistent with experimental folding studies. The positions of domain boundaries are identified by hydrogen-deuterium exchange mass spectrometry. The presence of structurally cryptic folding domains in tropomyosin could relate to its evolution and explain the uneven distribution of deleterious mutations that lead to various cardiomyopathies.
Collapse
Affiliation(s)
- Jose K James
- Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA.
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine and the Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, NJ 08854, USA.
| |
Collapse
|
26
|
In Silico Analysis of Fatty Acid Desaturases Structures in Camelina sativa, and Functional Evaluation of Csafad7 and Csafad8 on Seed Oil Formation and Seed Morphology. Int J Mol Sci 2021; 22:ijms221910857. [PMID: 34639198 PMCID: PMC8532002 DOI: 10.3390/ijms221910857] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/01/2021] [Accepted: 10/05/2021] [Indexed: 12/19/2022] Open
Abstract
Fatty acid desaturases add a second bond into a single bond of carbon atoms in fatty acid chains, resulting in an unsaturated bond between the two carbons. They are classified into soluble and membrane-bound desaturases, according to their structure, subcellular location, and function. The orthologous genes in Camelina sativa were identified and analyzed, and a total of 62 desaturase genes were identified. It was revealed that they had the common fatty acid desaturase domain, which has evolved separately, and the proteins of the same family also originated from the same ancestry. A mix of conserved, gained, or lost intron structure was obvious. Besides, conserved histidine motifs were found in each family, and transmembrane domains were exclusively revealed in the membrane-bound desaturases. The expression profile analysis of C. sativa desaturases revealed an increase in young leaves, seeds, and flowers. C. sativa ω3-fatty acid desaturases CsaFAD7 and CsaDAF8 were cloned and the subcellular localization analysis showed their location in the chloroplast. They were transferred into Arabidopsis thaliana to obtain transgenic lines. It was revealed that the ω3-fatty acid desaturase could increase the C18:3 level at the expense of C18:2, but decreases in oil content and seed weight, and wrinkled phenotypes were observed in transgenic CsaFAD7 lines, while no significant change was observed in transgenic CsaFAD8 lines in comparison to the wild-type. These findings gave insights into the characteristics of desaturase genes, which could provide an excellent basis for further investigation for C. sativa improvement, and overexpression of ω3-fatty acid desaturases in seeds could be useful in genetic engineering strategies, which are aimed at modifying the fatty acid composition of seed oil.
Collapse
|
27
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
28
|
Hempel T, Del Razo MJ, Lee CT, Taylor BC, Amaro RE, Noé F. Independent Markov decomposition: Toward modeling kinetics of biomolecular complexes. Proc Natl Acad Sci U S A 2021; 118:e2105230118. [PMID: 34321356 PMCID: PMC8346863 DOI: 10.1073/pnas.2105230118] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
To advance the mission of in silico cell biology, modeling the interactions of large and complex biological systems becomes increasingly relevant. The combination of molecular dynamics (MD) simulations and Markov state models (MSMs) has enabled the construction of simplified models of molecular kinetics on long timescales. Despite its success, this approach is inherently limited by the size of the molecular system. With increasing size of macromolecular complexes, the number of independent or weakly coupled subsystems increases, and the number of global system states increases exponentially, making the sampling of all distinct global states unfeasible. In this work, we present a technique called independent Markov decomposition (IMD) that leverages weak coupling between subsystems to compute a global kinetic model without requiring the sampling of all combinatorial states of subsystems. We give a theoretical basis for IMD and propose an approach for finding and validating such a decomposition. Using empirical few-state MSMs of ion channel models that are well established in electrophysiology, we demonstrate that IMD models can reproduce experimental conductance measurements with a major reduction in sampling compared with a standard MSM approach. We further show how to find the optimal partition of all-atom protein simulations into weakly coupled subunits.
Collapse
Affiliation(s)
- Tim Hempel
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Mauricio J Del Razo
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, 1090 GD Amsterdam, The Netherlands
- Korteweg-de Vries Institute for Mathematics, University of Amsterdam, 1090 GE Amsterdam, The Netherlands
- Dutch Institute for Emergent Phenomena, 1090 GL Amsterdam, The Netherlands
| | - Christopher T Lee
- Department of Mechanical and Aerospace Engineering, University of California San Diego, La Jolla, CA 92093
| | - Bryn C Taylor
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA 92093
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry, University of California San Diego, La Jolla, CA 92093;
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany;
- Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, TX 77005
| |
Collapse
|
29
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
30
|
Ferruz N, Noske J, Höcker B. Protlego: A Python package for the analysis and design of chimeric proteins. Bioinformatics 2021; 37:3182-3189. [PMID: 33901273 PMCID: PMC8504633 DOI: 10.1093/bioinformatics/btab253] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 03/05/2021] [Accepted: 04/19/2021] [Indexed: 01/03/2023] Open
Abstract
Motivation Duplication and recombination of protein fragments have led to the highly diverse protein space that we observe today. By mimicking this natural process, the design of protein chimeras via fragment recombination has proven experimentally successful and has opened a new era for the design of customizable proteins. The in silico building of structural models for these chimeric proteins, however, remains a manual task that requires a considerable degree of expertise and is not amenable for high-throughput studies. Energetic and structural analysis of the designed proteins often require the use of several tools, each with their unique technical difficulties and available in different programming languages or web servers. Results We implemented a Python package that enables automated, high-throughput design of chimeras and their structural analysis. First, it fetches evolutionarily conserved fragments from a built-in database (also available at fuzzle.uni-bayreuth.de). These relationships can then be represented via networks or further selected for chimera construction via recombination. Designed chimeras or natural proteins are then scored and minimized with the Charmm and Amber forcefields and their diverse structural features can be analyzed at ease. Here, we showcase Protlego’s pipeline by exploring the relationships between the P-loop and Rossmann superfolds, building and characterizing their offspring chimeras. We believe that Protlego provides a powerful new tool for the protein design community. Availability and implementation Protlego runs on the Linux platform and is freely available at (https://hoecker-lab.github.io/protlego/) with tutorials and documentation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Jakob Noske
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
31
|
Dermouche S, Chagot ME, Manival X, Quinternet M. Optimizing the First TPR Domain of the Human SPAG1 Protein Provides Insight into the HSP70 and HSP90 Binding Properties. Biochemistry 2021; 60:2349-2363. [PMID: 33739091 DOI: 10.1021/acs.biochem.1c00052] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Tetratricopeptide repeat domains, or TPR domains, are protein domains that mediate protein:protein interaction. As they allow contacts between proteins, they are of particular interest in transient steps of the assembly process of macromolecular complexes, such as the ribosome or the dynein arms. In this study, we focused on the first TPR domain of the human SPAG1 protein. SPAG1 is a multidomain protein that is important for ciliogenesis whose known mutations are linked to primary ciliary dyskinesia syndrome. It can interact with the chaperones RUVBL1/2, HSP70, and HSP90. Using protein sequence optimization in combination with structural and biophysical approaches, we analyzed, with atomistic precision, how the C-terminal tails of HSPs bind a variant form of SPAG1-TPR1 that mimics the wild-type domain. We discuss our results with regard to other complex three-dimensional structures with the aim of highlighting the motifs in the TPR sequences that could drive the positioning of the HSP peptides. These data could be important for the druggability of TPR regulators.
Collapse
Affiliation(s)
- Sana Dermouche
- Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France
| | | | - Xavier Manival
- Université de Lorraine, CNRS, IMoPA, F-54000 Nancy, France
| | - Marc Quinternet
- Université de Lorraine, CNRS, INSERM, IBSLor, F-54000 Nancy, France
| |
Collapse
|
32
|
Russo ET, Laio A, Punta M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics 2021; 22:121. [PMID: 33711918 PMCID: PMC7955657 DOI: 10.1186/s12859-021-04013-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 02/09/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND The identification of protein families is of outstanding practical importance for in silico protein annotation and is at the basis of several bioinformatic resources. Pfam is possibly the most well known protein family database, built in many years of work by domain experts with extensive use of manual curation. This approach is generally very accurate, but it is quite time consuming and it may suffer from a bias generated from the hand-curation itself, which is often guided by the available experimental evidence. RESULTS We introduce a procedure that aims to identify automatically putative protein families. The procedure is based on Density Peak Clustering and uses as input only local pairwise alignments between protein sequences. In the experiment we present here, we ran the algorithm on about 4000 full-length proteins with at least one domain classified by Pfam as belonging to the Pseudouridine synthase and Archaeosine transglycosylase (PUA) clan. We obtained 71 automatically-generated sequence clusters with at least 100 members. While our clusters were largely consistent with the Pfam classification, showing good overlap with either single or multi-domain Pfam family architectures, we also observed some inconsistencies. The latter were inspected using structural and sequence based evidence, which suggested that the automatic classification captured evolutionary signals reflecting non-trivial features of protein family architectures. Based on this analysis we identified a putative novel pre-PUA domain as well as alternative boundaries for a few PUA or PUA-associated families. As a first indication that our approach was unlikely to be clan-specific, we performed the same analysis on the P53 clan, obtaining comparable results. CONCLUSIONS The clustering procedure described in this work takes advantage of the information contained in a large set of pairwise alignments and successfully identifies a set of putative families and family architectures in an unsupervised manner. Comparison with the Pfam classification highlights significant overlap and points to interesting differences, suggesting that our new algorithm could have potential in applications related to automatic protein classification. Testing this hypothesis, however, will require further experiments on large and diverse sequence datasets.
Collapse
Affiliation(s)
| | | | - Marco Punta
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, SM2 5NG UK
- Present Address: Center for Omics Sciences, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| |
Collapse
|
33
|
Ma Y, Chhapekar SS, Lu L, Oh S, Singh S, Kim CS, Kim S, Choi GJ, Lim YP, Choi SR. Genome-wide identification and characterization of NBS-encoding genes in Raphanus sativus L. and their roles related to Fusarium oxysporum resistance. BMC PLANT BIOLOGY 2021; 21:47. [PMID: 33461498 PMCID: PMC7814608 DOI: 10.1186/s12870-020-02803-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 12/16/2020] [Indexed: 05/21/2023]
Abstract
BACKGROUND The nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes are important for plant development and disease resistance. Although genome-wide studies of NBS-encoding genes have been performed in several species, the evolution, structure, expression, and function of these genes remain unknown in radish (Raphanus sativus L.). A recently released draft R. sativus L. reference genome has facilitated the genome-wide identification and characterization of NBS-encoding genes in radish. RESULTS A total of 225 NBS-encoding genes were identified in the radish genome based on the essential NB-ARC domain through HMM search and Pfam database, with 202 mapped onto nine chromosomes and the remaining 23 localized on different scaffolds. According to a gene structure analysis, we identified 99 NBS-LRR-type genes and 126 partial NBS-encoding genes. Additionally, 80 and 19 genes respectively encoded an N-terminal Toll/interleukin-like domain and a coiled-coil domain. Furthermore, 72% of the 202 NBS-encoding genes were grouped in 48 clusters distributed in 24 crucifer blocks on chromosomes. The U block on chromosomes R02, R04, and R08 had the most NBS-encoding genes (48), followed by the R (24), D (23), E (23), and F (17) blocks. These clusters were mostly homogeneous, containing NBS-encoding genes derived from a recent common ancestor. Tandem (15 events) and segmental (20 events) duplications were revealed in the NBS family. Comparative evolutionary analyses of orthologous genes among Arabidopsis thaliana, Brassica rapa, and Brassica oleracea reflected the importance of the NBS-LRR gene family during evolution. Moreover, examinations of cis-elements identified 70 major elements involved in responses to methyl jasmonate, abscisic acid, auxin, and salicylic acid. According to RNA-seq expression analyses, 75 NBS-encoding genes contributed to the resistance of radish to Fusarium wilt. A quantitative real-time PCR analysis revealed that RsTNL03 (Rs093020) and RsTNL09 (Rs042580) expression positively regulates radish resistance to Fusarium oxysporum, in contrast to the negative regulatory role for RsTNL06 (Rs053740). CONCLUSIONS The NBS-encoding gene structures, tandem and segmental duplications, synteny, and expression profiles in radish were elucidated for the first time and compared with those of other Brassicaceae family members (A. thaliana, B. oleracea, and B. rapa) to clarify the evolution of the NBS gene family. These results may be useful for functionally characterizing NBS-encoding genes in radish.
Collapse
Affiliation(s)
- Yinbo Ma
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Sushil Satish Chhapekar
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Lu Lu
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Sangheon Oh
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Sonam Singh
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Chang Soo Kim
- Department of Crop Science, College of Agricultural and Life Sciences, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Seungho Kim
- Neo Seed Co., 256-45 Jingeonjung-gil, Gongdo-eup, Anseong, Gyeonggi Province 17565 Republic of Korea
| | - Gyung Ja Choi
- Center for Eco-friendly New Materials, Korea Research Institute of Chemical Technology, Daejeon, 34114 Republic of Korea
| | - Yong Pyo Lim
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| | - Su Ryun Choi
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Science, Chungnam National University, Daejeon, 34134 Republic of Korea
| |
Collapse
|
34
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
35
|
Wanchai V, Nookaew I, Ussery DW. ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices. Comput Struct Biotechnol J 2020; 18:3890-3896. [PMID: 33335686 PMCID: PMC7719867 DOI: 10.1016/j.csbj.2020.10.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 10/20/2020] [Accepted: 10/23/2020] [Indexed: 11/26/2022] Open
Abstract
Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One of the applications is to use as a high-throughput screening method for pathogenicity of genomes. Unlike sequence homology methods, protein comparison at a functional level provides us with a unique opportunity to classify proteins, based on their functional structures without dealing with sequence complexity of distantly related species. Protein functions can be abstractly described by a set of protein functional domains, such as PfamA domains; a set of genomes can then be mapped to a matrix, with each row representing a genome, and the columns representing the presence or absence of a given functional domain. However, a powerful tool is needed to analyze the large sparse matrices generated by millions of genomes that will become available in the near future. The ProdMX is a tool with user-friendly utilities developed to facilitate high-throughput analysis of proteins with an ability to be included as an effective module in the high-throughput pipeline. The ProdMX employs a compressed sparse matrix algorithm to reduce computational resources and time used to perform the matrix manipulation during functional domain analysis. The ProdMX is a free and publicly available Python package which can be installed with popular package mangers such as PyPI and Conda, or with a standard installer from source code available on the ProdMX GitHub repository at https://github.com/visanuwan/prodmx.
Collapse
Affiliation(s)
- Visanu Wanchai
- Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - Intawat Nookaew
- Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| | - David W Ussery
- Arkansas Center for Genomic Epidemiology & Medicine and The Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA
| |
Collapse
|
36
|
Chakraborty C, Sharma AR, Sharma G, Lee SS. Comparative Analysis and Molecular Evolution of Class I PI3K Regulatory Subunit p85α Reveal the Structural Similarity Between nSH2 and cSH2 Domains. Int J Pept Res Ther 2020. [DOI: 10.1007/s10989-020-10039-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
37
|
Ferruz N, Lobos F, Lemm D, Toledo-Patino S, Farías-Rico JA, Schmidt S, Höcker B. Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design. J Mol Biol 2020; 432:3898-3914. [PMID: 32330481 PMCID: PMC7322520 DOI: 10.1016/j.jmb.2020.04.013] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/15/2022]
Abstract
Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle (fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Francisco Lobos
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dominik Lemm
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| | - Saacnicteh Toledo-Patino
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Steffen Schmidt
- Max Planck Institute for Developmental Biology, Tübingen, Germany; Computational Biochemistry, University of Bayreuth, Bayreuth, Germany.
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany; Max Planck Institute for Developmental Biology, Tübingen, Germany.
| |
Collapse
|
38
|
Begum Y, Mondal SK. Comprehensive study of the genes involved in chlorophyll synthesis and degradation pathways in some monocot and dicot plant species. J Biomol Struct Dyn 2020; 39:2387-2414. [PMID: 32292132 DOI: 10.1080/07391102.2020.1748717] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Chlorophyll (Chl) biosynthesis is one of the most important cellular processes essential for plant photosynthesis. Chl degradation pathway is also important catabolic process occurs during leaf senescence, fruit ripening and under biotic or abiotic stress conditions. Here we have systematically investigated the molecular evolution, gene structure, compositional analysis along with ENc plot, correspondence analysis and codon usage bias of the proteins and encoded genes involved in Chl metabolism from monocots and dicots. The gene and species specific phylogenetic trees using amino acid sequences showed clear clustering formation of the selected species based on monocots and dicots but not supported by 18S rRNA. Nucleotide composition of the encoding genes showed that average GC%, GC1%, GC2% and GC3% were higher in monocots. RSCU analysis depicts that genes from monocots for both pathways and genes for synthesis pathway from dicots only biased to G/C-ending synonymous codons but in degradation pathway most optimal codons (except UUG) in dicots biased to A/U-ending synonymous codons. We found strong evidence of episodic diversifying selection at several amino acid sites in all genes investigated. Conserved domain and gene structures were observed for the genes with varying lengths of introns and exons, involved in Chl metabolism along with some intronless genes within synthesis pathway. ENc and correspondence analyses suggested the mutational or selection constraint on the genes to shape the codon usage. These comprehensive studies may be helpful in further research in molecular phylogenetics and genomics and to better understand the evolutionary dynamics of Chl metabolic pathway.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Yasmin Begum
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India.,Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta, Kolkata, West Bengal, India
| | - Sunil Kanti Mondal
- Department of Biotechnology, The University of Burdwan, Burdwan, West Bengal, India
| |
Collapse
|
39
|
Veevers R, Cawley G, Hayward S. Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression. BMC Bioinformatics 2020; 21:137. [PMID: 32272894 PMCID: PMC7147021 DOI: 10.1186/s12859-020-3464-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/20/2020] [Indexed: 11/12/2022] Open
Abstract
Background Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. Results The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. Conclusion In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk.
Collapse
Affiliation(s)
- Ruth Veevers
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Gavin Cawley
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Steven Hayward
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
40
|
Milner-White EJ. Protein three-dimensional structures at the origin of life. Interface Focus 2019; 9:20190057. [PMID: 31641431 DOI: 10.1098/rsfs.2019.0057] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2019] [Indexed: 12/22/2022] Open
Abstract
Proteins are relatively easy to synthesize, compared to nucleic acids and it is likely that there existed a stage prior to the RNA world which can be called the protein world. Some of the three-dimensional (3D) peptide structures in these proteins have, we argue, been conserved since then and may constitute the oldest biological relics in existence. We focus on 3D peptide motifs consisting of up to eight or so amino acid residues. The best known of these is the 'nest', a three- to seven-residue protein motif, which has the function of binding anionic atoms or groups of atoms. Ten per cent of amino acids in typical proteins belong to a nest, so it is a common motif. A five-residue nest is found as part of the well-known P-loop that is a recurring feature of many ATP or GTP-binding proteins and it has the function of binding the phosphate part of these ligands. A synthetic hexapeptide, ser-gly-ala-gly-lys-thr, designed to resemble the P-loop, has been shown to bind inorganic phosphate. Another type of nest binds iron-sulfur centres. A range of other simple motifs occur with various intriguing 3D structures; others bind cations or form channels that transport potassium ions; other peptides form catalytically active haem-like or sheet structures with certain transition metals. Amyloid peptides are also discussed. It now seems that the earliest polypeptides were far from being functionless stretches, and had many of the properties, both binding and catalytic, that might be expected to encourage and stabilize simple life forms in the hydrothermal vents of ocean depths.
Collapse
Affiliation(s)
- E James Milner-White
- Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G128QQ, UK
| |
Collapse
|
41
|
Wang L, Zhang Y, Zou S. The characterization of pc-polylines representing protein backbones. Proteins 2019; 88:307-318. [PMID: 31442337 DOI: 10.1002/prot.25803] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 08/08/2019] [Accepted: 08/19/2019] [Indexed: 11/10/2022]
Abstract
The backbone of a protein is typically represented as either a C α -polyline, a three-dimensional (3D) polyline that passes through the C α atoms, or a tuple of ϕ,ψ pairs while its fold is usually assigned using the 3D topological arrangement of the secondary structure elements (SSEs). It is tricky to obtain the SSE composition for a protein from the C α -polyline representation while its 3D SSE arrangement is not apparent in the two-dimensional (2D) ϕ,ψ representation. In this article, we first represent the backbone of a protein as a pc-polyline that passes through the centers of its peptide planes. We then analyze the pc-polylines for six different sets of proteins with high quality crystal structures. The results show that SSE composition becomes recognizable in pc-polyline presentation and consequently the geometrical property of the pc-polyline of a protein could be used to assign its secondary structure. Furthermore, our analysis finds that for each of the six sets the total length of a pc-polyline increases linearly with the number of the peptide planes. Interestingly a comparison of the six regression lines shows that they have almost identical slopes but different intercepts. Most interestingly there exist decent linear correlations between the intercepts of the six lines and either the average helix contents or the average sheet contents and between the intercepts and the average backbone hydrogen bonding energetics. Finally, we discuss the implications of the identified correlations for structure classification and protein folding, and the potential applications of pc-polyline representation to structure prediction and protein design.
Collapse
Affiliation(s)
- Lincong Wang
- The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| | - Yao Zhang
- The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| | - Shuxue Zou
- The College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
| |
Collapse
|
42
|
Horsefield S, Burdett H, Zhang X, Manik MK, Shi Y, Chen J, Qi T, Gilley J, Lai JS, Rank MX, Casey LW, Gu W, Ericsson DJ, Foley G, Hughes RO, Bosanac T, von Itzstein M, Rathjen JP, Nanson JD, Boden M, Dry IB, Williams SJ, Staskawicz BJ, Coleman MP, Ve T, Dodds PN, Kobe B. NAD + cleavage activity by animal and plant TIR domains in cell death pathways. Science 2019; 365:793-799. [PMID: 31439792 DOI: 10.1126/science.aax1911] [Citation(s) in RCA: 334] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 07/23/2019] [Indexed: 02/02/2023]
Abstract
SARM1 (sterile alpha and TIR motif containing 1) is responsible for depletion of nicotinamide adenine dinucleotide in its oxidized form (NAD+) during Wallerian degeneration associated with neuropathies. Plant nucleotide-binding leucine-rich repeat (NLR) immune receptors recognize pathogen effector proteins and trigger localized cell death to restrict pathogen infection. Both processes depend on closely related Toll/interleukin-1 receptor (TIR) domains in these proteins, which, as we show, feature self-association-dependent NAD+ cleavage activity associated with cell death signaling. We further show that SARM1 SAM (sterile alpha motif) domains form an octamer essential for axon degeneration that contributes to TIR domain enzymatic activity. The crystal structures of ribose and NADP+ (the oxidized form of nicotinamide adenine dinucleotide phosphate) complexes of SARM1 and plant NLR RUN1 TIR domains, respectively, reveal a conserved substrate binding site. NAD+ cleavage by TIR domains is therefore a conserved feature of animal and plant cell death signaling pathways.
Collapse
Affiliation(s)
- Shane Horsefield
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Hayden Burdett
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Xiaoxiao Zhang
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT 2601, Australia.,Plant Sciences Division, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Mohammad K Manik
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Yun Shi
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| | - Jian Chen
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT 2601, Australia.,Plant Sciences Division, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Tiancong Qi
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Jonathan Gilley
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, ED Adrian Building, Forvie Site, Robinson Way, Cambridge CB2 0PY, UK.,Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Jhih-Siang Lai
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Maxwell X Rank
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Lachlan W Casey
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia.,Centre for Microscopy and Microanalysis, University of Queensland, Brisbane, QLD 4072, Australia
| | - Weixi Gu
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Daniel J Ericsson
- Macromolecular Crystallography (MX) Beamlines, Australian Synchrotron, Melbourne, VIC 3168, Australia
| | - Gabriel Foley
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Robert O Hughes
- Disarm Therapeutics, 400 Technology Square, Cambridge, MA 02139, USA
| | - Todd Bosanac
- Disarm Therapeutics, 400 Technology Square, Cambridge, MA 02139, USA
| | - Mark von Itzstein
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| | - John P Rathjen
- Plant Sciences Division, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Jeffrey D Nanson
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Mikael Boden
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia
| | - Ian B Dry
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Urrbrae, SA 5064, Australia
| | - Simon J Williams
- Plant Sciences Division, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Brian J Staskawicz
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Michael P Coleman
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, ED Adrian Building, Forvie Site, Robinson Way, Cambridge CB2 0PY, UK.,Babraham Institute, Babraham, Cambridge CB22 3AT, UK
| | - Thomas Ve
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia. .,Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| | - Peter N Dodds
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Canberra, ACT 2601, Australia.
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
43
|
AlQuraishi M. End-to-End Differentiable Learning of Protein Structure. Cell Syst 2019; 8:292-301.e3. [PMID: 31005579 PMCID: PMC6513320 DOI: 10.1016/j.cels.2019.03.006] [Citation(s) in RCA: 193] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 02/01/2019] [Accepted: 03/11/2019] [Indexed: 12/11/2022]
Abstract
Predicting protein structure from sequence is a central challenge of biochemistry. Co-evolution methods show promise, but an explicit sequence-to-structure map remains elusive. Advances in deep learning that replace complex, human-designed pipelines with differentiable models optimized end to end suggest the potential benefits of similarly reformulating structure prediction. Here, we introduce an end-to-end differentiable model for protein structure learning. The model couples local and global protein structure via geometric units that optimize global geometry without violating local covalent chemistry. We test our model using two challenging tasks: predicting novel folds without co-evolutionary data and predicting known folds without structural templates. In the first task, the model achieves state-of-the-art accuracy, and in the second, it comes within 1-2 Å; competing methods using co-evolution and experimental templates have been refined over many years, and it is likely that the differentiable approach has substantial room for further improvement, with applications ranging from drug discovery to protein design.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
44
|
Catazaro J, Caprez A, Swanson D, Powers R. Functional Evolution of Proteins. Proteins 2019; 87:492-501. [PMID: 30714210 DOI: 10.1002/prot.25670] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 11/02/2018] [Accepted: 01/31/2019] [Indexed: 11/12/2022]
Abstract
The functional evolution of proteins advances through gene duplication followed by functional drift, whereas molecular evolution occurs through random mutational events. Over time, protein active-site structures or functional epitopes remain highly conserved, which enables relationships to be inferred between distant orthologs or paralogs. In this study, we present the first functional clustering and evolutionary analysis of the RCSB Protein Data Bank (RCSB PDB) based on similarities between active-site structures. All of the ligand-bound proteins within the RCSB PDB were scored using our Comparison of Protein Active-site Structures (CPASS) software and database (http://cpass.unl.edu/). Principal component analysis was then used to identify 4431 representative structures to construct a phylogenetic tree based on the CPASS comparative scores (http://itol.embl.de/shared/jcatazaro). The resulting phylogenetic tree identified a sequential, step-wise evolution of protein active-sites and provides novel insights into the emergence of protein function or changes in substrate specificity based on subtle changes in geometry and amino acid composition.
Collapse
Affiliation(s)
- Jonathan Catazaro
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Adam Caprez
- Holland Computing Center, Office of Research, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - David Swanson
- Holland Computing Center, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska.,Department of Chemistry, Nebraska Center for Integrated Biomolecular Communication, Lincoln, Nebraska
| |
Collapse
|
45
|
Phua SX, Chan KF, Su CTT, Poh JJ, Gan SKE. Perspective: The promises of a holistic view of proteins-impact on antibody engineering and drug discovery. Biosci Rep 2019; 39:BSR20181958. [PMID: 30630879 PMCID: PMC6398899 DOI: 10.1042/bsr20181958] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 12/27/2018] [Accepted: 01/09/2019] [Indexed: 12/23/2022] Open
Abstract
The reductionist approach is prevalent in biomedical science. However, increasing evidence now shows that biological systems cannot be simply considered as the sum of its parts. With experimental, technological, and computational advances, we can now do more than view parts in isolation, thus we propose that an increasing holistic view (where a protein is investigated as much as a whole as possible) is now timely. To further advocate this, we review and discuss several studies and applications involving allostery, where distant protein regions can cross-talk to influence functionality. Therefore, we believe that an increasing big picture approach holds great promise, particularly in the areas of antibody engineering and drug discovery in rational drug design.
Collapse
Affiliation(s)
- Ser-Xian Phua
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Kwok-Fong Chan
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Chinh Tran-To Su
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Jun-Jie Poh
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore
- APD SKEG Pte Ltd, Singapore
| | - Samuel Ken-En Gan
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), Singapore
- APD SKEG Pte Ltd, Singapore
- p53 Laboratory, Agency for Science, Technology and Research (A*STAR), Singapore
| |
Collapse
|
46
|
Shalaeva DN, Cherepanov DA, Galperin MY, Golovin AV, Mulkidjanian AY. Evolution of cation binding in the active sites of P-loop nucleoside triphosphatases in relation to the basic catalytic mechanism. eLife 2018; 7:e37373. [PMID: 30526846 PMCID: PMC6310460 DOI: 10.7554/elife.37373] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 11/26/2018] [Indexed: 01/01/2023] Open
Abstract
The ubiquitous P-loop fold nucleoside triphosphatases (NTPases) are typically activated by an arginine or lysine 'finger'. Some of the apparently ancestral NTPases are, instead, activated by potassium ions. To clarify the activation mechanism, we combined comparative structure analysis with molecular dynamics (MD) simulations of Mg-ATP and Mg-GTP complexes in water and in the presence of potassium, sodium, or ammonium ions. In all analyzed structures of diverse P-loop NTPases, the conserved P-loop motif keeps the triphosphate chain of bound NTPs (or their analogs) in an extended, catalytically prone conformation, similar to that imposed on NTPs in water by potassium or ammonium ions. MD simulations of potassium-dependent GTPase MnmE showed that linking of alpha- and gamma phosphates by the activating potassium ion led to the rotation of the gamma-phosphate group yielding an almost eclipsed, catalytically productive conformation of the triphosphate chain, which could represent the basic mechanism of hydrolysis by P-loop NTPases.
Collapse
Affiliation(s)
- Daria N Shalaeva
- School of PhysicsUniversity of OsnabrückOsnabrückGermany
- A.N. Belozersky Institute of Physico-Chemical BiologyLomonosov Moscow State UniversityMoscowRussia
- School of Bioengineering and BioinformaticsLomonosov Moscow State UniversityMoscowRussia
| | - Dmitry A Cherepanov
- A.N. Belozersky Institute of Physico-Chemical BiologyLomonosov Moscow State UniversityMoscowRussia
- Semenov Institute of Chemical PhysicsRussian Academy of SciencesMoscowRussia
| | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Andrey V Golovin
- School of Bioengineering and BioinformaticsLomonosov Moscow State UniversityMoscowRussia
| | - Armen Y Mulkidjanian
- School of PhysicsUniversity of OsnabrückOsnabrückGermany
- A.N. Belozersky Institute of Physico-Chemical BiologyLomonosov Moscow State UniversityMoscowRussia
- School of Bioengineering and BioinformaticsLomonosov Moscow State UniversityMoscowRussia
| |
Collapse
|
47
|
Saravanan KM, Ponnuraj K. Sequence and structural analysis of fibronectin-binding protein reveals importance of multiple intrinsic disordered tandem repeats. J Mol Recognit 2018; 32:e2768. [PMID: 30397967 DOI: 10.1002/jmr.2768] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/03/2018] [Accepted: 10/06/2018] [Indexed: 12/24/2022]
Abstract
The location of certain amino acid sequences like repeats along the polypeptide chain is very important in the context of forming the overall shape of the protein molecule which in fact determines its function. In gram-positive bacteria, fibronectin-binding protein (FnBP) is one such repeat containing protein, and it is a cell wall-attached protein responsible for various acute infections in human. Several studies on sequence, structure, and function of fibronectin-binding regions of FnBPs were reported; however, no detailed study was carried out on the full-length protein sequence. In the present study, we have made a thorough sequence and structure analysis on FnBP_A of Staphylococcus aureus and explored the presence of dual ligand-binding ability of fibrinogen (fg)-binding region and its molecular recognition processes. Multiple sequence alignment and protein-protein docking analysis reveal the regions which are likely involved in dual ligand binding. Further analysis of docking of FnBP_A fg-binding region and fn N-terminal modules suggests that if the latter binds to the fg-binding region of FnBP_A, it would inhibit the subsequent binding of fg because of steric hindrance. The sequence analysis further suggests that the abundance of disorder promoting residue glutamic acid and dual personality (both order/disorder promoting) residue threonine in tandem repeats of FnBP_A and B proteins possibly would help the molecule to undergo a conformational change while binding with fn by β-zipper mechanism. The segment-based power spectral analysis was carried out which helps to understand the distribution of hydrophobic residues along the sequence particularly in intrinsic disordered tandem repeats. The results presented here will help to understand the role of internal repeats and intrinsic disorder in the molecular recognition process of a pathogenic cell surface protein.
Collapse
Affiliation(s)
- Konda Mani Saravanan
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Chennai, India
| | - Karthe Ponnuraj
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Chennai, India
| |
Collapse
|
48
|
Navigating Among Known Structures in Protein Space. Methods Mol Biol 2018. [PMID: 30298400 DOI: 10.1007/978-1-4939-8736-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.
Collapse
|
49
|
Engelmann BW, Hsiao CJ, Blischak JD, Fourne Y, Khan Z, Ford M, Gilad Y. A Methodological Assessment and Characterization of Genetically-Driven Variation in Three Human Phosphoproteomes. Sci Rep 2018; 8:12106. [PMID: 30108239 PMCID: PMC6092387 DOI: 10.1038/s41598-018-30587-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 07/17/2018] [Indexed: 11/12/2022] Open
Abstract
Phosphorylation of proteins on serine, threonine, and tyrosine residues is a ubiquitous post-translational modification that plays a key part of essentially every cell signaling process. It is reasonable to assume that inter-individual variation in protein phosphorylation may underlie phenotypic differences, as has been observed for practically any other molecular regulatory phenotype. However, we do not know much about the extent of inter-individual variation in phosphorylation because it is quite challenging to perform a quantitative high throughput study to assess inter-individual variation in any post-translational modification. To test our ability to address this challenge with SILAC-based mass spectrometry, we quantified phosphorylation levels for three genotyped human cell lines within a nested experimental framework, and found that genetic background is the primary determinant of phosphoproteome variation. We uncovered multiple functional, biophysical, and genetic associations with germline driven phosphopeptide variation. Variants affecting protein levels or structure were among these associations, with the latter presenting, on average, a stronger effect. Interestingly, we found evidence that is consistent with a phosphopeptide variability buffering effect endowed from properties enriched within longer proteins. Because the small sample size in this 'pilot' study may limit the applicability of our genetic observations, we also undertook a thorough technical assessment of our experimental workflow to aid further efforts. Taken together, these results provide the foundation for future work to characterize inter-individual variation in post-translational modification levels and reveal novel insights into the nature of inter-individual variation in phosphorylation.
Collapse
Affiliation(s)
- Brett W Engelmann
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- AbbVie, North Chicago, Illinois, USA.
| | | | - John D Blischak
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Yannick Fourne
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Zia Khan
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
- Genentech, South San Francisco, California, USA
| | - Michael Ford
- MS Bioworks, LLC, 3950, Varsity Drive, Ann Arbor, Michigan, USA
| | - Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- Department of Medicine, University of Chicago, Chicago, Illinois, USA.
| |
Collapse
|
50
|
The FAM83 family of proteins: from pseudo-PLDs to anchors for CK1 isoforms. Biochem Soc Trans 2018; 46:761-771. [PMID: 29871876 PMCID: PMC6008594 DOI: 10.1042/bst20160277] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 04/05/2018] [Accepted: 04/09/2018] [Indexed: 12/12/2022]
Abstract
The eight members of the FAM83 (FAMily with sequence similarity 83) family of poorly characterised proteins are only present in vertebrates and are defined by the presence of the conserved DUF1669 domain of unknown function at their N-termini. The DUF1669 domain consists of a conserved phospholipase D (PLD)-like catalytic motif. However, the FAM83 proteins display no PLD catalytic (PLDc) activity, and the pseudo-PLDc motif present in each FAM83 member lacks the crucial elements of the native PLDc motif. In the absence of catalytic activity, it is likely that the DUF1669 domain has evolved to espouse novel function(s) in biology. Recent studies have indicated that the DUF1669 domain mediates the interaction with different isoforms of the CK1 (casein kinase 1) family of Ser/Thr protein kinases. In turn, different FAM83 proteins, which exhibit unique amino acid sequences outside the DUF1669 domain, deliver CK1 isoforms to unique subcellular compartments. One of the first protein kinases to be discovered, the CK1 isoforms are thought to be constitutively active and are known to control a plethora of biological processes. Yet, their regulation of kinase activity, substrate selectivity and subcellular localisation has remained a mystery. The emerging evidence now supports a central role for the DUF1669 domain, and the FAM83 proteins, in the regulation of CK1 biology.
Collapse
|