1
|
Timsit Y, Sergeant-Perthuis G, Bennequin D. The role of ribosomal protein networks in ribosome dynamics. Nucleic Acids Res 2025; 53:gkae1308. [PMID: 39788545 PMCID: PMC11711686 DOI: 10.1093/nar/gkae1308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 12/12/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025] Open
Abstract
Accurate protein synthesis requires ribosomes to integrate signals from distant functional sites and execute complex dynamics. Despite advances in understanding ribosome structure and function, two key questions remain: how information is transmitted between these distant sites, and how ribosomal movements are synchronized? We recently highlighted the existence of ribosomal protein networks, likely evolved to participate in ribosome signaling. Here, we investigate the relationship between ribosomal protein networks and ribosome dynamics. Our findings show that major motion centers in the bacterial ribosome interact specifically with r-proteins, and that ribosomal RNA exhibits high mobility around each r-protein. This suggests that periodic electrostatic changes in the context of negatively charged residues (Glu and Asp) induce RNA-protein 'distance-approach' cycles, controlling key ribosomal movements during translocation. These charged residues play a critical role in modulating electrostatic repulsion between RNA and proteins, thus coordinating ribosomal dynamics. We propose that r-protein networks synchronize ribosomal dynamics through an 'electrostatic domino' effect, extending the concept of allostery to the regulation of movements within supramolecular assemblies.
Collapse
Affiliation(s)
- Youri Timsit
- Aix Marseille Univ, Université de Toulon, CNRS, IRD, MIO UM110, 163 avenue de Luminy 13288 Marseille, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, FR2022/Tara GOSEE, 3 Rue Michel-Ange, 75016 Paris, France
| | - Grégoire Sergeant-Perthuis
- Laboratory of Computational and Quantitative Biology (LCQB), Sorbonne Université, 4 Place Jussieu, 75005 Paris, France
| | - Daniel Bennequin
- Institut de Mathématiques de Jussieu - Paris Rive Gauche (IMJ-PRG), UMR 7586, CNRS, Université Paris Diderot, 8, Pace Aurélie Nemours, 75013 Paris, France
| |
Collapse
|
2
|
Pradhan UK, Naha S, Das R, Gupta A, Parsad R, Meher PK. RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes. Comput Struct Biotechnol J 2024; 23:1631-1640. [PMID: 38660008 PMCID: PMC11039349 DOI: 10.1016/j.csbj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
3
|
Moonitz SA, Do NT, Noriega R. Electrostatic modulation of multiple binding events between loquacious-PD and double-stranded RNA. Phys Chem Chem Phys 2024; 26:20739-20744. [PMID: 39049620 DOI: 10.1039/d4cp02151g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Electrostatics can alter the RNA-binding properties of proteins that display structure selectivity without sequence specificity. Loquacious-PD relies on this broad scope response to mediate the interaction of endonucleases with double stranded RNAs. Multimodal spectroscopic probes with in situ perturbations reveal an efficient and stable binding mechanism that disfavors high protein density complexes and is sensitive to local electrostatics.
Collapse
Affiliation(s)
- Sasha A Moonitz
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, USA.
| | - Nhat T Do
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, USA.
| | - Rodrigo Noriega
- Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, USA.
| |
Collapse
|
4
|
Alston JJ, Soranno A, Holehouse AS. Conserved molecular recognition by an intrinsically disordered region in the absence of sequence conservation. RESEARCH SQUARE 2024:rs.3.rs-4477977. [PMID: 38883712 PMCID: PMC11177979 DOI: 10.21203/rs.3.rs-4477977/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
Intrinsically disordered regions (IDRs) are critical for cellular function yet often appear to lack sequence conservation when assessed by multiple sequence alignments. This raises the question of if and how function can be encoded and preserved in these regions despite massive sequence variation. To address this question, we have applied coarse-grained molecular dynamics simulations to investigate non-specific RNA binding of coronavirus nucleocapsid proteins. Coronavirus nucleocapsid proteins consist of multiple interspersed disordered and folded domains that bind RNA. Here, we focus on the first two domains of coronavirus nucleocapsid proteins: the disordered N-terminal domain (NTD) and the folded RNA binding domain (RBD). While the NTD is highly variable across evolution, the RBD is structurally conserved. This combination makes the NTD-RBD a convenient model system for exploring the interplay between an IDR adjacent to a folded domain and how changes in IDR sequence can influence molecular recognition of a partner. Our results reveal a surprising degree of sequence-specificity encoded by both the composition and the precise order of the amino acids in the NTD. The presence of an NTD can - depending on the sequence - either suppress or enhance RNA binding. Despite this sensitivity, large-scale variation in NTD sequences is possible while certain sequence features are retained. Consequently, a conformationally-conserved dynamic and disordered RNA:protein complex is found across nucleocapsid protein orthologs despite large-scale changes in both NTD sequence and RBD surface chemistry. Taken together, these insights shed light on the ability of disordered regions to preserve functional characteristics despite their sequence variability.
Collapse
Affiliation(s)
- Jhullian J. Alston
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
- Present Address, Program In Cellular and Molecular Medicine (PCMM), Boston Children’s Hospital, Boston, MA, USA
| | - Andrea Soranno
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Alex S. Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
5
|
Yan Y, Li W, Wang S, Huang T. Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence. ACS OMEGA 2024; 9:12734-12742. [PMID: 38524500 PMCID: PMC10955590 DOI: 10.1021/acsomega.3c08381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 03/26/2024]
Abstract
RNA-binding proteins (RBPs) can interact with RNAs to regulate RNA translation, modification, splicing, and other important biological processes. The accurate identification of RBPs is of paramount importance for gaining insights into the intricate mechanisms underlying organismal life activities. Traditional experimental methods to predict RBPs require a lot of time and money, so it is important to develop computational methods to predict RBPs. However, the existing approaches for RBP prediction still require further improvement due to unidentified RBPs in many species. In this study, we present Seq-RBPPred (predicting RBPs from sequence), a novel method that utilizes a comprehensive feature representation encompassing both biophysical properties and hidden-state features derived from protein sequences. In the results, comprehensive performance evaluations of Seq-RBPPred its superiority compare with state-of-the-art methods, yielding impressive performance including 0.922 for overall accuracy, 0.926 for sensitivity, 0.903 for specificity, and Matthew's correlation coefficient (MCC) of 0.757 as ascertained from the evaluation of the testing set. The data and code of Seq-RBPPred are available at https://github.com/yaoyao-11/Seq-RBPPred.
Collapse
Affiliation(s)
- Yuyao Yan
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Wenran Li
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Sijia Wang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| | - Tao Huang
- CAS Key Laboratory of Computational
Biology, Shanghai Institute of Nutrition and Health, Chinese Academy
of Sciences, University of Chinese Academy
of Sciences, Shanghai 200021, China
| |
Collapse
|
6
|
Levine J, Lobyntseva A, Shazman S, Hakim F, Gozes I. Longitudinal Genotype-Phenotype (Vineland Questionnaire) Characterization of 15 ADNP Syndrome Cases Highlights Mutated Protein Length and Structural Characteristics Correlation with Communicative Abilities Accentuated in Males. J Mol Neurosci 2024; 74:15. [PMID: 38282129 DOI: 10.1007/s12031-024-02189-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2024] [Indexed: 01/30/2024]
Abstract
Activity-dependent neuroprotective protein (ADNP) is essential for neurodevelopment and de novo mutations in ADNP cause the ADNP syndrome. From brain pathologies point of view, tauopathy has been demonstrated at a young age, implying stunted development coupled with early/accelerated neurodegeneration. Given potential genotype-phenotype differences and age-dependency, we have assessed here a cohort of 15 individuals (1-27-year-old), using 1-3 longitudinal parent (caretaker) interview/s (Vineland 3 questionnaire) over several years. Our results indicated developmental delays, or even developmental arrests, coupled with potential spurts of development at early ages. Severe outcomes correlated with the truncating high impact mutation, in other words, the remaining mutated protein length as well as with the tested individual age, corroborating the hypothesis of developmental delays coupled with accelerated aging. A significant correlation was noted between mutated protein length and communication, implying a high impact of ADNP on communicative skills. Additionally, correlations were discovered between the two previously described epi-genetic signatures in ADNP emphasizing aberrant acquisition of motor behaviors, with truncating mutations around the nuclear localization signal being mostly affected. Finally, all individuals seem to acquire an age equivalent of 1-6 years, requiring disease modification treatment, such as the ADNP-derived drug candidate, NAP (davunetide), which has recently shown efficacy in women suffering from the neurodegenerative disorder, progressive supranuclear palsy (PSP), a late-onset tauopathy.
Collapse
Affiliation(s)
- Jospeh Levine
- The Elton Laboratory for Molecular Neuroendocrinology, Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Sagol School of Neuroscience and Adams Super Center for Brain Studies, Tel Aviv University, Tel Aviv, 6997801, Israel
- Psychiatric Division, Ben Gurion University of the Negev, Beersheba, Israel
| | - Alexandra Lobyntseva
- The Elton Laboratory for Molecular Neuroendocrinology, Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Sagol School of Neuroscience and Adams Super Center for Brain Studies, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Shula Shazman
- Department of Mathematics and Computer Science, The Open University of Israel, Ra'anana, Israel
| | | | - Illana Gozes
- The Elton Laboratory for Molecular Neuroendocrinology, Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Sagol School of Neuroscience and Adams Super Center for Brain Studies, Tel Aviv University, Tel Aviv, 6997801, Israel.
| |
Collapse
|
7
|
Pradhan UK, Meher PK, Naha S, Pal S, Gupta S, Gupta A, Parsad R. RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features. Brief Funct Genomics 2023; 22:401-410. [PMID: 37158175 DOI: 10.1093/bfgp/elad016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/12/2023] [Accepted: 04/21/2023] [Indexed: 05/10/2023] Open
Abstract
RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.
Collapse
Affiliation(s)
- Upendra K Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Soumen Pal
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sagar Gupta
- CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur (HP) 176061, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
8
|
Alston JJ, Soranno A, Holehouse AS. Conserved molecular recognition by an intrinsically disordered region in the absence of sequence conservation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.06.552128. [PMID: 37609146 PMCID: PMC10441348 DOI: 10.1101/2023.08.06.552128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Intrinsically disordered regions (IDRs) are critical for cellular function, yet often appear to lack sequence conservation when assessed by multiple sequence alignments. This raises the question of if and how function can be encoded and preserved in these regions despite massive sequence variation. To address this question, we have applied coarse-grained molecular dynamics simulations to investigate non-specific RNA binding of coronavirus nucleocapsid proteins. Coronavirus nucleocapsid proteins consist of multiple interspersed disordered and folded domains that bind RNA. We focussed here on the first two domains of coronavirus nucleocapsid proteins, the disordered N-terminal domain (NTD) followed by the folded RNA binding domain (RBD). While the NTD is highly variable across evolution, the RBD is structurally conserved. This combination makes the NTD-RBD a convenient model system to explore the interplay between an IDR adjacent to a folded domain, and how changes in IDR sequence can influence molecular recognition of a partner. Our results reveal a surprising degree of sequence-specificity encoded by both the composition and the precise order of the amino acids in the NTD. The presence of an NTD can - depending on the sequence - either suppress or enhance RNA binding. Despite this sensitivity, large-scale variation in NTD sequences is possible while certain sequence features are retained. Consequently, a conformationally-conserved fuzzy RNA:protein complex is found across nucleocapsid protein orthologs, despite large-scale changes in both NTD sequence and RBD surface chemistry. Taken together, these insights shed light on the ability of disordered regions to preserve functional characteristics despite their sequence variability.
Collapse
|
9
|
Peng X, Wang X, Guo Y, Ge Z, Li F, Gao X, Song J. RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Brief Bioinform 2022; 23:6596984. [PMID: 35649392 PMCID: PMC9294422 DOI: 10.1093/bib/bbac215] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/25/2022] [Accepted: 05/06/2022] [Indexed: 11/27/2022] Open
Abstract
RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships.
Collapse
Affiliation(s)
- Xinxin Peng
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Yuming Guo
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria 3004, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.,College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.,KAUST Computational Bioscience Research Center, King Abdullah University of Science and Technology
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
10
|
Hamilton I, Gebala M, Herschlag D, Russell R. Direct Measurement of Interhelical DNA Repulsion and Attraction by Quantitative Cross-Linking. J Am Chem Soc 2022; 144:1718-1728. [PMID: 35073489 PMCID: PMC8815069 DOI: 10.1021/jacs.1c11122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Indexed: 12/30/2022]
Abstract
To better understand the forces that mediate nucleic acid compaction in biology, we developed the disulfide cross-linking approach xHEED (X-linking of Helices to measure Electrostatic Effects at Distance) to measure the distance-dependent encounter frequency of two DNA helices in solution. Using xHEED, we determined the distance that the electrostatic potential extends from DNA helices, the dependence of this distance on ionic conditions, and the magnitude of repulsion when two helices approach one another. Across all conditions tested, the potential falls to that of the bulk solution within 15 Å of the major groove surface. For separations of ∼30 Å, we measured a repulsion of 1.8 kcal/mol in low monovalent ion concentration (30 mM Na+), with higher Na+ concentrations ameliorating this repulsion, and 2 M Na+ or 100 mM Mg2+ eliminating it. Strikingly, we found full screening at very low Co3+ concentrations and net attraction at higher concentrations, without the higher-order DNA condensation that typically complicates studies of helical attraction. Our measurements define the relevant distances for electrostatic interactions of nucleic-acid helices in biology and introduce a new method to propel further understanding of how these forces impact biological processes.
Collapse
Affiliation(s)
- Ian Hamilton
- Department
of Molecular Biosciences, University of
Texas at Austin, Austin, Texas 78712, United States
| | - Magdalena Gebala
- Department
of Biochemistry, Stanford University, Stanford California 94305, United States
| | - Daniel Herschlag
- Department
of Biochemistry, Stanford University, Stanford California 94305, United States
| | - Rick Russell
- Department
of Molecular Biosciences, University of
Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
11
|
Han S, Lee H, Lee AJ, Kim SK, Jung I, Koh GY, Kim TK, Lee D. CHD4 Conceals Aberrant CTCF-Binding Sites at TAD Interiors by Regulating Chromatin Accessibility in Mouse Embryonic Stem Cells. Mol Cells 2021; 44:805-829. [PMID: 34764232 PMCID: PMC8627837 DOI: 10.14348/molcells.2021.0224] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 09/06/2021] [Indexed: 11/27/2022] Open
Abstract
CCCTC-binding factor (CTCF) critically contributes to 3D chromatin organization by determining topologically associated domain (TAD) borders. Although CTCF primarily binds at TAD borders, there also exist putative CTCF-binding sites within TADs, which are spread throughout the genome by retrotransposition. However, the detailed mechanism responsible for masking the putative CTCF-binding sites remains largely elusive. Here, we show that the ATP-dependent chromatin remodeler, chromodomain helicase DNA-binding 4 (CHD4), regulates chromatin accessibility to conceal aberrant CTCF-binding sites embedded in H3K9me3-enriched heterochromatic B2 short interspersed nuclear elements (SINEs) in mouse embryonic stem cells (mESCs). Upon CHD4 depletion, these aberrant CTCF-binding sites become accessible and aberrant CTCF recruitment occurs within TADs, resulting in disorganization of local TADs. RNA-binding intrinsically disordered domains (IDRs) of CHD4 are required to prevent this aberrant CTCF binding, and CHD4 is critical for the repression of B2 SINE transcripts. These results collectively reveal that a CHD4-mediated mechanism ensures appropriate CTCF binding and associated TAD organization in mESCs.
Collapse
Affiliation(s)
- Sungwook Han
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Hosuk Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
- Center for Vascular Research, Institute for Basic Sciences, Daejeon 34141, Korea
| | - Andrew J. Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Seung-Kyoon Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 37673, Korea
| | - Inkyung Jung
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| | - Gou Young Koh
- Center for Vascular Research, Institute for Basic Sciences, Daejeon 34141, Korea
| | - Tae-Kyung Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 37673, Korea
| | - Daeyoup Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
| |
Collapse
|
12
|
Ma CY, Pezzotti S, Schwaab G, Gebala M, Herschlag D, Havenith M. Cation enrichment in the ion atmosphere is promoted by local hydration of DNA. Phys Chem Chem Phys 2021; 23:23203-23213. [PMID: 34622888 PMCID: PMC8797164 DOI: 10.1039/d1cp01963e] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Electrostatic interactions are central to the structure and function of nucleic acids, including their folding, condensation, and interaction with proteins and other charged molecules. These interactions are profoundly affected by ions surrounding nucleic acids, the constituents of the so-called ion atmosphere. Here, we report precise Fourier Transform-Terahertz/Far-Infrared (FT-THz/FIR) measurements in the frequency range 30-500 cm-1 for a 24-bp DNA solvated in a series of alkali halide (NaCl, NaF, KCl, CsCl, and CsF) electrolyte solutions which are sensitive to changes in the ion atmosphere. Cation excess in the ion atmosphere is detected experimentally by observation of cation modes of Na+, K+, and Cs+ in the frequency range between 70-90 cm-1. Based on MD simulations, we propose that the magnitude of cation excess (which is salt specific) depends on the ability of the electrolyte to perturb the water network at the DNA interface: In the NaF atmosphere, the ions reduce the strength of interactions between water and the DNA more than in case of a NaCl electrolyte. Here, we explicitly take into account the solvent contribution to the chemical potential in the ion atmosphere: A decrease in the number of bound water molecules in the hydration layer of DNA is correlated with enhanced density fluctuations, which decrease the free energy cost of ion-hydration, thus promoting further ion accumulation within the DNA atmosphere. We propose that taking into account the local solvation is crucial for understanding the ion atmosphere.
Collapse
Affiliation(s)
- Chun Yu Ma
- Department of Physical Chemistry II, Ruhr-University Bochum, 44780 Bochum, Germany.
| | - Simone Pezzotti
- Department of Physical Chemistry II, Ruhr-University Bochum, 44780 Bochum, Germany.
| | - Gerhard Schwaab
- Department of Physical Chemistry II, Ruhr-University Bochum, 44780 Bochum, Germany.
| | - Magdalena Gebala
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Daniel Herschlag
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Martina Havenith
- Department of Physical Chemistry II, Ruhr-University Bochum, 44780 Bochum, Germany.
| |
Collapse
|
13
|
Sörensen T, Leeb S, Danielsson J, Oliveberg M. Polyanions Cause Protein Destabilization Similar to That in Live Cells. Biochemistry 2021; 60:735-746. [PMID: 33635054 PMCID: PMC8028048 DOI: 10.1021/acs.biochem.0c00889] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 02/11/2021] [Indexed: 12/25/2022]
Abstract
The structural stability of proteins is found to markedly change upon their transfer to the crowded interior of live cells. For some proteins, the stability increases, while for others, it decreases, depending on both the sequence composition and the type of host cell. The mechanism seems to be linked to the strength and conformational bias of the diffusive in-cell interactions, where protein charge is found to play a decisive role. Because most proteins, nucleotides, and membranes carry a net-negative charge, the intracellular environment behaves like a polyanionic (Z:1) system with electrostatic interactions different from those of standard 1:1 ion solutes. To determine how such polyanion conditions influence protein stability, we use negatively charged polyacetate ions to mimic the net-negatively charged cellular environment. The results show that, per Na+ equivalent, polyacetate destabilizes the model protein SOD1barrel significantly more than monoacetate or NaCl. At an equivalent of 100 mM Na+, the polyacetate destabilization of SOD1barrel is similar to that observed in live cells. By the combined use of equilibrium thermal denaturation, folding kinetics, and high-resolution nuclear magnetic resonance, this destabilization is primarily assigned to preferential interaction between polyacetate and the globally unfolded protein. This interaction is relatively weak and involves mainly the outermost N-terminal region of unfolded SOD1barrel. Our findings point thus to a generic influence of polyanions on protein stability, which adds to the sequence-specific contributions and needs to be considered in the evaluation of in vivo data.
Collapse
Affiliation(s)
- Therese Sörensen
- Department of Biochemistry and Biophysics,
Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Sarah Leeb
- Department of Biochemistry and Biophysics,
Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Jens Danielsson
- Department of Biochemistry and Biophysics,
Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| | - Mikael Oliveberg
- Department of Biochemistry and Biophysics,
Arrhenius Laboratories of Natural Sciences, Stockholm University, S-106 91 Stockholm, Sweden
| |
Collapse
|
14
|
Mishra A, Khanal R, Kabir WU, Hoque T. AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques. Artif Intell Med 2021; 113:102034. [PMID: 33685590 DOI: 10.1016/j.artmed.2021.102034] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 01/19/2021] [Accepted: 02/09/2021] [Indexed: 12/25/2022]
Abstract
Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs. The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases. Availability: Code-data is available here: http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip.
Collapse
Affiliation(s)
- Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, USA
| | - Reecha Khanal
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Wasi Ul Kabir
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
15
|
Palomino‐Hernandez O, Margreiter MA, Rossetti G. Challenges in RNA Regulation in Huntington's Disease: Insights from Computational Studies. Isr J Chem 2020. [DOI: 10.1002/ijch.202000021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Oscar Palomino‐Hernandez
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Faculty 1RWTH Aachen 52425 Aachen Germany
- Computation-based Science and Technology Research CenterThe Cyprus Institute Nicosia 2121 Cyprus
- Institute of Life ScienceThe Hebrew University of Jerusalem Jerusalem 91904 Israel
| | - Michael A. Margreiter
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Faculty 1RWTH Aachen 52425 Aachen Germany
| | - Giulia Rossetti
- Computational Biomedicine, Institute of Neuroscience and Medicine (INM-9)/Instute for advanced simulations (IAS-5)Forschungszentrum Juelich 52425 Jülich Germany
- Jülich Supercomputing Centre (JSC)Forschungszentrum Jülich 52425 Jülich Germany
- Department of Hematology, Oncology, Hemostaseology and Stem Cell Transplantation University Hospital AachenRWTH Aachen University Pauwelsstraße 30 52074 Aachen Germany
| |
Collapse
|
16
|
Quantitative Studies of an RNA Duplex Electrostatics by Ion Counting. Biophys J 2019; 117:1116-1124. [PMID: 31466697 DOI: 10.1016/j.bpj.2019.08.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 08/01/2019] [Accepted: 08/05/2019] [Indexed: 01/22/2023] Open
Abstract
RNAs are one of the most charged polyelectrolytes in nature, and understanding their electrostatics is fundamental to their structure and biological functions. An effective way to characterize the electrostatic field generated by nucleic acids is to quantify interactions between nucleic acids and ions that surround the molecules. These ions form a loosely associated cloud referred to as an ion atmosphere. Although theoretical and computational studies can describe the ion atmosphere around RNAs, benchmarks are needed to guide the development of these approaches, and experiments to date that read out RNA-ion interactions are limited. Here, we present ion counting studies to quantify the number of ions surrounding well-defined model systems of RNA and DNA duplexes. We observe that the RNA duplex attracts more cations and expels fewer anions compared to the DNA duplex, and the RNA duplex interacts significantly stronger with the divalent cation Mg2+, despite their identical total charge. These experimental results suggest that the RNA duplex generates a stronger electrostatic field than DNA, as is predicted based on the structural differences between their helices. Theoretical calculations using a nonlinear Poisson-Boltzmann equation give excellent agreement with experiments for monovalent ions but underestimate Mg2+-DNA and Mg2+-RNA interactions by 20%. These studies provide needed stringent benchmarks to use against other all-atom theoretical models of RNA-ion interactions, interactions that likely must be accurately accounted for in structural, dynamic, and energetic terms to confidently model RNA structure, interactions, and function.
Collapse
|
17
|
Gebala M, Johnson SL, Narlikar GJ, Herschlag D. Ion counting demonstrates a high electrostatic field generated by the nucleosome. eLife 2019; 8:e44993. [PMID: 31184587 PMCID: PMC6584128 DOI: 10.7554/elife.44993] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 06/08/2019] [Indexed: 01/04/2023] Open
Abstract
In eukaryotes, a first step towards the nuclear DNA compaction process is the formation of a nucleosome, which is comprised of negatively charged DNA wrapped around a positively charged histone protein octamer. Often, it is assumed that the complexation of the DNA into the nucleosome completely attenuates the DNA charge and hence the electrostatic field generated by the molecule. In contrast, theoretical and computational studies suggest that the nucleosome retains a strong, negative electrostatic field. Despite their fundamental implications for chromatin organization and function, these opposing views of nucleosome electrostatics have not been experimentally tested. Herein, we directly measure nucleosome electrostatics and find that while nucleosome formation reduces the complex charge by half, the nucleosome nevertheless maintains a strong negative electrostatic field. Our studies highlight the importance of considering the polyelectrolyte nature of the nucleosome and its impact on processes ranging from factor binding to DNA compaction.
Collapse
Affiliation(s)
- Magdalena Gebala
- Department of BiochemistryStanford UniversityStanfordUnited States
| | - Stephanie L Johnson
- Department of Biochemistry and BiophysicsUniversity of California, San FranciscoSan FranciscoUnited States
| | - Geeta J Narlikar
- Department of Biochemistry and BiophysicsUniversity of California, San FranciscoSan FranciscoUnited States
| | - Dan Herschlag
- Department of BiochemistryStanford UniversityStanfordUnited States
- Department of ChemistryStanford UniversityStanfordUnited States
- ChEM-H InstituteStanford UniversityStanfordUnited States
| |
Collapse
|
18
|
The fungal ribonuclease-like effector protein CSEP0064/BEC1054 represses plant immunity and interferes with degradation of host ribosomal RNA. PLoS Pathog 2019; 15:e1007620. [PMID: 30856238 PMCID: PMC6464244 DOI: 10.1371/journal.ppat.1007620] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 04/15/2019] [Accepted: 02/06/2019] [Indexed: 01/08/2023] Open
Abstract
The biotrophic fungal pathogen Blumeria graminis causes the powdery mildew disease of cereals and grasses. We present the first crystal structure of a B. graminis effector of pathogenicity (CSEP0064/BEC1054), demonstrating it has a ribonuclease (RNase)-like fold. This effector is part of a group of RNase-like proteins (termed RALPHs) which comprise the largest set of secreted effector candidates within the B. graminis genomes. Their exceptional abundance suggests they play crucial functions during pathogenesis. We show that transgenic expression of RALPH CSEP0064/BEC1054 increases susceptibility to infection in both monocotyledonous and dicotyledonous plants. CSEP0064/BEC1054 interacts in planta with the pathogenesis-related protein PR10. The effector protein associates with total RNA and weakly with DNA. Methyl jasmonate (MeJA) levels modulate susceptibility to aniline-induced host RNA fragmentation. In planta expression of CSEP0064/BEC1054 reduces the formation of this RNA fragment. We propose CSEP0064/BEC1054 is a pseudoenzyme that binds to host ribosomes, thereby inhibiting the action of plant ribosome-inactivating proteins (RIPs) that would otherwise lead to host cell death, an unviable interaction and demise of the fungus. Powdery mildews are common plant diseases which affect important crop plants including cereals such as wheat and barley. The fungi that cause this disease are obligate biotrophs: they have an absolute requirement for living host cells which they penetrate with feeding structures called haustoria. These fungi must be highly effective at avoiding immune recognition which would lead to death of the host cell and the pathogen. We assume they do this by delivering effector proteins to the host. While several hundred secreted effectors have been described in cereal powdery mildews, it is unknown how they work. Here, we use X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to determine the structure and interactions of the effector CSEP0064/BEC1054, representative of the largest class of effectors resembling fungal RNases. We find that this effector binds nucleic acids. Expression of the effector in plants increases susceptibility to infection. Moreover, transgenic CSEP0064/BEC1054 expression in wheat inhibits the degradation of host ribosomal RNA induced by ribosome-inactivating proteins (RIPs). We propose a novel mechanism of action for the RNase-like effectors in powdery mildews: they may act as pseudoenzymes to inhibit the host RIPs, known components of plant immune responses that lead to host cell death.
Collapse
|
19
|
Poursheikhali Asghari M, Abdolmaleki P. Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers. Avicenna J Med Biotechnol 2019; 11:104-111. [PMID: 30800250 PMCID: PMC6359699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nucleic acid-binding proteins play major roles in different biological processes, such as transcription, splicing and translation. Therefore, the nucleic acid-binding function prediction of proteins is a step toward full functional annotation of proteins. The aim of our research was the improvement of nucleic-acid binding function prediction. METHODS In the current study, nine machine-learning algorithms were used to predict RNA- and DNA-binding proteins and also to discriminate between RNA-binding proteins and DNA-binding proteins. The electrostatic features were utilized for prediction of each function in corresponding adapted protein datasets. The leave-one-out cross-validation process was used to measure the performance of employed classifiers. RESULTS Radial basis function classifier gave the best results in predicting RNA- and DNA-binding proteins in comparison with other classifiers applied. In discriminating between RNA- and DNA-binding proteins, multilayer perceptron classifier was the best one. CONCLUSION Our findings show that the prediction of nucleic acid-binding function based on these simple electrostatic features can be improved by applied classifiers. Moreover, a reasonable progress to distinguish between RNA- and DNA-binding proteins has been achieved.
Collapse
Affiliation(s)
| | - Parviz Abdolmaleki
- Corresponding author: Parviz Abdolmaleki, Ph.D., Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran, Tel: +98 21 82883404, Fax: +98 21 82884457, E-mail:,
| |
Collapse
|
20
|
Dvir S, Argoetti A, Mandel-Gutfreund Y. Ribonucleoprotein particles: advances and challenges in computational methods. Curr Opin Struct Biol 2018; 53:124-130. [PMID: 30172766 DOI: 10.1016/j.sbi.2018.08.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 08/07/2018] [Indexed: 01/16/2023]
Abstract
RNA-binding proteins (RBPs) interact with RNA to form Ribonucleoprotein Particles (RNPs). The interaction between RBPs and their RNA partners are traditionally thought to be mediated by highly conserved RNA-binding domains (RBDs). Recently, high-throughput studies led to the discovery of hundreds of novel proteins and domains, of which many do not follow the classical definition of RNA-binding. Despite technological innovations, experimental screenings are currently limited to the detection of specific types of RNPs, underscoring the importance of computational methods for predicting novel RBPs and RNA interacting residues and interfaces. Here, we discuss major challenges in computational prediction of RBPs and RBDs and outline new strategies to circumvent current limitations of experimental techniques.
Collapse
Affiliation(s)
- Shlomi Dvir
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Amir Argoetti
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel; Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel.
| |
Collapse
|
21
|
Hu W, Qin L, Li M, Pu X, Guo Y. A structural dissection of protein–RNA interactions based on different RNA base areas of interfaces. RSC Adv 2018; 8:10582-10592. [PMID: 35540439 PMCID: PMC9078961 DOI: 10.1039/c8ra00598b] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 03/05/2018] [Indexed: 11/21/2022] Open
Abstract
Protein–RNA interactions are very common cellular processes, but the mechanisms of interactions are not fully understood, mainly due to the complicated RNA structures. By the elaborate investigation on RNA structures of protein–RNA complexes, it was firstly found in this paper that RNAs in these complexes could be clearly classified into three classes (high, medium and low) based on the different levels of Pbase (the percentage of base area buried in the RNA interface). In view of the three RNA classes, more detailed analyses on protein–RNA interactions were comprehensively performed from various aspects, including interface area, structure, composition and interaction force, so as to achieve a deeper understanding of the recognition specificity for the three classes of protein–RNA interactions. According to our classification strategy, the three complex classes have significant differences in terms of almost all properties. Complexes in the high class have short and extended RNA structures and behave like protein–ssDNA interactions. Their hydrogen bonds and hydrophobic interactions are strong. For complexes in low class, their RNA structures are mainly double-stranded, like protein–dsDNA interactions, and electrostatic interactions frequently occur. The complexes in medium class have the longest RNA chains and largest average interface area. Meanwhile, they do not show any preference for the interaction force. On average, in terms of composition, secondary structures and intermolecular physicochemical properties, significant feature preferences can be observed in high and low complexes, but no highly specific features are found for medium complexes. We found that our proposed Pbase is an important parameter which can be used as a new determinant to distinguish protein–RNA complexes. For high and low complexes, we can more easily understand the specificity of the recognition process from the interface features than for medium complexes. In the future, medium complexes should be our research focus to further structurally analyze from more feature aspects. Overall, this study may contribute to further understanding of the mechanism of protein–RNA interactions on a more detailed level. Qualitative and quantitative measurements of the influence of structure and composition of RNA interfaces on protein–RNA interactions.![]()
Collapse
Affiliation(s)
- Wen Hu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Liu Qin
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Menglong Li
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Xuemei Pu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Yanzhi Guo
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| |
Collapse
|
22
|
Moutiez M, Belin P, Gondry M. Aminoacyl-tRNA-Utilizing Enzymes in Natural Product Biosynthesis. Chem Rev 2017; 117:5578-5618. [DOI: 10.1021/acs.chemrev.6b00523] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Mireille Moutiez
- Institute for Integrative Biology of the
Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette Cedex, France
| | - Pascal Belin
- Institute for Integrative Biology of the
Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette Cedex, France
| | - Muriel Gondry
- Institute for Integrative Biology of the
Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette Cedex, France
| |
Collapse
|
23
|
Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2016; 33:854-862. [DOI: 10.1093/bioinformatics/btw730] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open
|
24
|
Ghosh P, Sowdhamini R. Genome-wide survey of putative RNA-binding proteins encoded in the human proteome. MOLECULAR BIOSYSTEMS 2016; 12:532-40. [PMID: 26675803 DOI: 10.1039/c5mb00638d] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
RNA-binding proteins (RBPs) are involved in various post-transcriptional gene regulatory processes and are also functionally important members of the ribosome and the spliceosome. However, RBPs and their interactions with RNA are less well-studied in comparison to DNA-binding proteins. We have classified the existing RBP structures, available in complexes with RNA and RNA/DNA hybrids, into different structural families and created Hidden Markov Models (HMMs). These structure-centric family HMMs, along with the sequence-centric family HMMs, were used as a primary database to systematically search the human proteome for the presence of putative RBPs. We have found more than 2600 gene products with RBP signatures in humans, of which around 28% are likely to bind to RNA but not DNA, whereas 9% might bind to both RNA and DNA. 11% of them do not contain an explicit functional annotation yet. Nearly 30% of the putative RBPs are exclusively nuclear, 15% have known disease associations and around 30% are enzymes. Around 40% of the proteins identified in this study are novel and have not been reported by recent large-scale studies on human RBPs.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| | - R Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| |
Collapse
|
25
|
Ghosh P, Mathew OK, Sowdhamini R. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information. BMC Bioinformatics 2016; 17:411. [PMID: 27717309 PMCID: PMC5054549 DOI: 10.1186/s12859-016-1289-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 09/29/2016] [Indexed: 11/25/2022] Open
Abstract
Background RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. Results The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. Conclusions RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1289-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India
| | - Oommen K Mathew
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India.,SASTRA University, Tirumalaisamudram, Thanjavur, 613401, Tamil Nadu, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India.
| |
Collapse
|
26
|
Paz I, Kligun E, Bengad B, Mandel-Gutfreund Y. BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins. Nucleic Acids Res 2016; 44:W568-74. [PMID: 27198220 PMCID: PMC4987955 DOI: 10.1093/nar/gkw454] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 05/11/2016] [Indexed: 12/12/2022] Open
Abstract
Gene expression is a multi-step process involving many layers of regulation. The main regulators of the pathway are DNA and RNA binding proteins. While over the years, a large number of DNA and RNA binding proteins have been identified and extensively studied, it is still expected that many other proteins, some with yet another known function, are awaiting to be discovered. Here we present a new web server, BindUP, freely accessible through the website http://bindup.technion.ac.il/, for predicting DNA and RNA binding proteins using a non-homology-based approach. Our method is based on the electrostatic features of the protein surface and other general properties of the protein. BindUP predicts nucleic acid binding function given the proteins three-dimensional structure or a structural model. Additionally, BindUP provides information on the largest electrostatic surface patches, visualized on the server. The server was tested on several datasets of DNA and RNA binding proteins, including proteins which do not possess DNA or RNA binding domains and have no similarity to known nucleic acid binding proteins, achieving very high accuracy. BindUP is applicable in either single or batch modes and can be applied for testing hundreds of proteins simultaneously in a highly efficient manner.
Collapse
Affiliation(s)
- Inbal Paz
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Efrat Kligun
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Barak Bengad
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
27
|
Motion GB, Howden AJM, Huitema E, Jones S. DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool. Nucleic Acids Res 2015; 43:e158. [PMID: 26304539 PMCID: PMC4678848 DOI: 10.1093/nar/gkv805] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/28/2015] [Indexed: 11/26/2022] Open
Abstract
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.
Collapse
Affiliation(s)
- Graham B Motion
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Andrew J M Howden
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Edgar Huitema
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Susan Jones
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| |
Collapse
|
28
|
Ren H, Shen Y. RNA-binding residues prediction using structural features. BMC Bioinformatics 2015; 16:249. [PMID: 26254826 PMCID: PMC4529986 DOI: 10.1186/s12859-015-0691-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2015] [Accepted: 07/31/2015] [Indexed: 01/25/2023] Open
Abstract
Background RNA-protein complexes play an essential role in many biological processes. To explore potential functions of RNA-protein complexes, it’s important to identify RNA-binding residues in proteins. Results In this work, we propose a set of new structural features for RNA-binding residue prediction. A set of template patches are first extracted from RNA-binding interfaces. To construct structural features for a residue, we compare its surrounding patches with each template patch and use the accumulated distances as its structural features. These new features provide sufficient structural information of surrounding surface of a residue and they can be used to measure the structural similarity between the surface surrounding two residues. The new structural features, together with other sequence features, are used to predict RNA-binding residues using ensemble learning technique. Conclusions The experimental results reveal the effectiveness of the proposed structural features. In addition, the clustering results on template patches exhibit distinct structural patterns of RNA-binding sites, although the sequences of template patches in the same cluster are not conserved. We speculate that RNAs may have structure preferences when binding with proteins.
Collapse
Affiliation(s)
- Huizhu Ren
- 2011 Collaborative Innovation Center of Tianjin for Medical Epigenetics, Key Laboratory of Hormones and Development (Ministry of Health), Metabolic Diseases Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin, 300070, China.
| | - Ying Shen
- School of Software Engineering, Tongji University, Shanghai, 201804, China. .,Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information, Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, P.R. China.
| |
Collapse
|
29
|
Miao Z, Westhof E. Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score. Nucleic Acids Res 2015; 43:5340-51. [PMID: 25940624 PMCID: PMC4477668 DOI: 10.1093/nar/gkv446] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 04/23/2015] [Accepted: 04/24/2015] [Indexed: 11/13/2022] Open
Abstract
We describe a general binding score for predicting the nucleic acid binding probability in proteins. The score is directly derived from physicochemical and evolutionary features and integrates a residue neighboring network approach. Our process achieves stable and high accuracies on both DNA- and RNA-binding proteins and illustrates how the main driving forces for nucleic acid binding are common. Because of the effective integration of the synergetic effects of the network of neighboring residues and the fact that the prediction yields a hierarchical scoring on the protein surface, energy funnels for nucleic acid binding appear on protein surfaces, pointing to the dynamic process occurring in the binding of nucleic acids to proteins.
Collapse
Affiliation(s)
- Zhichao Miao
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| | - Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 15 Rue Descartes, 67000 Strasbourg, France
| |
Collapse
|
30
|
Pérez-Cano L, Fernández-Recio J. Dissection and prediction of RNA-binding sites on proteins. Biomol Concepts 2015; 1:345-55. [PMID: 25962008 DOI: 10.1515/bmc.2010.037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
RNA-binding proteins are involved in many important regulatory processes in cells and their study is essential for a complete understanding of living organisms. They show a large variability from both structural and functional points of view. However, several recent studies performed on protein-RNA crystal structures have revealed interesting common properties. RNA-binding sites usually constitute patches of positively charged or polar residues that make most of the specific and non-specific contacts with RNA. Negatively charged or aliphatic residues are less frequent at protein-RNA interfaces, although they can also be found either forming aliphatic and positive-negative pairs in protein RNA-binding sites or contacting RNA through their main chains. Aromatic residues found within these interfaces are usually involved in specific base recognition at RNA single-strand regions. This specific recognition, in combination with structural complementarity, represents the key source for specificity in protein-RNA association. From all this knowledge, a variety of computational methods for prediction of RNA-binding sites have been developed based either on protein sequence or on protein structure. Some reported methods are really successful in the identification of RNA-binding proteins or the prediction of RNA-binding sites. Given the growing interest in the field, all these studies and prediction methods will undoubtedly contribute to the identification and comprehension of protein-RNA interactions.
Collapse
|
31
|
Wang W, Liu J, Xiong Y, Zhu L, Zhou X. Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. IET Syst Biol 2014; 8:176-83. [PMID: 25075531 DOI: 10.1049/iet-syb.2013.0048] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs) play different roles in biological processes when they bind to single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA). However, the underlying binding mechanisms of SSBs and DSBs have not yet been fully understood. Here, the authors firstly constructed two groups of ssDNA and dsDNA specific binding sites from two non-redundant sets of SSBs and DSBs. They further analysed the relationship between the two classes of binding sites and a newly proposed set of features (residue charge distribution, secondary structure and spatial shape). To assess and utilise the predictive power of these features, they trained a classification model using support vector machine to make predictions about the ssDNA and the dsDNA binding sites. The author's analysis and prediction results indicated that the two classes of binding sites can be distinguishable by the three types of features, and the final classifier using all the features achieved satisfactory performance. In conclusion, the proposed features will deepen their understanding of the specificity of proteins which bind to ssDNA or dsDNA.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Juan Liu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China.
| | - Yi Xiong
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | - Lida Zhu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Xionghui Zhou
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
32
|
Structure and mechanism of the tRNA-dependent lantibiotic dehydratase NisB. Nature 2014; 517:509-12. [PMID: 25363770 DOI: 10.1038/nature13888] [Citation(s) in RCA: 248] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Accepted: 09/23/2014] [Indexed: 01/15/2023]
Abstract
Lantibiotics are a class of peptide antibiotics that contain one or more thioether bonds. The lantibiotic nisin is an antimicrobial peptide that is widely used as a food preservative to combat food-borne pathogens. Nisin contains dehydroalanine and dehydrobutyrine residues that are formed by the dehydration of Ser/Thr by the lantibiotic dehydratase NisB (ref. 2). Recent biochemical studies revealed that NisB glutamylates Ser/Thr side chains as part of the dehydration process. However, the molecular mechanism by which NisB uses glutamate to catalyse dehydration remains unresolved. Here we show that this process involves glutamyl-tRNA(Glu) to activate Ser/Thr residues. In addition, the 2.9-Å crystal structure of NisB in complex with its substrate peptide NisA reveals the presence of two separate domains that catalyse the Ser/Thr glutamylation and glutamate elimination steps. The co-crystal structure also provides insights into substrate recognition by lantibiotic dehydratases. Our findings demonstrate an unexpected role for aminoacyl-tRNA in the formation of dehydroamino acids in lantibiotics, and serve as a basis for the functional characterization of the many lantibiotic-like dehydratases involved in the biosynthesis of other classes of natural products.
Collapse
|
33
|
Yang XX, Deng ZL, Liu R. RBRDetector: Improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies. Proteins 2014; 82:2455-71. [DOI: 10.1002/prot.24610] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Revised: 04/28/2014] [Accepted: 05/09/2014] [Indexed: 11/05/2022]
Affiliation(s)
- Xiao-Xia Yang
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Zhi-Luo Deng
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| | - Rong Liu
- Agricultural Bioinformatics Key Laboratory of Hubei Province; College of Informatics; Huazhong Agricultural University; Wuhan 430070 People's Republic of China
| |
Collapse
|
34
|
Thomas MP, Whangbo J, McCrossan G, Deutsch AJ, Martinod K, Walch M, Lieberman J. Leukocyte protease binding to nucleic acids promotes nuclear localization and cleavage of nucleic acid binding proteins. THE JOURNAL OF IMMUNOLOGY 2014; 192:5390-7. [PMID: 24771851 DOI: 10.4049/jimmunol.1303296] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Killer lymphocyte granzyme (Gzm) serine proteases induce apoptosis of pathogen-infected cells and tumor cells. Many known Gzm substrates are nucleic acid binding proteins, and the Gzms accumulate in the target cell nucleus by an unknown mechanism. In this study, we show that human Gzms bind to DNA and RNA with nanomolar affinity. Gzms cleave their substrates most efficiently when both are bound to nucleic acids. RNase treatment of cell lysates reduces Gzm cleavage of RNA binding protein targets, whereas adding RNA to recombinant RNA binding protein substrates increases in vitro cleavage. Binding to nucleic acids also influences Gzm trafficking within target cells. Preincubation with competitor DNA and DNase treatment both reduce Gzm nuclear localization. The Gzms are closely related to neutrophil proteases, including neutrophil elastase (NE) and cathepsin G. During neutrophil activation, NE translocates to the nucleus to initiate DNA extrusion into neutrophil extracellular traps, which bind NE and cathepsin G. These myeloid cell proteases, but not digestive serine proteases, also bind DNA strongly and localize to nuclei and neutrophil extracellular traps in a DNA-dependent manner. Thus, high-affinity nucleic acid binding is a conserved and functionally important property specific to leukocyte serine proteases. Furthermore, nucleic acid binding provides an elegant and simple mechanism to confer specificity of these proteases for cleavage of nucleic acid binding protein substrates that play essential roles in cellular gene expression and cell proliferation.
Collapse
Affiliation(s)
- Marshall P Thomas
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Jennifer Whangbo
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Geoffrey McCrossan
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Aaron J Deutsch
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Kimberly Martinod
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Michael Walch
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| | - Judy Lieberman
- Program in Cellular and Molecular Medicine, Boston Children's Hospital, Boston, MA 02115;Division of Hematology-Oncology, Boston Children's Hospital, Boston, MA 02215; andDepartment of Pediatrics, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
35
|
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. ACTA ACUST UNITED AC 2014; 30:1601-8. [PMID: 24493033 PMCID: PMC4029037 DOI: 10.1093/bioinformatics/btu074] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite. Contact:gian.tartaglia@crg.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Andreas Zanzoni
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
36
|
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open
Abstract
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.
Collapse
|
37
|
Zhao H, Yang Y, Zhou Y. Prediction of RNA binding proteins comes of age from low resolution to high resolution. MOLECULAR BIOSYSTEMS 2013; 9:2417-25. [PMID: 23872922 PMCID: PMC3870025 DOI: 10.1039/c3mb70167k] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Networks of protein-RNA interactions is likely to be larger than protein-protein and protein-DNA interaction networks because RNA transcripts are encoded tens of times more than proteins (e.g. only 3% of human genome coded for proteins), have diverse function and localization, and are controlled by proteins from birth (transcription) to death (degradation). This massive network is evidenced by several recent experimental discoveries of large numbers of previously unknown RNA-binding proteins (RBPs). Meanwhile, more than 400 non-redundant protein-RNA complex structures (at 25% sequence identity or less) have been deposited into the protein databank. These sequences and structural resources for RBPs provide ample data for the development of computational techniques dedicated to RBP prediction, as experimentally determining RNA-binding functions is time-consuming and expensive. This review compares traditional machine-learning based approaches with emerging template-based methods at several levels of prediction resolution ranging from two-state binding/non-binding prediction, to binding residue prediction and protein-RNA complex structure prediction. The analysis indicates that the two approaches are complementary and their combinations may lead to further improvements.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA.
| | | | | |
Collapse
|
38
|
Banerji A, Navare C. Fractal nature of protein surface roughness: a note on quantification of change of surface roughness in active sites, before and after binding. J Mol Recognit 2013; 26:201-14. [PMID: 23526774 DOI: 10.1002/jmr.2264] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Revised: 01/07/2013] [Accepted: 01/11/2013] [Indexed: 11/09/2022]
Abstract
Year 2010 marked the 25th year since we came to know that roughness of a protein surface has fractal symmetry. Ever since the publication of Lewis and Rees' paper, hundreds of works from a spectrum of perspectives have established that fractal dimension (FD) can be considered as a reliable marker that describes roughness of protein surface objectively. In this article, we introduce readers to the fundamentals of fractals and present categorical biophysical and geometrical reasons as to why FD-based constructs can describe protein surface roughness more accurately. We then review the commonality (and the lack of it) between numerous approaches that have attempted to investigate protein surface with fractal measures, before exploring the patterns in the results that they have produced. Apart from presenting the genealogy of approaches and results, we present an analysis that quantifies the difference in surface roughness in stretches of protein surface containing the active site, before and after binding to ligands, to underline the utility of FD-based measures further. It has been found that surface stretches containing the active site, in general, undergo a significant increment in its roughness after binding. After presenting the entire repertoire of FD-based surface roughness studies, we talk about two yet-unexplored problems where application of FD-based techniques can help in deciphering underlying patterns of surface interactions. Finally, we list the limitations of FD-based constructs and put down several precautions that one must take while working with them.
Collapse
Affiliation(s)
- Anirban Banerji
- Bioinformatics Centre, University of Pune, Pune, Maharashtra, India.
| | | |
Collapse
|
39
|
Parisien M, Wang X, Perdrizet G, Lamphear C, Fierke CA, Maheshwari KC, Wilde MJ, Sosnick TR, Pan T. Discovering RNA-protein interactome by using chemical context profiling of the RNA-protein interface. Cell Rep 2013; 3:1703-13. [PMID: 23665222 DOI: 10.1016/j.celrep.2013.04.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2012] [Revised: 03/04/2013] [Accepted: 04/12/2013] [Indexed: 02/04/2023] Open
Abstract
RNA-protein (RNP) interactions generally are required for RNA function. At least 5% of human genes code for RNA-binding proteins. Whereas many approaches can identify the RNA partners for a specific protein, finding the protein partners for a specific RNA is difficult. We present a machine-learning method that scores a protein's binding potential for an RNA structure by utilizing the chemical context profiles of the interface from known RNP structures. Our approach is applicable even when only a single RNP structure is available. We examined 801 mammalian proteins and find that 37 (4.6%) potentially bind transfer RNA (tRNA). Most are enzymes involved in cellular processes unrelated to translation and were not known to interact with RNA. We experimentally tested six positive and three negative predictions for tRNA binding in vivo, and all nine predictions were correct. Our computational approach provides a powerful complement to experiments in discovering new RNPs.
Collapse
Affiliation(s)
- Marc Parisien
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Cirillo D, Agostini F, Tartaglia GG. Predictions of protein-RNA interactions. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2012. [DOI: 10.1002/wcms.1119] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
41
|
Jahandideh S, Srinivasasainagendra V, Zhi D. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection. J Theor Biol 2012; 312:65-75. [PMID: 22884576 DOI: 10.1016/j.jtbi.2012.07.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 07/09/2012] [Accepted: 07/13/2012] [Indexed: 01/11/2023]
Abstract
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Collapse
Affiliation(s)
- Samad Jahandideh
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Vinodh Srinivasasainagendra
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
42
|
Barik A, Mishra A, Bahadur RP. PRince: a web server for structural and physicochemical analysis of protein-RNA interface. Nucleic Acids Res 2012; 40:W440-4. [PMID: 22689640 PMCID: PMC3394290 DOI: 10.1093/nar/gks535] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
We have developed a web server, PRince, which analyzes the structural features and physicochemical properties of the protein–RNA interface. Users need to submit a PDB file containing the atomic coordinates of both the protein and the RNA molecules in complex form (in ‘.pdb’ format). They should also mention the chain identifiers of interacting protein and RNA molecules. The size of the protein–RNA interface is estimated by measuring the solvent accessible surface area buried in contact. For a given protein–RNA complex, PRince calculates structural, physicochemical and hydration properties of the interacting surfaces. All these parameters generated by the server are presented in a tabular format. The interacting surfaces can also be visualized with software plug-in like Jmol. In addition, the output files containing the list of the atomic coordinates of the interacting protein, RNA and interface water molecules can be downloaded. The parameters generated by PRince are novel, and users can correlate them with the experimentally determined biophysical and biochemical parameters for better understanding the specificity of the protein–RNA recognition process. This server will be continuously upgraded to include more parameters. PRince is publicly accessible and free for use. Available at http://www.facweb.iitkgp.ernet.in/~rbahadur/prince/home.html.
Collapse
Affiliation(s)
- Amita Barik
- Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, India
| | | | | |
Collapse
|
43
|
Walia RR, Caragea C, Lewis BA, Towfic F, Terribilini M, El-Manzalawy Y, Dobbs D, Honavar V. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art. BMC Bioinformatics 2012; 13:89. [PMID: 22574904 PMCID: PMC3490755 DOI: 10.1186/1471-2105-13-89] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 05/10/2012] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Collapse
Affiliation(s)
- Rasna R Walia
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Computer Science, Iowa State University, USA
| | - Cornelia Caragea
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- College of Information Sciences & Technology, The Pennsylvania State University, University Park, USA
| | - Benjamin A Lewis
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Genetics, Development and Cell Biology, , USA
| | - Fadi Towfic
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- The Broad Institute, USA
| | | | - Yasser El-Manzalawy
- Department of Computer Science, Iowa State University, USA
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
- Department of Systems & Computer Engineering, Al-Azhar University, Egypt
| | - Drena Dobbs
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Genetics, Development and Cell Biology, , USA
| | - Vasant Honavar
- Bioinformatics and Computational Biology Program, Iowa State University, USA
- Department of Computer Science, Iowa State University, USA
- Center for Computational Intelligence, Learning and Discovery, Iowa State University, USA
| |
Collapse
|
44
|
McAlister GC, Russell JD, Rumachik NG, Hebert AS, Syka JEP, Geer LY, Westphall MS, Pagliarini DJ, Coon JJ. Analysis of the acidic proteome with negative electron-transfer dissociation mass spectrometry. Anal Chem 2012; 84:2875-82. [PMID: 22335612 PMCID: PMC3310326 DOI: 10.1021/ac203430u] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We describe the first implementation of negative electron-transfer dissociation (NETD) on a hybrid ion trap-orbitrap mass spectrometer and its application to high-throughput sequencing of peptide anions. NETD, coupled with high pH separations, negative electrospray ionization (ESI), and an NETD compatible version of OMSSA, is part of a complete workflow that includes the formation, interrogation, and sequencing of peptide anions. Together these interlocking pieces facilitated the identification of more than 2000 unique peptides from Saccharomyces cerevisiae representing the most comprehensive analysis of peptide anions by tandem mass spectrometry to date. The same S. cerevisiae samples were interrogated using traditional, positive modes of peptide LC-MS/MS analysis (e.g., acidic LC separations, positive ESI, and collision activated dissociation), and the resulting peptide identifications of the different workflows were compared. Due to a decreased flux of peptide anions and a tendency to produce lowly charged precursors, the NETD-based LC-MS/MS workflow was not as sensitive as the positive mode methods. However, the use of NETD readily permits access to underrepresented acidic portions of the proteome by identifying peptides that tend to have lower pI values. As such, NETD improves sequence coverage, filling out the acidic portions of proteins that are often overlooked by the other methods.
Collapse
Affiliation(s)
| | | | | | | | | | - Lewis Y. Geer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | - Joshua J. Coon
- Department of Chemistry, University of Wisconsin, Madison, WI
- Department of Biomolecular Chemistry, University of Wisconsin, Madison, WI
- Genome Center of Wisconsin, University of Wisconsin, Madison, WI
| |
Collapse
|
45
|
Iwakiri J, Tateishi H, Chakraborty A, Patil P, Kenmochi N. Dissecting the protein-RNA interface: the role of protein surface shapes and RNA secondary structures in protein-RNA recognition. Nucleic Acids Res 2011; 40:3299-306. [PMID: 22199255 PMCID: PMC3333874 DOI: 10.1093/nar/gkr1225] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Protein-RNA interactions are essential for many biological processes. However, the structural mechanisms underlying these interactions are not fully understood. Here, we analyzed the protein surface shape (dented, intermediate or protruded) and the RNA base pairing properties (paired or unpaired nucleotides) at the interfaces of 91 protein-RNA complexes derived from the Protein Data Bank. Dented protein surfaces prefer unpaired nucleotides to paired ones at the interface, and hydrogen bonds frequently occur between the protein backbone and RNA bases. In contrast, protruded protein surfaces do not show such a preference, rather, electrostatic interactions initiate the formation of hydrogen bonds between positively charged amino acids and RNA phosphate groups. Interestingly, in many protein-RNA complexes that interact via an RNA loop, an aspartic acid is favored at the interface. Moreover, in most of these complexes, nucleotide bases in the RNA loop are flipped out and form hydrogen bonds with the protein, which suggests that aspartic acid is important for RNA loop recognition through a base-flipping process. This study provides fundamental insights into the role of the shape of the protein surface and RNA secondary structures in mediating protein-RNA interactions.
Collapse
Affiliation(s)
| | | | | | | | - Naoya Kenmochi
- *To whom correspondence should be addressed. Tel/Fax: +81 985 85 9084;
| |
Collapse
|
46
|
Dror I, Shazman S, Mukherjee S, Zhang Y, Glaser F, Mandel-Gutfreund Y. Predicting nucleic acid binding interfaces from structural models of proteins. Proteins 2011; 80:482-9. [PMID: 22086767 DOI: 10.1002/prot.23214] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Revised: 09/27/2011] [Accepted: 09/30/2011] [Indexed: 11/06/2022]
Abstract
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure.
Collapse
Affiliation(s)
- Iris Dror
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel 32000
| | | | | | | | | | | |
Collapse
|
47
|
Zhao H, Yang Y, Zhou Y. Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction. RNA Biol 2011; 8:988-96. [PMID: 21955494 DOI: 10.4161/rna.8.6.17813] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
A full understanding of the mechanism of post- transcriptional regulation requires more than simple two- state prediction (binding or not binding) for RNA binding proteins. Here we report a sequence-based technique dedicated for predicting complex structures of protein and RNA by combining fold recognition with binding affinity prediction. The method not only provides a highly accurate complex structure prediction (77% of residues are within 4°A RMSD from native in average for the independent test set) but also achieves the best performing two-state binding or non-binding prediction with an accuracy of 98%, precision of 84%, and Mathews correlation coefficient (MCC) of 0.62. Moreover, it predicts binding residues with an accuracy of 84%, precision of 66% and MCC value of 0.51. In addition, it has a success rate of 77% in predicting RNA binding types (mRNA, tRNA or rRNA). We further demonstrate that it makes more than 10% improvement either in precision or sensitivity than PSI- BLAST, HHPRED and our previously developed structure- based technique. This method expects to be useful for highly accurate genome-scale, high-resolution prediction of RNA-binding proteins and their complex structures. A web server (SPOT) is freely available for academic users at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, IN, USA
| | | | | |
Collapse
|
48
|
Computational methods for prediction of protein-RNA interactions. J Struct Biol 2011; 179:261-8. [PMID: 22019768 DOI: 10.1016/j.jsb.2011.10.001] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 09/28/2011] [Accepted: 10/04/2011] [Indexed: 12/21/2022]
Abstract
Understanding the molecular mechanism of protein-RNA recognition and complex formation is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes by X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) is tedious and difficult. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental observations, computational predictions can be sufficiently accurate to prompt functional hypotheses and guide experiments, e.g. to identify individual amino acid or nucleotide residues. In this article we review 10 methods for predicting protein-RNA interactions, seven of which predict RNA-binding sites from protein sequences and three from structures. We also developed a meta-predictor that uses the output of top three sequence-based primary predictors to calculate a consensus prediction, which outperforms all the primary predictors. In order to fully cover the software for predicting protein-RNA interactions, we also describe five methods for protein-RNA docking. The article highlights the strengths and shortcomings of existing methods for the prediction of protein-RNA interactions and provides suggestions for their further development.
Collapse
|
49
|
Shazman S, Elber G, Mandel-Gutfreund Y. From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces. Nucleic Acids Res 2011; 39:7390-9. [PMID: 21693557 PMCID: PMC3177183 DOI: 10.1093/nar/gkr395] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Protein nucleic acid interactions play a critical role in all steps of the gene expression pathway. Nucleic acid (NA) binding proteins interact with their partners, DNA or RNA, via distinct regions on their surface that are characterized by an ensemble of chemical, physical and geometrical properties. In this study, we introduce a novel methodology based on differential geometry, commonly used in face recognition, to characterize and predict NA binding surfaces on proteins. Applying the method on experimentally solved three-dimensional structures of proteins we successfully classify double-stranded DNA (dsDNA) from single-stranded RNA (ssRNA) binding proteins, with 83% accuracy. We show that the method is insensitive to conformational changes that occur upon binding and can be applicable for de novo protein-function prediction. Remarkably, when concentrating on the zinc finger motif, we distinguish successfully between RNA and DNA binding interfaces possessing the same binding motif even within the same protein, as demonstrated for the RNA polymerase transcription-factor, TFIIIA. In conclusion, we present a novel methodology to characterize protein surfaces, which can accurately tell apart dsDNA from an ssRNA binding interfaces. The strength of our method in recognizing fine-tuned differences on NA binding interfaces make it applicable for many other molecular recognition problems, with potential implications for drug design.
Collapse
Affiliation(s)
- Shula Shazman
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel
| | | | | |
Collapse
|
50
|
Zeng T, Li J, Liu J. Distinct interfacial biclique patterns between ssDNA-binding proteins and those with dsDNAs. Proteins 2011; 79:598-610. [PMID: 21120860 DOI: 10.1002/prot.22908] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We introduce a new motif called interfacial biclique pattern to study the difference between double-stranded DNA-binding proteins (DSBs, most of them also known to play the role as transcriptional factors) and single-stranded DNA-binding proteins (SSBs) which are found to involve in many applications recently. An interfacial biclique pattern in a protein-DNA complex usually consists of a group of residues and a group of nucleotides such that every residue has a contact to all of the bases. The proposal of this idea is based on a biological redundancy mechanism that: a site mutation has little influence on the other residues to recognize the target nucleotides and vice versa. The distribution of the residues on the interfacial motifs is investigated to identify distinct stable preferred residues, stable un-preferred residues and unstable preferred residues between SSBs and DSBs. We also examine residue co-occurrence and residue-base association rules in the interfacial motifs to uncover the different choices of residue combinations by SSBs and DSBs that have contacts with one or more bases. We found that DSBs and SSBs have their own right residues at the right places for the binding preference and association with nucleotides. Some of our results can be supported by literature work.
Collapse
Affiliation(s)
- Tao Zeng
- School of Computer, Wuhan University, Wuhan, Hubei, China 430072
| | | | | |
Collapse
|