1
|
Zuo Y, Chen H, Yang L, Chen R, Zhang X, Deng Z. Research progress on prediction of RNA-protein binding sites in the past five years. Anal Biochem 2024; 691:115535. [PMID: 38643894 DOI: 10.1016/j.ab.2024.115535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/23/2024]
Abstract
Accurately predicting RNA-protein binding sites is essential to gain a deeper comprehension of the protein-RNA interactions and their regulatory mechanisms, which are fundamental in gene expression and regulation. However, conventional biological approaches to detect these sites are often costly and time-consuming. In contrast, computational methods for predicting RNA protein binding sites are both cost-effective and expeditious. This review synthesizes already existing computational methods, summarizing commonly used databases for predicting RNA protein binding sites. In addition, applications and innovations of computational methods using traditional machine learning and deep learning for RNA protein binding site prediction during 2018-2023 are presented. These methods cover a wide range of aspects such as effective database utilization, feature selection and encoding, innovative classification algorithms, and evaluation strategies. Exploring the limitations of existing computational methods, this paper delves into the potential directions for future development. DeepRKE, RDense, and DeepDW all employ convolutional neural networks and long and short-term memory networks to construct prediction models, yet their algorithm design and feature encoding differ, resulting in diverse prediction performances.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Huixian Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Lele Yang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Ruoyan Chen
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Xiaoyao Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| |
Collapse
|
2
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
3
|
Pal A, Chakrabarti P, Dey S. ProDFace: A web-tool for the dissection of protein-DNA interfaces. Front Mol Biosci 2022; 9:978310. [PMID: 36148013 PMCID: PMC9486321 DOI: 10.3389/fmolb.2022.978310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/09/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-DNA interactions play a crucial role in gene expression and regulation. Identifying the DNA binding surface of proteins has long been a challenge–in comparison to protein-protein interactions, limited progress has been made in the development of efficient DNA binding site prediction and protein-DNA docking methods. Here we present ProDFace, a web tool that characterizes the binding region of a protein-DNA complex based on amino acid propensity, hydrogen bond (HB) donor capacity (number of solvent accessible HB donor groups), sequence conservation at the interface core and rim region, and geometry. The program takes as input the structure of a protein-DNA complex in PDB (Protein Data Bank) format, and outputs various physicochemical and geometric parameters of the interface, as well as conservation of the interface residues in the protein component. Values are provided for the whole interface, and after dissecting it into core and rim regions. Details of water mediated HBs between protein and DNA, potential HB donor groups present at the binding surface of protein, and conserved interface residues are also provided as downloadable text files. These parameters can be useful in evaluating and validating protein-DNA docking solutions, structures derived from simulation as well as solutions from the available prediction tools, and facilitate the development of more efficient prediction methods. The web-tool is freely available at structbioinfo.iitj.ac.in/resources/bioinfo/pd_interface.
Collapse
Affiliation(s)
- Arumay Pal
- School of Bioengineering, Vellore Institute of Technology, Bhopal, India
| | | | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Karwar, India
- *Correspondence: Sucharita Dey,
| |
Collapse
|
4
|
Oncul AB, Celik Y, Unel NM, Baloglu MC. Bhlhdb: A next generation database of basic helix loop helix transcription factors based on deep learning model. J Bioinform Comput Biol 2022; 20:2250014. [DOI: 10.1142/s0219720022500147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
5
|
Mei LC, Hao GF, Yang GF. Computational methods for predicting hotspots at protein-RNA interfaces. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1675. [PMID: 34080311 DOI: 10.1002/wrna.1675] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 11/10/2022]
Abstract
Protein-RNA interactions play essential roles in many critical biological events. A comprehensive understanding of the mechanisms underlying these interactions is helpful when studying cellular activities and therapeutic applications. Hotspots are a small portion of residues contributing much toward protein-RNA binding affinity. In pharmaceutical research, the hotspot residues are seen as the best option for designing small molecules to target proteins of therapeutic interest. With the accumulation of experimental data about protein-RNA interactions, computational methods have been produced for hotspot prediction on a large scale. In this review, we first present an overview of the existing databases for protein-RNA binding data. Furthermore, we outline the most adopted computational methods for hotspots prediction in protein-RNA interactions. Finally, we discuss the applications of hotspot prediction. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Methods > RNA Analyses In Vitro and In Silico.
Collapse
Affiliation(s)
- Long-Can Mei
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,Collaborative Innovation Center of Chemical Science and Engineering, Tianjin, China
| |
Collapse
|
6
|
Li H, Sze K, Lu G, Ballester PJ. Machine‐learning scoring functions for structure‐based drug lead optimization. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1465] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Hongjian Li
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
7
|
Emamjomeh A, Choobineh D, Hajieghrari B, MahdiNezhad N, Khodavirdipour A. DNA-protein interaction: identification, prediction and data analysis. Mol Biol Rep 2019; 46:3571-3596. [PMID: 30915687 DOI: 10.1007/s11033-019-04763-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 03/14/2019] [Indexed: 12/30/2022]
Abstract
Life in living organisms is dependent on specific and purposeful interaction between other molecules. Such purposeful interactions make the various processes inside the cells and the bodies of living organisms possible. DNA-protein interactions, among all the types of interactions between different molecules, are of considerable importance. Currently, with the development of numerous experimental techniques, diverse methods are convenient for recognition and investigating such interactions. While the traditional experimental techniques to identify DNA-protein complexes are time-consuming and are unsuitable for genome-scale studies, the current high throughput approaches are more efficient in determining such interaction at a large-scale, but they are clearly too costly to be practice for daily applications. Hence, according to the availability of much information related to different biological sequences and clearing different dimensions of conditions in which such interactions are formed, with the developments related to the computer, mathematics, and statistics motivate scientists to develop bioinformatics tools for prediction the interaction site(s). Until now, there has been much progress in this field. In this review, the factors and conditions governing the interaction and the laboratory techniques for examining such interactions are addressed. In addition, developed bioinformatics tools are introduced and compared for this reason and, in the end, several suggestions are offered for the promotion of such tools in prediction with much more precision.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran.
| | - Darush Choobineh
- Agricultural Biotechnology, Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Behzad Hajieghrari
- Department of Agricultural Biotechnology, College of Agriculture, Jahrom University, Jahrom, 74135-111, Iran.
| | - Nafiseh MahdiNezhad
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran
| | - Amir Khodavirdipour
- Division of Human Genetics, Department of Anatomy, St. John's hospital, Bangalore, India
| |
Collapse
|
8
|
Liu L, Xiong Y, Gao H, Wei DQ, Mitchell JC, Zhu X. dbAMEPNI: a database of alanine mutagenic effects for protein-nucleic acid interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4959188. [PMID: 29688380 PMCID: PMC5887268 DOI: 10.1093/database/bay034] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 03/15/2018] [Indexed: 01/08/2023]
Abstract
Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acid complexes. It contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information. Database URL: http://zhulab.ahu.edu.cn/dbAMEPNI
Collapse
Affiliation(s)
- Ling Liu
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyun Gao
- Information and Engineering College, Dalian University, Dalian 116622, Liaoning, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Julie C Mitchell
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA.,Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA.,Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN 37830, USA
| | - Xiaolei Zhu
- School of Life Sciences, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
9
|
Mallick Gupta A, Mukherjee S, Dutta A, Mukhopadhyay J, Bhattacharyya D, Mandal S. Identification of a suitable promoter for the sigma factor of Mycobacterium tuberculosis. MOLECULAR BIOSYSTEMS 2017; 13:2370-2378. [PMID: 28952652 DOI: 10.1039/c7mb00317j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Promoter binding specificity is one of the important characteristics of transcription by Mycobacterium tuberculosis (Mtb) sigma (σ) factors, which remains unexplored due to limited structural evidence. Our previous study on the structural features of Mtb-SigH, consisting of three alpha helices, and its interaction with core RNA polymerase has been extended herein to determine the little known DNA sequence recognition pattern involving its cognate promoters. Herein, high resolution X-ray crystallographic structures of the protein-DNA complexes were inspected to determine the tentative DNA-binding helix of the σ factor. The binding interface in the available crystal structures is found to be populated mainly with specific residues such as Arg, Asn, Lys, Gln, and Ser. We uncovered the helix 3 of Mtb-SigH containing most of these amino acids, which ranged from Arg 64 to Arg 75, forming the predicted active site. The complex of Mtb-SigH:DNA is modelled with 20 promoter sequences. The binding affinity is predicted by scoring these protein-DNA complexes through proximity and interaction parameters obtained by molecular dynamics simulations. The promoters are ranked considering hydrogen bonding, energy of interaction, buried surface area, and distance between centers of masses in interaction with the protein. The ranking is validated through in vitro transcription assays. The trends of these selected promoter interactions have shown variations parallel to the experimental evaluation, emphasizing the success of the active site determination along with screening of the promoter strength. The promoter interaction of Mtb-SigH can be highly beneficial for understanding the regulation of gene expression of a pathogen and also extends a solid platform to predict promoters for other bacterial σ factors.
Collapse
Affiliation(s)
- A Mallick Gupta
- Department of Microbiology, University of Calcutta, 35, Ballygunge Circular Road, Kolkata, 700019, India.
| | | | | | | | | | | |
Collapse
|
10
|
Ochoa-Montaño B, Blundell TL. XSuLT: a web server for structural annotation and representation of sequence-structure alignments. Nucleic Acids Res 2017; 45:W381-W387. [PMID: 28510698 PMCID: PMC5793734 DOI: 10.1093/nar/gkx421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/04/2017] [Indexed: 12/16/2022] Open
Abstract
The web server XSuLT, an enhanced version of the protein alignment annotation program JoY, formats a submitted multiple-sequence alignment using three-dimensional (3D) structural information in order to assist in the comparative analysis of protein evolution and in the optimization of alignments for comparative modelling and construct design. In addition to the features analysed by JoY, which include secondary structure, solvent accessibility and sidechain hydrogen bonds, XSuLT annotates each amino acid residue with residue depth, chain and ligand interactions, inter-residue contacts, sequence entropy, root mean square deviation and secondary structure and disorder prediction. It is also now integrated with built-in 3D visualization which interacts with the formatted alignment to facilitate inspection and understanding. Results can be downloaded as stand-alone HTML for the formatted alignment and as XML with the underlying annotation data. XSuLT is freely available at http://structure.bioc.cam.ac.uk/xsult/.
Collapse
Affiliation(s)
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
11
|
Wilson KA, Wetmore SD. Combining crystallographic and quantum chemical data to understand DNA-protein π-interactions in nature. Struct Chem 2017. [DOI: 10.1007/s11224-017-0954-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
12
|
Walia RR, El-Manzalawy Y, Honavar VG, Dobbs D. Sequence-Based Prediction of RNA-Binding Residues in Proteins. Methods Mol Biol 2017; 1484:205-235. [PMID: 27787829 PMCID: PMC5796408 DOI: 10.1007/978-1-4939-6406-2_15] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Collapse
Affiliation(s)
| | - Yasser El-Manzalawy
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Vasant G Honavar
- College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Drena Dobbs
- Genetics, Development and Cell Biology Department, Iowa State University, 3112 Molecular Biology Building, Ames, IA, 50011-3650, USA.
| |
Collapse
|
13
|
Jubb HC, Higueruelo AP, Ochoa-Montaño B, Pitt WR, Ascher DB, Blundell TL. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol 2016; 429:365-371. [PMID: 27964945 PMCID: PMC5282402 DOI: 10.1016/j.jmb.2016.12.004] [Citation(s) in RCA: 298] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 11/07/2016] [Accepted: 12/06/2016] [Indexed: 11/30/2022]
Abstract
Interactions between proteins and their ligands, such as small molecules, other proteins, and DNA, depend on specific interatomic interactions that can be classified on the basis of atom type and distance and angle constraints. Visualisation of these interactions provides insights into the nature of molecular recognition events and has practical uses in guiding drug design and understanding the structural and functional impacts of mutations. We present Arpeggio, a web server for calculating interactions within and between proteins and protein, DNA, or small-molecule ligands, including van der Waals', ionic, carbonyl, metal, hydrophobic, and halogen bond contacts, and hydrogen bonds and specific atom–aromatic ring (cation–π, donor–π, halogen–π, and carbon–π) and aromatic ring–aromatic ring (π–π) interactions, within user-submitted macromolecule structures. PyMOL session files can be downloaded, allowing high-quality publication images of the interactions to be generated. Arpeggio is implemented in Python and available as a user-friendly web interface at http://structure.bioc.cam.ac.uk/arpeggio/ and as a downloadable package at https://bitbucket.org/harryjubb/arpeggio. Enumeration and visualisation of molecular interactions can facilitate drug development and provide insights towards understanding the consequences of mutations in genetic diseases and protein engineering. Reliable and comprehensive methods to evaluate and visualise the full range of potential molecular interactions across many atom types present in protein structures are invaluable. Arpeggio calculates all intra- and interatomic interactions in macromolecular structures, including van der Waals', ionic, carbonyl, metal, hydrophobic, and halogen bond contacts, and hydrogen bonds and specific atom–aromatic ring (cation–π, donor–π, halogen–π, and carbon–π) and aromatic ring–aromatic ring (π–π) interactions, within a provided Protein Data Bank file. Calculations can be within or between any combination of protein, DNA, or small organic molecules. The Arpeggio web server (http://bleoberis.bioc.cam.ac.uk/arpeggioweb/) was implemented to provide a freely available, user-friendly web interface for the exploration of molecular interactions within protein structures, including through WebGL-based visualisation of interactions and downloadable interactive PyMOL session files. Arpeggio is written in Python, requires only Open Source dependencies, and is freely available for download at https://bitbucket.org/harryjubb/arpeggio for use in custom analyses.
Collapse
Affiliation(s)
- Harry C Jubb
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | - Alicia P Higueruelo
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Bernardo Ochoa-Montaño
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Will R Pitt
- UCB, 208 Bath Road, Slough, West Berkshire SL1 3WE, UK
| | - David B Ascher
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| | - Tom L Blundell
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
| |
Collapse
|
14
|
Ghosh P, Sowdhamini R. Genome-wide survey of putative RNA-binding proteins encoded in the human proteome. MOLECULAR BIOSYSTEMS 2016; 12:532-40. [PMID: 26675803 DOI: 10.1039/c5mb00638d] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
RNA-binding proteins (RBPs) are involved in various post-transcriptional gene regulatory processes and are also functionally important members of the ribosome and the spliceosome. However, RBPs and their interactions with RNA are less well-studied in comparison to DNA-binding proteins. We have classified the existing RBP structures, available in complexes with RNA and RNA/DNA hybrids, into different structural families and created Hidden Markov Models (HMMs). These structure-centric family HMMs, along with the sequence-centric family HMMs, were used as a primary database to systematically search the human proteome for the presence of putative RBPs. We have found more than 2600 gene products with RBP signatures in humans, of which around 28% are likely to bind to RNA but not DNA, whereas 9% might bind to both RNA and DNA. 11% of them do not contain an explicit functional annotation yet. Nearly 30% of the putative RBPs are exclusively nuclear, 15% have known disease associations and around 30% are enzymes. Around 40% of the proteins identified in this study are novel and have not been reported by recent large-scale studies on human RBPs.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| | - R Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka 560 065, India.
| |
Collapse
|
15
|
Ghosh P, Mathew OK, Sowdhamini R. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information. BMC Bioinformatics 2016; 17:411. [PMID: 27717309 PMCID: PMC5054549 DOI: 10.1186/s12859-016-1289-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 09/29/2016] [Indexed: 11/25/2022] Open
Abstract
Background RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. Results The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. Conclusions RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1289-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pritha Ghosh
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India
| | - Oommen K Mathew
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India.,SASTRA University, Tirumalaisamudram, Thanjavur, 613401, Tamil Nadu, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, Karnataka, 560 065, India.
| |
Collapse
|
16
|
Li H, Leung KS, Wong MH, Ballester PJ. USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques. Nucleic Acids Res 2016; 44:W436-41. [PMID: 27106057 PMCID: PMC4987897 DOI: 10.1093/nar/gkw320] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 04/06/2016] [Indexed: 12/12/2022] Open
Abstract
Ligand-based Virtual Screening (VS) methods aim at identifying molecules with a similar activity profile across phenotypic and macromolecular targets to that of a query molecule used as search template. VS using 3D similarity methods have the advantage of biasing this search toward active molecules with innovative chemical scaffolds, which are highly sought after in drug design to provide novel leads with improved properties over the query molecule (e.g. patentable, of lower toxicity or increased potency). Ultrafast Shape Recognition (USR) has demonstrated excellent performance in the discovery of molecules with previously-unknown phenotypic or target activity, with retrospective studies suggesting that its pharmacophoric extension (USRCAT) should obtain even better hit rates once it is used prospectively. Here we present USR-VS (http://usr.marseille.inserm.fr/), the first web server using these two validated ligand-based 3D methods for large-scale prospective VS. In about 2 s, 93.9 million 3D conformers, expanded from 23.1 million purchasable molecules, are screened and the 100 most similar molecules among them in terms of 3D shape and pharmacophoric properties are shown. USR-VS functionality also provides interactive visualization of the similarity of the query molecule against the hit molecules as well as vendor information to purchase selected hits in order to be experimentally tested.
Collapse
Affiliation(s)
- Hongjian Li
- Institute of Future Cities, Chinese University of Hong Kong, Hong Kong
| | - Kwong-S Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Man-H Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, 13009-Marseille, France
| |
Collapse
|
17
|
Kuang X, Dhroso A, Han JG, Shyu CR, Korkin D. DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:bav114. [PMID: 26827237 PMCID: PMC4733329 DOI: 10.1093/database/bav114] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 11/16/2015] [Indexed: 11/14/2022]
Abstract
Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction's mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein-protein interactions or protein-DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1,040,000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43,000 RNA-mediated interactions, and ∼12,000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org.
Collapse
Affiliation(s)
- Xingyan Kuang
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Andi Dhroso
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Jing Ginger Han
- Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Chi-Ren Shyu
- Informatics Institute, University of Missouri, Columbia, MO, USA, Department of Electrical and Computer Engineering, Department of Computer Science, University of Missouri, Columbia, MO, USA
| | - Dmitry Korkin
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA,
| |
Collapse
|
18
|
AlQuraishi M, Tang S, Xia X. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system. BMC Bioinformatics 2015; 16:390. [PMID: 26586237 PMCID: PMC4653904 DOI: 10.1186/s12859-015-0819-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 11/11/2015] [Indexed: 11/28/2022] Open
Abstract
Background Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. Description We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. .,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA.
| | - Shengdong Tang
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| | - Xide Xia
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| |
Collapse
|
19
|
Ascher DB, Jubb HC, Pires DEV, Ochi T, Higueruelo A, Blundell TL. Protein-Protein Interactions: Structures and Druggability. MULTIFACETED ROLES OF CRYSTALLOGRAPHY IN MODERN DRUG DISCOVERY 2015. [DOI: 10.1007/978-94-017-9719-1_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
20
|
Wilson KA, Wetmore SD. A Survey of DNA–Protein π–Interactions: A Comparison of Natural Occurrences and Structures, and Computationally Predicted Structures and Strengths. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2015. [DOI: 10.1007/978-3-319-14163-3_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
21
|
Halder S, Bhattacharyya D. RNA structure and dynamics: a base pairing perspective. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2013; 113:264-83. [PMID: 23891726 DOI: 10.1016/j.pbiomolbio.2013.07.003] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Revised: 06/25/2013] [Accepted: 07/16/2013] [Indexed: 12/12/2022]
Abstract
RNA is now known to possess various structural, regulatory and enzymatic functions for survival of cellular organisms. Functional RNA structures are generally created by three-dimensional organization of small structural motifs, formed by base pairing between self-complementary sequences from different parts of the RNA chain. In addition to the canonical Watson-Crick or wobble base pairs, several non-canonical base pairs are found to be crucial to the structural organization of RNA molecules. They appear within different structural motifs and are found to stabilize the molecule through long-range intra-molecular interactions between basic structural motifs like double helices and loops. These base pairs also impart functional variation to the minor groove of A-form RNA helices, thus forming anchoring site for metabolites and ligands. Non-canonical base pairs are formed by edge-to-edge hydrogen bonding interactions between the bases. A large number of theoretical studies have been done to detect and analyze these non-canonical base pairs within crystal or NMR derived structures of different functional RNA. Theoretical studies of these isolated base pairs using ab initio quantum chemical methods as well as molecular dynamics simulations of larger fragments have also established that many of these non-canonical base pairs are as stable as the canonical Watson-Crick base pairs. This review focuses on the various structural aspects of non-canonical base pairs in the organization of RNA molecules and the possible applications of these base pairs in predicting RNA structures with more accuracy.
Collapse
Affiliation(s)
- Sukanya Halder
- Biophysics division, Saha Institute of Nuclear Physics, 1/AF, Bidhannagar, Kolkata 700 064, India
| | | |
Collapse
|
22
|
Schreyer AM, Blundell TL. CREDO: a structural interactomics database for drug discovery. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat049. [PMID: 23868908 PMCID: PMC3715132 DOI: 10.1093/database/bat049] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL:http://www-cryst.bioc.cam.ac.uk/credo
Collapse
Affiliation(s)
- Adrian M Schreyer
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, CB2 1GA Cambridge, UK.
| | | |
Collapse
|
23
|
Kirsanov DD, Zanegina ON, Aksianov EA, Spirin SA, Karyagina AS, Alexeevski AV. NPIDB: Nucleic acid-Protein Interaction DataBase. Nucleic Acids Res 2012. [PMID: 23193292 PMCID: PMC3531207 DOI: 10.1093/nar/gks1199] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
Collapse
Affiliation(s)
- Dmitry D Kirsanov
- Department of Mathematical Methods in Biology, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
24
|
Turner D, Kim R, Guo JT. TFinDit: transcription factor-DNA interaction data depository. BMC Bioinformatics 2012; 13:220. [PMID: 22943312 PMCID: PMC3483241 DOI: 10.1186/1471-2105-13-220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 08/23/2012] [Indexed: 11/28/2022] Open
Abstract
Background One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions. Description TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. Conclusions TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
Collapse
Affiliation(s)
- Daniel Turner
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | |
Collapse
|
25
|
Ray SS, Halder S, Kaypee S, Bhattacharyya D. HD-RNAS: An Automated Hierarchical Database of RNA Structures. Front Genet 2012; 3:59. [PMID: 22529851 PMCID: PMC3329738 DOI: 10.3389/fgene.2012.00059] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 03/29/2012] [Indexed: 11/13/2022] Open
Abstract
One of the important goals of most biological investigations is to classify and organize the experimental findings so that they are readily useful for deriving generalized rules. Although there is a huge amount of information on RNA structures in PDB, there are redundant files, ambiguous synthetic sequences etc. Moreover, a systematic hierarchical organization, reflecting RNA classification, is missing in PDB. In this investigation, we have classified all the available RNA structures from PDB through a programmatic approach. Hence, it would be now a simple assignment to regularly update the classification as and when new structures are released. The classification can further determine (i) a non-redundant set of RNA structures and (ii) if available, a set of structures of identical sequence and function, which can highlight structural polymorphism, ligand-induced conformational alterations etc. Presently, we have classified the available structures (2095 PDB entries having RNA chain longer than nine nucleotides solved by X-ray crystallography or NMR spectroscopy) into nine functional classes. The structures of same function and same source are mostly seen to be similar with subtle differences depending on their functional complexation. The web-server is available online at http://www.saha.ac.in/biop/www/HD-RNAS.html and is updated regularly.
Collapse
|
26
|
Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics 2011; 12:489. [PMID: 22192482 PMCID: PMC3322362 DOI: 10.1186/1471-2105-12-489] [Citation(s) in RCA: 380] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2011] [Accepted: 12/22/2011] [Indexed: 11/22/2022] Open
Abstract
Background RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions. Results We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens. Conclusions Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.
Collapse
Affiliation(s)
- Usha K Muppirala
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, Iowa, USA.
| | | | | |
Collapse
|
27
|
Teyra J, Samsonov SA, Schreiber S, Pisabarro MT. SCOWLP update: 3D classification of protein-protein, -peptide, -saccharide and -nucleic acid interactions, and structure-based binding inferences across folds. BMC Bioinformatics 2011; 12:398. [PMID: 21992011 PMCID: PMC3210135 DOI: 10.1186/1471-2105-12-398] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Accepted: 10/13/2011] [Indexed: 11/10/2022] Open
Abstract
Background Protein interactions are essential for coordinating cellular functions. Proteomic studies have already elucidated a huge amount of protein-protein interactions that require detailed functional analysis. Understanding the structural basis of each individual interaction through their structural determination is necessary, yet an unfeasible task. Therefore, computational tools able to predict protein binding regions and recognition modes are required to rationalize putative molecular functions for proteins. With this aim, we previously created SCOWLP, a structural classification of protein binding regions at protein family level, based on the information obtained from high-resolution 3D protein-protein and protein-peptide complexes. Description We present here a new version of SCOWLP that has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. SCOWLP takes interfacial solvent into account for a detailed characterization of protein interactions. In addition, the binding regions obtained per protein family have been enriched by the inclusion of predicted binding regions, which have been inferred from structurally related proteins across all existing folds. These inferences might become very useful to suggest novel recognition regions and compare structurally similar interfaces from different families. Conclusions The updated SCOWLP has new functionalities that allow both, detection and comparison of protein regions recognizing different types of ligands, which include other proteins, peptides, nucleic acids and saccharides, within a solvated environment. Currently, SCOWLP allows the analysis of predicted protein binding regions based on structure-based inferences across fold space. These predictions may have a unique potential in assisting protein docking, in providing insights into protein interaction networks, and in guiding rational engineering of protein ligands. The newly designed SCOWLP web application has an improved user-friendly interface that facilitates its usage, and is available at http://www.scowlp.org.
Collapse
Affiliation(s)
- Joan Teyra
- Structural Bioinformatics BIOTEC TU Dresden, Tatzberg 47-51 01037 Dresden, Germany.
| | | | | | | |
Collapse
|
28
|
Bickerton GR, Higueruelo AP, Blundell TL. Comprehensive, atomic-level characterization of structurally characterized protein-protein interactions: the PICCOLO database. BMC Bioinformatics 2011; 12:313. [PMID: 21801404 PMCID: PMC3161047 DOI: 10.1186/1471-2105-12-313] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Accepted: 07/29/2011] [Indexed: 12/04/2022] Open
Abstract
Background Structural studies are increasingly providing huge amounts of information on multi-protein assemblies. Although a complete understanding of cellular processes will be dependent on an explicit characterization of the intermolecular interactions that underlie these assemblies and mediate molecular recognition, these are not well described by standard representations. Results Here we present PICCOLO, a comprehensive relational database capturing the details of structurally characterized protein-protein interactions. Interactions are described at the level of interacting pairs of atoms, residues and polypeptide chains, with the physico-chemical nature of the interactions being characterized. Distance and angle terms are used to distinguish 12 different interaction types, including van der Waals contacts, hydrogen bonds and hydrophobic contacts. The explicit aim of PICCOLO is to underpin large-scale analyses of the properties of protein-protein interfaces. This is exemplified by an analysis of residue propensity and interface contact preferences derived from a much larger data set than previously reported. However, PICCOLO also supports detailed inspection of particular systems of interest. Conclusions The current PICCOLO database comprises more than 260 million interacting atom pairs from 38,202 protein complexes. A web interface for the database is available at http://www-cryst.bioc.cam.ac.uk/piccolo.
Collapse
Affiliation(s)
- George R Bickerton
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| | | | | |
Collapse
|
29
|
Li Y, Wang C. Rapid evaluation of the binding energies between peptide amide and DNA base. J Comput Chem 2011; 32:2765-73. [DOI: 10.1002/jcc.21856] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 04/03/2011] [Accepted: 05/13/2011] [Indexed: 01/22/2023]
Affiliation(s)
- Yang Li
- School of Chemistry and Chemical Engineering, Liaoning Normal University, Dalian 116029, People's Republic of China
| | - Chang‐Sheng Wang
- School of Chemistry and Chemical Engineering, Liaoning Normal University, Dalian 116029, People's Republic of China
| |
Collapse
|
30
|
Gong S, Worth CL, Cheng TMK, Blundell TL. Meet Me Halfway: When Genomics Meets Structural Bioinformatics. J Cardiovasc Transl Res 2011; 4:281-303. [DOI: 10.1007/s12265-011-9259-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 02/08/2011] [Indexed: 01/08/2023]
|
31
|
Abstract
Molecular shape complementarity is widely recognized as a key indicator of biological activity. Unfortunately, efficient computation of shape similarity is challenging, which severely limits the potential of shape-based virtual screening. Ultrafast shape recognition (USR) is a recent shape similarity technique that is characterized by its extremely high speed of operation. Here we review important methodological aspects for the optimal application of USR as well as its first applications to medicinal chemistry problems. These applications already include several particularly successful prospective virtual screens, which shows the important role that USR can play in identifying bioactive molecules to be used as chemical probes and potentially as starting points for the drug-discovery process.
Collapse
Affiliation(s)
- Pedro J Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
32
|
Lewis BA, Walia RR, Terribilini M, Ferguson J, Zheng C, Honavar V, Dobbs D. PRIDB: a Protein-RNA interface database. Nucleic Acids Res 2011; 39:D277-82. [PMID: 21071426 PMCID: PMC3013700 DOI: 10.1093/nar/gkq1108] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Revised: 10/15/2010] [Accepted: 10/18/2010] [Indexed: 11/25/2022] Open
Abstract
The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at http://bindr.gdcb.iastate.edu/PRIDB.
Collapse
Affiliation(s)
- Benjamin A Lewis
- Bioinformatics and Computational Biology Program, Iowa State University, Iowa, USA.
| | | | | | | | | | | | | |
Collapse
|
33
|
Churchill CDM, Rutledge LR, Wetmore SD. Effects of the biological backbone on stacking interactions at DNA-protein interfaces: the interplay between the backbone···π and π···π components. Phys Chem Chem Phys 2010; 12:14515-26. [PMID: 20927465 DOI: 10.1039/c0cp00550a] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The (gas-phase) MP2/6-31G*(0.25) π···π stacking interactions between the five natural bases and the aromatic amino acids calculated using (truncated) monomers composed of conjugated rings and/or (extended) monomers containing the biological backbone (either the protein backbone or deoxyribose sugar) were previously compared. Although preliminary energetic results indicated that the protein backbone strengthens, while the deoxyribose sugar either strengthens or weakens, the interaction calculated using truncated models, the reasons for these effects were unknown. The present work explains these observations by dissecting the interaction energy of the extended complexes into individual backbone···π and π···π components. Our calculations reveal that the total interaction energy of the extended complex can be predicted as a sum of the backbone···π and π···π components, which indicates that the biological backbone does not significantly affect the ring system through π-polarization. Instead, we find that the backbone can indirectly affect the magnitude of the π···π contribution by changing the relative ring orientations in extended dimers compared with truncated dimers. Furthermore, the strengths of the individual backbone···π contributions are determined to be significant (up to 18 kJ mol(-1)). Therefore, the origin of the energetic change upon model extension is found to result from a balance between an additional (attractive) backbone···π component and differences in the strength of the π···π interaction. In addition, to understand the effects of the biological backbone on the stacking interactions at DNA-protein interfaces in nature, we analyzed the stacking interactions found in select DNA-protein crystal structures, and verified that an additive approach can be used to examine the strength of these interactions in biological complexes. Interestingly, although the presence of attractive backbone···π contacts is qualitatively confirmed using the quantum theory of atoms in molecules (QTAIM), QTAIM electron density analysis is unable to quantitatively predict the additive relationship of these interactions. Most importantly, this work reveals that both the backbone···π and π···π components must be carefully considered to accurately determine the overall stability of DNA-protein assemblies.
Collapse
Affiliation(s)
- Cassandra D M Churchill
- Department of Chemistry and Biochemistry, University of Lethbridge, 4401 University Drive, Lethbridge, Alberta, Canada T1K 3M4
| | | | | |
Collapse
|
34
|
Wren JD, Kupfer DM, Perkins EJ, Bridges S, Berleant D. Proceedings of the 2010 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 2010; 11 Suppl 6:S1. [PMID: 20946592 PMCID: PMC3026356 DOI: 10.1186/1471-2105-11-s6-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
35
|
Norambuena T, Melo F. The Protein-DNA Interface database. BMC Bioinformatics 2010; 11:262. [PMID: 20482798 PMCID: PMC2885377 DOI: 10.1186/1471-2105-11-262] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2010] [Accepted: 05/18/2010] [Indexed: 12/12/2022] Open
Abstract
The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.
Collapse
Affiliation(s)
- Tomás Norambuena
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | | |
Collapse
|
36
|
Contreras-Moreira B. 3D-footprint: a database for the structural analysis of protein-DNA complexes. Nucleic Acids Res 2009; 38:D91-7. [PMID: 19767616 PMCID: PMC2808867 DOI: 10.1093/nar/gkp781] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein–DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain and Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
- *To whom correspondence should be addressed. Tel: +34 976716089;
| |
Collapse
|
37
|
Structural and functional restraints in the evolution of protein families and superfamilies. Biochem Soc Trans 2009; 37:727-33. [DOI: 10.1042/bst0370727] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Divergent evolution of proteins reflects both selectively advantageous and neutral amino acid substitutions. In the present article, we examine restraints on sequence, which arise from selectively advantageous roles for structure and function and which lead to the conservation of local sequences and structures in families and superfamilies. We analyse structurally aligned members of protein families and superfamilies in order to investigate the importance of the local structural environment of amino acid residues in the acceptance of amino acid substitutions during protein evolution. We show that solvent accessibility is the most important determinant, followed by the existence of hydrogen bonds from the side-chain to main-chain functions and the nature of the element of secondary structure to which the amino acid contributes. Polar side chains whose hydrogen-bonding potential is satisfied tend to be more conserved than their unsatisfied or non-hydrogen-bonded counterparts, and buried and satisfied polar residues tend to be significantly more conserved than buried hydrophobic residues. Finally, we discuss the importance of functional restraints in the form of interactions of proteins with other macromolecules in assemblies or with substrates, ligands or allosteric regulators. We show that residues involved in such functional interactions are significantly more conserved and have differing amino acid substitution patterns.
Collapse
|
38
|
Lee S, Brown A, Pitt WR, Higueruelo AP, Gong S, Bickerton GR, Schreyer A, Tanramluk D, Baylay A, Blundell TL. Structural interactomics: informatics approaches to aid the interpretation of genetic variation and the development of novel therapeutics. MOLECULAR BIOSYSTEMS 2009; 5:1456-72. [DOI: 10.1039/b906402h] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|