1
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
2
|
Liu N, Yang Z, Liu Y, Dang X, Zhang Q, Wang J, Liu X, Zhang J, Pan X. Identification of a Putative SARS-CoV-2 Main Protease Inhibitor through In Silico Screening of Self-Designed Molecular Library. Int J Mol Sci 2023; 24:11390. [PMID: 37511149 PMCID: PMC10379331 DOI: 10.3390/ijms241411390] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/06/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
There have been outbreaks of SARS-CoV-2 around the world for over three years, and its variants continue to evolve. This has become a major global health threat. The main protease (Mpro, also called 3CLpro) plays a key role in viral replication and proliferation, making it an attractive drug target. Here, we have identified a novel potential inhibitor of Mpro, by applying the virtual screening of hundreds of nilotinib-structure-like compounds that we designed and synthesized. The screened compounds were assessed using SP docking, XP docking, MM-GBSA analysis, IFD docking, MD simulation, ADME/T prediction, and then an enzymatic assay in vitro. We finally identified the compound V291 as a potential SARS-CoV-2 Mpro inhibitor, with a high docking affinity and enzyme inhibitory activity. Moreover, the docking results indicate that His41 is a favorable amino acid for pi-pi interactions, while Glu166 can participate in salt-bridge formation with the protonated primary or secondary amines in the screened molecules. Thus, the compounds reported here are capable of engaging the key amino acids His41 and Glu166 in ligand-receptor interactions. A pharmacophore analysis further validates this assertion.
Collapse
Affiliation(s)
- Nanxin Liu
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Zeyu Yang
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Yuying Liu
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Xintao Dang
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Qingqing Zhang
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Jin Wang
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Xueying Liu
- School of Pharmacy, The Fourth Military Medical University, Xi'an 710032, China
| | - Jie Zhang
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| | - Xiaoyan Pan
- School of Pharmacy, Health Science Center, Xi'an Jiaotong University, Xi'an 710061, China
| |
Collapse
|
3
|
Sarkar T, Chen Y, Wang Y, Chen Y, Chen F, Reaux CR, Moore LE, Raghavan V, Xu W. Introducing mirror-image discrimination capability to the TSR-based method for capturing stereo geometry and understanding hierarchical structure relationships of protein receptor family. Comput Biol Chem 2023; 103:107824. [PMID: 36753783 PMCID: PMC9992349 DOI: 10.1016/j.compbiolchem.2023.107824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 01/17/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023]
Abstract
We have developed a Triangular Spatial Relationship (TSR)-based computational method for protein structure comparison and motif discovery that is both sequence and structure alignment-free. A protein 3D structure is modeled by all possible triangles that are constructed with every three Cα atoms of amino acids as vertices. Every triangle is represented using an integer (a key). The keys are calculated by a rule-based formula which is a function of a representative length, a representative angle, and the vertex labels associated with amino acids. A 3D structure is thereby represented by a vector of integers (TSR keys). Global or local structure comparisons are achieved by computing all keys or a set of keys, respectively. Many enzymatic reactions and notable marketed drugs are highly stereospecific. Thus, in this paper, we propose a modified key calculation formula by including a mechanism for discriminating mirror-image keys to capture stereo geometry. We assign a positive or a negative sign to the integers representing mirror-image keys. Applying the new key calculation function provides the ability to further discriminate mirror-image keys that were previously considered identical. As the result, applying the mirror-image discrimination capability (i) significantly increases the number of distinct keys; (ii) decreases the number of common keys; (iii) decreases structural similarity; (iv) increases the opportunity to identify specific keys for each type of the receptors. The specific keys identified in this study for the cases of without (not applying) and with (applying) mirror-image discrimination can be considered as the structure signatures that exclusively belong to a certain type of receptors. Applying mirror-image discrimination introduces stereospecificity to keys for allowing more precise modeling of ligand - target interactions. The development of mirror-image TSR keys of Cα atom, in conjunction with the integration of Cα TSR keys with all-atom TSR keys for amino acids and drugs, will lead to a new and promising computational method for aiding drug design and discovery.
Collapse
Affiliation(s)
- Titli Sarkar
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA; The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
| | - Yuwu Chen
- San Diego Supercomputer Center, University of California San Diego, Gilman Drive, La Jolla, CA 92093, USA
| | - Yu Wang
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Yixin Chen
- Department of Computer and Information Science, The University of Mississippi, MS 38677, USA
| | - Feng Chen
- High Performance Computing, Frey Computing Services Center, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Camille R Reaux
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Laura E Moore
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504, USA
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, P.O. Box 44370, Lafayette, LA 70504, USA.
| |
Collapse
|
4
|
In Search of a Dynamical Vocabulary: A Pipeline to Construct a Basis of Shared Traits in Large-Scale Motions of Proteins. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The paradigmatic sequence–structure–dynamics–function relation in proteins is currently well established in the scientific community; in particular, a large effort has been made to probe the first connection, indeed providing convincing evidence of its strength and rationalizing it in a quantitative and general framework. In contrast, however, the role of dynamics as a link between structure and function has eluded a similarly clear-cut verification and description. In this work, we propose a pipeline aimed at building a basis for the quantitative characterization of the large-scale dynamics of a set of proteins, starting from the sole knowledge of their native structures. The method hinges on a dynamics-based clusterization, which allows a straightforward comparison with structural and functional protein classifications. The resulting basis set, obtained through the application to a group of related proteins, is shown to reproduce the salient large-scale dynamical features of the dataset. Most interestingly, the basis set is shown to encode the fluctuation patterns of homologous proteins not belonging to the initial dataset, thus highlighting the general applicability of the pipeline used to build it.
Collapse
|
5
|
Jukič M, Janežič D, Bren U. Potential Novel Thioether-Amide or Guanidine-Linker Class of SARS-CoV-2 Virus RNA-Dependent RNA Polymerase Inhibitors Identified by High-Throughput Virtual Screening Coupled to Free-Energy Calculations. Int J Mol Sci 2021; 22:11143. [PMID: 34681802 PMCID: PMC8540652 DOI: 10.3390/ijms222011143] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/10/2021] [Accepted: 10/13/2021] [Indexed: 01/18/2023] Open
Abstract
SARS-CoV-2, or severe acute respiratory syndrome coronavirus 2, represents a new pathogen from the family of Coronaviridae that caused a global pandemic of COVID-19 disease. In the absence of effective antiviral drugs, research of novel therapeutic targets such as SARS-CoV-2 RNA-dependent RNA polymerase (RdRp) becomes essential. This viral protein is without a human counterpart and thus represents a unique prospective drug target. However, in vitro biological evaluation testing on RdRp remains difficult and is not widely available. Therefore, we prepared a database of commercial small-molecule compounds and performed an in silico high-throughput virtual screening on the active site of the SARS-CoV-2 RdRp using ensemble docking. We identified a novel thioether-amide or guanidine-linker class of potential RdRp inhibitors and calculated favorable binding free energies of representative hits by molecular dynamics simulations coupled with Linear Interaction Energy calculations. This innovative procedure maximized the respective phase-space sampling and yielded non-covalent inhibitors representing small optimizable molecules that are synthetically readily accessible, commercially available as well as suitable for further biological evaluation and mode of action studies.
Collapse
Affiliation(s)
- Marko Jukič
- Laboratory of Physical Chemistry and Chemical Thermodynamics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova 17, SI-2000 Maribor, Slovenia;
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000 Koper, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000 Koper, Slovenia
| | - Urban Bren
- Laboratory of Physical Chemistry and Chemical Thermodynamics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova 17, SI-2000 Maribor, Slovenia;
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000 Koper, Slovenia
| |
Collapse
|
6
|
Zhu M, Song X, Chen P, Wang W, Wang B. dbHDPLS: A database of human disease-related protein-ligand structures. Comput Biol Chem 2019; 78:353-358. [PMID: 30665056 DOI: 10.1016/j.compbiolchem.2018.12.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 12/11/2018] [Accepted: 12/30/2018] [Indexed: 12/31/2022]
Abstract
Protein-ligand complexes perform specific functions, most of which are related to human diseases. The database, called as human disease-related protein-ligand structures (dbHDPLS), collected 8833 structures which were extracted from protein data bank (PDB) and other related databases. The database is annotated with comprehensive information involving ligands and drugs, related human diseases and protein-ligand interaction information, with the information of protein structures. The database may be a reliable resource for structure-based drug target discoveries and druggability predictions of protein-ligand binding sites, drug-disease relationships based on protein-ligand complex structures. It can be publicly accessed at the website: http://DeepLearner.ahu.edu.cn/web/dbDPLS/.
Collapse
Affiliation(s)
- Muchun Zhu
- Institutes of Physical Science and Information Technology, Anhui University, 230601 Hefei, Anhui, China
| | - Xiaoping Song
- Institutes of Physical Science and Information Technology, Anhui University, 230601 Hefei, Anhui, China
| | - Peng Chen
- School of Electrical and Information Engineering, Anhui University of Technology, 243032 Ma'anshan, Anhui, China; Institutes of Physical Science and Information Technology, Anhui University, 230601 Hefei, Anhui, China.
| | - Wenyan Wang
- School of Electrical and Information Engineering, Anhui University of Technology, 243032 Ma'anshan, Anhui, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, 243032 Ma'anshan, Anhui, China.
| |
Collapse
|
7
|
Baeissa H, Benstead-Hume G, Richardson CJ, Pearl FMG. Identification and analysis of mutational hotspots in oncogenes and tumour suppressors. Oncotarget 2017; 8:21290-21304. [PMID: 28423505 PMCID: PMC5400584 DOI: 10.18632/oncotarget.15514] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 02/07/2017] [Indexed: 01/25/2023] Open
Abstract
Background The key to interpreting the contribution of a disease-associated mutation in the development and progression of cancer is an understanding of the consequences of that mutation both on the function of the affected protein and on the pathways in which that protein is involved. Protein domains encapsulate function and position-specific domain based analysis of mutations have been shown to help elucidate their phenotypes. Results In this paper we examine the domain biases in oncogenes and tumour suppressors, and find that their domain compositions substantially differ. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects we mapped over one million mutations to their respective Pfam domains to identify which domains are enriched in any of three different classes of mutation; missense, indels or truncations. Next, we identified the mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domains We find that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. Conclusions By considering hotspots in tumour suppressors and oncogenes independently, we find that there are different specific positions within domain families that are particularly suited to accommodate either a loss or a gain of function mutation. The position is also dependent on the class of mutation. We find rare mutations co-located with well-known functional mutation hotspots, in members of homologous domain superfamilies, and we detect novel mutation hotspots in domain families previously unconnected with cancer. The results of this analysis can be accessed through the MOKCa database (http://strubiol.icr.ac.uk/extra/MOKCa).
Collapse
Affiliation(s)
- Hanadi Baeissa
- School of Life Sciences, University of Sussex, Falmer, Brighton, UK
| | | | | | | |
Collapse
|
8
|
Abstract
Twenty years after their discovery, knots in proteins are now quite well understood. They are believed to be functionally advantageous and provide extra stability to protein chains. In this work, we go one step further and search for links-entangled structures, more complex than knots, which consist of several components. We derive conditions that proteins need to meet to be able to form links. We search through the entire Protein Data Bank and identify several sequentially nonhomologous chains that form a Hopf link and a Solomon link. We relate topological properties of these proteins to their function and stability and show that the link topology is characteristic of eukaryotes only. We also explain how the presence of links affects the folding pathways of proteins. Finally, we define necessary conditions to form Borromean rings in proteins and show that no structure in the Protein Data Bank forms a link of this type.
Collapse
Affiliation(s)
- Pawel Dabrowski-Tumanski
- Faculty of Chemistry, University of Warsaw, 02-093, Warsaw, Poland
- Centre of New Technologies, University of Warsaw, 02-097, Warsaw, Poland
| | - Joanna I Sulkowska
- Faculty of Chemistry, University of Warsaw, 02-093, Warsaw, Poland;
- Centre of New Technologies, University of Warsaw, 02-097, Warsaw, Poland
| |
Collapse
|
9
|
The CWB2 Cell Wall-Anchoring Module Is Revealed by the Crystal Structures of the Clostridium difficile Cell Wall Proteins Cwp8 and Cwp6. Structure 2017; 25:514-521. [PMID: 28132783 DOI: 10.1016/j.str.2016.12.018] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 08/10/2016] [Accepted: 12/30/2016] [Indexed: 11/21/2022]
Abstract
Bacterial cell wall proteins play crucial roles in cell survival, growth, and environmental interactions. In Gram-positive bacteria, cell wall proteins include several types that are non-covalently attached via cell wall binding domains. Of the two conserved surface-layer (S-layer)-anchoring modules composed of three tandem SLH or CWB2 domains, the latter have so far eluded structural insight. The crystal structures of Cwp8 and Cwp6 reveal multi-domain proteins, each containing an embedded CWB2 module. It consists of a triangular trimer of Rossmann-fold CWB2 domains, a feature common to 29 cell wall proteins in Clostridium difficile 630. The structural basis of the intact module fold necessary for its binding to the cell wall is revealed. A comparison with previously reported atomic force microscopy data of S-layers suggests that C. difficile S-layers are complex oligomeric structures, likely composed of several different proteins.
Collapse
|
10
|
Vyas R, Bapat S, Jain E, Karthikeyan M, Tambe S, Kulkarni BD. Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis. Comput Biol Chem 2016; 65:37-44. [DOI: 10.1016/j.compbiolchem.2016.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 09/07/2016] [Accepted: 09/19/2016] [Indexed: 01/06/2023]
|
11
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
12
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 1820] [Impact Index Per Article: 227.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
13
|
Sun M, Wang X, Zou C, He Z, Liu W, Li H. Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors. BMC Bioinformatics 2016; 17:231. [PMID: 27266516 PMCID: PMC4897909 DOI: 10.1186/s12859-016-1110-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 06/02/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. RESULTS In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. CONCLUSIONS The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Collapse
Affiliation(s)
- Meijian Sun
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Xia Wang
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Chuanxin Zou
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Zenghui He
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Wei Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Mei Long Road, Shanghai, 200237, China.
| |
Collapse
|
14
|
Kumari A, Kanchan S, Sinha RP, Kesheri M. Applications of Bio-molecular Databases in Bioinformatics. MEDICAL IMAGING IN CLINICAL APPLICATIONS 2016. [DOI: 10.1007/978-3-319-33793-7_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
15
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
16
|
Zolfaghari Emameh R, Kuuslahti M, Vullo D, Barker HR, Supuran CT, Parkkila S. Ascaris lumbricoides β carbonic anhydrase: a potential target enzyme for treatment of ascariasis. Parasit Vectors 2015; 8:479. [PMID: 26385556 PMCID: PMC4575479 DOI: 10.1186/s13071-015-1098-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 09/15/2015] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND A parasitic roundworm, Ascaris lumbricoides, is the causative agent of ascariasis, with approximately 760 million cases around the world. Helminthic infections occur with a high prevalence mostly in tropical and developing xcountries. Therefore, design of affordable broad-spectrum anti-helminthic agents against a variety of pathogens, including not only A. lumbricoides but also hookworms and whipworms, is desirable. Beta carbonic anhydrases (β-CAs) are considered promising targets of novel anthelminthics because these enzymes are present in various parasites, while completely absent in vertebrates. METHODS In this study, we identified an A. lumbricoides β-CA (AIBCA) protein from protein sequence data using bioinformatics tools. We used computational biology resources and methods (including InterPro, CATH/Gene3D, KEGG, and METACYC) to analyze AlBCA and define potential roles of this enzyme in biological pathways. The AlBCA gene was cloned into pFastBac1, and recombinant AIBCA was produced in sf-9 insect cells. Kinetics of AlBCA were analyzed by a stopped-flow method. RESULTS Multiple sequence alignment revealed that AIBCA contains the two sequence motifs, CXDXR and HXXC, typical for β-CAs. Recombinant AIBCA showed significant CA catalytic activity with kcat of 6.0 × 10(5) s(-1) and kcat/KM of 4.3 × 10(7) M(-1) s(-1). The classical CA inhibitor, acetazolamide, showed an inhibition constant of 84.1 nM. Computational modeling suggests that the molecular architecture of AIBCA is highly similar to several other known β-CA structures. Functional predictions suggest that AIBCA might play a role in bicarbonate-mediated metabolic pathways, such as gluconeogenesis and removal of metabolically produced cyanate. CONCLUSIONS These results open new avenues to further investigate the precise functions of β-CAs in parasites and suggest that novel β-CA specific inhibitors should be developed and tested against helminthic diseases.
Collapse
Affiliation(s)
- Reza Zolfaghari Emameh
- Department of Anatomy, School of Medicine, University of Tampere, Tampere, Finland.
- BioMediTech, University of Tampere, Tampere, Finland.
- Fimlab Laboratories Ltd and Tampere University Hospital, Tampere, Finland.
| | - Marianne Kuuslahti
- Department of Anatomy, School of Medicine, University of Tampere, Tampere, Finland.
| | - Daniela Vullo
- Dipartimento di Chimica, Laboratorio di Chimica Bioinorganica, Universita' degli Studi di Firenze, Sesto Fiorentino, Firenze, Italy.
- Neurofarba Department, Sezione di Scienze Farmaceutiche e Nutraceutiche, Universita' degli Studi di Firenze, Sesto Fiorentino, Firenze, Italy.
| | - Harlan R Barker
- Department of Anatomy, School of Medicine, University of Tampere, Tampere, Finland.
| | - Claudiu T Supuran
- Dipartimento di Chimica, Laboratorio di Chimica Bioinorganica, Universita' degli Studi di Firenze, Sesto Fiorentino, Firenze, Italy.
- Neurofarba Department, Sezione di Scienze Farmaceutiche e Nutraceutiche, Universita' degli Studi di Firenze, Sesto Fiorentino, Firenze, Italy.
| | - Seppo Parkkila
- Department of Anatomy, School of Medicine, University of Tampere, Tampere, Finland.
- Fimlab Laboratories Ltd and Tampere University Hospital, Tampere, Finland.
| |
Collapse
|
17
|
Hu J, Zhang X, Liu X, Tang J. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification. Comput Biol Med 2015; 61:127-37. [PMID: 25899802 DOI: 10.1016/j.compbiomed.2015.03.022] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 03/19/2015] [Accepted: 03/20/2015] [Indexed: 11/25/2022]
Abstract
Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions.
Collapse
Affiliation(s)
- Jing Hu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, Hubei, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, Hubei, China
| | - Xiaolong Zhang
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, Hubei, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, Hubei, China.
| | - Xiaoming Liu
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, Hubei, China; Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, Hubei, China
| | - Jinshan Tang
- School of Technology, Michigan Technological University, Houghton, MI 49931, USA.
| |
Collapse
|
18
|
Abstract
Much of the biochemistry that underlies health, medicine, and numerous biotechnology applications is regulated by proteins, whereby the ability of proteins to effect such processes is dictated by the three-dimensional structural assembly of the proteins. Thus, a detailed understanding of biochemistry requires not only knowledge of the constituent sequence of proteins, but also a detailed understanding of how that sequence folds spatially. Three-dimensional analysis of protein structures is thus proving to be a critical mode of biological and medical discovery in the early twenty-first century, providing fundamental insight into function that produces useful biochemistry and dysfunction that leads to disease. The large number of distinct proteins precludes rigorous laboratory characterization of the complete structural proteome, but fortunately efficient in silico structure prediction is possible for many proteins that have not been experimentally characterized. One technique that continues to provide accurate and efficient protein structure predictions, called comparative modeling, has become a critical tool in many biological disciplines. The discussion herein is an updated version of a previous 2008 treatise focusing on the general philosophy of comparative modeling methods and on specific strategies for successfully achieving reliable and accurate models. The chapter discusses basic aspects of template selection, sequence alignment, spatial alignment, loop and gap modeling, side chain modeling, structural refinement and validation, and provides an important new discussion on automated computational tools for protein structure prediction.
Collapse
|
19
|
Structural protein reorganization and fold emergence investigated through amino acid sequence permutations. Amino Acids 2014; 47:147-52. [PMID: 25331423 DOI: 10.1007/s00726-014-1849-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 09/29/2014] [Indexed: 10/24/2022]
Abstract
Correlation between random amino acid sequences and protein folds suggests that proteins autonomously evolved the most stable folds, with stability and function evolving subsequently, suggesting the existence of common protein ancestors from which all modern proteins evolved. To test this hypothesis, we shuffled the sequences of 10 natural proteins and obtained 40 different and apparently unrelated folds. Our results suggest that shuffled sequences are sufficiently stable and may act as a basis to evolve functional proteins. The common secondary structure of modern proteins is well represented by a small set of permuted sequences, which also show the emergence of intrinsic disorder and aggregation-prone stretches of the polypeptide chain.
Collapse
|
20
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
21
|
Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, Wang X, Mort M, Stenson PD, Cooper DN, Yu H. Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks. Hum Mutat 2014; 35:585-93. [PMID: 24599843 DOI: 10.1002/humu.22534] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 02/14/2014] [Indexed: 01/24/2023]
Abstract
With the rapid growth of structural genomics, numerous protein crystal structures have become available. However, the parallel increase in knowledge of the functional principles underlying biological processes, and more specifically the underlying molecular mechanisms of disease, has been less dramatic. This notwithstanding, the study of complex cellular networks has made possible the inference of protein functions on a large scale. Here, we combine the scale of network systems biology with the resolution of traditional structural biology to generate a large-scale atomic-resolution interactome-network comprising 3,398 interactions between 2,890 proteins with a well-defined interaction interface and interface residues for each interaction. Within the framework of this atomic-resolution network, we have explored the structural principles underlying variations causing human-inherited disease. We find that in-frame pathogenic variations are enriched at both the interface and in the interacting domain, suggesting that variations not only at interface "hot-spots," but in the entire interacting domain can result in alterations of interactions. Further, the sites of pathogenic variations are closely related to the biophysical strength of the interactions they perturb. Finally, we show that biochemical alterations consequent to these variations are considerably more disruptive than evolutionary changes, with the most significant alterations at the protein interaction interface.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Wang Q, Yan J, Li X. Protein fold recognition based on functional domain composition. Comput Biol Chem 2014; 48:71-6. [PMID: 24412838 DOI: 10.1016/j.compbiolchem.2013.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 12/09/2013] [Indexed: 11/17/2022]
Abstract
Recognition of protein fold types is an important step in protein structure and function predictions and is also an important method in protein sequence-structure research. Protein fold type reflects the topological pattern of the structure's core. Now there are three methods of protein structure prediction, comparative modeling, fold recognition and de novo prediction. Since comparative modeling is limited by sequence similarity and there is too much workload in de novo prediction, fold recognition has the greatest potential. In order to improve recognition accuracy, a recognition method based on functional domain composition is proposed in this paper. This article focuses on the 124 fold types which have more than 2 samples in LIFCA database. We apply the functional domain composition to predict the fold types of a protein or a domain. In order to evaluate our method and its sensibility to the samples involving SCOP family divided, we tested our results from different aspects. The average sensitivity, specificity and Matthew's correlation coefficient (MCC) of the 124 fold types were found to be 94.58%, 99.96% and 0.91, respectively. Our results indicate that the functional domain composition method is a very promising method for protein fold recognition. And though based on simple classification rules, LIFCA database can grasp the functional features of different proteins, reflecting the corresponding relation between protein structure and function.
Collapse
Affiliation(s)
- Qin Wang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Jinli Yan
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, People's Republic of China
| | - Xiaoqin Li
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, People's Republic of China.
| |
Collapse
|
23
|
|
24
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
25
|
Chen YC, Sargsyan K, Wright JD, Huang YS, Lim C. Identifying RNA-binding residues based on evolutionary conserved structural and energetic features. Nucleic Acids Res 2013; 42:e15. [PMID: 24343026 PMCID: PMC3919582 DOI: 10.1093/nar/gkt1299] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Increasing numbers of protein structures are solved each year, but many of these structures belong to proteins whose sequences are homologous to sequences in the Protein Data Bank. Nevertheless, the structures of homologous proteins belonging to the same family contain useful information because functionally important residues are expected to preserve physico-chemical, structural and energetic features. This information forms the basis of our method, which detects RNA-binding residues of a given RNA-binding protein as those residues that preserve physico-chemical, structural and energetic features in its homologs. Tests on 81 RNA-bound and 35 RNA-free protein structures showed that our method yields a higher fraction of true RNA-binding residues (higher precision) than two structure-based and two sequence-based machine-learning methods. Because the method requires no training data set and has no parameters, its precision does not degrade when applied to 'novel' protein sequences unlike methods that are parameterized for a given training data set. It was used to predict the 'unknown' RNA-binding residues in the C-terminal RNA-binding domain of human CPEB3. The two predicted residues, F430 and F474, were experimentally verified to bind RNA, in particular F430, whose mutation to alanine or asparagine nearly abolished RNA binding. The method has been implemented in a webserver called DR_bind1, which is freely available with no login requirement at http://drbind.limlab.ibms.sinica.edu.tw.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan and Department of Chemistry, National Tsing Hua University, Hsinchu 300, Taiwan
| | | | | | | | | |
Collapse
|
26
|
Rappoport N, Linial M. Functional inference by ProtoNet family tree: the uncharacterized proteome of Daphnia pulex. BMC Bioinformatics 2013; 14 Suppl 3:S11. [PMID: 23514195 PMCID: PMC3584848 DOI: 10.1186/1471-2105-14-s3-s11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background Daphnia pulex (Water flea) is the first fully sequenced crustacean genome. The crustaceans and insects have diverged from a common ancestor. It is a model organism for studying the molecular makeup for coping with the environmental challenges. In the complete proteome, there are 30,550 putative proteins. However, about 10,000 of them have no known homologues. Currently, the UniProtoKB reports on 95% of the Daphnia's proteins as putative and uncharacterized proteins. Results We have applied ProtoNet, an unsupervised hierarchical protein clustering method that covers about 10 million sequences, for automatic annotation of the Daphnia's proteome. 98.7% (26,625) of the Daphnia full-length proteins were successfully mapped to 13,880 ProtoNet stable clusters, and only 1.3% remained unmapped. We compared the properties of the Daphnia's protein families with those of the mouse and the fruitfly proteomes. Functional annotations were successfully assigned for 86% of the proteins. Most proteins (61%) were mapped to only 2953 clusters that contain Daphnia's duplicated genes. We focused on the functionality of maximally amplified paralogs. Cuticle structure components and a variety of ion channels protein families were associated with a maximal level of gene amplification. We focused on gene amplification as a leading strategy of the Daphnia in coping with environmental toxicity. Conclusions Automatic inference is achieved through mapping of sequences to the protein family tree of ProtoNet 6.0. Applying a careful inference protocol resulted in functional assignments for over 86% of the complete proteome. We conclude that the scaffold of ProtoNet can be used as an alignment-free protocol for large-scale annotation task of uncharacterized proteomes.
Collapse
Affiliation(s)
- Nadav Rappoport
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | | |
Collapse
|
27
|
Dey F, Cliff Zhang Q, Petrey D, Honig B. Toward a "structural BLAST": using structural relationships to infer function. Protein Sci 2013; 22:359-66. [PMID: 23349097 DOI: 10.1002/pro.2225] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Revised: 01/17/2013] [Accepted: 01/17/2013] [Indexed: 02/05/2023]
Abstract
We outline a set of strategies to infer protein function from structure. The overall approach depends on extensive use of homology modeling, the exploitation of a wide range of global and local geometric relationships between protein structures and the use of machine learning techniques. The combination of modeling with broad searches of protein structure space defines a "structural BLAST" approach to infer function with high genomic coverage. Applications are described to the prediction of protein-protein and protein-ligand interactions. In the context of protein-protein interactions, our structure-based prediction algorithm, PrePPI, has comparable accuracy to high-throughput experiments. An essential feature of PrePPI involves the use of Bayesian methods to combine structure-derived information with non-structural evidence (e.g. co-expression) to assign a likelihood for each predicted interaction. This, combined with a structural BLAST approach significantly expands the range of applications of protein structure in the annotation of protein function, including systems level biological applications where it has previously played little role.
Collapse
Affiliation(s)
- Fabian Dey
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics and Initiative in Systems Biology, Columbia University, New York, New York 10032, USA
| | | | | | | |
Collapse
|
28
|
Micheletti C. Comparing proteins by their internal dynamics: exploring structure-function relationships beyond static structural alignments. Phys Life Rev 2012. [PMID: 23199577 DOI: 10.1016/j.plrev.2012.10.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The growing interest for comparing protein internal dynamics owes much to the realisation that protein function can be accompanied or assisted by structural fluctuations and conformational changes. Analogously to the case of functional structural elements, those aspects of protein flexibility and dynamics that are functionally oriented should be subject to evolutionary conservation. Accordingly, dynamics-based protein comparisons or alignments could be used to detect protein relationships that are more elusive to sequence and structural alignments. Here we provide an account of the progress that has been made in recent years towards developing and applying general methods for comparing proteins in terms of their internal dynamics and advance the understanding of the structure-function relationship.
Collapse
Affiliation(s)
- Cristian Micheletti
- Scuola Internazionale Superiore di Studi Avanzati, via Bonomea 265, Trieste, Italy.
| |
Collapse
|
29
|
Messih MA, Chitale M, Bajic VB, Kihara D, Gao X. Protein domain recurrence and order can enhance prediction of protein functions. Bioinformatics 2012; 28:i444-i450. [PMID: 22962465 PMCID: PMC3436825 DOI: 10.1093/bioinformatics/bts398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference. RESULTS We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions. AVAILABILITY The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx. CONTACT dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Mario Abdel Messih
- Mathematical and Computer Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | | | | | | | | |
Collapse
|
30
|
Molecular characterization of an α-N-acetylgalactosaminidase from Clonorchis sinensis. Parasitol Res 2012; 111:2149-56. [PMID: 22926676 DOI: 10.1007/s00436-012-3063-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 07/24/2012] [Indexed: 10/28/2022]
Abstract
The α-N-acetylgalactosaminidase (α-NAGAL) is an exoglycosidase that selectively cleaves terminal α-linked N-acetylgalactosamines from a variety of sugar chains. A complementary DNA (cDNA) clone encoding a novel Clonorchis sinensis α-NAGAL (Cs-α-NAGAL) was identified in the expressed sequence tags database of the adult C. sinensis liver fluke. The complete coding sequence was 1,308 bp long and encoded a 436-residue protein. The selected glycosidase was manually curated as α-NAGAL (EC 3.2.1.49) based on a composite bioinformatics analysis including a search for orthologues, comparative structure modeling, and the generation of a phylogenetic tree. One orthologue of Cs-α-NAGAL was the Rattus norvegicus α-NAGAL (accession number: NP_001012120) that does not exist in C. sinensis. Cs-α-NAGAL belongs to the GH27 family and the GH-D clan. A phylogenetic analysis revealed that the GH27 family of Cs-α-NAGAL was distinct from GH31 and GH36 within the GH-D clan. The putative 3D structure of Cs-α-NAGAL was built using SWISS-MODEL with a Gallus gallus α-NAGAL template (PDB code 1ktb chain A); this model demonstrated the superimposition of a TIM barrel fold (α/β) structure and substrate binding pocket. Cs-α-NAGAL transcripts were detected in the adult worm and egg cDNA libraries of C. sinensis but not in the metacercaria. Recombinant Cs-α-NAGAL (rCs-α-NAGAL) was expressed in Escherichia coli, and the purified rCs-α-NAGAL was recognized specifically by the C. sinensis-infected human sera. This is the first report of an α-NAGAL protein in the Trematode class, suggesting that it is a potential diagnostic or vaccine candidate with strong antigenicity.
Collapse
|
31
|
Sola-Carvajal A, García-García MI, García-Carmona F, Sánchez-Ferrer Á. Insights into the evolution of sorbitol metabolism: phylogenetic analysis of SDR196C family. BMC Evol Biol 2012; 12:147. [PMID: 22899811 PMCID: PMC3458964 DOI: 10.1186/1471-2148-12-147] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Accepted: 08/08/2012] [Indexed: 11/17/2022] Open
Abstract
Background Short chain dehydrogenases/reductases (SDR) are NAD(P)(H)-dependent oxidoreductases with a highly conserved 3D structure and of an early origin, which has allowed them to diverge into several families and enzymatic activities. The SDR196C family (http://www.sdr-enzymes.org) groups bacterial sorbitol dehydrogenases (SDH), which are of great industrial interest. In this study, we examine the phylogenetic relationship between the members of this family, and based on the findings and some sequence conserved blocks, a new and a more accurate classification is proposed. Results The distribution of the 66 bacterial SDH species analyzed was limited to Gram-negative bacteria. Six different bacterial families were found, encompassing α-, β- and γ-proteobacteria. This broad distribution in terms of bacteria and niches agrees with that of SDR, which are found in all forms of life. A cluster analysis of sorbitol dehydrogenase revealed different types of gene organization, although with a common pattern in which the SDH gene is surrounded by sugar ABC transporter proteins, another SDR, a kinase, and several gene regulators. According to the obtained trees, six different lineages and three sublineages can be discerned. The phylogenetic analysis also suggested two different origins for SDH in β-proteobacteria and four origins for γ-proteobacteria. Finally, this subdivision was further confirmed by the differences observed in the sequence of the conserved blocks described for SDR and some specific blocks of SDH, and by a functional divergence analysis, which made it possible to establish new consensus sequences and specific fingerprints for the lineages and sub lineages. Conclusion SDH distribution agrees with that observed for SDR, indicating the importance of the polyol metabolism, as an alternative source of carbon and energy. The phylogenetic analysis pointed to six clearly defined lineages and three sub lineages, and great variability in the origin of this gene, despite its well conserved 3D structure. This suggests that SDH are very old and emerged early during the evolution. This study also opens up a new and more accurate classification of SDR196C family, introducing two numbers at the end of the family name, which indicate the lineage and the sublineage of each member, i.e, SDR196C6.3.
Collapse
Affiliation(s)
- Agustín Sola-Carvajal
- Department of Biochemistry and Molecular Biology-A, Faculty of Biology, Regional Campus of International Excellence Campus Mare Nostrum, University of Murcia, Campus Espinardo, Murcia E-30100, Spain
| | | | | | | |
Collapse
|
32
|
Chen YC, Wright JD, Lim C. DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry. Nucleic Acids Res 2012; 40:W249-56. [PMID: 22661576 PMCID: PMC3394278 DOI: 10.1093/nar/gks481] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
DR_bind is a web server that automatically predicts DNA-binding residues, given the respective protein structure based on (i) electrostatics, (ii) evolution and (iii) geometry. In contrast to machine-learning methods, DR_bind does not require a training data set or any parameters. It predicts DNA-binding residues by detecting a cluster of conserved, solvent-accessible residues that are electrostatically stabilized upon mutation to Asp−/Glu−. The server requires as input the DNA-binding protein structure in PDB format and outputs a downloadable text file of the predicted DNA-binding residues, a 3D visualization of the predicted residues highlighted in the given protein structure, and a downloadable PyMol script for visualization of the results. Calibration on 83 and 55 non-redundant DNA-bound and DNA-free protein structures yielded a DNA-binding residue prediction accuracy/precision of 90/47% and 88/42%, respectively. Since DR_bind does not require any training using protein–DNA complex structures, it may predict DNA-binding residues in novel structures of DNA-binding proteins resulting from structural genomics projects with no conservation data. The DR_bind server is freely available with no login requirement at http://dnasite.limlab.ibms.sinica.edu.tw.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Genomics Research Center, Academia Sinica, Taipei 115, Taiwan
| | | | | |
Collapse
|
33
|
Džunková M, D’Auria G, Pérez-Villarroya D, Moya A. Hybrid sequencing approach applied to human fecal metagenomic clone libraries revealed clones with potential biotechnological applications. PLoS One 2012; 7:e47654. [PMID: 23082187 PMCID: PMC3474745 DOI: 10.1371/journal.pone.0047654] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 09/14/2012] [Indexed: 02/07/2023] Open
Abstract
Natural environments represent an incredible source of microbial genetic diversity. Discovery of novel biomolecules involves biotechnological methods that often require the design and implementation of biochemical assays to screen clone libraries. However, when an assay is applied to thousands of clones, one may eventually end up with very few positive clones which, in most of the cases, have to be "domesticated" for downstream characterization and application, and this makes screening both laborious and expensive. The negative clones, which are not considered by the selected assay, may also have biotechnological potential; however, unfortunately they would remain unexplored. Knowledge of the clone sequences provides important clues about potential biotechnological application of the clones in the library; however, the sequencing of clones one-by-one would be very time-consuming and expensive. In this study, we characterized the first metagenomic clone library from the feces of a healthy human volunteer, using a method based on 454 pyrosequencing coupled with a clone-by-clone Sanger end-sequencing. Instead of whole individual clone sequencing, we sequenced 358 clones in a pool. The medium-large insert (7-15 kb) cloning strategy allowed us to assemble these clones correctly, and to assign the clone ends to maintain the link between the position of a living clone in the library and the annotated contig from the 454 assembly. Finally, we found several open reading frames (ORFs) with previously described potential medical application. The proposed approach allows planning ad-hoc biochemical assays for the clones of interest, and the appropriate sub-cloning strategy for gene expression in suitable vectors/hosts.
Collapse
Affiliation(s)
- Mária Džunková
- Joint Unit of Research in Genomics and Health, Centre for Public Health Research (CSISP) - Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain
- CIBER en Epidemiología y Salud Pública (CIBEResp), Madrid, Spain
| | - Giuseppe D’Auria
- Joint Unit of Research in Genomics and Health, Centre for Public Health Research (CSISP) - Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain
- CIBER en Epidemiología y Salud Pública (CIBEResp), Madrid, Spain
- * E-mail:
| | - David Pérez-Villarroya
- Joint Unit of Research in Genomics and Health, Centre for Public Health Research (CSISP) - Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain
| | - Andrés Moya
- Joint Unit of Research in Genomics and Health, Centre for Public Health Research (CSISP) - Cavanilles Institute for Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain
- CIBER en Epidemiología y Salud Pública (CIBEResp), Madrid, Spain
| |
Collapse
|
34
|
Abstract
Annotation of prokaryotic sequences can be separated into structural and functional annotation. Structural annotation is dependent on algorithmic interrogation of experimental evidence to discover the physical characteristics of a gene. This is done in an effort to construct accurate gene models, so understanding function or evolution of genes among organisms is not impeded. Functional annotation is dependent on sequence similarity to other known genes or proteins in an effort to assess the function of the gene. Combining structural and functional annotation across genomes in a comparative manner promotes higher levels of accurate annotation as well as an advanced understanding of genome evolution. As the availability of bacterial sequences increases and annotation methods improve, the value of comparative annotation will increase.
Collapse
Affiliation(s)
- Nicholas Beckloff
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | | | | |
Collapse
|
35
|
Rappoport N, Karsenty S, Stern A, Linial N, Linial M. ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. Nucleic Acids Res 2011; 40:D313-20. [PMID: 22121228 PMCID: PMC3245180 DOI: 10.1093/nar/gkr1027] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottom–up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9 million proteins of which 5.5% come from UniProtKB/SwissProt and the rest from UniProtKB/TrEMBL. The hierarchical tree structure is based on an all-against-all comparison of 2.5 million representatives of UniRef50. Rigorous annotation-based quality tests prune the tree to most informative 162 088 clusters. Every high-quality cluster is assigned a ProtoName that reflects the most significant annotations of its proteins. These annotations are dominated by GO terms, UniProt/Swiss-Prot keywords and InterPro. ProtoNet 6.0 operates in a default mode. When used in the advanced mode, this data structure offers the user a view of the family tree at any desired level of resolution. Systematic comparisons with previous versions of ProtoNet are carried out. They show how our view of protein families evolves, as larger parts of the sequence space become known. ProtoNet 6.0 provides numerous tools to navigate the hierarchy of clusters.
Collapse
Affiliation(s)
- Nadav Rappoport
- School of Computer Science and Engineering, Institute of Life Sciences, The Sudarsky Center for Computational Biology, The Hebrew University of Jerusalem, 91904 Israel
| | | | | | | | | |
Collapse
|
36
|
Chakraborty A, Ghosh S, Chowdhary G, Maulik U, Chakrabarti S. DBETH: a Database of Bacterial Exotoxins for Human. Nucleic Acids Res 2011; 40:D615-20. [PMID: 22102573 PMCID: PMC3244994 DOI: 10.1093/nar/gkr942] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Pathogenic bacteria produce protein toxins to survive in the hostile environments defined by the host's defense systems and immune response. Recent progresses in high-throughput genome sequencing and structure determination techniques have contributed to a better understanding of mechanisms of action of the bacterial toxins at the cellular and molecular levels leading to pathogenicity. It is fair to assume that with time more and more unknown toxins will emerge not only by the discovery of newer species but also due to the genetic rearrangement of existing bacterial genomes. Hence, it is crucial to organize a systematic compilation and subsequent analyses of the inherent features of known bacterial toxins. We developed a Database for Bacterial ExoToxins (DBETH, http://www.hpppi.iicb.res.in/btox/), which contains sequence, structure, interaction network and analytical results for 229 toxins categorized within 24 mechanistic and activity types from 26 bacterial genuses. The main objective of this database is to provide a comprehensive knowledgebase for human pathogenic bacterial toxins where various important sequence, structure and physico-chemical property based analyses are provided. Further, we have developed a prediction server attached to this database which aims to identify bacterial toxin like sequences either by establishing homology with known toxin sequences/domains or by classifying bacterial toxin specific features using a support vector based machine learning techniques.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Department of Structural Biology and Bioinformatics Division, Indian Institute of Chemical Biology, Council for Scientific and Industrial Research, Jadavpur University, Kolkata, WB 700 032, India
| | | | | | | | | |
Collapse
|
37
|
Stivala A, Wybrow M, Wirth A, Whisstock JC, Stuckey PJ. Automatic generation of protein structure cartoons with Pro-origami. Bioinformatics 2011; 27:3315-6. [PMID: 21994221 DOI: 10.1093/bioinformatics/btr575] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Protein topology diagrams are 2D representations of protein structure that are particularly useful in understanding and analysing complex protein folds. Generating such diagrams presents a major problem in graph drawing, with automatic approaches often resulting in errors or uninterpretable results. Here we apply a breakthrough in diagram layout to protein topology cartoons, providing clear, accurate, interactive and editable diagrams, which are also an interface to a structural search method. AVAILABILITY Pro-origami is available via a web server at http://munk.csse.unimelb.edu.au/pro-origami CONTACT a.stivala@pgrad.unimelb.edu.au; pjs@csse.unimelb.edu.au.
Collapse
Affiliation(s)
- Alex Stivala
- Department of Computer Science and Software Engineering, The University of Melbourne Parkville Campus, Victoria 3010, Australia.
| | | | | | | | | |
Collapse
|
38
|
Suvorova YM, Rudenko VM, Korotkov EV. Detection change points of triplet periodicity of gene. Gene 2011; 491:58-64. [PMID: 21982972 DOI: 10.1016/j.gene.2011.08.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2011] [Revised: 08/10/2011] [Accepted: 08/25/2011] [Indexed: 10/17/2022]
Abstract
The triplet periodicity (TP) is a distinguished property of protein coding sequences. There are complex genes with more than one TP type along their sequence. We say that these genes contain a triplet periodicity change point. The aim of the work is to find all genes that contain TP change point and attempt to compare the positions of change point in genes with known biological data. We have developed a mathematical method to identify triplet periodicity changes along a sequence. We have found 311,221 genes with the TP change point in the KEGG/Genes database (version 48). It is about 8% from the total database volume (4013150). We showed that the repetitive sequences are not the only cause of such events. We suppose that the TP change point may indicate a fusion of genes or domains. We performed BLAST analysis to find potential ancestral genes for the parts of genes with TP change point. As a result we found that in 131323 cases sequences with TP change point have proper similarities for one or both parts. The relationship between TP change point and the fusion events in genes is discussed. The program realization of the method is available by request to authors.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Bioinfomatics Laboratory, Centre of Bioengineering, Russian Academy of Sciences, 117312, Moscow, Prospect 60-tya Oktyabrya, 7/1, Russia.
| | | | | |
Collapse
|
39
|
Tarrío R, Ayala FJ, Rodríguez-Trelles F. The Vein Patterning 1 (VEP1) gene family laterally spread through an ecological network. PLoS One 2011; 6:e22279. [PMID: 21818306 PMCID: PMC3144213 DOI: 10.1371/journal.pone.0022279] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Accepted: 06/18/2011] [Indexed: 11/23/2022] Open
Abstract
Lateral gene transfer (LGT) is a major evolutionary mechanism in prokaryotes. Knowledge about LGT— particularly, multicellular— eukaryotes has only recently started to accumulate. A widespread assumption sees the gene as the unit of LGT, largely because little is yet known about how LGT chances are affected by structural/functional features at the subgenic level. Here we trace the evolutionary trajectory of VEin Patterning 1, a novel gene family known to be essential for plant development and defense. At the subgenic level VEP1 encodes a dinucleotide-binding Rossmann-fold domain, in common with members of the short-chain dehydrogenase/reductase (SDR) protein family. We found: i) VEP1 likely originated in an aerobic, mesophilic and chemoorganotrophic α-proteobacterium, and was laterally propagated through nets of ecological interactions, including multiple LGTs between phylogenetically distant green plant/fungi-associated bacteria, and five independent LGTs to eukaryotes. Of these latest five transfers, three are ancient LGTs, implicating an ancestral fungus, the last common ancestor of land plants and an ancestral trebouxiophyte green alga, and two are recent LGTs to modern embryophytes. ii) VEP1's rampant LGT behavior was enabled by the robustness and broad utility of the dinucleotide-binding Rossmann-fold, which provided a platform for the evolution of two unprecedented departures from the canonical SDR catalytic triad. iii) The fate of VEP1 in eukaryotes has been different in different lineages, being ubiquitous and highly conserved in land plants, whereas fungi underwent multiple losses. And iv) VEP1-harboring bacteria include non-phytopathogenic and phytopathogenic symbionts which are non-randomly distributed with respect to the type of harbored VEP1 gene. Our findings suggest that VEP1 may have been instrumental for the evolutionary transition of green plants to land, and point to a LGT-mediated ‘Trojan Horse’ mechanism for the evolution of bacterial pathogenesis against plants. VEP1 may serve as tool for revealing microbial interactions in plant/fungi-associated environments.
Collapse
Affiliation(s)
- Rosa Tarrío
- Universidad de Santiago de Compostela, CIBERER, Genome Medicine Group, Santiago de Compostela, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco J. Ayala
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco Rodríguez-Trelles
- Grup de Biologia Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
40
|
Chaulk SG, Smith Frieday MN, Arthur DC, Culham DE, Edwards RA, Soo P, Frost LS, Keates RAB, Glover JNM, Wood JM. ProQ is an RNA chaperone that controls ProP levels in Escherichia coli. Biochemistry 2011; 50:3095-106. [PMID: 21381725 DOI: 10.1021/bi101683a] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Transporter ProP mediates osmolyte accumulation in Escherichia coli cells exposed to high osmolality media. The cytoplasmic ProQ protein amplifies ProP activity by an unknown mechanism. The N- and C-terminal domains of ProQ are predicted to be structurally similar to known RNA chaperone proteins FinO and Hfq from E. coli. Here we demonstrate that ProQ is an RNA chaperone, binding RNA and facilitating both RNA strand exchange and RNA duplexing. Experiments performed with the isolated ProQ domains showed that the FinO-like domain serves as a high-affinity RNA-binding domain, whereas the Hfq-like domain is largely responsible for RNA strand exchange and duplexing. These data suggest that ProQ may regulate ProP production. Transcription of proP proceeds from RpoD- and RpoS-dependent promoters. Lesions at proQ affected ProP levels in an osmolality- and growth phase-dependent manner, decreasing ProP levels when proP was expressed from its own chromosomal promoters or from a heterologous plasmid-based promoter. Small RNA molecules are known to regulate cellular levels of sigma factor RpoS. ProQ did not act by changing RpoS levels since proQ lesions did not influence RpoS-dependent stationary phase thermotolerance and they affected ProP production and activity similarly in bacteria without and with an rpoS defect. Taken together, these results suggest that ProQ does not regulate proP transcription. It may act as an RNA-binding protein to regulate proP translation.
Collapse
Affiliation(s)
- Steven G Chaulk
- Department of Biochemistry, School of Molecular and Systems Medicine, University of Alberta, Edmonton, Alberta, Canada T6G 2H7
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Abstract
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds-particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the most specialized user, dealing with specific families, diseases, structural features and so on.
Collapse
|
42
|
Meyer T, D'Abramo M, Hospital A, Rueda M, Ferrer-Costa C, Pérez A, Carrillo O, Camps J, Fenollosa C, Repchevsky D, Gelpí JL, Orozco M. MoDEL (Molecular Dynamics Extended Library): A Database of Atomistic Molecular Dynamics Trajectories. Structure 2010; 18:1399-409. [DOI: 10.1016/j.str.2010.07.013] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2010] [Revised: 07/19/2010] [Accepted: 07/27/2010] [Indexed: 11/26/2022]
|
43
|
Crystal structure of a novel non-Pfam protein PF2046 solved using low resolution B-factor sharpening and multi-crystal averaging methods. Protein Cell 2010; 1:453-8. [PMID: 21203960 DOI: 10.1007/s13238-010-0045-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 03/18/2010] [Indexed: 10/19/2022] Open
Abstract
Sometimes crystals cannot diffract X-rays beyond 3.0 Å resolution due to the intrinsic flexibility associated with the protein. Low resolution diffraction data not only pose a challenge to structure determination, but also hamper interpretation of mechanistic details. Crystals of a 25.6 kDa non-Pfam, hypothetical protein, PF2046, diffracted X-rays to 3.38 Å resolution. A combination of Se-Met derived heavy atom positions with multiple cycles of B-factor sharpening, multi-crystal averaging, restrained refinement followed by manual inspection of electron density and model building resulted in a final model with a R value of 23.5 (R(free)= 24.7). The asymmetric unit was large and consisted of six molecules arranged as a homodimer of trimers. Analysis of the structure revealed the presence of a RNA binding domain suggesting a role for PF2046 in the processing of nucleic acids.
Collapse
|
44
|
The bridge-region of the Ku superfamily is an atypical zinc ribbon domain. J Struct Biol 2010; 172:294-9. [PMID: 20580930 DOI: 10.1016/j.jsb.2010.05.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2010] [Accepted: 05/20/2010] [Indexed: 11/23/2022]
Abstract
Members of the Ku superfamily are DNA-end-binding proteins involved in non-homologous end-joining (NHEJ) DNA repair. The published crystal structure of human Ku-DNA complex reveals a heterodimer that forms a ring around dsDNA by means of the Ku core modules. These modules contain a highly conserved seven-stranded β-barrel, which in turn contains an insertion, termed the bridge-region, between its second and third β-strands. The bridge-region adopts an unusual β-strand-rich structure critical for dsDNA-binding and Ku function, but its provenance remains unclear. Here, we demonstrate that the bridge-region of Ku is a novel member of the diverse Zn-ribbon fold group. Sequence analysis reveals that Ku from several Gram-positive bacteria and bacteriophages retain metal-chelating motifs, whereas they have been lost in the versions from most other organisms. Structural comparisons suggest that the Zn-ribbon from Ku-bridge-region is the first example of a circularly permuted, segment-swapped Zn-ribbon. This finding helps explain how Ku is likely to bind DNA as an obligate dimer. Further, we hypothesize that retention of the unusual conformation of the turns of the Zn-ribbons, despite loss of the Zn-binding sites, provides clues regarding the mechanism by which the Ku-bridge-regions sense the DNA state.
Collapse
|
45
|
Paszkowski-Rogacz M, Slabicki M, Pisabarro MT, Buchholz F. PhenoFam-gene set enrichment analysis through protein structural information. BMC Bioinformatics 2010; 11:254. [PMID: 20478033 PMCID: PMC2881086 DOI: 10.1186/1471-2105-11-254] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 05/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the current technological advances in high-throughput biology, the necessity to develop tools that help to analyse the massive amount of data being generated is evident. A powerful method of inspecting large-scale data sets is gene set enrichment analysis (GSEA) and investigation of protein structural features can guide determining the function of individual genes. However, a convenient tool that combines these two features to aid in high-throughput data analysis has not been developed yet. In order to fill this niche, we developed the user-friendly, web-based application, PhenoFam. RESULTS PhenoFam performs gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Our tool is designed to analyse complete sets of results from quantitative high-throughput studies (gene expression microarrays, functional RNAi screens, etc.) without prior pre-filtering or hits-selection steps. PhenoFam utilizes Ensembl databases to link a list of user-provided identifiers with protein features from the InterPro database, and assesses whether results associated with individual domains differ significantly from the overall population. To demonstrate the utility of PhenoFam we analysed a genome-wide RNA interference screen and discovered a novel function of plexins containing the cytoplasmic RasGAP domain. Furthermore, a PhenoFam analysis of breast cancer gene expression profiles revealed a link between breast carcinoma and altered expression of PX domain containing proteins. CONCLUSIONS PhenoFam provides a user-friendly, easily accessible web interface to perform GSEA based on high-throughput data sets and structural-functional protein information, and therefore aids in functional annotation of genes.
Collapse
Affiliation(s)
- Maciej Paszkowski-Rogacz
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr, 108, 01307 Dresden, Germany.
| | | | | | | |
Collapse
|
46
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
47
|
Glasner ME, Gerlt JA, Babbitt PC. Mechanisms of protein evolution and their application to protein engineering. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:193-239, xii-xiii. [PMID: 17124868 DOI: 10.1002/9780471224464.ch3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein engineering holds great promise for the development of new biosensors, diagnostics, therapeutics, and agents for bioremediation. Despite some remarkable successes in experimental and computational protein design, engineered proteins rarely achieve the efficiency or specificity of natural enzymes. Current protein design methods utilize evolutionary concepts, including mutation, recombination, and selection, but the inability to fully recapitulate the success of natural evolution suggests that some evolutionary principles have not been fully exploited. One aspect of protein engineering that has received little attention is how to select the most promising proteins to serve as templates, or scaffolds, for engineering. Two evolutionary concepts that could provide a rational basis for template selection are the conservation of catalytic mechanisms and functional promiscuity. Knowledge of the catalytic motifs responsible for conserved aspects of catalysis in mechanistically diverse superfamilies could be used to identify promising templates for protein engineering. Second, protein evolution often proceeds through promiscuous intermediates, suggesting that templates which are naturally promiscuous for a target reaction could enhance protein engineering strategies. This review explores these ideas and alternative hypotheses concerning protein evolution and engineering. Future research will determine if application of these principles will lead to a protein engineering methodology governed by predictable rules for designing efficient, novel catalysts.
Collapse
Affiliation(s)
- Margaret E Glasner
- Department of Biopharmaceutical Sciences, University of California-San Francisco, San Francisco, CA 94143, USA
| | | | | |
Collapse
|
48
|
Cheng J, Lu TH, Liu CL, Lin JY. A biophysical elucidation for less toxicity of agglutinin than abrin-a from the seeds of Abrus precatorius in consequence of crystal structure. J Biomed Sci 2010; 17:34. [PMID: 20433687 PMCID: PMC2890655 DOI: 10.1186/1423-0127-17-34] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2010] [Accepted: 04/30/2010] [Indexed: 11/17/2022] Open
Abstract
X-ray crystal structure determination of agglutinin from abrus precatorius in Taiwan is presented. The crystal structure of agglutinin, a type II ribosome-inactivating protein (RIP) from the seeds of Abrus precatorius in Taiwan, has been determined from a novel crystalline form by the molecular replacement method using the coordinates of abrin-a as the template. The structure has space group P41212 with Z = 8, and been refined at 2.6 Å to R-factor of 20.4%. The root-mean-square deviations of bond lengths and angles from the standard values are 0.009 Å and 1.3°. Primary, secondary, tertiary and quaternary structures of agglutinin have been described and compared with those of abrin-a to a certain extent. In subsequent docking research, we found that Asn200 of abrin-a may form a critical hydrogen bond with G4323 of 28SRNA, while corresponding Pro199 of agglutinin is a kink hydrophobic residue bound with the cleft in a more compact complementary relationship. This may explain the lower toxicity of agglutinin than abrin-a, despite of similarity in secondary structure and the activity cleft of two RIPs.
Collapse
Affiliation(s)
- Jack Cheng
- Department of Physics, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | | | | | |
Collapse
|
49
|
Kahraman A, Morris RJ, Laskowski RA, Favia AD, Thornton JM. On the diversity of physicochemical environments experienced by identical ligands in binding pockets of unrelated proteins. Proteins 2010; 78:1120-36. [PMID: 19927322 DOI: 10.1002/prot.22633] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Most function prediction methods that identify cognate ligands from binding site analyses work on the assumption of molecular complementarity. These approaches build on the conjectured complementarity of geometrical and physicochemical properties between ligands and binding sites so that similar binding sites will bind similar ligands. We found that this assumption does not generally hold for protein-ligand interactions and observed that it is not the chemical composition of ligand molecules that dictates the complementarity between protein and ligand molecules, but that the ligand's share within the functional mechanism of a protein determines the degree of complementarity. Here, we present for a set of cognate ligands a descriptive analysis and comparison of the physicochemical properties that each ligand experiences in various nonhomologous binding pockets. The comparisons in each ligand set reveal large variations in their experienced physicochemical properties, suggesting that the same ligand can bind to distinct physicochemical environments. In some protein ligand complexes, the variation was found to correlate with the electrochemical characteristic of ligand molecules, whereas in others it was disclosed as a prerequisite for the biochemical function of the protein. To achieve binding, proteins were observed to engage in subtle balancing acts between electrostatic and hydrophobic interactions to generate stabilizing free energies of binding. For the presented analysis, a new method for scoring hydrophobicity from molecular environments was developed showing high correlations with experimental determined desolvation energies. The presented results highlight the complexities of molecular recognition and underline the challenges of computational structural biology in developing methods to detect these important subtleties.
Collapse
Affiliation(s)
- Abdullah Kahraman
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom.
| | | | | | | | | |
Collapse
|
50
|
Triviño JC, Pazos F. Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds. BMC SYSTEMS BIOLOGY 2010; 4:46. [PMID: 20406431 PMCID: PMC2883543 DOI: 10.1186/1752-0509-4-46] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 04/20/2010] [Indexed: 12/02/2022]
Abstract
Background Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures. Results In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism E. coli. Conclusions We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.
Collapse
Affiliation(s)
- Juan C Triviño
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain
| | | |
Collapse
|