1
|
Bordin N, Lau AM, Orengo C. Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins. Mol Cell 2023; 83:3950-3952. [PMID: 37977115 DOI: 10.1016/j.molcel.2023.10.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 10/27/2023] [Accepted: 10/27/2023] [Indexed: 11/19/2023]
Abstract
Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| | - Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| |
Collapse
|
2
|
Wilson IA, Stanfield RL. 50 Years of structural immunology. J Biol Chem 2021; 296:100745. [PMID: 33957119 PMCID: PMC8163984 DOI: 10.1016/j.jbc.2021.100745] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/24/2021] [Accepted: 04/30/2021] [Indexed: 12/12/2022] Open
Abstract
Fifty years ago, the first landmark structures of antibodies heralded the dawn of structural immunology. Momentum then started to build toward understanding how antibodies could recognize the vast universe of potential antigens and how antibody-combining sites could be tailored to engage antigens with high specificity and affinity through recombination of germline genes (V, D, J) and somatic mutation. Equivalent groundbreaking structures in the cellular immune system appeared some 15 to 20 years later and illustrated how processed protein antigens in the form of peptides are presented by MHC molecules to T cell receptors. Structures of antigen receptors in the innate immune system then explained their inherent specificity for particular microbial antigens including lipids, carbohydrates, nucleic acids, small molecules, and specific proteins. These two sides of the immune system act immediately (innate) to particular microbial antigens or evolve (adaptive) to attain high specificity and affinity to a much wider range of antigens. We also include examples of other key receptors in the immune system (cytokine receptors) that regulate immunity and inflammation. Furthermore, these antigen receptors use a limited set of protein folds to accomplish their various immunological roles. The other main players are the antigens themselves. We focus on surface glycoproteins in enveloped viruses including SARS-CoV-2 that enable entry and egress into host cells and are targets for the antibody response. This review covers what we have learned over the past half century about the structural basis of the immune response to microbial pathogens and how that information can be utilized to design vaccines and therapeutics.
Collapse
MESH Headings
- Adaptive Immunity
- Allergy and Immunology/history
- Animals
- Antibodies, Viral/chemistry
- Antibodies, Viral/genetics
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigen Presentation
- Antigens, Viral/chemistry
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- COVID-19/immunology
- COVID-19/virology
- Crystallography/history
- Crystallography/methods
- History, 20th Century
- History, 21st Century
- Humans
- Immunity, Innate
- Protein Folding
- Protein Interaction Domains and Motifs
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Receptors, Cytokine/chemistry
- Receptors, Cytokine/genetics
- Receptors, Cytokine/immunology
- SARS-CoV-2/immunology
- SARS-CoV-2/pathogenicity
- V(D)J Recombination
Collapse
Affiliation(s)
- Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA; The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California, USA.
| | - Robyn L Stanfield
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| |
Collapse
|
3
|
Retraction: Site‐specific recombination of nitrogen‐fixation genes in cyanobacteria by XisF–XisH–XisI complex: Structures and models, William C. Hwang, James W. Golden, Jaime Pascual, Dong Xu, Anton Cheltsov, Adam Godzik. Proteins 2018; 86:268. [PMID: 30338965 PMCID: PMC5094899 DOI: 10.1002/prot.24679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The above article from the Proteins: Structure, Function, and Bioinformatics, published online on 1 September 2014 in Wiley Online Library as Accepted Article (http://onlinelibrary.wiley.com/doi/10.1002/prot.24679/full), has been retracted by agreement between William C. Hwang, James W. Golden, Jaime Pascual, Dong Xu, Anton Cheltsov, Adam Godzik, the Editor‐in‐Chief, Bertrand E. Garcia‐Moreno, and Wiley Periodicals, Inc. The retraction has been agreed because submission was made without agreement from co‐author Adam Godzik.
Collapse
|
4
|
Functional classification of protein toxins as a basis for bioinformatic screening. Sci Rep 2017; 7:13940. [PMID: 29066768 PMCID: PMC5655178 DOI: 10.1038/s41598-017-13957-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 10/02/2017] [Indexed: 01/05/2023] Open
Abstract
Proteins are fundamental to life and exhibit a wide diversity of activities, some of which are toxic. Therefore, assessing whether a specific protein is safe for consumption in foods and feeds is critical. Simple BLAST searches may reveal homology to a known toxin, when in fact the protein may pose no real danger. Another challenge to answer this question is the lack of curated databases with a representative set of experimentally validated toxins. Here we have systematically analyzed over 10,000 manually curated toxin sequences using sequence clustering, network analysis, and protein domain classification. We also developed a functional sequence signature method to distinguish toxic from non-toxic proteins. The current database, combined with motif analysis, can be used by researchers and regulators in a hazard screening capacity to assess the potential of a protein to be toxic at early stages of development. Identifying key signatures of toxicity can also aid in redesigning proteins, so as to maintain their desirable functions while reducing the risk of potential health hazards.
Collapse
|
5
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
6
|
The impact of structural genomics: the first quindecennial. ACTA ACUST UNITED AC 2016; 17:1-16. [PMID: 26935210 DOI: 10.1007/s10969-016-9201-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 02/17/2016] [Indexed: 12/21/2022]
Abstract
The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.
Collapse
|
7
|
Deng K, Takasuka TE, Bianchetti CM, Bergeman LF, Adams PD, Northen TR, Fox BG. Use of Nanostructure-Initiator Mass Spectrometry to Deduce Selectivity of Reaction in Glycoside Hydrolases. Front Bioeng Biotechnol 2015; 3:165. [PMID: 26579511 PMCID: PMC4621489 DOI: 10.3389/fbioe.2015.00165] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/02/2015] [Indexed: 12/20/2022] Open
Abstract
Chemically synthesized nanostructure-initiator mass spectrometry (NIMS) probes derivatized with tetrasaccharides were used to study the reactivity of representative Clostridium thermocellum β-glucosidase, endoglucanases, and cellobiohydrolase. Diagnostic patterns for reactions of these different classes of enzymes were observed. Results show sequential removal of glucose by the β-glucosidase and a progressive increase in specificity of reaction from endoglucanases to cellobiohydrolase. Time-dependent reactions of these polysaccharide-selective enzymes were modeled by numerical integration, which provides a quantitative basis to make functional distinctions among a continuum of naturally evolved catalytic properties. Consequently, our method, which combines automated protein translation with high-sensitivity and time-dependent detection of multiple products, provides a new approach to annotate glycoside hydrolase phylogenetic trees with functional measurements.
Collapse
Affiliation(s)
- Kai Deng
- US Department of Energy Joint BioEnergy Institute , Emeryville, CA , USA ; Sandia National Laboratories , Livermore, CA , USA
| | - Taichi E Takasuka
- US Department of Energy Great Lakes Bioenergy Research Center , Madison, WI , USA
| | - Christopher M Bianchetti
- US Department of Energy Great Lakes Bioenergy Research Center , Madison, WI , USA ; Department of Chemistry, University of Wisconsin-Oshkosh , Oshkosh, WI , USA
| | - Lai F Bergeman
- US Department of Energy Great Lakes Bioenergy Research Center , Madison, WI , USA
| | - Paul D Adams
- US Department of Energy Joint BioEnergy Institute , Emeryville, CA , USA ; Lawrence Berkeley National Laboratory , Berkeley, CA , USA ; Department of Bioengineering, University of California Berkeley , Berkeley, CA , USA
| | - Trent R Northen
- US Department of Energy Joint BioEnergy Institute , Emeryville, CA , USA ; Lawrence Berkeley National Laboratory , Berkeley, CA , USA
| | - Brian G Fox
- US Department of Energy Great Lakes Bioenergy Research Center , Madison, WI , USA ; Department of Biochemistry, University of Wisconsin-Madison , Madison, WI , USA
| |
Collapse
|
8
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
9
|
Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015; 11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling. Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.
Collapse
Affiliation(s)
- Brinda Vallat
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Carlos Madrid-Aliste
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| |
Collapse
|
10
|
Tsai Y, Holton T, Yeates TO. Diffusion accessibility as a method for visualizing macromolecular surface geometry. Protein Sci 2015; 24:1702-5. [PMID: 26189444 DOI: 10.1002/pro.2752] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 07/15/2015] [Indexed: 11/10/2022]
Abstract
Important three-dimensional spatial features such as depth and surface concavity can be difficult to convey clearly in the context of two-dimensional images. In the area of macromolecular visualization, the computer graphics technique of ray-tracing can be helpful, but further techniques for emphasizing surface concavity can give clearer perceptions of depth. The notion of diffusion accessibility is well-suited for emphasizing such features of macromolecular surfaces, but a method for calculating diffusion accessibility has not been made widely available. Here we make available a web-based platform that performs the necessary calculation by solving the Laplace equation for steady state diffusion, and produces scripts for visualization that emphasize surface depth by coloring according to diffusion accessibility. The URL is http://services.mbi.ucla.edu/DiffAcc/.
Collapse
Affiliation(s)
- Yingssu Tsai
- Department of Chemistry and Biochemistry, University of California, Los Angeles
| | - Thomas Holton
- UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA
| | - Todd O Yeates
- Department of Chemistry and Biochemistry, University of California, Los Angeles.,UCLA-DOE Institute for Genomics and Proteomics, Los Angeles, CA
| |
Collapse
|
11
|
Yang YS, Fernandez B, Lagorce A, Aloin V, De Guillen KM, Boyer JB, Dedieu A, Confalonieri F, Armengaud J, Roumestand C. Prioritizing targets for structural biology through the lens of proteomics: the archaeal protein TGAM_1934 from Thermococcus gammatolerans. Proteomics 2015; 15:114-23. [PMID: 25359407 DOI: 10.1002/pmic.201300535] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 10/01/2014] [Accepted: 10/24/2014] [Indexed: 11/09/2022]
Abstract
ORFans are hypothetical proteins lacking any significant sequence similarity with other proteins. Here, we highlighted by quantitative proteomics the TGAM_1934 ORFan from the hyperradioresistant Thermococcus gammatolerans archaeon as one of the most abundant hypothetical proteins. This protein has been selected as a priority target for structure determination on the basis of its abundance in three cellular conditions. Its solution structure has been determined using multidimensional heteronuclear NMR spectroscopy. TGAM_1934 displays an original fold, although sharing some similarities with the 3D structure of the bacterial ortholog of frataxin, CyaY, a protein conserved in bacteria and eukaryotes and involved in iron-sulfur cluster biogenesis. These results highlight the potential of structural proteomics in prioritizing ORFan targets for structure determination based on quantitative proteomics data. The proteomic data and structure coordinates have been deposited to the ProteomeXchange with identifier PXD000402 (http://proteomecentral.proteomexchange.org/dataset/PXD000402) and Protein Data Bank under the accession number 2mcf, respectively.
Collapse
Affiliation(s)
- Yin-Shan Yang
- Centre de Biochimie Structurale, Universités de Montpellier, Montpellier, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2015. [DOI: 10.1007/978-3-319-21233-3_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
13
|
Mizianty MJ, Fan X, Yan J, Chalmers E, Woloschuk C, Joachimiak A, Kurgan L. Covering complete proteomes with X-ray structures: a current snapshot. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014; 70:2781-93. [PMID: 25372670 PMCID: PMC4220968 DOI: 10.1107/s1399004714019427] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/27/2014] [Indexed: 12/23/2022]
Abstract
Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtained through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.
Collapse
Affiliation(s)
- Marcin J. Mizianty
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Xiao Fan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Jing Yan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Eric Chalmers
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Christopher Woloschuk
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | - Andrzej Joachimiak
- Midwest Center for Structural Genomics, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Lukasz Kurgan
- Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| |
Collapse
|
14
|
Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins 2014; 82 Suppl 2:43-56. [PMID: 24323734 DOI: 10.1002/prot.24488] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Revised: 11/10/2013] [Accepted: 11/19/2013] [Indexed: 12/27/2022]
Abstract
Template-based modeling (TBM) is a major component of the critical assessment of protein structure prediction (CASP). In CASP10, some 41,740 predicted models submitted by 150 predictor groups were assessed as TBM predictions. The accuracy of protein structure prediction was assessed by geometric comparison with experimental X-ray crystal and NMR structures using a composite score that included both global alignment metrics and distance-matrix-based metrics. These included GDT-HA and GDC-all global alignment scores, and the superimposition-independent LDDT distance-matrix-based score. In addition, a superimposition-independent RPF metric, similar to that described previously for comparing protein models against experimental NMR data, was used for comparing predicted protein structure models against experimental protein structures. To score well on all four of these metrics, models must feature accurate predictions of both backbone and side-chain conformations. Performance rankings were determined independently for server and the combined server plus human-curated predictor groups. Final rankings were made using paired head-to-head Student's t-test analysis of raw metric scores among the top 25 performing groups in each category.
Collapse
Affiliation(s)
- Yuanpeng J Huang
- Center for Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, New Jersey, 08854
| | | | | | | |
Collapse
|
15
|
Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014; 9:e92197. [PMID: 24637808 PMCID: PMC3956894 DOI: 10.1371/journal.pone.0092197] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 02/19/2014] [Indexed: 12/21/2022] Open
Abstract
The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.
Collapse
Affiliation(s)
- Tomasz Kosciolek
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - David T. Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
16
|
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative. Proc Natl Acad Sci U S A 2014; 111:3733-8. [PMID: 24567391 DOI: 10.1073/pnas.1321614111] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Collapse
|
17
|
Zimmerman MD, Grabowski M, Domagalski MJ, Maclean EM, Chruszcz M, Minor W. Data management in the modern structural biology and biomedical research environment. Methods Mol Biol 2014; 1140:1-25. [PMID: 24590705 PMCID: PMC4086192 DOI: 10.1007/978-1-4939-0354-2_1] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Modern high-throughput structural biology laboratories produce vast amounts of raw experimental data. The traditional method of data reduction is very simple-results are summarized in peer-reviewed publications, which are hopefully published in high-impact journals. By their nature, publications include only the most important results derived from experiments that may have been performed over the course of many years. The main content of the published paper is a concise compilation of these data, an interpretation of the experimental results, and a comparison of these results with those obtained by other scientists.Due to an avalanche of structural biology manuscripts submitted to scientific journals, in many recent cases descriptions of experimental methodology (and sometimes even experimental results) are pushed to supplementary materials that are only published online and sometimes may not be reviewed as thoroughly as the main body of a manuscript. Trouble may arise when experimental results are contradicting the results obtained by other scientists, which requires (in the best case) the reexamination of the original raw data or independent repetition of the experiment according to the published description of the experiment. There are reports that a significant fraction of experiments obtained in academic laboratories cannot be repeated in an industrial environment (Begley CG & Ellis LM, Nature 483(7391):531-3, 2012). This is not an indication of scientific fraud but rather reflects the inadequate description of experiments performed on different equipment and on biological samples that were produced with disparate methods. For that reason the goal of a modern data management system is not only the simple replacement of the laboratory notebook by an electronic one but also the creation of a sophisticated, internally consistent, scalable data management system that will combine data obtained by a variety of experiments performed by various individuals on diverse equipment. All data should be stored in a core database that can be used by custom applications to prepare internal reports, statistics, and perform other functions that are specific to the research that is pursued in a particular laboratory.This chapter presents a general overview of the methods of data management and analysis used by structural genomics (SG) programs. In addition to a review of the existing literature on the subject, also presented is experience in the development of two SG data management systems, UniTrack and LabDB. The description is targeted to a general audience, as some technical details have been (or will be) published elsewhere. The focus is on "data management," meaning the process of gathering, organizing, and storing data, but also briefly discussed is "data mining," the process of analysis ideally leading to an understanding of the data. In other words, data mining is the conversion of data into information. Clearly, effective data management is a precondition for any useful data mining. If done properly, gathering details on millions of experiments on thousands of proteins and making them publicly available for analysis-even after the projects themselves have ended-may turn out to be one of the most important benefits of SG programs.
Collapse
Affiliation(s)
- Matthew D Zimmerman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | | | | | | | | | | |
Collapse
|
18
|
Pulavarti SVSRK, Eletsky A, Lee HW, Acton TB, Xiao R, Everett JK, Prestegard JH, Montelione GT, Szyperski T. Solution NMR structure of CD1104B from pathogenic Clostridium difficile reveals a distinct α-helical architecture and provides first structural representative of protein domain family PF14203. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2013; 14:155-160. [PMID: 24048810 PMCID: PMC3844015 DOI: 10.1007/s10969-013-9164-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 09/10/2013] [Indexed: 05/30/2023]
Abstract
A high-quality structure of the 68-residue protein CD1104B from Clostridium difficile strain 630 exhibits a distinct all α-helical fold. The structure presented here is the first representative of bacterial protein domain family PF14203 (currently 180 members) of unknown function (DUF4319) and reveals that the side-chains of the only two strictly conserved residues (Glu 8 and Lys 48) form a salt bridge. Moreover, these two residues are located in the vicinity of the largest surface cleft which is predicted to contribute to a surface area involved in protein-protein interactions. This, along with its coding in transposon CTn4, suggests that CD1104B (and very likely all members of Pfam 14203) functions by interacting with other proteins required for the transfer of transposons between different bacterial species.
Collapse
Affiliation(s)
- Surya VSRK Pulavarti
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Hsiau-Wei Lee
- Complex Carbohydrate Research Center, University at Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University at Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA, Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ, Piscataway NJ 08854, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
19
|
DePietro PJ, Julfayev ES, McLaughlin WA. Quantification of the impact of PSI:Biology according to the annotations of the determined structures. BMC STRUCTURAL BIOLOGY 2013; 13:24. [PMID: 24139526 PMCID: PMC4016320 DOI: 10.1186/1472-6807-13-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 10/14/2013] [Indexed: 11/23/2022]
Abstract
Background Protein Structure Initiative:Biology (PSI:Biology) is the third phase of PSI where protein structures are determined in high-throughput to characterize their biological functions. The transition to the third phase entailed the formation of PSI:Biology Partnerships which are composed of structural genomics centers and biomedical science laboratories. We present a method to examine the impact of protein structures determined under the auspices of PSI:Biology by measuring their rates of annotations. The mean numbers of annotations per structure and per residue are examined. These are designed to provide measures of the amount of structure to function connections that can be leveraged from each structure. Results One result is that PSI:Biology structures are found to have a higher rate of annotations than structures determined during the first two phases of PSI. A second result is that the subset of PSI:Biology structures determined through PSI:Biology Partnerships have a higher rate of annotations than those determined exclusive of those partnerships. Both results hold when the annotation rates are examined either at the level of the entire protein or for annotations that are known to fall at specific residues within the portion of the protein that has a determined structure. Conclusions We conclude that PSI:Biology determines structures that are estimated to have a higher degree of biomedical interest than those determined during the first two phases of PSI based on a broad array of biomedical annotations. For the PSI:Biology Partnerships, we see that there is an associated added value that represents part of the progress toward the goals of PSI:Biology. We interpret the added value to mean that team-based structural biology projects that utilize the expertise and technologies of structural genomics centers together with biological laboratories in the community are conducted in a synergistic manner. We show that the annotation rates can be used in conjunction with established metrics, i.e. the numbers of structures and impact of publication records, to monitor the progress of PSI:Biology towards its goals of examining structure to function connections of high biomedical relevance. The metric provides an objective means to quantify the overall impact of PSI:Biology as it uses biomedical annotations from external sources.
Collapse
Affiliation(s)
| | | | - William A McLaughlin
- Department of Basic Science, The Commonwealth Medical College, 525 Pine Street, Scranton, PA 18509, USA.
| |
Collapse
|
20
|
Mistry J, Kloppmann E, Rost B, Punta M. An estimated 5% of new protein structures solved today represent a new Pfam family. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:2186-93. [PMID: 24189229 PMCID: PMC3817691 DOI: 10.1107/s0907444913027157] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 10/02/2013] [Indexed: 01/09/2023]
Abstract
High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in the Protein Data Bank (PDB), the repository of all publicly available protein structures, continues to increase, with more than 8000 structures released in 2012 alone. The authors of this article have studied how structural coverage of the protein-sequence space has changed over time by monitoring the number of Pfam families that acquired their first representative structure each year from 1976 to 2012. Twenty years ago, for every 100 new PDB entries released, an estimated 20 Pfam families acquired their first structure. By 2012, this decreased to only about five families per 100 structures. The reasons behind the slower pace at which previously uncharacterized families are being structurally covered were investigated. It was found that although more than 50% of current Pfam families are still without a structural representative, this set is enriched in families that are small, functionally uncharacterized or rich in problem features such as intrinsically disordered and transmembrane regions. While these are important constraints, the reasons why it may not yet be time to give up the pursuit of a targeted but more comprehensive structural coverage of the protein-sequence space are discussed.
Collapse
Affiliation(s)
- Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, England
| | | | | | | |
Collapse
|
21
|
In silico mechanistic profiling to probe small molecule binding to sulfotransferases. PLoS One 2013; 8:e73587. [PMID: 24039991 PMCID: PMC3765257 DOI: 10.1371/journal.pone.0073587] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 07/28/2013] [Indexed: 01/01/2023] Open
Abstract
Drug metabolizing enzymes play a key role in the metabolism, elimination and detoxification of xenobiotics, drugs and endogenous molecules. While their principal role is to detoxify organisms by modifying compounds, such as pollutants or drugs, for a rapid excretion, in some cases they render their substrates more toxic thereby inducing severe side effects and adverse drug reactions, or their inhibition can lead to drug–drug interactions. We focus on sulfotransferases (SULTs), a family of phase II metabolizing enzymes, acting on a large number of drugs and hormones and showing important structural flexibility. Here we report a novel in silico structure-based approach to probe ligand binding to SULTs. We explored the flexibility of SULTs by molecular dynamics (MD) simulations in order to identify the most suitable multiple receptor conformations for ligand binding prediction. Then, we employed structure-based docking-scoring approach to predict ligand binding and finally we combined the predicted interaction energies by using a QSAR methodology. The results showed that our protocol successfully prioritizes potent binders for the studied here SULT1 isoforms, and give new insights on specific molecular mechanisms for diverse ligands’ binding related to their binding sites plasticity. Our best QSAR models, introducing predicted protein-ligand interaction energy by using docking, showed accuracy of 67.28%, 78.00% and 75.46%, for the isoforms SULT1A1, SULT1A3 and SULT1E1, respectively. To the best of our knowledge our protocol is the first in silico structure-based approach consisting of a protein-ligand interaction analysis at atomic level that considers both ligand and enzyme flexibility, along with a QSAR approach, to identify small molecules that can interact with II phase dug metabolizing enzymes.
Collapse
|
22
|
Abstract
Docking is the computational method of choice to quickly predict how a low molecular-weight ligand binds to its macromolecular target. Despite persistent problems in predicting binding free energies, docking has undergone significant advances in numerous topics (throughput, target flexibility). The ever increasing availability of high-resolution X-ray structures and the development of more reliable comparative models for proteins of pharmacological interest paved the way to apply protein–ligand docking to multiple targets to predict main and off-targets for bioactive compounds and even to repurpose existing drugs. Applying docking to multiple targets brings an additional level of complexity in scoring numerous and heterogeneous docking poses. Despite undeniable successes, proteomewide docking should, however, be considered with caution with regard to recall and precision of the predictions.
Collapse
|
23
|
Villoutreix BO, Lagorce D, Labbé CM, Sperandio O, Miteva MA. One hundred thousand mouse clicks down the road: selected online resources supporting drug discovery collected over a decade. Drug Discov Today 2013; 18:1081-9. [PMID: 23831439 DOI: 10.1016/j.drudis.2013.06.013] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 06/18/2013] [Accepted: 06/26/2013] [Indexed: 12/17/2022]
Abstract
Online resources enabling and supporting drug discovery have blossomed during the past ten years. However, drug hunters commonly find themselves overwhelmed by the proliferation of these computer-based resources. Ten years ago, we, the authors of this review, felt that a comprehensive list of in silico resources relating to drug discovery was needed. Especially because the internet provides a wealth of inspiring tools that, if fully exploited, could greatly assist the process. We present here a compilation of online tools and databases collected over the past decade. The tools were essentially found through literature and internet searches and, currently, our list contains over 1500 URLs. We also briefly highlight some recently reported services and comment about ongoing and future efforts in the field.
Collapse
Affiliation(s)
- Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, Inserm UMR-S 973, Molécules Thérapeutiques In Silico, 39 rue Helene Brion, 75013 Paris, France.
| | | | | | | | | |
Collapse
|
24
|
Johansson MU, Zoete V, Guex N. Recurrent structural motifs in non-homologous protein structures. Int J Mol Sci 2013; 14:7795-814. [PMID: 23574940 PMCID: PMC3645717 DOI: 10.3390/ijms14047795] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Revised: 03/27/2013] [Accepted: 04/01/2013] [Indexed: 11/18/2022] Open
Abstract
We have extracted an extensive collection of recurrent structural motifs (RSMs), which consist of sequentially non-contiguous structural motifs (4–6 residues), each of which appears with very similar conformation in three or more mutually unrelated protein structures. We find that the proteins in our set are covered to a substantial extent by the recurrent non-contiguous structural motifs, especially the helix and strand regions. Computational alanine scanning calculations indicate that the average folding free energy changes upon alanine mutation for most types of non-alanine residues are higher for amino acids that are present in recurrent structural motifs than for amino acids that are not. The non-alanine amino acids that are most common in the recurrent structural motifs, i.e., phenylalanine, isoleucine, leucine, valine and tyrosine and the less abundant methionine and tryptophan, have the largest folding free energy changes. This indicates that the recurrent structural motifs, as we define them, describe recurrent structural patterns that are important for protein stability. In view of their properties, such structural motifs are potentially useful for inter-residue contact prediction and protein structure refinement.
Collapse
Affiliation(s)
- Maria U. Johansson
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
- Authors to whom correspondence should be addressed; E-Mails: (M.U.J.); (N.G.); Tel.: +41-21-692-40-86 (M.U.J.); +41-21-692-40-37 (N.G.); Fax: +41-21-692-40-65 (M.U.J. & N.G.)
| | - Vincent Zoete
- Molecular Modelling Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland; E-Mail:
| | - Nicolas Guex
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
- Authors to whom correspondence should be addressed; E-Mails: (M.U.J.); (N.G.); Tel.: +41-21-692-40-86 (M.U.J.); +41-21-692-40-37 (N.G.); Fax: +41-21-692-40-65 (M.U.J. & N.G.)
| |
Collapse
|
25
|
Mills JL, Acton TB, Xiao R, Everett JK, Montelione GT, Szyperski T. Solution NMR structure of the helicase associated domain BVU_0683(627-691) from Bacteroides vulgatus provides first structural coverage for protein domain family PF03457 and indicates domain binding to DNA. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2013; 14:19-24. [PMID: 23160728 PMCID: PMC3637686 DOI: 10.1007/s10969-012-9148-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Accepted: 10/29/2012] [Indexed: 06/01/2023]
Abstract
A high-quality NMR structure of the helicase associated (HA) domain comprising residues 627-691 of the 753-residue protein BVU_0683 from Bacteroides vulgatus exhibits an all α-helical fold. The structure presented here is the first representative for the large protein domain family PF03457 (currently 742 members) of HA domains. Comparison with structurally similar proteins supports the hypothesis that HA domains bind to DNA and that binding specificity varies greatly within the family of HA domains constituting PF03457.
Collapse
Affiliation(s)
- Jeffrey L. Mills
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA, Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, Piscataway, NJ 08854, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
26
|
Affiliation(s)
- Michael Bieler
- Boehringer Ingelheim Pharma GmbH & Co. KG; Lead Discovery and Optimization Support; 88397; Biberach/Riss; Germany
| | - Herbert Koeppen
- Boehringer Ingelheim Pharma GmbH & Co. KG; Lead Discovery and Optimization Support; 88397; Biberach/Riss; Germany
| |
Collapse
|
27
|
Desaphy J, Azdimousa K, Kellenberger E, Rognan D. Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J Chem Inf Model 2012; 52:2287-99. [PMID: 22834646 DOI: 10.1021/ci300184x] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Estimating the pairwise similarity of protein-ligand binding sites is a fast and efficient way of predicting cross-reactivity and putative side effects of drug candidates. Among the many tools available, three-dimensional (3D) alignment-dependent methods are usually slow and based on simplified representations of binding site atoms or surfaces. On the other hand, fast and efficient alignment-free methods have recently been described but suffer from a lack of interpretability. We herewith present a novel binding site description (VolSite), coupled to an alignment and comparison tool (Shaper) combining the speed of alignment-free methods with the interpretability of alignment-dependent approaches. It is based on the comparison of negative images of binding cavities encoding both shape and pharmacophoric properties at regularly spaced grid points. Shaper approximates the resulting molecular shape with a smooth Gaussian function and aligns protein binding sites by optimizing their volume overlap. Volsite and Shaper were successfully applied to compare protein-ligand binding sites and to predict their structural druggability.
Collapse
Affiliation(s)
- Jérémy Desaphy
- Laboratory of Therapeutic Innovation, UMR 7200 Université de Strasbourg/CNRS, Medalis Drug Discovery Center, F-67400 Illkirch, France
| | | | | | | |
Collapse
|
28
|
Kulp DW, Subramaniam S, Donald JE, Hannigan BT, Mueller BK, Grigoryan G, Senes A. Structural informatics, modeling, and design with an open-source Molecular Software Library (MSL). J Comput Chem 2012; 33:1645-61. [PMID: 22565567 PMCID: PMC3432414 DOI: 10.1002/jcc.22968] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 02/16/2012] [Accepted: 03/02/2012] [Indexed: 01/22/2023]
Abstract
We present the Molecular Software Library (MSL), a C++ library for molecular modeling. MSL is a set of tools that supports a large variety of algorithms for the design, modeling, and analysis of macromolecules. Among the main features supported by the library are methods for applying geometric transformations and alignments, the implementation of a rich set of energy functions, side chain optimization, backbone manipulation, calculation of solvent accessible surface area, and other tools. MSL has a number of unique features, such as the ability of storing alternative atomic coordinates (for modeling) and multiple amino acid identities at the same backbone position (for design). It has a straightforward mechanism for extending its energy functions and can work with any type of molecules. Although the code base is large, MSL was created with ease of developing in mind. It allows the rapid implementation of simple tasks while fully supporting the creation of complex applications. Some of the potentialities of the software are demonstrated here with examples that show how to program complex and essential modeling tasks with few lines of code. MSL is an ongoing and evolving project, with new features and improvements being introduced regularly, but it is mature and suitable for production and has been used in numerous protein modeling and design projects. MSL is open-source software, freely downloadable at http://msl-libraries.org. We propose it as a common platform for the development of new molecular algorithms and to promote the distribution, sharing, and reutilization of computational methods.
Collapse
Affiliation(s)
| | | | | | - Brett T. Hannigan
- U. of Pennsylvania, Genomics and Computational Biology Graduate Group
| | | | | | | |
Collapse
|
29
|
Kloppmann E, Punta M, Rost B. Structural genomics plucks high-hanging membrane proteins. Curr Opin Struct Biol 2012; 22:326-32. [PMID: 22622032 DOI: 10.1016/j.sbi.2012.05.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2012] [Revised: 03/28/2012] [Accepted: 05/01/2012] [Indexed: 01/21/2023]
Abstract
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.
Collapse
Affiliation(s)
- Edda Kloppmann
- Department of Bioinformatics and Computational Biology, Technical University Munich, Germany.
| | | | | |
Collapse
|
30
|
Montelione GT. The Protein Structure Initiative: achievements and visions for the future. F1000 BIOLOGY REPORTS 2012; 4:7. [PMID: 22500193 PMCID: PMC3318194 DOI: 10.3410/b4-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The Protein Structure Initiative (PSI) was established in 2000 by the National Institutes of General Medical Sciences with the long-term goal of providing 3D (three-dimensional) structural information for most proteins in nature. As advances in genomic sequencing, bioinformatics, homology modelling, and methods for rapid determination of 3D structures of proteins by X-ray crystallography and nuclear magnetic resonance (NMR) converged, it was proposed that our understanding of the biology of protein structure and evolution could be greatly enabled by ‘genomic-scale’ protein structure determination. Over the past 12 years, the PSI has evolved from a testing bed for new methods of sample and structure production to a core component of a wide range of biology programs.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| |
Collapse
|
31
|
Eletsky A, Acton TB, Xiao R, Everett JK, Montelione GT, Szyperski T. Solution NMR structures reveal a distinct architecture and provide first structures for protein domain family PF04536. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:9-14. [PMID: 22198206 PMCID: PMC3609422 DOI: 10.1007/s10969-011-9122-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 12/13/2011] [Indexed: 11/29/2022]
Abstract
The protein family (Pfam) PF04536 is a broadly conserved domain family of unknown function (DUF477), with more than 1,350 members in prokaryotic and eukaryotic proteins. High-quality NMR structures of the N-terminal domain comprising residues 41-180 of the 684-residue protein CG2496 from Corynebacterium glutamicum and the N-terminal domain comprising residues 35-182 of the 435-residue protein PG0361 from Porphyromonas gingivalis both exhibit an α/β fold comprised of a four-stranded β-sheet, three α-helices packed against one side of the sheet, and a fourth α-helix attached to the other side. In spite of low sequence similarity (18%) assessed by structure-based sequence alignment, the two structures are globally quite similar. However, moderate structural differences are observed for the relative orientation of two of the four helices. Comparison with known protein structures reveals that the α/β architecture of CG2496(41-180) and PG0361(35-182) has previously not been characterized. Moreover, calculation of surface charge potential and identification of surface clefts indicate that the two domains very likely have different functions.
Collapse
Affiliation(s)
- Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, Buffalo, NY 14260, USA
| | | | | | | | | | | |
Collapse
|
32
|
Eletsky A, Petrey D, Cliff Zhang Q, Lee HW, Acton TB, Xiao R, Everett JK, Prestegard JH, Honig B, Montelione GT, Szyperski T. Solution NMR structures reveal unique homodimer formation by a winged helix-turn-helix motif and provide first structures for protein domain family PF10771. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:1-7. [PMID: 22223187 PMCID: PMC3654790 DOI: 10.1007/s10969-011-9121-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 12/13/2011] [Indexed: 11/29/2022]
Abstract
High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.
Collapse
Affiliation(s)
- Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Donald Petrey
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Qiangfeng Cliff Zhang
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Hsiau-Wei Lee
- Complex Carbohydrate Research Center, University of Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Thomas B. Acton
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University of Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Barry Honig
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Gaetano T. Montelione
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
33
|
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011; 6:e28766. [PMID: 22163331 PMCID: PMC3233603 DOI: 10.1371/journal.pone.0028766] [Citation(s) in RCA: 778] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 11/14/2011] [Indexed: 11/19/2022] Open
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Collapse
Affiliation(s)
- Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | | | | | | | |
Collapse
|
34
|
Cai XH, Jaroszewski L, Wooley J, Godzik A. Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering. Proteins 2011; 79:2389-402. [PMID: 21671455 PMCID: PMC3132221 DOI: 10.1002/prot.23049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Revised: 02/12/2011] [Accepted: 03/13/2011] [Indexed: 12/14/2022]
Abstract
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Collapse
Affiliation(s)
- Xiao-hui Cai
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, Bioinformatics Core, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - John Wooley
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, Bioinformatics Core, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0446, USA
- Joint Center for Structural Genomics, Bioinformatics Core, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| |
Collapse
|
35
|
Practical applications of structural genomics technologies for mutagen research. Mutat Res 2011; 722:165-70. [PMID: 21182983 DOI: 10.1016/j.mrgentox.2010.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 12/10/2010] [Indexed: 11/23/2022]
Abstract
Here we present a perspective on a range of practical uses of structural genomics for mutagen research. Structural genomics is an overloaded term and requires some definition to bound the discussion; we give a brief description of public and private structural genomics endeavors, along with some of their objectives, their activities, their capabilities, and their limitations. We discuss how structural genomics might impact mutagen research in three different scenarios: at a structural genomics center, at a lab with modest resources that also conducts structural biology research, and at a lab that is conducting mutagen research without in-house experimental structural biology. Applications span functional annotation of single genes or SNP, to constructing gene networks and pathways, to an integrated systems biology approach. Structural genomics centers can take advantage of systems biology models to target high value targets for structure determination and in turn extend systems models to better understand systems biology diseases or phenomenon. Individual investigator run structural biology laboratories can collaborate with structural genomics centers, but can also take advantage of technical advances and tools developed by structural genomics centers and can employ a structural genomics approach to advancing biological understanding. Individual investigator-run non-structural biology laboratories can also collaborate with structural genomics centers, possibly influencing targeting decisions, but can also use structure based annotation tools enabled by the growing coverage of protein fold space provided by structural genomics. Better functional annotation can inform pathway and systems biology models.
Collapse
|
36
|
Verschueren E, Vanhee P, van der Sloot AM, Serrano L, Rousseau F, Schymkowitz J. Protein design with fragment databases. Curr Opin Struct Biol 2011; 21:452-9. [PMID: 21684149 DOI: 10.1016/j.sbi.2011.05.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 05/25/2011] [Indexed: 11/25/2022]
Abstract
Structure-based computational methods are popular tools for designing proteins and interactions between proteins because they provide the necessary insight and details required for rational engineering. Here, we first argue that large-scale databases of fragments contain a discrete but complete set of building blocks that can be used to design structures. We show that these structural alphabets can be saturated to provide conformational ensembles that sample the native structure space around energetic minima. Second, we show that catalogs of interaction patterns hold the key to overcome the lack of scaffolds when computationally designing protein interactions. Finally, we illustrate the power of database-driven computational protein design methods by recent successful applications and discuss what challenges remain to push this field forward.
Collapse
Affiliation(s)
- Erik Verschueren
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
37
|
Mao B, Guan R, Montelione GT. Improved technologies now routinely provide protein NMR structures useful for molecular replacement. Structure 2011; 19:757-66. [PMID: 21645849 PMCID: PMC3612016 DOI: 10.1016/j.str.2011.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 04/07/2011] [Accepted: 04/25/2011] [Indexed: 10/18/2022]
Abstract
Molecular replacement (MR) is widely used for addressing the phase problem in X-ray crystallography. Historically, crystallographers have had limited success using NMR structures as MR search models. Here, we report a comprehensive investigation of the utility of protein NMR ensembles as MR search models, using data for 25 pairs of X-ray and NMR structures solved and refined using modern NMR methods. Starting from NMR ensembles prepared by an improved protocol, FindCore, correct MR solutions were obtained for 22 targets. Based on these solutions, automatic model rebuilding could be done successfully. Rosetta refinement of NMR structures provided MR solutions for another two proteins. We also demonstrate that such properly prepared NMR ensembles and X-ray crystal structures have similar performance when used as MR search models for homologous structures, particularly for targets with sequence identity >40%.
Collapse
Affiliation(s)
- Binchen Mao
- Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, and Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, Piscataway, New Jersey 08854, USA
| | - Rongjin Guan
- Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, and Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, Piscataway, New Jersey 08854, USA
| | - Gaetano T. Montelione
- Center for Advanced Biotechnology and Medicine, Northeast Structural Genomics Consortium, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, and Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, Piscataway, New Jersey 08854, USA
| |
Collapse
|
38
|
Teyra J, Hawkins J, Zhu H, Pisabarro MT. Studies on the inference of protein binding regions across fold space based on structural similarities. Proteins 2011; 79:499-508. [PMID: 21069715 DOI: 10.1002/prot.22897] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The emerging picture of a continuous protein fold space highlights the existence of non obvious structural similarities between proteins with apparent different topologies. The identification of structure resemblances across fold space and the analysis of similar recognition regions may be a valuable source of information towards protein structure-based functional characterization. In this work, we use non-sequential structural alignment methods (ns-SAs) to identify structural similarities between protein pairs independently of their SCOP hierarchy, and we calculate the significance of binding region conservation using the interacting residues overlap in the ns-SA. We cluster the binding inferences for each family to distinguish already known family binding regions from putative new ones. Our methodology exploits the enormous amount of data available in the PDB to identify binding region similarities within protein families and to propose putative binding regions. Our results indicate that there is a plethora of structurally common binding regions among proteins, independently of current fold classifications. We obtain a 6- to 8-fold enrichment of novel binding regions, and identify binding inferences for 728 protein families that so far lack binding information in the PDB. We explore binding mode analogies between ligands from commonly clustered binding regions to investigate the utility of our methodology. A comprehensive analysis of the obtained binding inferences may help in the functional characterization of protein recognition and assist rational engineering. The data obtained in this work is available in the download link at www.scowlp.org.
Collapse
Affiliation(s)
- Joan Teyra
- Structural Bioinformatics, BIOTEC, Technical University of Dresden, Tatzberg 47-51, 01307 Dresden, Germany.
| | | | | | | |
Collapse
|
39
|
|
40
|
Julfayev ES, McLaughlin RJ, Tao YP, McLaughlin WA. A new approach to assess and predict the functional roles of proteins across all known structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:9-20. [PMID: 21445639 PMCID: PMC3089730 DOI: 10.1007/s10969-011-9105-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 03/14/2011] [Indexed: 12/11/2022]
Abstract
The three dimensional atomic structures of proteins provide information regarding their function; and codified relationships between structure and function enable the assessment of function from structure. In the current study, a new data mining tool was implemented that checks current gene ontology (GO) annotations and predicts new ones across all the protein structures available in the Protein Data Bank (PDB). The tool overcomes some of the challenges of utilizing large amounts of protein annotation and measurement information to form correspondences between protein structure and function. Protein attributes were extracted from the Structural Biology Knowledgebase and open source biological databases. Based on the presence or absence of a given set of attributes, a given protein's functional annotations were inferred. The results show that attributes derived from the three dimensional structures of proteins enhanced predictions over that using attributes only derived from primary amino acid sequence. Some predictions reflected known but not completely documented GO annotations. For example, predictions for the GO term for copper ion binding reflected used information a copper ion was known to interact with the protein based on information in a ligand interaction database. Other predictions were novel and require further experimental validation. These include predictions for proteins labeled as unknown function in the PDB. Two examples are a role in the regulation of transcription for the protein AF1396 from Archaeoglobus fulgidus and a role in RNA metabolism for the protein psuG from Thermotoga maritima.
Collapse
Affiliation(s)
- Elchin S. Julfayev
- Department of Basic Science, The Commonwealth Medical College, 525 Pine Street, Scranton, PA 18509 USA
| | - Ryan J. McLaughlin
- Department of Basic Science, The Commonwealth Medical College, 525 Pine Street, Scranton, PA 18509 USA
| | - Yi-Ping Tao
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ 08854-8087 USA
| | - William A. McLaughlin
- Department of Basic Science, The Commonwealth Medical College, 525 Pine Street, Scranton, PA 18509 USA
| |
Collapse
|
41
|
Lee D, de Beer TAP, Laskowski RA, Thornton JM, Orengo CA. 1,000 structures and more from the MCSG. BMC STRUCTURAL BIOLOGY 2011; 11:2. [PMID: 21219649 PMCID: PMC3024214 DOI: 10.1186/1472-6807-11-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 01/10/2011] [Indexed: 11/10/2022]
Abstract
Background The Midwest Center for Structural Genomics (MCSG) is one of the large-scale centres of the Protein Structure Initiative (PSI). During the first two phases of the PSI the MCSG has solved over a thousand protein structures. A criticism of structural genomics is that target selection strategies mean that some structures are solved without having a known function and thus are of little biomedical significance. Structures of unknown function have stimulated the development of methods for function prediction from structure. Results We show that the MCSG has met the stated goals of the PSI and use online resources and readily available function prediction methods to provide functional annotations for more than 90% of the MCSG structures. The structure-to-function prediction method ProFunc provides likely functions for many of the MCSG structures that cannot be annotated by sequence-based methods. Conclusions Although the focus of the PSI was structural coverage, many of the structures solved by the MCSG can also be associated with functional classes and biological roles of possible biomedical value.
Collapse
Affiliation(s)
- David Lee
- Department of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
42
|
Acton TB, Xiao R, Anderson S, Aramini J, Buchwald WA, Ciccosanti C, Conover K, Everett J, Hamilton K, Huang YJ, Janjua H, Kornhaber G, Lau J, Lee DY, Liu G, Maglaqui M, Ma L, Mao L, Patel D, Rossi P, Sahdev S, Shastry R, Swapna GVT, Tang Y, Tong S, Wang D, Wang H, Zhao L, Montelione GT. Preparation of protein samples for NMR structure, function, and small-molecule screening studies. Methods Enzymol 2011; 493:21-60. [PMID: 21371586 DOI: 10.1016/b978-0-12-381274-2.00002-9] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
In this chapter, we concentrate on the production of high-quality protein samples for nuclear magnetic resonance (NMR) studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium and outline our high-throughput strategies for producing high-quality protein samples for NMR studies. Our strategy is based on the cloning, expression, and purification of 6×-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6×-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (>97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this chapter describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening, and structural genomics research.
Collapse
Affiliation(s)
- Thomas B Acton
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Northeast Structural Genomics Consortium, Rutgers University, Piscataway, New Jersey, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Koonin EV. New variants of known folds: do they bring new biology? Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1226-9. [PMID: 20944215 PMCID: PMC2954209 DOI: 10.1107/s1744309110013242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 04/09/2010] [Indexed: 05/30/2023]
Abstract
New distinct versions of known protein folds provide a powerful means of protein-function prediction that complements sequence and genomic context analysis. These structures do not supplant direct biochemical experiments, but are indispensable for the complete characterization of proteins.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA.
| |
Collapse
|
44
|
Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wooley J, Wüthrich K, Wilson IA. The JCSG high-throughput structural biology pipeline. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1137-42. [PMID: 20944202 PMCID: PMC2954196 DOI: 10.1107/s1744309110038212] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Accepted: 09/24/2010] [Indexed: 11/23/2022]
Abstract
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.
Collapse
Affiliation(s)
- Marc-André Elsliger
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ashley M. Deacon
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA, USA
| | - Adam Godzik
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute La Jolla, CA, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Scott A. Lesley
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
- Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - John Wooley
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Kurt Wüthrich
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian A. Wilson
- Joint Center for Structural Genomics (JCSG), http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
45
|
De Franchi E, Schalon C, Messa M, Onofri F, Benfenati F, Rognan D. Binding of protein kinase inhibitors to synapsin I inferred from pair-wise binding site similarity measurements. PLoS One 2010; 5:e12214. [PMID: 20808948 PMCID: PMC2922380 DOI: 10.1371/journal.pone.0012214] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 07/26/2010] [Indexed: 11/18/2022] Open
Abstract
Predicting off-targets by computational methods is getting increasing importance in early drug discovery stages. We herewith present a computational method based on binding site three-dimensional comparisons, which prompted us to investigate the cross-reaction of protein kinase inhibitors with synapsin I, an ATP-binding protein regulating neurotransmitter release in the synapse. Systematic pair-wise comparison of the staurosporine-binding site of the proto-oncogene Pim-1 kinase with 6,412 druggable protein-ligand binding sites suggested that the ATP-binding site of synapsin I may recognize the pan-kinase inhibitor staurosporine. Biochemical validation of this hypothesis was realized by competition experiments of staurosporine with ATP-gamma(35)S for binding to synapsin I. Staurosporine, as well as three other inhibitors of protein kinases (cdk2, Pim-1 and casein kinase type 2), effectively bound to synapsin I with nanomolar affinities and promoted synapsin-induced F-actin bundling. The selective Pim-1 kinase inhibitor quercetagetin was shown to be the most potent synapsin I binder (IC50 = 0.15 microM), in agreement with the predicted binding site similarities between synapsin I and various protein kinases. Other protein kinase inhibitors (protein kinase A and chk1 inhibitor), kinase inhibitors (diacylglycerolkinase inhibitor) and various other ATP-competitors (DNA topoisomerase II and HSP-90alpha inhibitors) did not bind to synapsin I, as predicted from a lower similarity of their respective ATP-binding sites to that of synapsin I. The present data suggest that the observed downregulation of neurotransmitter release by some but not all protein kinase inhibitors may also be contributed by a direct binding to synapsin I and phosphorylation-independent perturbation of synapsin I function. More generally, the data also demonstrate that cross-reactivity with various targets may be detected by systematic pair-wise similarity measurement of ligand-annotated binding sites.
Collapse
Affiliation(s)
- Enrico De Franchi
- Department of Neuroscience and Brain Technologies, The Italian Institute of Technology, Genova, Italy
| | - Claire Schalon
- Structural Chemogenomics, Laboratory of Therapeutic Innovation, CNRS UMR 7200, Université de Strasbourg, Illkirch, France
| | - Mirko Messa
- Department of Neuroscience and Brain Technologies, The Italian Institute of Technology, Genova, Italy
| | - Franco Onofri
- Department of Experimental Medicine, University of Genova and Istituto Nazionale di Neuroscienze, Genova, Italy
| | - Fabio Benfenati
- Department of Neuroscience and Brain Technologies, The Italian Institute of Technology, Genova, Italy
- Department of Experimental Medicine, University of Genova and Istituto Nazionale di Neuroscienze, Genova, Italy
| | - Didier Rognan
- Structural Chemogenomics, Laboratory of Therapeutic Innovation, CNRS UMR 7200, Université de Strasbourg, Illkirch, France
| |
Collapse
|
46
|
Xiao R, Anderson S, Aramini J, Belote R, Buchwald WA, Ciccosanti C, Conover K, Everett JK, Hamilton K, Huang YJ, Janjua H, Jiang M, Kornhaber GJ, Lee DY, Locke JY, Ma LC, Maglaqui M, Mao L, Mitra S, Patel D, Rossi P, Sahdev S, Sharma S, Shastry R, Swapna GVT, Tong SN, Wang D, Wang H, Zhao L, Montelione GT, Acton TB. The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium. J Struct Biol 2010; 172:21-33. [PMID: 20688167 DOI: 10.1016/j.jsb.2010.07.011] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2010] [Revised: 07/24/2010] [Accepted: 07/28/2010] [Indexed: 11/15/2022]
Abstract
We describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples. The platform is centered on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems. The 6X-His tag allows for similar purification procedures for most targets and implementation of high-throughput (HTP) parallel methods. In most cases, the 6X-His-tagged proteins are sufficiently purified (>97% homogeneity) using a HTP two-step purification protocol for most structural studies. Using this platform, the open reading frames of over 16,000 different targeted proteins (or domains) have been cloned as>26,000 constructs. Over the past 10 years, more than 16,000 of these expressed protein, and more than 4400 proteins (or domains) have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html). Using these samples, the NESG has deposited more than 900 new protein structures to the Protein Data Bank (PDB). The methods described here are effective in producing eukaryotic and prokaryotic protein samples in E. coli. This paper summarizes some of the updates made to the protein production pipeline in the last 5 years, corresponding to phase 2 of the NIGMS Protein Structure Initiative (PSI-2) project. The NESG Protein Production Platform is suitable for implementation in a large individual laboratory or by a small group of collaborating investigators. These advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are of broad value to the structural biology, functional proteomics, and structural genomics communities.
Collapse
Affiliation(s)
- Rong Xiao
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Robert Wood Johnson Medical School, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Murakami Y, Spriggs RV, Nakamura H, Jones S. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences. Nucleic Acids Res 2010; 38:W412-6. [PMID: 20507911 PMCID: PMC2896099 DOI: 10.1093/nar/gkq474] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2010] [Revised: 05/06/2010] [Accepted: 05/13/2010] [Indexed: 12/20/2022] Open
Abstract
The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA.
Collapse
Affiliation(s)
- Yoichi Murakami
- Laboratory of Protein Informatics, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, Osaka, Japan and Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer BN1 9QG, UK
| | - Ruth V. Spriggs
- Laboratory of Protein Informatics, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, Osaka, Japan and Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer BN1 9QG, UK
| | - Haruki Nakamura
- Laboratory of Protein Informatics, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, Osaka, Japan and Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer BN1 9QG, UK
| | - Susan Jones
- Laboratory of Protein Informatics, Research Center for Structural and Functional Proteomics, Institute for Protein Research, Osaka University, Osaka, Japan and Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer BN1 9QG, UK
| |
Collapse
|
48
|
Lichtarge O, Wilkins A. Evolution: a guide to perturb protein function and networks. Curr Opin Struct Biol 2010; 20:351-9. [PMID: 20444593 PMCID: PMC2916956 DOI: 10.1016/j.sbi.2010.04.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Accepted: 04/08/2010] [Indexed: 12/11/2022]
Abstract
Protein interactions give rise to networks that control cell fate in health and disease; selective means to probe these interactions are therefore of wide interest. We discuss here Evolutionary Tracing (ET), a comparative method to identify protein functional sites and to guide experiments that selectively block, recode, or mimic their amino acid determinants. These studies suggest, in principle, a scalable approach to perturb individual links in protein networks.
Collapse
Affiliation(s)
- Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | |
Collapse
|
49
|
Montelione GT, Szyperski T. Advances in protein NMR provided by the NIGMS Protein Structure Initiative: impact on drug discovery. CURRENT OPINION IN DRUG DISCOVERY & DEVELOPMENT 2010; 13:335-349. [PMID: 20443167 PMCID: PMC4002360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Rational drug design relies on the 3D structures of biological macromolecules, with a particular emphasis on proteins. The structural genomics-based high-throughput structure determination platforms established by the Protein Structure Initiative (PSI) of the National Institute of General Medical Science (NIGMS) of the NIH are uniquely suited to provide these structures. NMR plays a critical role in structure determination because many important protein targets do not form the single crystals required for X-ray diffraction. NMR can provide valuable structural and dynamic information on proteins and their drug complexes that cannot be obtained with X-ray crystallography. This review discusses recent advances in NMR that have been driven by structural genomics projects. These advances suggest that the future discovery and design of drugs can increasingly rely on protocols using NMR approaches for the rapid and accurate determination of structures.
Collapse
Affiliation(s)
- Gaetano T Montelione
- Rutgers University, Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, Piscataway, NJ 08854-5638, USA.
| | | |
Collapse
|
50
|
Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010; 67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Collapse
Affiliation(s)
- Ursula Hinz
- Swiss-Prot Group, Swiss Institute of Bioinformatics, 1 rue Michel Servet, 1211, Geneva, Switzerland.
| |
Collapse
|