1
|
Yang H, Kim K, Li S, Pacheco J, Chen XS. Structural basis of sequence-specific RNA recognition by the antiviral factor APOBEC3G. Nat Commun 2022; 13:7498. [PMID: 36470880 PMCID: PMC9722718 DOI: 10.1038/s41467-022-35201-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022] Open
Abstract
An essential step in restricting HIV infectivity by the antiviral factor APOBEC3G is its incorporation into progeny virions via binding to HIV RNA. However, the mechanism of APOBEC3G capturing viral RNA is unknown. Here, we report crystal structures of a primate APOBEC3G bound to different types of RNAs, revealing that APOBEC3G specifically recognizes unpaired 5'-AA-3' dinucleotides, and to a lesser extent, 5'-GA-3' dinucleotides. APOBEC3G binds to the common 3'A in the AA/GA motifs using an aromatic/hydrophobic pocket in the non-catalytic domain. It binds to the 5'A or 5'G in the AA/GA motifs using an aromatic/hydrophobic groove conformed between the non-catalytic and catalytic domains. APOBEC3G RNA binding property is distinct from that of the HIV nucleocapsid protein recognizing unpaired guanosines. Our findings suggest that the sequence-specific RNA recognition is critical for APOBEC3G virion packaging and restricting HIV infectivity.
Collapse
Affiliation(s)
- Hanjing Yang
- Molecular and Computational Biology, Departments of Biological Sciences and Chemistry, Los Angeles, CA 90089 USA
| | - Kyumin Kim
- Molecular and Computational Biology, Departments of Biological Sciences and Chemistry, Los Angeles, CA 90089 USA
| | - Shuxing Li
- Molecular and Computational Biology, Departments of Biological Sciences and Chemistry, Los Angeles, CA 90089 USA ,grid.42505.360000 0001 2156 6853Center of Excellence in NanoBiophysics, University of Southern California, Los Angeles, CA 90089 USA
| | - Josue Pacheco
- Molecular and Computational Biology, Departments of Biological Sciences and Chemistry, Los Angeles, CA 90089 USA
| | - Xiaojiang S. Chen
- Molecular and Computational Biology, Departments of Biological Sciences and Chemistry, Los Angeles, CA 90089 USA ,grid.42505.360000 0001 2156 6853Center of Excellence in NanoBiophysics, University of Southern California, Los Angeles, CA 90089 USA ,grid.42505.360000 0001 2156 6853Genetic, Molecular and Cellular Biology Program, Keck School of Medicine, Los Angeles, CA 90033 USA ,grid.42505.360000 0001 2156 6853Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033 USA
| |
Collapse
|
2
|
BEHZADI PAYAM, GAJDÁCS MÁRIÓ. Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology. Eur J Microbiol Immunol (Bp) 2021; 11:77-86. [PMID: 34908533 PMCID: PMC8830413 DOI: 10.1556/1886.2021.00020] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 11/23/2021] [Indexed: 12/25/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RSCB PDB) provides a wide range of digital data regarding biology and biomedicine. This huge internet resource involves a wide range of important biological data, obtained from experiments around the globe by different scientists. The Worldwide Protein Data Bank (wwPDB) represents a brilliant collection of 3D structure data associated with important and vital biomolecules including nucleic acids (RNAs and DNAs) and proteins. Moreover, this database accumulates knowledge regarding function and evolution of biomacromolecules which supports different disciplines such as biotechnology. 3D structure, functional characteristics and phylogenetic properties of biomacromolecules give a deep understanding of the biomolecules' characteristics. An important advantage of the wwPDB database is the data updating time, which is done every week. This updating process helps users to have the newest data and information for their projects. The data and information in wwPDB can be a great support to have an accurate imagination and illustrations of the biomacromolecules in biotechnology. As demonstrated by the SARS-CoV-2 pandemic, rapidly reliable and accessible biological data for microbiology, immunology, vaccinology, and drug development are critical to address many healthcare-related challenges that are facing humanity. The aim of this paper is to introduce the readers to wwPDB, and to highlight the importance of this database in biotechnology, with the expectation that the number of scientists interested in the utilization of Protein Data Bank's resources will increase substantially in the coming years.
Collapse
Affiliation(s)
- PAYAM BEHZADI
- Department of Microbiology, College of Basic Sciences, Shahr-e-Qods Branch, Islamic Azad University, Tehran, 37541-374, Iran
| | - MÁRIÓ GAJDÁCS
- Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, 6720, Szeged, Hungary,*Corresponding author. Tel.: +36-62-342-532. E-mail:
| |
Collapse
|
3
|
Burley SK, Berman HM. Open-access data: A cornerstone for artificial intelligence approaches to protein structure prediction. Structure 2021; 29:515-520. [PMID: 33984281 DOI: 10.1016/j.str.2021.04.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/08/2021] [Accepted: 04/23/2021] [Indexed: 12/28/2022]
Abstract
The Protein Data Bank (PDB) was established in 1971 to archive three-dimensional (3D) structures of biological macromolecules as a public good. Fifty years later, the PDB is providing millions of data consumers around the world with open access to more than 175,000 experimentally determined structures of proteins and nucleic acids (DNA, RNA) and their complexes with one another and small-molecule ligands. PDB data users are working, teaching, and learning in fundamental biology, biomedicine, bioengineering, biotechnology, and energy sciences. They also represent the fields of agriculture, chemistry, physics and materials science, mathematics, statistics, computer science, and zoology, and even the social sciences. The enormous wealth of 3D structure data stored in the PDB has underpinned significant advances in our understanding of protein architecture, culminating in recent breakthroughs in protein structure prediction accelerated by artificial intelligence approaches and deep or machine learning methods.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; The Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
4
|
Krebs FS, Zoete V, Trottet M, Pouchon T, Bovigny C, Michielin O. Swiss-PO: a new tool to analyze the impact of mutations on protein three-dimensional structures for precision oncology. NPJ Precis Oncol 2021; 5:19. [PMID: 33737716 PMCID: PMC7973488 DOI: 10.1038/s41698-021-00156-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
Swiss-PO is a new web tool to map gene mutations on the 3D structure of corresponding proteins and to intuitively assess the structural implications of protein variants for precision oncology. Swiss-PO is constructed around a manually curated database of 3D structures, variant annotations, and sequence alignments, for a list of 50 genes taken from the Ion AmpliSeqTM Custom Cancer Hotspot Panel. The website was designed to guide users in the choice of the most appropriate structure to analyze regarding the mutated residue, the role of the protein domain it belongs to, or the drug that could be selected to treat the patient. The importance of the mutated residue for the structure and activity of the protein can be assessed based on the molecular interactions exchanged with neighbor residues in 3D within the same protein or between different biomacromolecules, its conservation in orthologs, or the known effect of reported mutations in its 3D or sequence-based vicinity. Swiss-PO is available free of charge or login at https://www.swiss-po.ch .
Collapse
Affiliation(s)
- Fanny S Krebs
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
| | - Vincent Zoete
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Maxence Trottet
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Timothée Pouchon
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Christophe Bovigny
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Olivier Michielin
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Department of Oncology, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
5
|
Berman HM, Vallat B, Lawson CL. The data universe of structural biology. IUCRJ 2020; 7:630-638. [PMID: 32695409 PMCID: PMC7340255 DOI: 10.1107/s205225252000562x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 04/21/2020] [Indexed: 05/05/2023]
Abstract
The Protein Data Bank (PDB) has grown from a small data resource for crystallographers to a worldwide resource serving structural biology. The history of the growth of the PDB and the role that the community has played in developing standards and policies are described. This article also illustrates how other biophysics communities are collaborating with the worldwide PDB to create a network of interoperating data resources. This network will expand the capabilities of structural biology and enable the determination and archiving of increasingly complex structures.
Collapse
Affiliation(s)
- Helen M. Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biological Sciences and Bridge Institute, University of Southern California, Los Angeles, CA 90089, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Catherine L. Lawson
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
6
|
Bohn JA, Thummar K, York A, Raymond A, Brown WC, Bieniasz PD, Hatziioannou T, Smith JL. APOBEC3H structure reveals an unusual mechanism of interaction with duplex RNA. Nat Commun 2017; 8:1021. [PMID: 29044109 PMCID: PMC5647330 DOI: 10.1038/s41467-017-01309-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 09/06/2017] [Indexed: 11/08/2022] Open
Abstract
The APOBEC3 family of cytidine deaminases cause lethal hypermutation of retroviruses via deamination of newly reverse-transcribed viral DNA. Their ability to bind RNA is essential for virion infiltration and antiviral activity, yet the mechanisms of viral RNA recognition are unknown. By screening naturally occurring, polymorphic, non-human primate APOBEC3H variants for biological and crystallization properties, we obtained a 2.24-Å crystal structure of pig-tailed macaque APOBEC3H with bound RNA. Here, we report that APOBEC3H forms a dimer around a short RNA duplex and, despite the bound RNA, has potent cytidine deaminase activity. The structure reveals an unusual RNA-binding mode in which two APOBEC3H molecules at opposite ends of a seven-base-pair duplex interact extensively with both RNA strands, but form no protein-protein contacts. CLIP-seq analysis revealed that APOBEC3H preferentially binds to sequences in the viral genome predicted to contain duplexes, a property that may facilitate both virion incorporation and catalytic activity.
Collapse
Affiliation(s)
- Jennifer A Bohn
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Keyur Thummar
- Laboratory of Retrovirology, The Rockefeller University, New York, NY, 10065, USA
- Howard Hughes Medical Institute, The Rockefeller University, New York, NY, 10065, USA
| | - Ashley York
- Laboratory of Retrovirology, The Rockefeller University, New York, NY, 10065, USA
- Howard Hughes Medical Institute, The Rockefeller University, New York, NY, 10065, USA
| | - Alice Raymond
- Laboratory of Retrovirology, The Rockefeller University, New York, NY, 10065, USA
- Howard Hughes Medical Institute, The Rockefeller University, New York, NY, 10065, USA
| | - W Clay Brown
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Paul D Bieniasz
- Laboratory of Retrovirology, The Rockefeller University, New York, NY, 10065, USA
- Howard Hughes Medical Institute, The Rockefeller University, New York, NY, 10065, USA
| | | | - Janet L Smith
- Life Sciences Institute, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
7
|
Carrozzini B, Cascarano GL, Giacovazzo C, Mazzone A. Advances in molecular-replacement procedures: theREVANpipeline. ACTA ACUST UNITED AC 2015; 71:1856-63. [DOI: 10.1107/s1399004715012730] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 07/01/2015] [Indexed: 11/10/2022]
Abstract
TheREVANpipeline aiming at the solution of protein structuresviamolecular replacement (MR) has been assembled. It is the successor toREVA, a pipeline that is particularly efficient when the sequence identity (SI) between the target and the model is greater than 0.30. TheREVANandREVAprocedures coincide when the SI is >0.30, but differ substantially in worse conditions. To treat these cases,REVANcombines a variety of programs and algorithms (REMO09,REFMAC,DM,DSR,VLD,free lunch,Coot,Buccaneerandphenix.autobuild). The MR model, suitably rotated and positioned, is first refined by a standardREFMACrefinement procedure, and the corresponding electron density is then submitted to cycles ofDM–VLD–REFMAC. The nextREFMACapplications exploit the better electron densities obtained at the end of theVLD–EDM sections (a procedure called vector refinement). In order to make the model more similar to the target, the model is submitted to mutations, in whichCootplays a basic role, and it is then cyclically resubmitted toREFMAC–EDM–VLDcycles. The phases thus obtained are submitted tofree lunchand allow most of the test structures studied by DiMaioet al.[(2011),Nature (London),473, 540–543] to be solved without using energy-guided programs.
Collapse
|
8
|
Huang YH, Rose PW, Hsu CN. Citing a Data Repository: A Case Study of the Protein Data Bank. PLoS One 2015; 10:e0136631. [PMID: 26317409 PMCID: PMC4552849 DOI: 10.1371/journal.pone.0136631] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 08/06/2015] [Indexed: 12/01/2022] Open
Abstract
The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies. The PDB’s large corpus of data (> 100,000 structures) and related citations provide a well-organized and extensive test set for developing and understanding data citation and access metrics. In this paper, we present a systematic investigation of how authors cite PDB as a data repository. We describe a novel metric based on information cascade constructed by exploring the citation network to measure influence between competing works and apply that to analyze different data citation practices to PDB. Based on this new metric, we found that the original publication of RCSB PDB in the year 2000 continues to attract most citations though many follow-up updates were published. None of these follow-up publications by members of the wwPDB organization can compete with the original publication in terms of citations and influence. Meanwhile, authors increasingly choose to use URLs of PDB in the text instead of citing PDB papers, leading to disruption of the growth of the literature citations. A comparison of data usage statistics and paper citations shows that PDB Web access is highly correlated with URL mentions in the text. The results reveal the trend of how authors cite a biomedical data repository and may provide useful insight of how to measure the impact of a data repository.
Collapse
Affiliation(s)
- Yi-Hung Huang
- Department of Computer Science, National Taiwan University, Taipei 106, Taiwan
- Intel-NTU Connected Context Computing Center, National Taiwan University, Taipei 106, Taiwan
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputer Center, UC San Diego, La Jolla, CA 92093, United States of America
| | - Chun-Nan Hsu
- Division of Biomedical Informatics, Department of Medicine, UC San Diego, La Jolla, CA 92093, United States of America
- * E-mail:
| |
Collapse
|
9
|
Ding HJ, Oikonomou CM, Jensen GJ. The Caltech Tomography Database and Automatic Processing Pipeline. J Struct Biol 2015; 192:279-86. [PMID: 26087141 DOI: 10.1016/j.jsb.2015.06.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Revised: 06/11/2015] [Accepted: 06/13/2015] [Indexed: 10/23/2022]
Abstract
Here we describe the Caltech Tomography Database and automatic image processing pipeline, designed to process, store, display, and distribute electron tomographic data including tilt-series, sample information, data collection parameters, 3D reconstructions, correlated light microscope images, snapshots, segmentations, movies, and other associated files. Tilt-series are typically uploaded automatically during collection to a user's "Inbox" and processed automatically, but can also be entered and processed in batches via scripts or file-by-file through an internet interface. As with the video website YouTube, each tilt-series is represented on the browsing page with a link to the full record, a thumbnail image and a video icon that delivers a movie of the tomogram in a pop-out window. Annotation tools allow users to add notes and snapshots. The database is fully searchable, and sets of tilt-series can be selected and re-processed, edited, or downloaded to a personal workstation. The results of further processing and snapshots of key results can be recorded in the database, automatically linked to the appropriate tilt-series. While the database is password-protected for local browsing and searching, datasets can be made public and individual files can be shared with collaborators over the Internet. Together these tools facilitate high-throughput tomography work by both individuals and groups.
Collapse
Affiliation(s)
- H Jane Ding
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States
| | - Catherine M Oikonomou
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States
| | - Grant J Jensen
- Division of Biology, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, United States; Howard Hughes Medical Institute, United States.
| |
Collapse
|
10
|
Oldfield T. Mathematical Data Mining. CRYSTALLOGR REV 2014. [DOI: 10.1080/08893110410001664909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Inhester T, Rarey M. Protein-ligand interaction databases: advanced tools to mine activity data and interactions on a structural level. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1192] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Therese Inhester
- Center for Bioinformatics; University of Hamburg; Hamburg Germany
| | - Matthias Rarey
- Center for Bioinformatics; University of Hamburg; Hamburg Germany
| |
Collapse
|
12
|
Ali H, Urolagin S, Gurarslan Ö, Vihinen M. Performance of Protein Disorder Prediction Programs on Amino Acid Substitutions. Hum Mutat 2014; 35:794-804. [DOI: 10.1002/humu.22564] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/04/2014] [Indexed: 01/04/2023]
Affiliation(s)
- Heidi Ali
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Siddhaling Urolagin
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
| | - Ömer Gurarslan
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
| | - Mauno Vihinen
- Institute of Biomedical Technology; FI-33014 University of Tampere; Tampere Finland
- BioMediTech; Tampere Finland
- Department of Experimental Medical Science; Lund University; SE-22184 Lund Sweden
- Tampere University Hospital; Tampere Finland
| |
Collapse
|
13
|
Espinosa O, Mitsopoulos K, Hakas J, Pearl F, Zvelebil M. Deriving a mutation index of carcinogenicity using protein structure and protein interfaces. PLoS One 2014; 9:e84598. [PMID: 24454733 PMCID: PMC3893166 DOI: 10.1371/journal.pone.0084598] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 11/16/2013] [Indexed: 11/29/2022] Open
Abstract
With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/.
Collapse
Affiliation(s)
- Octavio Espinosa
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Konstantinos Mitsopoulos
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Jarle Hakas
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| | - Frances Pearl
- UK Cancer Therapeutics Unit, The Institute of Cancer Research, London, United Kingdom
- Translational Drug Discovery Group, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Marketa Zvelebil
- Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, United Kingdom
| |
Collapse
|
14
|
Berman HM. Creating a community resource for protein science. Protein Sci 2012; 21:1587-96. [PMID: 22969036 PMCID: PMC3527698 DOI: 10.1002/pro.2154] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 08/30/2012] [Indexed: 12/13/2022]
Abstract
In addition to being one of the early pioneers in protein crystallography, Carl Brändén made significant contributions to science education with his elegant and beautifully illustrated book Introduction to Protein Structure (Brändén and Tooze, New York: Garland, 1991). It is truly an honor to receive this award in their names. This award and the 40th anniversary of the Protein Data Bank (PDB; Berman et al., Structure 2012;20:391-396) have given me an opportunity to reflect on the various components that have contributed to building a resource for protein science and to try to quantify the impact of having PDB data openly available.
Collapse
Affiliation(s)
- Helen M Berman
- Department of Chemistry and Chemical Biology, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA.
| |
Collapse
|
15
|
Saito M, Takemura N, Shirai T. Classification of ligand molecules in PDB with fast heuristic graph match algorithm COMPLIG. J Mol Biol 2012; 424:379-90. [PMID: 23041414 DOI: 10.1016/j.jmb.2012.10.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 09/15/2012] [Accepted: 10/01/2012] [Indexed: 11/29/2022]
Abstract
A fast heuristic graph-matching algorithm, COMPLIG, was devised to classify the small-molecule ligands in the Protein Data Bank (PDB), which are currently not properly classified on structure basis. By concurrently classifying proteins and ligands, we determined the most appropriate parameter for categorizing ligands to be more than 60% identity of atoms and bonds between molecules, and we classified 11,585 types of ligands into 1946 clusters. Although the large clusters were composed of nucleotides or amino acids, a significant presence of drug compounds was also observed. Application of the system to classify the natural ligand status of human proteins in the current database suggested that, at most, 37% of the experimental structures of human proteins were in complex with natural ligands. However, protein homology- and/or ligand similarity-based modeling was implied to provide models of natural interactions for an additional 28% of the total, which might be used to increase the knowledge of intrinsic protein-metabolite interactions.
Collapse
Affiliation(s)
- Mihoko Saito
- Nagahama Institute of Bioscience and Technology and Bioinformatics Research Division, Japan Science and Technology Agency, Nagahama, Shiga 526-0829, Japan
| | | | | |
Collapse
|
16
|
Müller H, Freytag JC, Leser U. Improving data quality by source analysis. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2012. [DOI: 10.1145/2107536.2107538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In many domains, data cleaning is hampered by our limited ability to specify a comprehensive set of integrity constraints to assist in identification of erroneous data. An alternative approach to improve data quality is to exploit different data sources that contain information about the same set of objects. Such overlapping sources highlight hot-spots of poor data quality through conflicting data values and immediately provide alternative values for conflict resolution. In order to derive a dataset of high quality, we can merge the overlapping sources based on a quality assessment of the conflicting values. The quality of the resulting dataset, however, is highly dependent on our ability to asses the quality of conflicting values effectively.
The main objective of this article is to introduce methods that aid the developer of an integrated system over overlapping, but contradicting sources in the task of improving the quality of data. Value conflicts between contradicting sources are often systematic, caused by some characteristic of the different sources. Our goal is to identify such systematic differences and outline data patterns that occur in conjunction with them. Evaluated by an expert user, the regularities discovered provide insights into possible conflict reasons and help to assess the quality of inconsistent values. The contributions of this article are two concepts of systematic conflicts: contradiction patterns and minimal update sequences. Contradiction patterns resemble a special form of association rules that summarize characteristic data properties for conflict occurrence. We adapt existing association rule mining algorithms for mining contradiction patterns. Contradiction patterns, however, view each class of conflicts in isolation, sometimes leading to largely overlapping patterns. Sequences of set-oriented update operations that transform one data source into the other are compact descriptions for all regular differences among the sources. We consider minimal update sequences as the most likely explanation for observed differences between overlapping data sources. Furthermore, the order of operations within the sequences point out potential dependencies between systematic differences. Finding minimal update sequences, however, is beyond reach in practice. We show that the problem already is NP-complete for a restricted set of operations. In the light of this intractability result, we present heuristics that lead to convincing results for all examples we considered.
Collapse
Affiliation(s)
| | | | - Ulf Leser
- Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
17
|
Abstract
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds-particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the most specialized user, dealing with specific families, diseases, structural features and so on.
Collapse
|
18
|
Fischer JD, Holliday GL, Thornton JM. The CoFactor database: organic cofactors in enzyme catalysis. ACTA ACUST UNITED AC 2010; 26:2496-7. [PMID: 20679331 PMCID: PMC2944199 DOI: 10.1093/bioinformatics/btq442] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. Results: CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. Availability: The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor Contact:julia.fischer@ebi.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julia D Fischer
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
19
|
Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010; 67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Collapse
Affiliation(s)
- Ursula Hinz
- Swiss-Prot Group, Swiss Institute of Bioinformatics, 1 rue Michel Servet, 1211, Geneva, Switzerland.
| |
Collapse
|
20
|
Laskowski RA. Protein structure databases. Methods Mol Biol 2010; 609:59-82. [PMID: 20221913 DOI: 10.1007/978-1-60327-241-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses, and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds--particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the more specialized user, dealing with specific families, diseases, structural features, and so on.
Collapse
Affiliation(s)
- Roman A Laskowski
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
21
|
Lopez D, Pazos F. Gene ontology functional annotations at the structural domain level. Proteins 2009; 76:598-607. [PMID: 19241468 DOI: 10.1002/prot.22373] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Most proteins are organized in domains which can be seen as independent modular units in terms of molecular function (MF). Nevertheless, current functional annotations are done on a "whole-chain" basis without associating specific functions to the individual domains. We present here an automatic method for discerning which particular structural domain within a protein is responsible for a given MF originally attributed to the whole protein. By annotating the SCOP structural domains with gene ontology terms using this method, we obtained the first large-scale functional annotation at the domain level. We performed a large-scale comparison of these annotations with the ones implicit in the functional annotations of Interpro signatures, showing that the performance of this method is globally better. We also discuss in detail some particular examples. Generated automatically and available online, this resource could be the basis for future manually curated annotations.
Collapse
Affiliation(s)
- Daniel Lopez
- National Centre for Biotechnology, Madrid, Spain
| | | |
Collapse
|
22
|
Prieto C, De Las Rivas J. Structural domain-domain interactions: Assessment and comparison with protein-protein interaction data to improve the interactome. Proteins 2009; 78:109-17. [DOI: 10.1002/prot.22569] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
23
|
Maupetit J, Tuffery P, Derreumaux P. A coarse-grained protein force field for folding and structure prediction. Proteins 2009; 69:394-408. [PMID: 17600832 DOI: 10.1002/prot.21505] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.
Collapse
Affiliation(s)
- Julien Maupetit
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0346, Université Paris 7, Tour 53-54, 2 place Jussieu, 75251 Paris, Cedex 05, France
| | | | | |
Collapse
|
24
|
Kirchmair J, Markt P, Distinto S, Schuster D, Spitzer GM, Liedl KR, Langer T, Wolber G. The Protein Data Bank (PDB), its related services and software tools as key components for in silico guided drug discovery. J Med Chem 2009; 51:7021-40. [PMID: 18975926 DOI: 10.1021/jm8005977] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Johannes Kirchmair
- Department of Pharmaceutical Chemistry, Faculty of Chemistry and Pharmacy and Center for Molecular Biosciences, University of Innsbruck, Innrain 52, A-6020 Innsbruck, Austria
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Floris M, Orsini M, Thanaraj TA. Splice-mediated Variants of Proteins (SpliVaP) - data and characterization of changes in signatures among protein isoforms due to alternative splicing. BMC Genomics 2008; 9:453. [PMID: 18831736 PMCID: PMC2573899 DOI: 10.1186/1471-2164-9-453] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 10/02/2008] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND It is often the case that mammalian genes are alternatively spliced; the resulting alternate transcripts often encode protein isoforms that differ in amino acid sequences. Changes among the protein isoforms can alter the cellular properties of proteins. The effect can range from a subtle modulation to a complete loss of function. RESULTS (i) We examined human splice-mediated protein isoforms (as extracted from a manually curated data set, and from a computationally predicted data set) for differences in the annotation for protein signatures (Pfam domains and PRINTS fingerprints) and we characterized the differences & their effects on protein functionalities. An important question addressed relates to the extent of protein isoforms that may lack any known function in the cell. (ii) We present a database that reports differences in protein signatures among human splice-mediated protein isoform sequences. CONCLUSION (i) Characterization: The work points to distinct sets of alternatively spliced genes with varying degrees of annotation for the splice-mediated protein isoforms. Protein molecular functions seen to be often affected are those that relate to: binding, catalytic, transcription regulation, structural molecule, transporter, motor, and antioxidant; and the processes that are often affected are nucleic acid binding, signal transduction, and protein-protein interactions. Signatures are often included/excluded and truncated in length among protein isoforms; truncation is seen as the predominant type of change. Analysis points to the following novel aspects: (a) Analysis using data from the manually curated Vega indicates that one in 8.9 genes can lead to a protein isoform of no "known" function; and one in 18 expressed protein isoforms can be such an "orphan" isoform; the corresponding numbers as seen with computationally predicted ASD data set are: one in 4.9 genes and one in 9.8 isoforms. (b) When swapping of signatures occurs, it is often between those of same functional classifications. (c) Pfam domains can occur in varying lengths, and PRINTS fingerprints can occur with varying number of constituent motifs among isoforms - since such a variation is seen in large number of genes, it could be a general mechanism to modulate protein function. (ii) DATA The reported resource (at http://www.bioinformatica.crs4.org/tools/dbs/splivap/) provides the community ability to access data on splice-mediated protein isoforms (with value-added annotation such as association with diseases) through changes in protein signatures.
Collapse
Affiliation(s)
- Matteo Floris
- CRS4-Bioinformatica, Parco Scientifico e Technologico, POLARIS, Edificio 3, 09010 PULA (CA), Sardinia, Italy.
| | | | | |
Collapse
|
26
|
Golovin A, Henrick K. MSDmotif: exploring protein sites and motifs. BMC Bioinformatics 2008; 9:312. [PMID: 18637174 PMCID: PMC2491636 DOI: 10.1186/1471-2105-9-312] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2008] [Accepted: 07/17/2008] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Protein structures have conserved features - motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. RESULTS We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, phi/psi sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. CONCLUSION MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.
Collapse
Affiliation(s)
- Adel Golovin
- EMBL Outstation, The European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | |
Collapse
|
27
|
Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ever increasing number of protein structures determined by structural genomic projects has spurred much interest in the development of methods for structure-based function prediction. Existing methods can be roughly classified in two groups: some use a comparative approach looking for the presence of structural motifs possibly associated with a known biochemical function. Other methods try to identify functional patches on the surface of a protein using only its physicochemical characteristics. This review will cover both kinds of approaches to structure-based function prediction as well as their use in real-world cases. The main issues and limitations in using protein structure to predict function will also be discussed. These are mainly: the assessment of the statistical significance of structural similarities and the extent to which these methods depend on the accuracy and availability of structural data.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Department of Biology, Centre for Molecular Bioinformatics, University of Tor Vergata, Rome, Italy.
| | | |
Collapse
|
28
|
Manning JR, Jefferson ER, Barton GJ. The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction. BMC Bioinformatics 2008; 9:51. [PMID: 18221517 PMCID: PMC2267696 DOI: 10.1186/1471-2105-9-51] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 01/25/2008] [Indexed: 11/21/2022] Open
Abstract
Background Amino acids responsible for structure, core function or specificity may be inferred from multiple protein sequence alignments where a limited set of residue types are tolerated. The rise in available protein sequences continues to increase the power of techniques based on this principle. Results A new algorithm, SMERFS, for predicting protein functional sites from multiple sequences alignments was compared to 14 conservation measures and to the MINER algorithm. Validation was performed on an automatically generated dataset of 1457 families derived from the protein interactions database SNAPPI-DB, and a smaller manually curated set of 148 families. The best performing measure overall was Williamson property entropy, with ROC0.1 scores of 0.0087 and 0.0114 for domain and small molecule contact prediction, respectively. The Lancet method performed worse than random on protein-protein interaction site prediction (ROC0.1 score of 0.0008). The SMERFS algorithm gave similar accuracy to the phylogenetic tree-based MINER algorithm but was superior to Williamson in prediction of non-catalytic transient complex interfaces. SMERFS predicts sites that are significantly more solvent accessible compared to Williamson. Conclusion Williamson property entropy is the the best performing of 14 conservation measures examined. The difference in performance of SMERFS relative to Williamson in manually defined complexes was dependent on complex type. The best choice of analysis method is therefore dependent on the system of interest. Additional computation employed by Miner in calculation of phylogenetic trees did not produce improved results over SMERFS. SMERFS performance was improved by use of windows over alignment columns, illustrating the necessity of considering the local environment of positions when assessing their functional significance.
Collapse
|
29
|
Diella F, Gould CM, Chica C, Via A, Gibson TJ. Phospho.ELM: a database of phosphorylation sites--update 2008. Nucleic Acids Res 2007; 36:D240-4. [PMID: 17962309 PMCID: PMC2238828 DOI: 10.1093/nar/gkm772] [Citation(s) in RCA: 182] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Phospho.ELM is a manually curated database of eukaryotic phosphorylation sites. The resource includes data collected from published literature as well as high-throughput data sets. The current release of Phospho.ELM (version 7.0, July 2007) contains 4078 phospho-protein sequences covering 12 025 phospho-serine, 2362 phospho-threonine and 2083 phospho-tyrosine sites. The entries provide information about the phosphorylated proteins and the exact position of known phosphorylated instances, the kinases responsible for the modification (where known) and links to bibliographic references. The database entries have hyperlinks to easily access further information from UniProt, PubMed, SMART, ELM, MSD as well as links to the protein interaction databases MINT and STRING. A new BLAST search tool, complementary to retrieval by keyword and UniProt accession number, allows users to submit a protein query (by sequence or UniProt accession) to search against the curated data set of phosphorylated peptides. Phospho.ELM is available on line at: http://phospho.elm.eu.org
Collapse
Affiliation(s)
- Francesca Diella
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany and Center for Molecular Bioinformatics, Dept. of Biology, Tor Vergata University, Rome, Italy
| | - Cathryn M. Gould
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany and Center for Molecular Bioinformatics, Dept. of Biology, Tor Vergata University, Rome, Italy
| | - Claudia Chica
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany and Center for Molecular Bioinformatics, Dept. of Biology, Tor Vergata University, Rome, Italy
| | - Allegra Via
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany and Center for Molecular Bioinformatics, Dept. of Biology, Tor Vergata University, Rome, Italy
| | - Toby J. Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany and Center for Molecular Bioinformatics, Dept. of Biology, Tor Vergata University, Rome, Italy
- * To whom correspondence should be addressed.+49 6221 3878398+49 6221 3878517
| |
Collapse
|
30
|
Abstract
UNLABELLED The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is carried out by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations. AVAILABILITY http://www.efamily.org.uk/software/dasclients/spice/
Collapse
Affiliation(s)
- Andreas Prlić
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge, UK.
| | | | | |
Collapse
|
31
|
Sorzano COS, Jonic S, Cottevieille M, Larquet E, Boisset N, Marco S. 3D electron microscopy of biological nanomachines: principles and applications. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 36:995-1013. [PMID: 17611751 DOI: 10.1007/s00249-007-0203-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2007] [Revised: 06/01/2007] [Accepted: 06/11/2007] [Indexed: 11/21/2022]
Abstract
Transmission electron microscopy is a powerful technique for studying the three-dimensional (3D) structure of a wide range of biological specimens. Knowledge of this structure is crucial for fully understanding complex relationships among macromolecular complexes and organelles in living cells. In this paper, we present the principles and main application domains of 3D transmission electron microscopy in structural biology. Moreover, we survey current developments needed in this field, and discuss the close relationship of 3D transmission electron microscopy with other experimental techniques aimed at obtaining structural and dynamical information from the scale of whole living cells to atomic structure of macromolecular complexes.
Collapse
Affiliation(s)
- C O S Sorzano
- Bioengineering Lab, Escuela Politécnica Superior, Univ. San Pablo CEU, Campus Urb, Montepríncipe s/n, 28668, Boadilla del Monte, Madrid, Spain.
| | | | | | | | | | | |
Collapse
|
32
|
Plewczynski D, Hoffmann M, von Grotthuss M, Ginalski K, Rychewski L. In silico prediction of SARS protease inhibitors by virtual high throughput screening. Chem Biol Drug Des 2007; 69:269-79. [PMID: 17461975 PMCID: PMC7188353 DOI: 10.1111/j.1747-0285.2007.00475.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A structure‐based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein–ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High‐Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta‐Database. Selected molecules having close structural relationship to a 2‐methyl‐2,4‐pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.
Collapse
Affiliation(s)
- Dariusz Plewczynski
- Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego 5a Street, 02-106 Warsaw, Poland.
| | | | | | | | | |
Collapse
|
33
|
Bhalla J, Storchan GB, MacCarthy CM, Uversky VN, Tcherkasskaya O. Local flexibility in molecular function paradigm. Mol Cell Proteomics 2006; 5:1212-23. [PMID: 16571897 DOI: 10.1074/mcp.m500315-mcp200] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
It is generally accepted that the functional activity of biological macromolecules requires tightly packed three-dimensional structures. Recent theoretical and experimental evidence indicates, however, the importance of molecular flexibility for the proper functioning of some proteins. We examined high resolution structures of proteins in various functional categories with respect to the secondary structure assessment. The latter was considered as a characteristic of the inherent flexibility of a polypeptide chain. We found that the proteins in functionally competent conformational states might be comprised of 20-70% flexible residues. For instance, proteins involved in gene regulation, e.g. transcription factors, are on average largely disordered molecules with over 60% of amino acids residing in "coiled" configurations. In contrast, oxygen transporters constitute a class of relatively rigid molecules with only 30% of residues being locally flexible. Phylogenic comparison of a large number of protein families with respect to the propagation of secondary structure illuminates the growing role of the local flexibility in organisms of greater complexity. Furthermore the local flexibility in protein molecules appears to be dependent on the molecular confinement and is essentially larger in extracellular proteins.
Collapse
Affiliation(s)
- Jag Bhalla
- Biochemistry and Molecular & Cellular Biology, Georgetown University School of Medicine, Washington, DC 20007, USA
| | | | | | | | | |
Collapse
|
34
|
Rother K, Michalsky E, Leser U. How well are protein structures annotated in secondary databases? Proteins 2006; 60:571-6. [PMID: 16021624 DOI: 10.1002/prot.20520] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We investigated to what extent Protein Data Bank (PDB) entries are annotated with second-party information based on existing cross-references between PDB and 15 other databases. We report 2 interesting findings. First, there is a clear "annotation gap" for structures less than 7 years old for secondary databases that are manually curated. Second, the examined databases overlap with each other quite well, dividing the PDB into 2 well-annotated thirds and one poorly annotated third. Both observations should be taken into account in any study depending on the selection of protein structures by their annotation.
Collapse
Affiliation(s)
- Kristian Rother
- Berlin Center of Genome-Based Bioinformatics (BCB), Institute of Biochemistry at the Charité, Humboldt Universität Berlin, Berlin, Germany.
| | | | | |
Collapse
|
35
|
Block P, Sotriffer CA, Dramburg I, Klebe G. AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB. Nucleic Acids Res 2006; 34:D522-6. [PMID: 16381925 PMCID: PMC1347402 DOI: 10.1093/nar/gkj039] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
AffinDB is a database of affinity data for structurally resolved protein–ligand complexes from the Protein Data Bank (PDB). It is freely accessible at . Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein–ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH value of the measurement, ligand molecular weight, and publication data (author, journal and year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design.
Collapse
Affiliation(s)
| | | | | | - Gerhard Klebe
- To whom correspondence should be addressed. Tel: +49 6421 2821313; Fax: +49 6421 2828994;
| |
Collapse
|
36
|
Gold ND, Jackson RM. Fold Independent Structural Comparisons of Protein–Ligand Binding Sites for Exploring Functional Relationships. J Mol Biol 2006; 355:1112-24. [PMID: 16359705 DOI: 10.1016/j.jmb.2005.11.044] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Revised: 11/11/2005] [Accepted: 11/15/2005] [Indexed: 11/23/2022]
Abstract
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Collapse
Affiliation(s)
- Nicola D Gold
- Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK
| | | |
Collapse
|
37
|
Golovin A, Dimitropoulos D, Oldfield T, Rachedi A, Henrick K. MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites. Proteins 2006; 58:190-9. [PMID: 15468317 DOI: 10.1002/prot.20288] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. For each bound molecule the biological assembly of the quaternary structure has been used to determine all contact residues and a fast interactive search and retrieval system has been developed. Prosite pattern and short sequence search options are available together with a novel graphical query generator for inter-residue contacts. The database and its query interface are accessible from the Internet through a web server located at: http://www.ebi.ac.uk/msd-srv/msdsite.
Collapse
Affiliation(s)
- Adel Golovin
- EMBL Outstation, The European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | | | | | | |
Collapse
|
38
|
Park SH, Ryu KH, Gilbert D. Fast similarity search for protein 3D structures using topological pattern matching based on spatial relations. Int J Neural Syst 2005; 15:287-96. [PMID: 16187404 DOI: 10.1142/s0129065705000244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.
Collapse
Affiliation(s)
- Sung-Hee Park
- Database Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Cheongju, 361-763, Korea.
| | | | | |
Collapse
|
39
|
Unser M, Sorzano C, Thévenaz P, Jonić S, El-Bez C, De Carlo S, Conway J, Trus B. Spectral signal-to-noise ratio and resolution assessment of 3D reconstructions. J Struct Biol 2005; 149:243-55. [PMID: 15721578 PMCID: PMC1464087 DOI: 10.1016/j.jsb.2004.10.011] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2004] [Revised: 09/22/2004] [Indexed: 11/20/2022]
Abstract
Measuring the quality of three-dimensional (3D) reconstructed biological macromolecules by transmission electron microscopy is still an open problem. In this article, we extend the applicability of the spectral signal-to-noise ratio (SSNR) to the evaluation of 3D volumes reconstructed with any reconstruction algorithm. The basis of the method is to measure the consistency between the data and a corresponding set of reprojections computed for the reconstructed 3D map. The idiosyncrasies of the reconstruction algorithm are taken explicitly into account by performing a noise-only reconstruction. This results in the definition of a 3D SSNR which provides an objective indicator of the quality of the 3D reconstruction. Furthermore, the information to build the SSNR can be used to produce a volumetric SSNR (VSSNR). Our method overcomes the need to divide the data set in two. It also provides a direct measure of the performance of the reconstruction algorithm itself; this latter information is typically not available with the standard resolution methods which are primarily focused on reproducibility alone.
Collapse
Affiliation(s)
- M. Unser
- Biomedical Imaging Group, Swiss Federal Institute of Technology Lausanne, CH-1015 Lausanne VD, Switzerland
| | - C.O.S. Sorzano
- Biomedical Imaging Group, Swiss Federal Institute of Technology Lausanne, CH-1015 Lausanne VD, Switzerland
- Escuela Politécnica Superior, Universidad San Pablo-CEU, Campus Urb. Montepríncipe s/n, 28668 Boadilla del Monte, Madrid, Spain
- Biocomputing Unit, National Center of Biotechnology (CSIC), Campus Univ. Autónoma s/n, 28047 Cantoblanco, Madrid, Spain
- Corresponding author. Fax: +34 91 585 4506. E-mail address: (C.O.S. Sorzano)
| | - P Thévenaz
- Biomedical Imaging Group, Swiss Federal Institute of Technology Lausanne, CH-1015 Lausanne VD, Switzerland
| | - S. Jonić
- Biomedical Imaging Group, Swiss Federal Institute of Technology Lausanne, CH-1015 Lausanne VD, Switzerland
| | - C. El-Bez
- Laboratoire d’analyse ultrastructurale, Université de Lausanne, CH-1015 Lausanne VD, Switzerland
| | - S. De Carlo
- Department of Molecular and Cell Biology, Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - J.F. Conway
- Laboratoire de Microscopie Electronique Structurale, Institut de Biologie Structurale, 41 rue Jules Horowitz, 38027 Grenoble, Cedex 1, France
| | - B.L. Trus
- Imaging Sciences Laboratory, Center of Information Technology (NIH/DHHS), 12 Center Drive, MSC 5624, Bethesda, MD 20892-5624, USA
| |
Collapse
|
40
|
Arzt S, Beteva A, Cipriani F, Delageniere S, Felisaz F, Förstner G, Gordon E, Launer L, Lavault B, Leonard G, Mairs T, McCarthy A, McCarthy J, McSweeney S, Meyer J, Mitchell E, Monaco S, Nurizzo D, Ravelli R, Rey V, Shepard W, Spruce D, Svensson O, Theveneau P. Automation of macromolecular crystallography beamlines. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2005; 89:124-52. [PMID: 15910915 DOI: 10.1016/j.pbiomolbio.2004.09.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The production of three-dimensional crystallographic structural information of macromolecules can now be thought of as a pipeline which is being streamlined at every stage from protein cloning, expression and purification, through crystallisation to data collection and structure solution. Synchrotron X-ray beamlines are a key section of this pipeline as it is at these that the X-ray diffraction data that ultimately leads to the elucidation of macromolecular structures are collected. The burgeoning number of macromolecular crystallography (MX) beamlines available worldwide may be enhanced significantly with the automation of both their operation and of the experiments carried out on them. This paper reviews the current situation and provides a glimpse of how a MX beamline may look in the not too distant future.
Collapse
Affiliation(s)
- Steffi Arzt
- European Synchrotron Radiation Facility, 6 rue Jules Horowitz, Zip 38000, BP 220, F-38043 Grenoble Cedex, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Yang ZR, Thomson R, McNeil P, Esnouf RM. RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 2005; 21:3369-76. [PMID: 15947016 DOI: 10.1093/bioinformatics/bti534] [Citation(s) in RCA: 477] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Recent studies have found many proteins containing regions that do not form well-defined three-dimensional structures in their native states. The study and detection of such disordered regions is important both for understanding protein function and for facilitating structural analysis since disordered regions may affect solubility and/or crystallizability. RESULTS We have developed the regional order neural network (RONN) software as an application of our recently developed 'bio-basis function neural network' pattern recognition algorithm to the detection of natively disordered regions in proteins. The results of blind-testing a panel of nine disorder prediction tools (including RONN) against 80 protein sequences derived from the Protein Data Bank shows that, based on the probability excess measure, RONN performed the best.
Collapse
Affiliation(s)
- Zheng Rong Yang
- School of Engineering and Computer Science, Exeter University, Exeter EX4 4QF, UK
| | | | | | | |
Collapse
|
42
|
Michalsky E, Dunkel M, Goede A, Preissner R. SuperLigands - a database of ligand structures derived from the Protein Data Bank. BMC Bioinformatics 2005; 6:122. [PMID: 15943884 PMCID: PMC1173082 DOI: 10.1186/1471-2105-6-122] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2005] [Accepted: 05/19/2005] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Currently, the PDB contains approximately 29,000 protein structures comprising over 70,000 experimentally determined three-dimensional structures of over 5,000 different low molecular weight compounds. Information about these PDB ligands can be very helpful in the field of molecular modelling and prediction, particularly for the prediction of protein binding sites and function. DESCRIPTION Here we present an Internet accessible database delivering PDB ligands in the MDL Mol file format which, in contrast to the PDB format, includes information about bond types. Structural similarity of the compounds can be detected by calculation of Tanimoto coefficients and by three-dimensional superposition. Topological similarity of PDB ligands to known drugs can be assessed via Tanimoto coefficients. CONCLUSION SuperLigands supplements the set of existing resources of information about small molecules bound to PDB structures. Allowing for three-dimensional comparison of the compounds as a novel feature, this database represents a valuable means of analysis and prediction in the field of biological and medical research.
Collapse
Affiliation(s)
- Elke Michalsky
- BCB (Berlin Center for Genome Based Bioinformatics) at Institute of Biochemistry, Charité (University Medicine Berlin), Monbijoustr. 2, 10117 Berlin, Germany
| | - Mathias Dunkel
- BCB (Berlin Center for Genome Based Bioinformatics) at Institute of Biochemistry, Charité (University Medicine Berlin), Monbijoustr. 2, 10117 Berlin, Germany
| | - Andrean Goede
- BCB (Berlin Center for Genome Based Bioinformatics) at Institute of Biochemistry, Charité (University Medicine Berlin), Monbijoustr. 2, 10117 Berlin, Germany
| | - Robert Preissner
- BCB (Berlin Center for Genome Based Bioinformatics) at Institute of Biochemistry, Charité (University Medicine Berlin), Monbijoustr. 2, 10117 Berlin, Germany
| |
Collapse
|
43
|
Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud K, Phan I, Gattiker A, Kulikova T, Faruque N, Duggan K, Mclaren P, Reimholz B, Duret L, Penel S, Reuter I, Apweiler R. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 2005; 33:D297-302. [PMID: 15608201 PMCID: PMC539993 DOI: 10.1093/nar/gki039] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
Collapse
Affiliation(s)
- Paul Kersey
- The EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Trißl S, Rother K, Müller H, Steinke T, Koch I, Preissner R, Frömmel C, Leser U. Columba: an integrated database of proteins, structures, and annotations. BMC Bioinformatics 2005; 6:81. [PMID: 15801979 PMCID: PMC1087474 DOI: 10.1186/1471-2105-6-81] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2004] [Accepted: 03/31/2005] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. DESCRIPTION COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. CONCLUSION The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
Collapse
Affiliation(s)
- Silke Trißl
- Institute of Informatics, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| | - Kristian Rother
- Institute of Biochemistry, Charité Universitätsmedizin Berlin, Monbijoustraß e 2a, 10117 Berlin, Germany
| | - Heiko Müller
- Institute of Informatics, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| | - Thomas Steinke
- Zuse Institute Berlin, Takustrasse 7, 14195 Berlin, Germany
| | - Ina Koch
- Technische Fachhochschule Berlin, Seestr. 64, 13347 Berlin, Germany
| | - Robert Preissner
- Institute of Biochemistry, Charité Universitätsmedizin Berlin, Monbijoustraß e 2a, 10117 Berlin, Germany
| | - Cornelius Frömmel
- Institute of Biochemistry, Charité Universitätsmedizin Berlin, Monbijoustraß e 2a, 10117 Berlin, Germany
| | - Ulf Leser
- Institute of Informatics, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
| |
Collapse
|
45
|
Abstract
Arginine is an abundant residue in protein-protein interfaces. The importance of this residue relates to the versatility of its side chain in intermolecular interactions. Different classes of protein-protein interfaces were surveyed for cation-pi interactions. Approximately half of the protein complexes and one-third of the homodimers analyzed were found to contain at least one intermolecular cation-pi pair. Interactions between arginine and tyrosine were found to be the most abundant. The electrostatic interaction energy was calculated to be approximately 3 kcal/mol, on average. A distance-based search of guanidinium:aromatic interactions was also performed using the Macromolecular Structure Database (MSD). This search revealed that half of the guanidinium:aromatic pairs pack in a coplanar manner. Furthermore, it was found that the cationic group of the cation-pi pair is frequently involved in intermolecular hydrogen bonds. In this manner the arginine side chain can participate in multiple interactions, providing a mechanism for inter-protein specificity. Thus, the cation-pi interaction is established as an important contributor to protein-protein interfaces.
Collapse
Affiliation(s)
- Peter B Crowley
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal.
| | | |
Collapse
|
46
|
Morris RJ, Najmanovich RJ, Kahraman A, Thornton JM. Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics 2005; 21:2347-55. [PMID: 15728116 DOI: 10.1093/bioinformatics/bti337] [Citation(s) in RCA: 145] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION An increasing number of protein structures are being determined for which no biochemical characterization is available. The analysis of protein structure and function assignment is becoming an unexpected challenge and a major bottleneck towards the goal of well-annotated genomes. As shape plays a crucial role in biomolecular recognition and function, the examination and development of shape description and comparison techniques is likely to be of prime importance for understanding protein structure-function relationships. RESULTS A novel technique is presented for the comparison of protein binding pockets. The method uses the coefficients of a real spherical harmonics expansion to describe the shape of a protein's binding pocket. Shape similarity is computed as the L2 distance in coefficient space. Such comparisons in several thousands per second can be carried out on a standard linux PC. Other properties such as the electrostatic potential fit seamlessly into the same framework. The method can also be used directly for describing the shape of proteins and other molecules. AVAILABILITY A limited version of the software for the real spherical harmonics expansion of a set of points in PDB format is freely available upon request from the authors. Binding pocket comparisons and ligand prediction will be made available through the protein structure annotation pipeline Profunc (written by Roman Laskowski) which will be accessible from the EBI website shortly.
Collapse
Affiliation(s)
- Richard J Morris
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | |
Collapse
|
47
|
Czerwinski EW, Midoro-Horiuti T, White MA, Brooks EG, Goldblum RM. Crystal structure of Jun a 1, the major cedar pollen allergen from Juniperus ashei, reveals a parallel beta-helical core. J Biol Chem 2005; 280:3740-6. [PMID: 15539389 PMCID: PMC2653420 DOI: 10.1074/jbc.m409655200] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Pollen from cedar and cypress trees is a major cause of seasonal hypersensitivity in humans in several regions of the Northern Hemisphere. We report the first crystal structure of a cedar allergen, Jun a 1, from the pollen of the mountain cedar Juniperus ashei (Cupressaceae). The core of the structure consists primarily of a parallel beta-helix, which is nearly identical to that found in the pectin/pectate lyases from several plant pathogenic microorganisms. Four IgE epitopes mapped to the surface of the protein are accessible to the solvent. The conserved vWiDH sequence is covered by the first 30 residues of the N terminus. The potential reactive arginine, analogous to the pectin/pectate lyase reaction site, is accessible to the solvent, but the substrate binding groove is blocked by a histidine-aspartate salt bridge, a glutamine, and an alpha-helix, all of which are unique to Jun a 1. These observations suggest that steric hindrance in Jun a 1 precludes enzyme activity. The overall results suggest that it is the structure of Jun a 1 that makes it a potent allergen.
Collapse
Affiliation(s)
- Edmund W Czerwinski
- Sealy Center for Structural Biology, Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch at Galveston, Galveston, Texas 77555-0647, USA.
| | | | | | | | | |
Collapse
|
48
|
Sorzano COS, Jonić S, El-Bez C, Carazo JM, De Carlo S, Thévenaz P, Unser M. A multiresolution approach to orientation assignment in 3D electron microscopy of single particles. J Struct Biol 2005; 146:381-92. [PMID: 15099579 DOI: 10.1016/j.jsb.2004.01.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2003] [Revised: 01/13/2004] [Indexed: 11/26/2022]
Abstract
Three-dimensional (3D) electron microscopy (3DEM) aims at the determination of the spatial distribution of the Coulomb potential of macromolecular complexes. The 3D reconstruction of a macromolecule using single-particle techniques involves thousands of 2D projections. One of the key parameters required to perform such a 3D reconstruction is the orientation of each projection image as well as its in-plane orientation. This information is unknown experimentally and must be determined using image-processing techniques. We propose the use of wavelets to match the experimental projections with those obtained from a reference 3D model. The wavelet decomposition of the projection images provides a framework for a multiscale matching algorithm in which speed and robustness to noise are gained. Furthermore, this multiresolution approach is combined with a novel orientation selection strategy. Results obtained from computer simulations as well as experimental data encourage the use of this approach.
Collapse
Affiliation(s)
- C O S Sorzano
- Escuela Politécnica Superior, Universidad San Pablo-CEU, Campus Urb., Madrid, Spain.
| | | | | | | | | | | | | |
Collapse
|
49
|
Macchiarulo A, Nobeli I, Thornton JM. Ligand selectivity and competition between enzymes in silico. Nat Biotechnol 2004; 22:1039-45. [PMID: 15286657 DOI: 10.1038/nbt999] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In a cell, there are many possibilities for cross interactions between enzymes and small molecules, arising from the similarities in the structures of the metabolites and the flexibility in binding of protein active sites. Despite this promiscuity, the cognate partners must be able to recognize each other in vivo, for the cell to function efficiently. This study examines the basis of this selectivity in recognition using standard docking calculations and finds significant improvement when proteins and ligands are cross-docked. We find that cognate molecules rarely form the most stable complexes and that specificity may be driven either by recognition of the substrate by the enzyme or the recognition of the enzyme by the substrate. Despite limitations of the in silico methods, especially the scoring functions, these calculations highlight the need to consider cross reactions in the cell and suggest that localization and compartmentalization must be important factors in the evolution of complex cells. However, the inherent promiscuity of these interactions can also benefit an organism, by facilitating the evolution of new functions from old ones. The results also suggest that high-throughput screening should involve not just a panel of small molecules, but also a panel of proteins to test for cross-reactivity.
Collapse
Affiliation(s)
- Antonio Macchiarulo
- EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
50
|
Stollar EJ, Gelpí JL, Velankar S, Golovin A, Orozco M, Luisi BF. Unconventional interactions between water and heterocyclic nitrogens in protein structures. Proteins 2004; 57:1-8. [PMID: 15326588 DOI: 10.1002/prot.20216] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We report an unusual interaction in which a water molecule approaches the heterocyclic nitrogen of tryptophan and histidine along an axis that is roughly perpendicular to the aromatic plane of the side chain. The interaction is distinct from the well-known conventional aromatic hydrogen-bond, and it occurs at roughly the same frequency in protein structures. Calculations indicate that the water-indole interaction is favorable energetically, and we find several cases in which such contacts are conserved among structural orthologs. The indole-water interaction links side chains and peptide backbone in turn regions, connects the side chains in beta-sheets, and bridges secondary elements from different domains. We suggest that the water-indole interaction can be indirectly responsible for the quenching of tryptophan fluorescence that is observed in the folding of homeodomains and, possibly, many other proteins. We also observe a similar interaction between water and the imidazole nitrogens of the histidine side chain. Taken together, these observations suggest that the unconventional water-indole and water-imidazole interactions provide a small but favorable contribution to protein structures.
Collapse
Affiliation(s)
- Elliott J Stollar
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | | | | | | | | |
Collapse
|