1
|
Liu R, Clayton J, Shen M, Bhatnagar S, Shen J. Machine Learning Models to Interrogate Proteome-Wide Covalent Ligandabilities Directed at Cysteines. JACS AU 2024; 4:1374-1384. [PMID: 38665640 PMCID: PMC11040703 DOI: 10.1021/jacsau.3c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/22/2024] [Accepted: 02/23/2024] [Indexed: 04/28/2024]
Abstract
Machine learning (ML) identification of covalently ligandable sites may accelerate targeted covalent inhibitor design and help expand the druggable proteome space. Here, we report the rigorous development and validation of the tree-based models and convolutional neural networks (CNNs) trained on a newly curated database (LigCys3D) of over 1000 liganded cysteines in nearly 800 proteins represented by over 10,000 three-dimensional structures in the protein data bank. The unseen tests yielded 94 and 93% area under the receiver operating characteristic curves for the tree models and CNNs, respectively. Based on the AlphaFold2 predicted structures, the ML models recapitulated the newly liganded cysteines in the PDB with over 90% recall values. To assist the community of covalent drug discoveries, we report the predicted ligandable cysteines in 392 human kinases and their locations in the sequence-aligned kinase structure, including the PH and SH2 domains. Furthermore, we disseminate a searchable online database LigCys3D (https://ligcys.computchem.org/) and a web prediction server DeepCys (https://deepcys.computchem.org/), both of which will be continuously updated and improved by including newly published experimental data. The present work represents an important step toward the ML-led integration of big genome data and structure models to annotate the human proteome space for the next-generation covalent drug discoveries.
Collapse
Affiliation(s)
- Ruibin Liu
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Joseph Clayton
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
- Division
of Applied Regulatory Science, Office of Clinical Pharmacology, Center
for Drug Evaluation and Research, U.S. Food
and Drug Administration, Silver
Spring, Maryland 20993, United States
| | - Mingzhe Shen
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Shubham Bhatnagar
- Department
of Computer Science, University of Maryland
at College Park, College
Park, Maryland 20742, United States
| | - Jana Shen
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| |
Collapse
|
2
|
Liu R, Clayton J, Shen M, Bhatnagar S, Shen J. Machine Learning Models to Interrogate Proteomewide Covalent Ligandabilities Directed at Cysteines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.17.553742. [PMID: 37662346 PMCID: PMC10473668 DOI: 10.1101/2023.08.17.553742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Machine learning (ML) identification of covalently ligandable sites may accelerate targeted covalent inhibitor design and help expand the druggable proteome space. Here we report the rigorous development and validation of the tree-based models and convolutional neural networks (CNNs) trained on a newly curated database (LigCys3D) of over 1,000 liganded cysteines in nearly 800 proteins represented by over 10,000 three-dimensional structures in the protein data bank. The unseen tests yielded 94% and 93% AUCs (area under the receiver operating characteristic curve) for the tree models and CNNs, respectively. Based on the AlphaFold2 predicted structures, the ML models recapitulated the newly liganded cysteines in the PDB with over 90% recall values. To assist the community of covalent drug discoveries, we report the predicted ligandable cysteines in 392 human kinases and their locations in the sequence-aligned kinase structure including the PH and SH2 domains. Furthermore, we disseminate a searchable online database LigCys3D (https://ligcys.computchem.org/) and a web prediction server DeepCys (https://deepcys.computchem.org/), both of which will be continuously updated and improved by including newly published experimental data. The present work represents a first step towards the ML-led integration of big genome data and structure models to annotate the human proteome space for the next-generation covalent drug discoveries.
Collapse
Affiliation(s)
- Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| | - Joseph Clayton
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
- Division of Applied Regulatory Science, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA
| | - Mingzhe Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| | - Shubham Bhatnagar
- Department of Computer Science, University of Maryland at College Park, College Park, MD 20742, USA
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, USA
| |
Collapse
|
3
|
Lawson CL, Berman H, Chen L, Vallat B, Zirbel C. The Nucleic Acid Knowledgebase: a new portal for 3D structural information about nucleic acids. Nucleic Acids Res 2024; 52:D245-D254. [PMID: 37953312 PMCID: PMC10767938 DOI: 10.1093/nar/gkad957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
The Nucleic Acid Knowledgebase (nakb.org) is a new data resource, updated weekly, for experimentally determined 3D structures containing DNA and/or RNA nucleic acid polymers and their biological assemblies. NAKB indexes nucleic acid-containing structures derived from all major structure determination methods (X-ray, NMR and EM), including all held by the Protein Data Bank (PDB). As the planned successor to the Nucleic Acid Database (NDB), NAKB's design preserves all functionality of the NDB and provides novel nucleic acid-centric content, including structural and functional annotations, as well as annotations from and links to external resources. A variety of custom interactive tools have been developed to enable rapid exploration and drill-down of NAKB's content.
Collapse
Affiliation(s)
- Catherine L Lawson
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Li Chen
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
4
|
Kunnakkattu IR, Choudhary P, Pravda L, Nadzirin N, Smart OS, Yuan Q, Anyango S, Nair S, Varadi M, Velankar S. PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank. J Cheminform 2023; 15:117. [PMID: 38042830 PMCID: PMC10693035 DOI: 10.1186/s13321-023-00786-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/17/2023] [Indexed: 12/04/2023] Open
Abstract
While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due to the large amount and diversity of data. Here, we present PDBe CCDUtils, a versatile toolkit for processing and analysing small molecules from the PDB in PDBx/mmCIF format. PDBe CCDUtils provides streamlined access to all the metadata for small molecules in the PDB and offers a set of convenient methods to compute various properties using RDKit, such as 2D depictions, 3D conformers, physicochemical properties, scaffolds, common fragments, and cross-references to small molecule databases using UniChem. The toolkit also provides methods for identifying all the covalently attached chemical components in a macromolecular structure and calculating similarity among small molecules. By providing a broad range of functionality, PDBe CCDUtils caters to the needs of researchers in cheminformatics, structural biology, bioinformatics and computational chemistry.
Collapse
Affiliation(s)
- Ibrahim Roshan Kunnakkattu
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Preeti Choudhary
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lukas Pravda
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Nurul Nadzirin
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Oliver S Smart
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Qi Yuan
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephen Anyango
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sreenath Nair
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
5
|
Bæk KT, Kepp KP. Assessment of AlphaFold2 for Human Proteins via Residue Solvent Exposure. J Chem Inf Model 2022; 62:3391-3400. [PMID: 35785970 DOI: 10.1021/acs.jcim.2c00243] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2's ability to describe the backbone solvent exposure as a functionally important and easily interpretable "natural coordinate" of protein conformation, using human proteins as test case. After screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2 structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors, and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identified larger deviations for lower-confidence scores (pLDDT), and exposed residues and polar residues (e.g., Asp, Glu, Asn) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to a common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a metric, we quantified the performance of AF2 for human proteins and provided estimates of the expected agreement as a function of ligand presence, multimer/monomer status, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.
Collapse
Affiliation(s)
- Kristoffer T Bæk
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| |
Collapse
|
6
|
Helliwell JR. Pre- and Post-publication Verification for Reproducible Data Mining in Macromolecular Crystallography. Methods Mol Biol 2022; 2449:235-261. [PMID: 35507266 DOI: 10.1007/978-1-0716-2095-3_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Like an article narrative is deemed by an editor and referees to be worthy of being a version of record on acceptance as a publication, so must the underpinning data also be scrutinized before passing it as a version of record. Indeed without the underpinning data, a study and its conclusions cannot be reproduced at any stage of evaluation, pre- or post-publication. Likewise, an independent study without its own underpinning data also cannot be reproduced let alone be considered a replicate of the first study. The PDB is a modern marvel of achievement providing an organized open access to depositor and user of the data held there opening numerous applications. Methods for modeling protein structures and for determination of structures are still improving their precision, and artifacts of the method exist. So their accuracy is realized if they are reproduced by other methods. It is on such foundations that reproducible data mining is based. Data rates are expanding considerably be they at synchrotrons, the X-ray free electron lasers (XFELs), electron cryomicroscopes (cryoEM), or at the neutron facilities. The work of a person as a referee or user with a narrative and its underpinning data may well be complemented in future by artificial intelligence with machine learning, the former for specific refereeing and the latter for the more general validation, both ideally before publication. Examples are described involving rhenium theranostics, the anti-cancer platins and the SARS-CoV-2 main protease.
Collapse
Affiliation(s)
- John R Helliwell
- Department of Chemistry, University of Manchester, Manchester, UK.
| |
Collapse
|
7
|
Croitoru A, Park SJ, Kumar A, Lee J, Im W, MacKerell AD, Aleksandrov A. Additive CHARMM36 Force Field for Nonstandard Amino Acids. J Chem Theory Comput 2021; 17:3554-3570. [PMID: 34009984 DOI: 10.1021/acs.jctc.1c00254] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Nonstandard amino acids are both abundant in nature, where they play a key role in various cellular processes, and can be synthesized in laboratories, for example, for the manufacture of a range of pharmaceutical agents. In this work, we have extended the additive all-atom CHARMM36 and CHARMM General force field (CGenFF) to a large set of 333 nonstandard amino acids. These include both amino acids with nonstandard side chains, such as post-translationally modified and artificial amino acids, as well as amino acids with modified backbone groups, such as chromophores composed of several amino acids. Model compounds representative of the nonstandard amino acids were parametrized for protonation states that are likely at the physiological pH of 7 and, for some more common residues, in both d- and l-stereoisomers. Considering all protonation, tautomeric, and stereoisomeric forms, a total of 406 nonstandard amino acids were parametrized. Emphasis was placed on the quality of both intra- and intermolecular parameters. Partial charges were derived using quantum mechanical (QM) data on model compound dipole moments, electrostatic potentials, and interactions with water. Optimization of all intramolecular parameters, including torsion angle parameters, was performed against information from QM adiabatic potential energy surface (PES) scans. Special emphasis was put on the quality of terms corresponding to PES around rotatable dihedral angles. Validation of the force field was based on molecular dynamics simulations of 20 protein complexes containing different nonstandard amino acids. Overall, the presented parameters will allow for computational studies of a wide range of proteins containing nonstandard amino acids, including natural and artificial residues.
Collapse
Affiliation(s)
- Anastasia Croitoru
- Laboratoire d'Optique et Biosciences (CNRS UMR7645, INSERM U1182), Ecole Polytechnique, Institut Polytechnique de Paris, F-91128 Palaiseau, France
| | - Sang-Jun Park
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Anmol Kumar
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, United States
| | - Jumin Lee
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Wonpil Im
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Alexander D MacKerell
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, United States
| | - Alexey Aleksandrov
- Laboratoire d'Optique et Biosciences (CNRS UMR7645, INSERM U1182), Ecole Polytechnique, Institut Polytechnique de Paris, F-91128 Palaiseau, France
| |
Collapse
|
8
|
Burley SK. Impact of structural biologists and the Protein Data Bank on small-molecule drug discovery and development. J Biol Chem 2021; 296:100559. [PMID: 33744282 PMCID: PMC8059052 DOI: 10.1016/j.jbc.2021.100559] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 02/02/2021] [Accepted: 03/16/2021] [Indexed: 12/12/2022] Open
Abstract
The Protein Data Bank (PDB) is an international core data resource central to fundamental biology, biomedicine, bioenergy, and biotechnology/bioengineering. Now celebrating its 50th anniversary, the PDB houses >175,000 experimentally determined atomic structures of proteins, nucleic acids, and their complexes with one another and small molecules and drugs. The importance of three-dimensional (3D) biostructure information for research and education obtains from the intimate link between molecular form and function evident throughout biology. Among the most prolific consumers of PDB data are biomedical researchers, who rely on the open access resource as the authoritative source of well-validated, expertly curated biostructures. This review recounts how the PDB grew from just seven protein structures to contain more than 49,000 structures of human proteins that have proven critical for understanding their roles in human health and disease. It then describes how these structures are used in academe and industry to validate drug targets, assess target druggability, characterize how tool compounds and other small-molecules bind to drug targets, guide medicinal chemistry optimization of binding affinity and selectivity, and overcome challenges during preclinical drug development. Three case studies drawn from oncology exemplify how structural biologists and open access to PDB structures impacted recent regulatory approvals of antineoplastic drugs.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California, USA.
| |
Collapse
|
9
|
Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 2020; 47:D464-D474. [PMID: 30357411 PMCID: PMC6324064 DOI: 10.1093/nar/gky1004] [Citation(s) in RCA: 717] [Impact Index Per Article: 179.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/11/2018] [Indexed: 02/06/2023] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a ‘Structural View of Biology.’
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.,Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Luigi Di Costanzo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Cole Christie
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ken Dalenberg
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel K Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Tara Kalro
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Harry Namkoong
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Periskova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andreas Prlic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Chris Randle
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alexander Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Peter Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Raul Sala
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Lihua Tan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yi-Ping Tao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Valasatava
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jesse Woo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Huanwang Yang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Marina Zhuravleva
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
10
|
van Beusekom B, Wezel N, Hekkelman ML, Perrakis A, Emsley P, Joosten RP. Building and rebuilding N-glycans in protein structure models. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY 2019; 75:416-425. [PMID: 30988258 PMCID: PMC6465985 DOI: 10.1107/s2059798319003875] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 03/20/2019] [Indexed: 01/16/2023]
Abstract
Carbohydrates are automatically built and rebuilt using Coot in the PDB-REDO pipeline. N-Glycosylation is one of the most common post-translational modifications and is implicated in, for example, protein folding and interaction with ligands and receptors. N-Glycosylation trees are complex structures of linked carbohydrate residues attached to asparagine residues. While carbohydrates are typically modeled in protein structures, they are often incomplete or have the wrong chemistry. Here, new tools are presented to automatically rebuild existing glycosylation trees, to extend them where possible, and to add new glycosylation trees if they are missing from the model. The method has been incorporated in the PDB-REDO pipeline and has been applied to build or rebuild 16 452 carbohydrate residues in 11 651 glycosylation trees in 4498 structure models, and is also available from the PDB-REDO web server. With better modeling of N-glycosylation, the biological function of this important modification can be better and more easily understood.
Collapse
Affiliation(s)
- Bart van Beusekom
- Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Natasja Wezel
- Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Maarten L Hekkelman
- Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Anastassis Perrakis
- Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Paul Emsley
- MRC Laboratory for Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, England
| | - Robbie P Joosten
- Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| |
Collapse
|
11
|
Young JY, Westbrook JD, Feng Z, Peisach E, Persikova I, Sala R, Sen S, Berrisford JM, Swaminathan GJ, Oldfield TJ, Gutmanas A, Igarashi R, Armstrong DR, Baskaran K, Chen L, Chen M, Clark AR, Di Costanzo L, Dimitropoulos D, Gao G, Ghosh S, Gore S, Guranovic V, Hendrickx PMS, Hudson BP, Ikegawa Y, Kengaku Y, Lawson CL, Liang Y, Mak L, Mukhopadhyay A, Narayanan B, Nishiyama K, Patwardhan A, Sahni G, Sanz-García E, Sato J, Sekharan MR, Shao C, Smart OS, Tan L, van Ginkel G, Yang H, Zhuravleva MA, Markley JL, Nakamura H, Kurisu G, Kleywegt GJ, Velankar S, Berman HM, Burley SK. Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4844086. [PMID: 29688351 PMCID: PMC5804564 DOI: 10.1093/database/bay002] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 01/02/2018] [Indexed: 11/24/2022]
Abstract
The Protein Data Bank (PDB) is the single global repository for experimentally determined 3D structures of biological macromolecules and their complexes with ligands. The worldwide PDB (wwPDB) is the international collaboration that manages the PDB archive according to the FAIR principles: Findability, Accessibility, Interoperability and Reusability. The wwPDB recently developed OneDep, a unified tool for deposition, validation and biocuration of structures of biological macromolecules. All data deposited to the PDB undergo critical review by wwPDB Biocurators. This article outlines the importance of biocuration for structural biology data deposited to the PDB and describes wwPDB biocuration processes and the role of expert Biocurators in sustaining a high-quality archive. Structural data submitted to the PDB are examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources and validated for scientific/technical accuracy. We illustrate how biocuration is integral to PDB data archiving, as it facilitates accurate, consistent and comprehensive representation of biological structure data, allowing efficient and effective usage by research scientists, educators, students and the curious public worldwide. Database URL: https://www.wwpdb.org/
Collapse
Affiliation(s)
- Jasmine Y Young
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - John D Westbrook
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Zukang Feng
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Irina Persikova
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Raul Sala
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Sanchayita Sen
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - John M Berrisford
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - G Jawahar Swaminathan
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Thomas J Oldfield
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Aleksandras Gutmanas
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Reiko Igarashi
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - David R Armstrong
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Kumaran Baskaran
- BMRB, BioMagResBank, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - Li Chen
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Minyu Chen
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Alice R Clark
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Luigi Di Costanzo
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Dimitris Dimitropoulos
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Guanghua Gao
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Sutapa Ghosh
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Swanand Gore
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Vladimir Guranovic
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Pieter M S Hendrickx
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Brian P Hudson
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Yasuyo Ikegawa
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Yumiko Kengaku
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Catherine L Lawson
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Lora Mak
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Abhik Mukhopadhyay
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Buvaneswari Narayanan
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Kayoko Nishiyama
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Ardan Patwardhan
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gaurav Sahni
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eduardo Sanz-García
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Junko Sato
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Monica R Sekharan
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Oliver S Smart
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lihua Tan
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Glen van Ginkel
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Huanwang Yang
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Marina A Zhuravleva
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - John L Markley
- BMRB, BioMagResBank, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706, USA
| | - Haruki Nakamura
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Genji Kurisu
- PDBj, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-shi, Osaka 565-0871, Japan
| | - Gerard J Kleywegt
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Helen M Berman
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Stephen K Burley
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA.,RCSB Protein Data Bank, San Diego Supercomputer Center and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA.,Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, Little Albany St, New Brunswick, NJ 08901, USA
| |
Collapse
|
12
|
Park SJ, Lee J, Patel DS, Ma H, Lee HS, Jo S, Im W. Glycan Reader is improved to recognize most sugar types and chemical modifications in the Protein Data Bank. Bioinformatics 2018; 33:3051-3057. [PMID: 28582506 DOI: 10.1093/bioinformatics/btx358] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 05/31/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Glycans play a central role in many essential biological processes. Glycan Reader was originally developed to simplify the reading of Protein Data Bank (PDB) files containing glycans through the automatic detection and annotation of sugars and glycosidic linkages between sugar units and to proteins, all based on atomic coordinates and connectivity information. Carbohydrates can have various chemical modifications at different positions, making their chemical space much diverse. Unfortunately, current PDB files do not provide exact annotations for most carbohydrate derivatives and more than 50% of PDB glycan chains have at least one carbohydrate derivative that could not be correctly recognized by the original Glycan Reader. Results Glycan Reader has been improved and now identifies most sugar types and chemical modifications (including various glycolipids) in the PDB, and both PDB and PDBx/mmCIF formats are supported. CHARMM-GUI Glycan Reader is updated to generate the simulation system and input of various glycoconjugates with most sugar types and chemical modifications. It also offers a new functionality to edit the glycan structures through addition/deletion/modification of glycosylation types, sugar types, chemical modifications, glycosidic linkages, and anomeric states. The simulation system and input files can be used for CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Glycan Fragment Database in GlycanStructure.Org is also updated to provide an intuitive glycan sequence search tool for complex glycan structures with various chemical modifications in the PDB. Availability and implementation http://www.charmm-gui.org/input/glycan and http://www.glycanstructure.org. Contact wonpil@lehigh.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sang-Jun Park
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| | - Jumin Lee
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| | - Dhilon S Patel
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| | - Hongjing Ma
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| | - Hui Sun Lee
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| | - Sunhwan Jo
- Leadership Computing Facility, Argonne National Laboratory, Argonne, IL, USA
| | - Wonpil Im
- Department of Biological Sciences and Bioengineering Program, Lehigh University, Bethlehem, PA, USA
| |
Collapse
|
13
|
Nuti E, Cuffaro D, Bernardini E, Camodeca C, Panelli L, Chaves S, Ciccone L, Tepshi L, Vera L, Orlandini E, Nencetti S, Stura EA, Santos MA, Dive V, Rossello A. Development of Thioaryl-Based Matrix Metalloproteinase-12 Inhibitors with Alternative Zinc-Binding Groups: Synthesis, Potentiometric, NMR, and Crystallographic Studies. J Med Chem 2018; 61:4421-4435. [DOI: 10.1021/acs.jmedchem.8b00096] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Elisa Nuti
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Doretta Cuffaro
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Elisa Bernardini
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
- Centro de Química Estrutural, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
| | - Caterina Camodeca
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Laura Panelli
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Sílvia Chaves
- Centro de Química Estrutural, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
| | - Lidia Ciccone
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d’Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette 91190, France
- Synchrotron SOLEIL, L’Orme des Merisiers,
Saint-Aubin, BP 48, 91192 Gif-sur-Yvette, France
| | - Livia Tepshi
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d’Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette 91190, France
| | - Laura Vera
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d’Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette 91190, France
- Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, 5232 Villigen, Switzerland
| | - Elisabetta Orlandini
- Dipartimento di Scienze della Terra, Università di Pisa, via Santa Maria 53, 56126 Pisa, Italy
| | - Susanna Nencetti
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Enrico A. Stura
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d’Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette 91190, France
| | - M. Amélia Santos
- Centro de Química Estrutural, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
| | - Vincent Dive
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d’Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette 91190, France
| | - Armando Rossello
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| |
Collapse
|
14
|
Beshnova DA, Pereira J, Lamzin VS. Estimation of the protein-ligand interaction energy for model building and validation. Acta Crystallogr D Struct Biol 2017; 73:195-202. [PMID: 28291754 PMCID: PMC5349431 DOI: 10.1107/s2059798317003400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2016] [Accepted: 03/01/2017] [Indexed: 12/03/2022] Open
Abstract
Macromolecular X-ray crystallography is one of the main experimental techniques to visualize protein-ligand interactions. The high complexity of the ligand universe, however, has delayed the development of efficient methods for the automated identification, fitting and validation of ligands in their electron-density clusters. The identification and fitting are primarily based on the density itself and do not take into account the protein environment, which is a step that is only taken during the validation of the proposed binding mode. Here, a new approach, based on the estimation of the major energetic terms of protein-ligand interaction, is introduced for the automated identification of crystallographic ligands in the indicated binding site with ARP/wARP. The applicability of the method to the validation of protein-ligand models from the Protein Data Bank is demonstrated by the detection of models that are `questionable' and the pinpointing of unfavourable interatomic contacts.
Collapse
Affiliation(s)
- Daria A. Beshnova
- European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany
| | - Joana Pereira
- European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany
| | - Victor S. Lamzin
- European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany
| |
Collapse
|
15
|
Young JY, Westbrook JD, Feng Z, Sala R, Peisach E, Oldfield TJ, Sen S, Gutmanas A, Armstrong DR, Berrisford JM, Chen L, Chen M, Di Costanzo L, Dimitropoulos D, Gao G, Ghosh S, Gore S, Guranovic V, Hendrickx PMS, Hudson BP, Igarashi R, Ikegawa Y, Kobayashi N, Lawson CL, Liang Y, Mading S, Mak L, Mir MS, Mukhopadhyay A, Patwardhan A, Persikova I, Rinaldi L, Sanz-Garcia E, Sekharan MR, Shao C, Swaminathan GJ, Tan L, Ulrich EL, van Ginkel G, Yamashita R, Yang H, Zhuravleva MA, Quesada M, Kleywegt GJ, Berman HM, Markley JL, Nakamura H, Velankar S, Burley SK. OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. Structure 2017; 25:536-545. [PMID: 28190782 DOI: 10.1016/j.str.2017.01.004] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/08/2016] [Accepted: 01/10/2017] [Indexed: 10/20/2022]
Abstract
OneDep, a unified system for deposition, biocuration, and validation of experimentally determined structures of biological macromolecules to the PDB archive, has been developed as a global collaboration by the worldwide PDB (wwPDB) partners. This new system was designed to ensure that the wwPDB could meet the evolving archiving requirements of the scientific community over the coming decades. OneDep unifies deposition, biocuration, and validation pipelines across all wwPDB, EMDB, and BMRB deposition sites with improved focus on data quality and completeness in these archives, while supporting growth in the number of depositions and increases in their average size and complexity. In this paper, we describe the design, functional operation, and supporting infrastructure of the OneDep system, and provide initial performance assessments.
Collapse
Affiliation(s)
- Jasmine Y Young
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
| | - John D Westbrook
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Raul Sala
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Thomas J Oldfield
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sanchayita Sen
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Aleksandras Gutmanas
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - David R Armstrong
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - John M Berrisford
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Li Chen
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Minyu Chen
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Luigi Di Costanzo
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dimitris Dimitropoulos
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Guanghua Gao
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sutapa Ghosh
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Swanand Gore
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Vladimir Guranovic
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Pieter M S Hendrickx
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Brian P Hudson
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Reiko Igarashi
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Yasuyo Ikegawa
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Naohiro Kobayashi
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Catherine L Lawson
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Steve Mading
- BMRB, BioMagResBank, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Lora Mak
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - M Saqib Mir
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Abhik Mukhopadhyay
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ardan Patwardhan
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Irina Persikova
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Luana Rinaldi
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Eduardo Sanz-Garcia
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Monica R Sekharan
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - G Jawahar Swaminathan
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lihua Tan
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Eldon L Ulrich
- BMRB, BioMagResBank, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Glen van Ginkel
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Reiko Yamashita
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Huanwang Yang
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Marina A Zhuravleva
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Martha Quesada
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Gerard J Kleywegt
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Helen M Berman
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John L Markley
- BMRB, BioMagResBank, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Haruki Nakamura
- PDBj, Institute for Protein Research, Osaka University, Osaka, 565-0871, Japan
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Stephen K Burley
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; RCSB Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA
| |
Collapse
|
16
|
Shionyu-Mitsuyama C, Hijikata A, Tsuji T, Shirai T. Classification of ligand molecules in PDB with graph match-based structural superposition. ACTA ACUST UNITED AC 2016; 17:135-146. [PMID: 28012138 DOI: 10.1007/s10969-016-9209-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 12/05/2016] [Indexed: 10/20/2022]
Abstract
The fast heuristic graph match algorithm for small molecules, COMPLIG, was improved by adding a structural superposition process to verify the atom-atom matching. The modified method was used to classify the small molecule ligands in the Protein Data Bank (PDB) by their three-dimensional structures, and 16,660 types of ligands in the PDB were classified into 7561 clusters. In contrast, a classification by a previous method (without structure superposition) generated 3371 clusters from the same ligand set. The characteristic feature in the current classification system is the increased number of singleton clusters, which contained only one ligand molecule in a cluster. Inspections of the singletons in the current classification system but not in the previous one implied that the major factors for the isolation were differences in chirality, cyclic conformations, separation of substructures, and bond length. Comparisons between current and previous classification systems revealed that the superposition-based classification was effective in clustering functionally related ligands, such as drugs targeted to specific biological processes, owing to the strictness of the atom-atom matching.
Collapse
Affiliation(s)
- Clara Shionyu-Mitsuyama
- Department of Bioscience, Nagahama Institute of Bio-science and Technology, 1266 Tamura, Nagahama, 526-0829, Japan
| | - Atsushi Hijikata
- Department of Bioscience, Nagahama Institute of Bio-science and Technology, 1266 Tamura, Nagahama, 526-0829, Japan
| | - Toshiyuki Tsuji
- Department of Bioscience, Nagahama Institute of Bio-science and Technology, 1266 Tamura, Nagahama, 526-0829, Japan
| | - Tsuyoshi Shirai
- Department of Bioscience, Nagahama Institute of Bio-science and Technology, 1266 Tamura, Nagahama, 526-0829, Japan.
| |
Collapse
|
17
|
Adams PD, Aertgeerts K, Bauer C, Bell JA, Berman HM, Bhat TN, Blaney JM, Bolton E, Bricogne G, Brown D, Burley SK, Case DA, Clark KL, Darden T, Emsley P, Feher VA, Feng Z, Groom CR, Harris SF, Hendle J, Holder T, Joachimiak A, Kleywegt GJ, Krojer T, Marcotrigiano J, Mark AE, Markley JL, Miller M, Minor W, Montelione GT, Murshudov G, Nakagawa A, Nakamura H, Nicholls A, Nicklaus M, Nolte RT, Padyana AK, Peishoff CE, Pieniazek S, Read RJ, Shao C, Sheriff S, Smart O, Soisson S, Spurlino J, Stouch T, Svobodova R, Tempel W, Terwilliger TC, Tronrud D, Velankar S, Ward SC, Warren GL, Westbrook JD, Williams P, Yang H, Young J. Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop. Structure 2016; 24:502-508. [PMID: 27050687 DOI: 10.1016/j.str.2016.02.017] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 02/24/2016] [Accepted: 02/25/2016] [Indexed: 10/22/2022]
Abstract
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) represent an important source of information concerning drug-target interactions, providing atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. Of the more than 115,000 entries extant in the Protein Data Bank (PDB) archive, ∼75% include at least one non-polymeric ligand. Ligand geometrical and stereochemical quality, the suitability of ligand models for in silico drug discovery and design, and the goodness-of-fit of ligand models to electron-density maps vary widely across the archive. We describe the proceedings and conclusions from the first Worldwide PDB/Cambridge Crystallographic Data Center/Drug Design Data Resource (wwPDB/CCDC/D3R) Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30-31, 2015. Experts in protein crystallography from academe and industry came together with non-profit and for-profit software providers for crystallography and with experts in computational chemistry and data archiving to discuss and make recommendations on best practices, as framed by a series of questions central to structural studies of macromolecule-ligand complexes. What data concerning bound ligands should be archived in the PDB? How should the ligands be best represented? How should structural models of macromolecule-ligand complexes be validated? What supplementary information should accompany publications of structural studies of biological macromolecules? Consensus recommendations on best practices developed in response to each of these questions are provided, together with some details regarding implementation. Important issues addressed but not resolved at the workshop are also enumerated.
Collapse
Affiliation(s)
- Paul D Adams
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley Laboratory, Department of Bioengineering, UC Berkeley, Berkeley, CA 94720-8235, USA
| | | | - Cary Bauer
- Bruker AXS, Inc., Madison, WI 53711, USA
| | | | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Talapady N Bhat
- Biosystems and Biomaterials Division, NIST, Gaithersburg, MD 20899, USA
| | | | - Evan Bolton
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | | | - David Brown
- School of Biosciences, University of Kent, Canterbury CT2 7NH, UK; Charles River Ltd., Structural Biology and Biophysics, Cambridge CB10 1XL, UK
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.
| | - David A Case
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kirk L Clark
- Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA
| | - Tom Darden
- OpenEye Scientific, Cambridge, MA 02142, USA
| | - Paul Emsley
- MRC Laboratory of Molecular Biology, Cambridge CB2 0QH, UK
| | - Victoria A Feher
- Drug Design Data Resource and Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Colin R Groom
- Cambridge Crystallographic Data Centre, Cambridge CB2 1EZ, UK.
| | | | - Jorg Hendle
- Structural Biology, Lilly Biotechnology Center, San Diego, CA 92121, USA
| | | | - Andrzej Joachimiak
- Structural Biology Center, Biosciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Gerard J Kleywegt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Krojer
- Structural Genomics Consortium, University of Oxford, Oxford OX3 7DQ, UK
| | - Joseph Marcotrigiano
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Alan E Mark
- School of Chemistry & Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - John L Markley
- BioMagResBank, Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706-1544, USA
| | - Matthew Miller
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | | | - Atsushi Nakagawa
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Osaka 565-0871, Japan
| | - Haruki Nakamura
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Osaka 565-0871, Japan
| | | | - Marc Nicklaus
- Computer-Aided Drug Design Group, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA
| | | | | | | | - Susan Pieniazek
- Bristol-Myers Squibb Research and Development, Pennington, NJ 08534, USA
| | - Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Steven Sheriff
- Bristol-Myers Squibb Research and Development, Princeton, NJ 08543, USA
| | - Oliver Smart
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - John Spurlino
- Janssen Pharmaceuticals, Inc., Spring House, PA 19002, USA
| | - Terry Stouch
- Science For Solutions, LLC, West Windsor, NJ 08550, USA
| | - Radka Svobodova
- CEITEC-Central European Institute of Technology and National Centre for Biomolecular Research, Masaryk University Brno, 625 00 Brno, Czech Republic
| | - Wolfram Tempel
- Structural Genomics Consortium, University of Toronto, Toronto, ON M5G 1L7, Canada
| | | | - Dale Tronrud
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR 97331, USA
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suzanna C Ward
- Cambridge Crystallographic Data Centre, Cambridge CB2 1EZ, UK
| | | | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | | | - Huanwang Yang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
18
|
Ciccone L, Policar C, Stura EA, Shepard W. Human TTR conformation altered by rhenium tris-carbonyl derivatives. J Struct Biol 2016; 195:353-364. [PMID: 27402536 DOI: 10.1016/j.jsb.2016.07.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/07/2016] [Accepted: 07/08/2016] [Indexed: 01/13/2023]
Abstract
Transthyretin (TTR) is a 54 kDa homotetrameric serum protein that transports thyroxine (T4) and retinol. TTR is potentially amyloidogenic due to homotetramer dissociation into monomeric intermediates that self-assemble as amyloid deposits and insoluble fibrils. Most crystallographic structures, including those of amyloidogenic variants show the same tetramer without major variations in the monomer-monomer interface nor in the volume of the interdimeric cavity. Soaking TTR crystals in a solution containing rhenium tris-carbonyl derivatives yields a TTR conformer never observed before. Only one of the two monomers of the crystallographic dimer is significantly altered, and the inner part of the T4 binding cavity is expanded at one end and shrunk at the other. The result redefines the mechanism of allosteric communication between the two sites, suggesting that negative cooperativity is a function of dimer asymmetry, which can be induced through internal or external binding. An aspect that remains unexplained is why the conformational changes are ubiquitous throughout the crystal although the heavy metal content of the derivatized crystals is relatively low. The conformational changes observed, which include Leu(82), may represent a form of TTR better at scavenging β-Amyloid. At a resolution of 1.69Å, with excellent refinement statistics and well defined electron density for all parts of the structure, it is possible to envisage answering important questions that range from protein cooperative behavior to heavy atom induced protein conformational modifications that can result in crystallographic non-isomorphism.
Collapse
Affiliation(s)
- Lidia Ciccone
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France; CEA, iBiTec-S, Service d'Ingénierie Moléculaire des Protéines (SIMOPRO), Gif-sur-Yvette F-91191, France
| | - Clotilde Policar
- Ecole Normale Supérieure, Département de chimie, 24, rue Lhomond, 75005 Paris, France; Université Pierre et Marie Curie Paris 6, 4, Place Jussieu, 75005 Paris, France; CNRS, UMR7203, 75005 Paris, France
| | - Enrico A Stura
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France; CEA, iBiTec-S, Service d'Ingénierie Moléculaire des Protéines (SIMOPRO), Gif-sur-Yvette F-91191, France.
| | - William Shepard
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France
| |
Collapse
|
19
|
Polsinelli I, Nencetti S, Shepard W, Ciccone L, Orlandini E, Stura EA. A new crystal form of human transthyretin obtained with a curcumin derived ligand. J Struct Biol 2016; 194:8-17. [PMID: 26796656 DOI: 10.1016/j.jsb.2016.01.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 01/14/2016] [Accepted: 01/16/2016] [Indexed: 12/18/2022]
Abstract
Transthyretin (TTR), a 54kDa homotetrameric protein that transports thyroxine (T4), has been associated with clinical cases of TTR amyloidosis for its tendency to aggregate to form fibrils. Many ligands with a potential to inhibit fibril formation have been studied by X-ray crystallography in complex with TTR. Unfortunately, the ligand is often found in ambiguous electron density that is difficult to interpret. The ligand validation statistics suggest over-interpretation, even for the most active compounds like diflunisal. The primary technical reason is its position on a crystallographic 2-fold axis in the most common crystal form. Further investigations with the use of polyethylene glycol (PEG) to crystallize TTR complexes have resulted in a new trigonal polymorph with two tetramers in the asymmetric unit. The ligand used to obtain this new polymorph, 4-hydroxychalcone, is related to curcumin. Here we evaluate this crystal form to understand the contribution it may bring to the study of TTR ligands complexes, which are often asymmetric.
Collapse
Affiliation(s)
- Ivan Polsinelli
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France; Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - Susanna Nencetti
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | - William Shepard
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France
| | - Lidia Ciccone
- Dipartimento di Farmacia, Università di Pisa, Via Bonanno 6, 56126 Pisa, Italy
| | | | - Enrico A Stura
- Synchrotron SOLEIL, l'Orme des Merisiers, Saint Aubin, BP 48, 91192 Gif-sur-Yvette, France; CEA, iBiTec-S, Service d'Ingénierie Moléculaire des Protéines (SIMOPRO), Gif-sur-Yvette F-91191, France.
| |
Collapse
|
20
|
Dauter Z, Wlodawer A. Progress in protein crystallography. Protein Pept Lett 2016; 23:201-10. [PMID: 26732246 PMCID: PMC6287266 DOI: 10.2174/0929866523666160106153524] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 10/26/2015] [Accepted: 01/03/2016] [Indexed: 11/22/2022]
Abstract
Macromolecular crystallography evolved enormously from the pioneering days, when structures were solved by "wizards" performing all complicated procedures almost by hand. In the current situation crystal structures of large systems can be often solved very effectively by various powerful automatic programs in days or hours, or even minutes. Such progress is to a large extent coupled to the advances in many other fields, such as genetic engineering, computer technology, availability of synchrotron beam lines and many other techniques, creating the highly interdisciplinary science of macromolecular crystallography. Due to this unprecedented success crystallography is often treated as one of the analytical methods and practiced by researchers interested in structures of macromolecules, but not highly competent in the procedures involved in the process of structure determination. One should therefore take into account that the contemporary, highly automatic systems can produce results almost without human intervention, but the resulting structures must be carefully checked and validated before their release into the public domain.
Collapse
Affiliation(s)
- Zbigniew Dauter
- Macromolecular Crystallography Laboratory, National Cancer Institute, Frederick, MD and Argonne, IL, USA.
| | | |
Collapse
|
21
|
Dufresne Y, Noé L, Leclère V, Pupin M. Smiles2Monomers: a link between chemical and biological structures for polymers. J Cheminform 2015; 7:62. [PMID: 26715946 PMCID: PMC4693424 DOI: 10.1186/s13321-015-0111-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 12/06/2015] [Indexed: 12/17/2022] Open
Abstract
Background The monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. Many databases give access to the atomic structure of compounds but the monomeric structure of polymers is often lacking. We have designed a smart algorithm, implemented in the tool Smiles2Monomers (s2m), to infer efficiently and accurately the monomeric structure of a polymer from its chemical structure. Results Our strategy is divided into two steps: first, monomers are mapped on the atomic structure by an efficient subgraph-isomorphism algorithm ; second, the best tiling is computed so that non-overlapping monomers cover all the structure of the target polymer. The mapping is based on a Markovian index built by a dynamic programming algorithm. The index enables s2m to search quickly all the given monomers on a target polymer. After, a greedy algorithm combines the mapped monomers into a consistent monomeric structure. Finally, a local branch and cut algorithm refines the structure. We tested this method on two manually annotated databases of polymers and reconstructed the structures de novo with a sensitivity over 90 %. The average computation time per polymer is 2 s. Conclusion s2m automatically creates de novo monomeric annotations for polymers, efficiently in terms of time computation and sensitivity. s2m allowed us to detect annotation errors in the tested databases and to easily find the accurate structures. So, s2m could be integrated into the curation process of databases of small compounds to verify the current entries and accelerate the annotation of new polymers. The full method can be downloaded or accessed via a website for peptide-like polymers at http://bioinfo.lifl.fr/norine/smiles2monomers.jsp.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0111-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Dufresne
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| | - Laurent Noé
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| | - Valérie Leclère
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France ; Univ. Lille, INRA, ISA, Univ. Artois, Univ. Littoral Côte d'Opale, EA 7394 - ICV - Institut Charles Viollette, 59000 Lille, France
| | - Maude Pupin
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, 59000 Lille, France ; Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40 avenue Halley, 59650 Villeneuve d'Ascq, France
| |
Collapse
|
22
|
Shabalin I, Dauter Z, Jaskolski M, Minor W, Wlodawer A. Crystallography and chemistry should always go together: a cautionary tale of protein complexes with cisplatin and carboplatin. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2015; 71:1965-79. [PMID: 26327386 PMCID: PMC4556316 DOI: 10.1107/s139900471500629x] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 03/27/2015] [Indexed: 12/23/2022]
Abstract
The anticancer activity of platinum-containing drugs such as cisplatin and carboplatin is considered to primarily arise from their interactions with nucleic acids; nevertheless, these drugs, or the products of their hydrolysis, also bind to proteins, potentially leading to the known side effects of the treatments. Here, over 40 crystal structures deposited in the Protein Data Bank (PDB) of cisplatin and carboplatin complexes of several proteins were analysed. Significant problems of either a crystallographic or a chemical nature were found in most of the presented atomic models and they could be traced to less or more serious deficiencies in the data-collection and refinement procedures. The re-evaluation of these data and models was possible thanks to their mandatory or voluntary deposition in publicly available databases, emphasizing the point that the availability of such data is critical for making structural science reproducible. Based on this analysis of a selected group of macromolecular structures, the importance of deposition of raw diffraction data is stressed and a procedure for depositing, tracking and using re-refined crystallographic models is suggested.
Collapse
Affiliation(s)
- Ivan Shabalin
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
| | - Zbigniew Dauter
- Synchrotron Radiation Research Section, MCL, National Cancer Institute, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Mariusz Jaskolski
- Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, Poland
- Center for Biocrystallographic Research, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
| | - Alexander Wlodawer
- Protein Structure Section, MCL, National Cancer Institute, Frederick, MD 21702, USA
| |
Collapse
|
23
|
Warr WA. Many InChIs and quite some feat. J Comput Aided Mol Des 2015; 29:681-94. [PMID: 26081259 DOI: 10.1007/s10822-015-9854-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 06/10/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, Holmes Chapel, Crewe, Cheshire, CW4 7HZ, UK,
| |
Collapse
|