1
|
Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies. STRUCTURAL DYNAMICS (MELVILLE, N.Y.) 2024; 11:034701. [PMID: 38774441 PMCID: PMC11106648 DOI: 10.1063/4.0000251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 05/08/2024] [Indexed: 05/24/2024]
Abstract
Studying protein dynamics and conformational heterogeneity is crucial for understanding biomolecular systems and treating disease. Despite the deposition of over 215 000 macromolecular structures in the Protein Data Bank and the advent of AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold, static representations are typically produced, which fail to fully capture macromolecular motion. Here, we discuss the importance of integrating experimental structures with computational clustering to explore the conformational landscapes that manifest protein function. We describe the method developed by the Protein Data Bank in Europe - Knowledge Base to identify distinct conformational states, demonstrate the resource's primary use cases, through examples, and discuss the need for further efforts to annotate protein conformations with functional information. Such initiatives will be crucial in unlocking the potential of protein dynamics data, expediting drug discovery research, and deepening our understanding of macromolecular mechanisms.
Collapse
|
2
|
PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank. J Cheminform 2023; 15:117. [PMID: 38042830 PMCID: PMC10693035 DOI: 10.1186/s13321-023-00786-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/17/2023] [Indexed: 12/04/2023] Open
Abstract
While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due to the large amount and diversity of data. Here, we present PDBe CCDUtils, a versatile toolkit for processing and analysing small molecules from the PDB in PDBx/mmCIF format. PDBe CCDUtils provides streamlined access to all the metadata for small molecules in the PDB and offers a set of convenient methods to compute various properties using RDKit, such as 2D depictions, 3D conformers, physicochemical properties, scaffolds, common fragments, and cross-references to small molecule databases using UniChem. The toolkit also provides methods for identifying all the covalently attached chemical components in a macromolecular structure and calculating similarity among small molecules. By providing a broad range of functionality, PDBe CCDUtils caters to the needs of researchers in cheminformatics, structural biology, bioinformatics and computational chemistry.
Collapse
|
3
|
PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education. Protein Sci 2022; 31:e4439. [PMID: 36173162 PMCID: PMC9517934 DOI: 10.1002/pro.4439] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/02/2022] [Accepted: 09/05/2022] [Indexed: 11/26/2022]
Abstract
The archiving and dissemination of protein and nucleic acid structures as well as their structural, functional and biophysical annotations is an essential task that enables the broader scientific community to conduct impactful research in multiple fields of the life sciences. The Protein Data Bank in Europe (PDBe; pdbe.org) team develops and maintains several databases and web services to address this fundamental need. From data archiving as a member of the Worldwide PDB consortium (wwPDB; wwpdb.org), to the PDBe Knowledge Base (PDBe‐KB; pdbekb.org), we provide data, data‐access mechanisms, and visualizations that facilitate basic and applied research and education across the life sciences. Here, we provide an overview of the structural data and annotations that we integrate and make freely available. We describe the web services and data visualization tools we offer, and provide information on how to effectively use or even further develop them. Finally, we discuss the direction of our data services, and how we aim to tackle new challenges that arise from the recent, unprecedented advances in the field of structure determination and protein structure modeling.
Collapse
|
4
|
PDBe-KB: collaboratively defining the biological context of structural data. Nucleic Acids Res 2022; 50:D534-D542. [PMID: 34755867 PMCID: PMC8728252 DOI: 10.1093/nar/gkab988] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/01/2021] [Accepted: 10/14/2021] [Indexed: 12/15/2022] Open
Abstract
The Protein Data Bank in Europe - Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive.
Collapse
|
5
|
Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment. Proteins 2021; 89:1800-1823. [PMID: 34453465 PMCID: PMC8616814 DOI: 10.1002/prot.26222] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 12/19/2022]
Abstract
We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.
Collapse
|
6
|
PDBe Aggregated API: Programmatic access to an integrative knowledge graph of molecular structure data. Bioinformatics 2021; 37:3950-3952. [PMID: 34081107 PMCID: PMC8570819 DOI: 10.1093/bioinformatics/btab424] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/13/2021] [Accepted: 06/02/2021] [Indexed: 12/30/2022] Open
Abstract
SUMMARY The PDBe aggregated API is an open-access and open-source RESTful API that provides programmatic access to a wealth of macromolecular structural data and their functional and biophysical annotations through 80+ API endpoints. The API is powered by the PDBe graph database (https://pdbe.org/graph-schema), an open-access integrative knowledge graph that can be used as a discovery tool to answer complex biological questions. AVAILABILITY AND IMPLEMENTATION The PDBe aggregated API provides up-to-date access to the PDBe graph database, which has weekly releases with the latest data from the Protein Data Bank, integrated with updated annotations from UniProt, Pfam, CATH, SCOP and the PDBe-KB partner resources. The complete list of all the available API endpoints and their descriptions are available at https://pdbe.org/graph-api. The source code of the Python 3.6+ API application is publicly available at https://gitlab.ebi.ac.uk/pdbe-kb/services/pdbe-graph-api. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
7
|
|
8
|
Modeling protein‐protein, protein‐peptide, and protein‐oligosaccharide complexes: CAPRI 7th edition. Proteins 2020; 88:916-938. [DOI: 10.1002/prot.25870] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 12/19/2019] [Accepted: 12/26/2019] [Indexed: 12/19/2022]
|
9
|
PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res 2020; 48:D335-D343. [PMID: 31691821 PMCID: PMC7145656 DOI: 10.1093/nar/gkz990] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 10/11/2019] [Accepted: 10/25/2019] [Indexed: 11/23/2022] Open
Abstract
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
Collapse
|
10
|
Blind prediction of homo- and hetero-protein complexes: The CASP13-CAPRI experiment. Proteins 2019; 87:1200-1221. [PMID: 31612567 PMCID: PMC7274794 DOI: 10.1002/prot.25838] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/28/2022]
Abstract
We present the results for CAPRI Round 46, the third joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of 20 targets including 14 homo-oligomers and 6 heterocomplexes. Eight of the homo-oligomer targets and one heterodimer comprised proteins that could be readily modeled using templates from the Protein Data Bank, often available for the full assembly. The remaining 11 targets comprised 5 homodimers, 3 heterodimers, and two higher-order assemblies. These were more difficult to model, as their prediction mainly involved "ab-initio" docking of subunit models derived from distantly related templates. A total of ~30 CAPRI groups, including 9 automatic servers, submitted on average ~2000 models per target. About 17 groups participated in the CAPRI scoring rounds, offered for most targets, submitting ~170 models per target. The prediction performance, measured by the fraction of models of acceptable quality or higher submitted across all predictors groups, was very good to excellent for the nine easy targets. Poorer performance was achieved by predictors for the 11 difficult targets, with medium and high quality models submitted for only 3 of these targets. A similar performance "gap" was displayed by scorer groups, highlighting yet again the unmet challenge of modeling the conformational changes of the protein components that occur upon binding or that must be accounted for in template-based modeling. Our analysis also indicates that residues in binding interfaces were less well predicted in this set of targets than in previous Rounds, providing useful insights for directions of future improvements.
Collapse
|
11
|
Abstract
Data processing and data management services for structural biology. Enhancements to existing web services for structure solution and analysis. New pipelines to link these services into more complex higher-level workflows. New data management facilities. Making the benefits of European e-Infrastructures more accessible to structural biologists.
The West-Life project (https://about.west-life.eu/) is a Horizon 2020 project funded by the European Commission to provide data processing and data management services for the international community of structural biologists, and in particular to support integrative experimental approaches within the field of structural biology. It has developed enhancements to existing web services for structure solution and analysis, created new pipelines to link these services into more complex higher-level workflows, and added new data management facilities. Through this work it has striven to make the benefits of European e-Infrastructures more accessible to life-science researchers in general and structural biologists in particular.
Collapse
|
12
|
Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 2019; 47:D520-D528. [PMID: 30357364 PMCID: PMC6324056 DOI: 10.1093/nar/gky949] [Citation(s) in RCA: 505] [Impact Index Per Article: 101.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 09/28/2018] [Accepted: 10/05/2018] [Indexed: 01/10/2023] Open
Abstract
The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans.
Collapse
|
13
|
PDBe: towards reusable data delivery infrastructure at protein data bank in Europe. Nucleic Acids Res 2018; 46:D486-D492. [PMID: 29126160 PMCID: PMC5753225 DOI: 10.1093/nar/gkx1070] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 10/14/2017] [Accepted: 10/26/2017] [Indexed: 01/12/2023] Open
Abstract
The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged in the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments and improvements at PDBe addressing three challenging areas: data enrichment, data dissemination and functional reusability. New features of the PDBe Web site are discussed, including a context dependent menu providing links to raw experimental data and improved presentation of structures solved by hybrid methods. The paper also summarizes the features of the LiteMol suite, which is a set of services enabling fast and interactive 3D visualization of structures, with associated experimental maps, annotations and quality assessment information. We introduce a library of Web components which can be easily reused to port data and functionality available at PDBe to other services. We also introduce updates to the SIFTS resource which maps PDB data to other bioinformatics resources, and the PDBe REST API.
Collapse
|
14
|
PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res 2016; 44:D385-95. [PMID: 26476444 PMCID: PMC4702783 DOI: 10.1093/nar/gkv1047] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 10/01/2015] [Indexed: 12/30/2022] Open
Abstract
The Protein Data Bank in Europe (http://pdbe.org) accepts and annotates depositions of macromolecular structure data in the PDB and EMDB archives and enriches, integrates and disseminates structural information in a variety of ways. The PDBe website has been redesigned based on an analysis of user requirements, and now offers intuitive access to improved and value-added macromolecular structure information. Unique value-added information includes lists of reviews and research articles that cite or mention PDB entries as well as access to figures and legends from full-text open-access publications that describe PDB entries. A powerful new query system not only shows all the PDB entries that match a given query, but also shows the 'best structures' for a given macromolecule, ligand complex or sequence family using data-quality information from the wwPDB validation reports. A PDBe RESTful API has been developed to provide unified access to macromolecular structure data available in the PDB and EMDB archives as well as value-added annotations, e.g. regarding structure quality and up-to-date cross-reference information from the SIFTS resource. Taken together, these new developments facilitate unified access to macromolecular structure data in an intuitive way for non-expert users and support expert users in analysing macromolecular structure data.
Collapse
|
15
|
COGNAC: a web server for searching and annotating hydrogen-bonded base interactions in RNA three-dimensional structures. Nucleic Acids Res 2014; 42:W382-8. [PMID: 24831543 PMCID: PMC4086061 DOI: 10.1093/nar/gku438] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Hydrogen bonds are crucial factors that stabilize a complex ribonucleic acid (RNA) molecule's three-dimensional (3D) structure. Minute conformational changes can result in variations in the hydrogen bond interactions in a particular structure. Furthermore, networks of hydrogen bonds, especially those found in tight clusters, may be important elements in structure stabilization or function and can therefore be regarded as potential tertiary motifs. In this paper, we describe a graph theoretical algorithm implemented as a web server that is able to search for unbroken networks of hydrogen-bonded base interactions and thus provide an accounting of such interactions in RNA 3D structures. This server, COGNAC (COnnection tables Graphs for Nucleic ACids), is also able to compare the hydrogen bond networks between two structures and from such annotations enable the mapping of atomic level differences that may have resulted from conformational changes due to mutations or binding events. The COGNAC server can be accessed at http://mfrlab.org/grafss/cognac.
Collapse
|
16
|
IMAAAGINE: a webserver for searching hypothetical 3D amino acid side chain arrangements in the Protein Data Bank. Nucleic Acids Res 2013; 41:W432-40. [PMID: 23716645 PMCID: PMC3692123 DOI: 10.1093/nar/gkt431] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We describe a server that allows the interrogation of the Protein Data Bank for hypothetical 3D side chain patterns that are not limited to known patterns from existing 3D structures. A minimal side chain description allows a variety of side chain orientations to exist within the pattern, and generic side chain types such as acid, base and hydroxyl-containing can be additionally deployed in the search query. Moreover, only a subset of distances between the side chains need be specified. We illustrate these capabilities in case studies involving arginine stacks, serine-acid group arrangements and multiple catalytic triad-like configurations. The IMAAAGINE server can be accessed at http://mfrlab.org/grafss/imaaagine/.
Collapse
|
17
|
Proteins of unknown function in the Protein Data Bank (PDB): an inventory of true uncharacterized proteins and computational tools for their analysis. Int J Mol Sci 2012. [PMID: 23202924 PMCID: PMC3497298 DOI: 10.3390/ijms131012761] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Proteins of uncharacterized functions form a large part of many of the currently available biological databases and this situation exists even in the Protein Data Bank (PDB). Our analysis of recent PDB data revealed that only 42.53% of PDB entries (1084 coordinate files) that were categorized under “unknown function” are true examples of proteins of unknown function at this point in time. The remainder 1465 entries also annotated as such appear to be able to have their annotations re-assessed, based on the availability of direct functional characterization experiments for the protein itself, or for homologous sequences or structures thus enabling computational function inference.
Collapse
|
18
|
SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures. Nucleic Acids Res 2012; 40:W380-6. [PMID: 22573174 PMCID: PMC3394286 DOI: 10.1093/nar/gks401] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Similarities in the 3D patterns of amino acid side chains can provide insights into their function despite the absence of any detectable sequence or fold similarities. Search for protein sites (SPRITE) and amino acid pattern search for substructures and motifs (ASSAM) are graph theoretical programs that can search for 3D amino side chain matches in protein structures, by representing the amino acid side chains as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. Both programs require the input file to be in the PDB format. The objective of using SPRITE is to identify matches of side chains in a query structure to patterns with characterized function. In contrast, a 3D pattern of interest can be searched for existing occurrences in available PDB structures using ASSAM. Both programs are freely accessible without any login requirement. SPRITE is available at http://mfrlab.org/grafss/sprite/ while ASSAM can be accessed at http://mfrlab.org/grafss/assam/.
Collapse
|