1
|
Segura J, Sanchez-Garcia R, Bittrich S, Rose Y, Burley SK, Duarte JM. Multi-scale structural similarity embedding search across entire proteomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.28.640875. [PMID: 40093062 PMCID: PMC11908163 DOI: 10.1101/2025.02.28.640875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures. Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.
Collapse
Affiliation(s)
- Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Ruben Sanchez-Garcia
- School of Science and Technology, IE University, Paseo de la Castellana 259, 28046 Madrid, Spain
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank and the Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
2
|
Kim W, Mirdita M, Levy Karin E, Gilchrist CLM, Schweke H, Söding J, Levy ED, Steinegger M. Rapid and sensitive protein complex alignment with Foldseek-Multimer. Nat Methods 2025; 22:469-472. [PMID: 39910251 PMCID: PMC11903335 DOI: 10.1038/s41592-025-02593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 12/20/2024] [Indexed: 02/07/2025]
Abstract
Advances in computational structure prediction will vastly augment the hundreds of thousands of currently available protein complex structures. Translating these into discoveries requires aligning them, which is computationally prohibitive. Foldseek-Multimer computes complex alignments from compatible chain-to-chain alignments, identified by efficiently clustering their superposition vectors. Foldseek-Multimer is 3-4 orders of magnitudes faster than the gold standard, while producing comparable alignments; this allows it to compare billions of complex pairs in 11 h. Foldseek-Multimer is open-source software available at GitHub via https://github.com/steineggerlab/foldseek/ , https://search.foldseek.com/search/ and the BFMD database.
Collapse
Affiliation(s)
- Woosub Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | | | | | - Hugo Schweke
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular and Cellular Biology, University of Geneva, Geneva, Switzerland
| | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Göttingen, Göttingen, Germany
| | - Emmanuel D Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
- Department of Molecular and Cellular Biology, University of Geneva, Geneva, Switzerland.
| | - Martin Steinegger
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
3
|
Burley S, Bhatt R, Bhikadiya C, Bi C, Biester A, Biswas P, Bittrich S, Blaumann S, Brown R, Chao H, Chithari VR, Craig P, Crichlow G, Duarte J, Dutta S, Feng Z, Flatt J, Ghosh S, Goodsell D, Green RK, Guranovic V, Henry J, Hudson B, Joy M, Kaelber J, Khokhriakov I, Lai JS, Lawson C, Liang Y, Myers-Turnbull D, Peisach E, Persikova I, Piehl D, Pingale A, Rose Y, Sagendorf J, Sali A, Segura J, Sekharan M, Shao C, Smith J, Trumbull M, Vallat B, Voigt M, Webb B, Whetstone S, Wu-Wu A, Xing T, Young J, Zalevsky A, Zardecki C. Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank. Nucleic Acids Res 2025; 53:D564-D574. [PMID: 39607707 PMCID: PMC11701563 DOI: 10.1093/nar/gkae1091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/17/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, RCSB.org), the US Worldwide Protein Data Bank (wwPDB, wwPDB.org) data center for the global PDB archive, provides access to the PDB data via its RCSB.org research-focused web portal. We report substantial additions to the tools and visualization features available at RCSB.org, which now delivers more than 227000 experimentally determined atomic-level three-dimensional (3D) biostructures stored in the global PDB archive alongside more than 1 million Computed Structure Models (CSMs) of proteins (including models for human, model organisms, select human pathogens, crop plants and organisms important for addressing climate change). In addition to providing support for 3D structure motif searches with user-provided coordinates, new features highlighted herein include query results organized by redundancy-reduced Groups and summary pages that facilitate exploration of groups of similar proteins. Newly released programmatic tools are also described, as are enhanced training opportunities.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
- Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Rusham Bhatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Alison Biester
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Pratyoy Biswas
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Santiago Blaumann
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ronald Brown
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vivek Reddy Chithari
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Paul A Craig
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute, New Brunswick, NJ 08901, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Justin W Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute, New Brunswick, NJ 08901, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Michael Joy
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jason T Kaelber
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Jhih-Siang Lai
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Douglas Myers-Turnbull
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Aditya Pingale
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Jared Sagendorf
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, CA 94158, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - James Smith
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Michael Trumbull
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ben Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, CA 94158, USA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Amy Wu-Wu
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Tongji Xing
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California, San Francisco, CA 94158, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
4
|
Lushchekina S, Weiner L, Ashani Y, Emrizal R, Firdaus‐Raih M, Silman I, Sussman JL. Why is binding of a divalent metal cation to a structural motif containing four carboxylate residues not accompanied by a conformational change? Protein Sci 2024; 33:e5206. [PMID: 39548604 PMCID: PMC11567836 DOI: 10.1002/pro.5206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/16/2024] [Accepted: 10/17/2024] [Indexed: 11/18/2024]
Abstract
We earlier showed that Torpedo californica acetylcholinesterase (AChE) contains a cluster of four conserved aspartates that can strongly bind divalent cations, which we named the 4D motif. Binding of the divalent metal cations greatly increases its thermal stability. Here we systematically examined all available crystallographic structures of T. californica AChE. Two additional metal-binding sites were identified, both composed of acidic and histidine residues. Relative binding to the 4D and additional sites was studied using metadynamics simulations. It was observed that in crystal structures devoid of metal ions in the 4D site, the conformation of T. californica AChE is almost identical to that in structures in which it is occupied by a divalent metal ion. Closer examination of the 4D motif reveals that three of the four acidic residues form ion pairs with conserved basic residues surrounding them. We named this new motif the 4A/3B motif. Molecular dynamics with quantum potential simulations was used to quantify the 4D motif's binding strength compared with that of the metal-binding site in the protein fXIIIa, which consists of four aspartates, but is devoid of adjacent cationic residues. Whereas fXIIIa's 4D site, in the absence of a metal cation, expanded significantly in the simulation, that of Torpedo AChE displayed only minor periodic changes in size. Furthermore, the energy of metal ion unbinding from the two sites differs by ca. 10 kcal/mol. We identified several other proteins in the PDB that contain the 4A/3B motif, whose conformations are identical in the presence or absence of a metal ion. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at https://proteopedia.org/w/Journal:Protein_Science:4.
Collapse
Affiliation(s)
- Sofya Lushchekina
- Department of Biomolecular SciencesWeizmann Institute of ScienceRehovotIsrael
| | - Lev Weiner
- Department of Brain SciencesWeizmann Institute of ScienceRehovotIsrael
- Department of Chemical Research SupportWeizmann Institute of ScienceRehovotIsrael
| | - Yacov Ashani
- Department of Biomolecular SciencesWeizmann Institute of ScienceRehovotIsrael
| | - Reeki Emrizal
- Department of Applied Physics, Faculty of Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia
| | - Mohd Firdaus‐Raih
- Department of Applied Physics, Faculty of Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia
- Institute of Systems BiologyUniversiti Kebangsaan MalaysiaBangiMalaysia
| | - Israel Silman
- Department of Brain SciencesWeizmann Institute of ScienceRehovotIsrael
| | - Joel L. Sussman
- Department of Chemical and Structural BiologyWeizmann Institute of ScienceRehovotIsrael
- Structural Proteomics Unit, Life Sciences Core FacilitiesWeizmann Institute of ScienceRehovotIsrael
| |
Collapse
|
5
|
Sirugue L, Langenfeld F, Lagarde N, Montes M. PLO3S: Protein LOcal Surficial Similarity Screening. Comput Struct Biotechnol J 2024; 26:1-10. [PMID: 38189058 PMCID: PMC10770625 DOI: 10.1016/j.csbj.2023.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 01/09/2024] Open
Abstract
The study of protein molecular surfaces enables to better understand and predict protein interactions. Different methods have been developed in computer vision to compare surfaces that can be applied to protein molecular surfaces. The present work proposes a method using the Wave Kernel Signature: Protein LOcal Surficial Similarity Screening (PLO3S). The descriptor of the PLO3S method is a local surface shape descriptor projected on a unit sphere mapped onto a 2D plane and called Surface Wave Interpolated Maps (SWIM). PLO3S allows to rapidly compare protein surface shapes through local comparisons to filter large protein surfaces datasets in protein structures virtual screening protocols.
Collapse
Affiliation(s)
- Léa Sirugue
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| | - Matthieu Montes
- Laboratoire GBCM, EA7528, Conservatoire National des Arts et Métiers, Hesam Université, 2, rue Conté, Paris, 75003, France
| |
Collapse
|
6
|
Lai JS, Burley SK, Duarte JM. ZMPY3D: accelerating protein structure volume analysis through vectorized 3D Zernike moments and Python-based GPU integration. BIOINFORMATICS ADVANCES 2024; 4:vbae111. [PMID: 39100546 PMCID: PMC11297494 DOI: 10.1093/bioadv/vbae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/12/2024] [Accepted: 07/25/2024] [Indexed: 08/06/2024]
Abstract
Motivation Volumetric 3D object analyses are being applied in research fields such as structural bioinformatics, biophysics, and structural biology, with potential integration of artificial intelligence/machine learning (AI/ML) techniques. One such method, 3D Zernike moments, has proven valuable in analyzing protein structures (e.g., protein fold classification, protein-protein interaction analysis, and molecular dynamics simulations). Their compactness and efficiency make them amenable to large-scale analyses. Established methods for deriving 3D Zernike moments, however, can be inefficient, particularly when higher order terms are required, hindering broader applications. As the volume of experimental and computationally-predicted protein structure information continues to increase, structural biology has become a "big data" science requiring more efficient analysis tools. Results This application note presents a Python-based software package, ZMPY3D, to accelerate computation of 3D Zernike moments by vectorizing the mathematical formulae and using graphical processing units (GPUs). The package offers popular GPU-supported libraries such as CuPy and TensorFlow together with NumPy implementations, aiming to improve computational efficiency, adaptability, and flexibility in future algorithm development. The ZMPY3D package can be installed via PyPI, and the source code is available from GitHub. Volumetric-based protein 3D structural similarity scores and transform matrix of superposition functionalities have both been implemented, creating a powerful computational tool that will allow the research community to amalgamate 3D Zernike moments with existing AI/ML tools, to advance research and education in protein structure bioinformatics. Availability and implementation ZMPY3D, implemented in Python, is available on GitHub (https://github.com/tawssie/ZMPY3D) and PyPI, released under the GPL License.
Collapse
Affiliation(s)
- Jhih-Siang Lai
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, United States
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| |
Collapse
|
7
|
Barco RA, Merino N, Lam B, Budnik B, Kaplan M, Wu F, Amend JP, Nealson KH, Emerson D. Comparative proteomics of a versatile, marine, iron-oxidizing chemolithoautotroph. Environ Microbiol 2024; 26:e16632. [PMID: 38861374 DOI: 10.1111/1462-2920.16632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/20/2024] [Indexed: 06/13/2024]
Abstract
This study conducted a comparative proteomic analysis to identify potential genetic markers for the biological function of chemolithoautotrophic iron oxidation in the marine bacterium Ghiorsea bivora. To date, this is the only characterized species in the class Zetaproteobacteria that is not an obligate iron-oxidizer, providing a unique opportunity to investigate differential protein expression to identify key genes involved in iron-oxidation at circumneutral pH. Over 1000 proteins were identified under both iron- and hydrogen-oxidizing conditions, with differentially expressed proteins found in both treatments. Notably, a gene cluster upregulated during iron oxidation was identified. This cluster contains genes encoding for cytochromes that share sequence similarity with the known iron-oxidase, Cyc2. Interestingly, these cytochromes, conserved in both Bacteria and Archaea, do not exhibit the typical β-barrel structure of Cyc2. This cluster potentially encodes a biological nanowire-like transmembrane complex containing multiple redox proteins spanning the inner membrane, periplasm, outer membrane, and extracellular space. The upregulation of key genes associated with this complex during iron-oxidizing conditions was confirmed by quantitative reverse transcription-PCR. These findings were further supported by electromicrobiological methods, which demonstrated negative current production by G. bivora in a three-electrode system poised at a cathodic potential. This research provides significant insights into the biological function of chemolithoautotrophic iron oxidation.
Collapse
Affiliation(s)
- Roman A Barco
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Bigelow Laboratory for Ocean Sciences, East Boothbay, Maine, USA
| | - N Merino
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Lawrence Livermore National Lab, Biosciences and Biotechnology Division, Livermore, California, USA
| | - B Lam
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - B Budnik
- Mass Spectrometry and Proteomics Resource Laboratory, Harvard University, Cambridge, Massachusetts, USA
| | - M Kaplan
- Department of Microbiology, University of Chicago, Chicago, Illinois, USA
| | - F Wu
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, Zhejiang, China
| | - J P Amend
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - K H Nealson
- Department of Earth Sciences, University of Southern California, Los Angeles, California, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - D Emerson
- Bigelow Laboratory for Ocean Sciences, East Boothbay, Maine, USA
| |
Collapse
|
8
|
Qi J, Feng C, Shi Y, Yang J, Zhang F, Li G, Han R. FP-Zernike: An Open-source Structural Database Construction Toolkit for Fast Structure Retrieval. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae007. [PMID: 38894604 PMCID: PMC11423855 DOI: 10.1093/gpbjnl/qzae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 08/16/2023] [Accepted: 09/20/2023] [Indexed: 06/21/2024]
Abstract
The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.
Collapse
Affiliation(s)
- Junhai Qi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- BioMap Research, Menlo Park, CA 94025, USA
| | - Chenjie Feng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- College of Medical Information and Engineering, Ningxia Medical University, Yinchuan 750004, China
| | - Yulin Shi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Fa Zhang
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
9
|
Burley SK, Piehl DW, Vallat B, Zardecki C. RCSB Protein Data Bank: supporting research and education worldwide through explorations of experimentally determined and computationally predicted atomic level 3D biostructures. IUCRJ 2024; 11:279-286. [PMID: 38597878 PMCID: PMC11067742 DOI: 10.1107/s2052252524002604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/19/2024] [Indexed: 04/11/2024]
Abstract
The Protein Data Bank (PDB) was established as the first open-access digital data resource in biology and medicine in 1971 with seven X-ray crystal structures of proteins. Today, the PDB houses >210 000 experimentally determined, atomic level, 3D structures of proteins and nucleic acids as well as their complexes with one another and small molecules (e.g. approved drugs, enzyme cofactors). These data provide insights into fundamental biology, biomedicine, bioenergy and biotechnology. They proved particularly important for understanding the SARS-CoV-2 global pandemic. The US-funded Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) and other members of the Worldwide Protein Data Bank (wwPDB) partnership jointly manage the PDB archive and support >60 000 `data depositors' (structural biologists) around the world. wwPDB ensures the quality and integrity of the data in the ever-expanding PDB archive and supports global open access without limitations on data usage. The RCSB PDB research-focused web portal at https://www.rcsb.org/ (RCSB.org) supports millions of users worldwide, representing a broad range of expertise and interests. In addition to retrieving 3D structure data, PDB `data consumers' access comparative data and external annotations, such as information about disease-causing point mutations and genetic variations. RCSB.org also provides access to >1 000 000 computed structure models (CSMs) generated using artificial intelligence/machine-learning methods. To avoid doubt, the provenance and reliability of experimentally determined PDB structures and CSMs are identified. Related training materials are available to support users in their RCSB.org explorations.
Collapse
Affiliation(s)
- Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Biology Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Dennis W. Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
10
|
Greener JG, Jamali K. Fast protein structure searching using structure graph embeddings. BIOINFORMATICS ADVANCES 2024; 5:vbaf042. [PMID: 40196750 PMCID: PMC11974391 DOI: 10.1093/bioadv/vbaf042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/11/2025] [Accepted: 03/03/2025] [Indexed: 04/09/2025]
Abstract
Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation, and protein classification. Fast and accurate methods to search with structures will be essential to make use of the vast databases that have recently become available, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein domains. Availability and implementation The method, called Progres, is available as software at https://github.com/greener-group/progres and as a web server at https://progres.mrc-lmb.cam.ac.uk. It has accuracy comparable to the best current methods and can search the AlphaFold database TED domains in a 10th of a second per query on CPU.
Collapse
Affiliation(s)
- Joe G Greener
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| | - Kiarash Jamali
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| |
Collapse
|
11
|
Manalastas-Cantos K, Adoni KR, Pfeifer M, Märtens B, Grünewald K, Thalassinos K, Topf M. Modeling Flexible Protein Structure With AlphaFold2 and Crosslinking Mass Spectrometry. Mol Cell Proteomics 2024; 23:100724. [PMID: 38266916 PMCID: PMC10884514 DOI: 10.1016/j.mcpro.2024.100724] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/26/2024] Open
Abstract
We propose a pipeline that combines AlphaFold2 (AF2) and crosslinking mass spectrometry (XL-MS) to model the structure of proteins with multiple conformations. The pipeline consists of two main steps: ensemble generation using AF2 and conformer selection using XL-MS data. For conformer selection, we developed two scores-the monolink probability score (MP) and the crosslink probability score (XLP)-both of which are based on residue depth from the protein surface. We benchmarked MP and XLP on a large dataset of decoy protein structures and showed that our scores outperform previously developed scores. We then tested our methodology on three proteins having an open and closed conformation in the Protein Data Bank: Complement component 3 (C3), luciferase, and glutamine-binding periplasmic protein, first generating ensembles using AF2, which were then screened for the open and closed conformations using experimental XL-MS data. In five out of six cases, the most accurate model within the AF2 ensembles-or a conformation within 1 Å of this model-was identified using crosslinks, as assessed through the XLP score. In the remaining case, only the monolinks (assessed through the MP score) successfully identified the open conformation of glutamine-binding periplasmic protein, and these results were further improved by including the "occupancy" of the monolinks. This serves as a compelling proof-of-concept for the effectiveness of monolinks. In contrast, the AF2 assessment score was only able to identify the most accurate conformation in two out of six cases. Our results highlight the complementarity of AF2 with experimental methods like XL-MS, with the MP and XLP scores providing reliable metrics to assess the quality of the predicted models. The MP and XLP scoring functions mentioned above are available at https://gitlab.com/topf-lab/xlms-tools.
Collapse
Affiliation(s)
- Karen Manalastas-Cantos
- Center for Data and Computing in Natural Sciences, Universität Hamburg, Hamburg, Germany; Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany
| | - Kish R Adoni
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Matthias Pfeifer
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Birgit Märtens
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Kay Grünewald
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Department of Chemistry, Universität Hamburg, Hamburg, Germany
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Maya Topf
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany.
| |
Collapse
|
12
|
Zhang A, Mickelin O, Kileel J, Verbeke EJ, Marshall NF, Gilles MA, Singer A. Moment-based metrics for molecules computable from cryogenic electron microscopy images. BIOLOGICAL IMAGING 2024; 4:e3. [PMID: 38516630 PMCID: PMC10951804 DOI: 10.1017/s2633903x24000023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/27/2024] [Accepted: 01/30/2024] [Indexed: 03/23/2024]
Abstract
Single-particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution three-dimensional (3D) structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3D reconstruction, known as Kam's method, relies on the moments of the two-dimensional (2D) images. Inspired by Kam's method, we introduce a rotationally invariant metric between two molecular structures, which does not require 3D alignment. Further, we introduce a metric between a stack of projection images and a molecular structure, which is invariant to rotations and reflections and does not require performing 3D reconstruction. Additionally, the latter metric does not assume a uniform distribution of viewing angles. We demonstrate the uses of the new metrics on synthetic and experimental datasets, highlighting their ability to measure structural similarity.
Collapse
Affiliation(s)
- Andy Zhang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Oscar Mickelin
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Joe Kileel
- Department of Mathematics and Oden Institute, University of Texas at Austin, Austin, TX, USA
| | - Eric J. Verbeke
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | | | - Marc Aurèle Gilles
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Amit Singer
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- Department of Mathematics, Princeton University, Princeton, NJ, USA
| |
Collapse
|
13
|
Bitencourt-Ferreira G, Villarreal MA, Quiroga R, Biziukova N, Poroikov V, Tarasova O, de Azevedo Junior WF. Exploring Scoring Function Space: Developing Computational Models for Drug Discovery. Curr Med Chem 2024; 31:2361-2377. [PMID: 36944627 DOI: 10.2174/0929867330666230321103731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 03/23/2023]
Abstract
BACKGROUND The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHODS We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
Collapse
Affiliation(s)
| | - Marcos A Villarreal
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Rodrigo Quiroga
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Walter F de Azevedo Junior
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
- Specialization Program in Bioinformatics, The Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681 Porto Alegre / RS 90619-900, Brazil
| |
Collapse
|
14
|
Banach M. Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python. Molecules 2023; 29:52. [PMID: 38202635 PMCID: PMC10779519 DOI: 10.3390/molecules29010052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/06/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024] Open
Abstract
Object retrieval systems measure the degree of similarity of the shape of 3D models. They search for the elements of the 3D model databases that resemble the query model. In structural bioinformatics, the query model is a protein tertiary/quaternary structure and the objective is to find similarly shaped molecules in the Protein Data Bank. With the ever-growing size of the PDB, a direct atomic coordinate comparison with all its members is impractical. To overcome this problem, the shape of the molecules can be encoded by fixed-length feature vectors. The distance of a protein to the entire PDB can be measured in this low-dimensional domain in linear time. The state-of-the-art approaches utilize Zernike-Canterakis moments for the shape encoding and supply the retrieval process with geometric data of the input structures. The BioZernike descriptors are a standard utility of the PDB since 2020. However, when trying to calculate the ZC moments locally, the issue of the deficiency of libraries readily available for use in custom programs (i.e., without relying on external binaries) is encountered, in particular programs written in Python. Here, a fast and well-documented Python implementation of the Pozo-Koehl algorithm is presented. In contrast to the more popular algorithm by Novotni and Klein, which is based on the voxelized volume, the PK algorithm produces ZC moments directly from the triangular surface meshes of 3D models. In particular, it can accept the molecular surfaces of proteins as its input. In the presented PK-Zernike library, owing to Numba's just-in-time compilation, a mesh with 50,000 facets is processed by a single thread in a second at the moment order 20. Since this is the first time the PK algorithm is used in structural bioinformatics, it is employed in a novel, simple, but efficient protein structure retrieval pipeline. The elimination of the outlying chain fragments via a fast PCA-based subroutine improves the discrimination ability, allowing for this pipeline to achieve an 0.961 area under the ROC curve in the BioZernike validation suite (0.997 for the assemblies). The correlation between the results of the proposed approach and of the 3D Surfer program attains values up to 0.99.
Collapse
Affiliation(s)
- Mateusz Banach
- Department of Bioinformatics and Telemedicine, Faculty of Medicine, Jagiellonian University Medical College, Medyczna 7, 30-688 Kraków, Poland
| |
Collapse
|
15
|
Catapano L, Long F, Yamashita K, Nicholls RA, Steiner RA, Murshudov GN. Neutron crystallographic refinement with REFMAC5 from the CCP4 suite. Acta Crystallogr D Struct Biol 2023; 79:1056-1070. [PMID: 37921806 PMCID: PMC7615533 DOI: 10.1107/s2059798323008793] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 10/05/2023] [Indexed: 11/04/2023] Open
Abstract
Hydrogen (H) atoms are abundant in macromolecules and often play critical roles in enzyme catalysis, ligand-recognition processes and protein-protein interactions. However, their direct visualization by diffraction techniques is challenging. Macromolecular X-ray crystallography affords the localization of only the most ordered H atoms at (sub-)atomic resolution (around 1.2 Å or higher). However, many H atoms of biochemical significance remain undetectable by this method. In contrast, neutron diffraction methods enable the visualization of most H atoms, typically in the form of deuterium (2H) atoms, at much more common resolution values (better than 2.5 Å). Thus, neutron crystallography, although technically demanding, is often the method of choice when direct information on protonation states is sought. REFMAC5 from the Collaborative Computational Project No. 4 (CCP4) is a program for the refinement of macromolecular models against X-ray crystallographic and cryo-EM data. This contribution describes its extension to include the refinement of structural models obtained from neutron crystallographic data. Stereochemical restraints with accurate bond distances between H atoms and their parent atom nuclei are now part of the CCP4 Monomer Library, the source of prior chemical information used in the refinement. One new feature for neutron data analysis in REFMAC5 is refinement of the protium/deuterium (1H/2H) fraction. This parameter describes the relative 1H/2H contribution to neutron scattering for hydrogen isotopes. The newly developed REFMAC5 algorithms were tested by performing the (re-)refinement of several entries available in the PDB and of one novel structure (FutA) using either (i) neutron data only or (ii) neutron data supplemented by external restraints to a reference X-ray crystallographic structure. Re-refinement with REFMAC5 afforded models characterized by R-factor values that are consistent with, and in some cases better than, the originally deposited values. The use of external reference structure restraints during refinement has been observed to be a valuable strategy, especially for structures at medium-low resolution.
Collapse
Affiliation(s)
- Lucrezia Catapano
- Randall Centre for Cell and Molecular Biophysics, Faculty of Life Sciences and Medicine, King’s College London, London SE1 9RT, United Kingdom
- Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Fei Long
- Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Keitaro Yamashita
- Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert A. Nicholls
- Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Roberto A. Steiner
- Randall Centre for Cell and Molecular Biophysics, Faculty of Life Sciences and Medicine, King’s College London, London SE1 9RT, United Kingdom
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Garib N. Murshudov
- Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
16
|
Petrovskiy DV, Nikolsky KS, Rudnev VR, Kulikova LI, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. SAFoldNet: A Novel Tool for Discovering and Aligning Three-Dimensional Protein Structures Based on a Neural Network. Int J Mol Sci 2023; 24:14439. [PMID: 37833886 PMCID: PMC10572457 DOI: 10.3390/ijms241914439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 09/15/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023] Open
Abstract
The development and improvement of methods for comparing and searching for three-dimensional protein structures remain urgent tasks in modern structural biology. To solve this problem, we developed a new tool, SAFoldNet, which allows for searching, aligning, superimposing, and determining the exact coordinates of fragments of protein structures. The proposed search and alignment tool was built using neural networking. Specifically, we implemented the integrative synergy of neural network predictions and the well-known BLAST algorithm for searching and aligning sequences. The proposed method involves multistage processing, comprising a stage for converting the geometry of protein structures into sequences of a structural alphabet using a neural network, a search stage for forming a set of candidate structures, and a refinement stage for calculating the structural alignment and overlap and evaluating the similarity with the starting structure of the search. The effectiveness and practical applicability of the proposed tool were compared with those of several widely used services for searching and aligning protein structures. The results of the comparisons confirmed that the proposed method is effective and competitive relative to the available modern services. Furthermore, using the proposed approach, a service with a user-friendly web interface was developed, which allows for searching, aligning, and superimposing protein structures; determining the location of protein fragments; mapping onto a protein molecule chain; and providing structural similarity metrices (expected value and root mean square deviation).
Collapse
Affiliation(s)
| | | | | | | | | | - Kristina A. Malsagova
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (D.V.P.); (K.S.N.); (V.R.R.); (L.I.K.); (T.V.B.); (A.T.K.); (A.L.K.)
| | | | | |
Collapse
|
17
|
Ravnik V, Jukič M, Bren U. Identifying Metal Binding Sites in Proteins Using Homologous Structures, the MADE Approach. J Chem Inf Model 2023; 63:5204-5219. [PMID: 37557084 PMCID: PMC10466382 DOI: 10.1021/acs.jcim.3c00558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Indexed: 08/11/2023]
Abstract
In order to identify the locations of metal ions in the binding sites of proteins, we have developed a method named the MADE (MAcromolecular DEnsity and Structure Analysis) approach. The MADE approach represents an evolution of our previous toolset, the ProBiS H2O (MD) methodology, for the identification of conserved water molecules. Our method uses experimental structures of proteins homologous to a query, which are subsequently superimposed upon it. Areas with a particular species present in a similar location among many homologous protein structures are identified using a clustering algorithm. Dense clusters likely represent positions containing species important to the query protein structure or function. We analyze well-characterized apo protein structures and show that the MADE approach can identify clusters corresponding to the expected positions of metal ions in their binding sites. The greatest advantage of our method lies in its generality. It can in principle be applied to any species found in protein records; it is not only limited to metal ions. We additionally demonstrate that the MADE approach can be successfully applied to predict the location of cofactors in computer-modeled structures, e.g., via AlphaFold. We also conduct a careful protein superposition method comparison and find our methodology robust and the results largely independent of the selected protein superposition algorithm. We postulate that with increasing structural data availability, additional applications of the MADE approach will be possible such as non-protein systems, water network identification, protein binding site elaboration, and analysis of binding events, all in a dynamic manner. We have implemented the MADE approach as a plugin for the PyMOL molecular visualization tool. The MADE plugin is available free of charge at https://gitlab.com/Jukic/made_software.
Collapse
Affiliation(s)
- Vid Ravnik
- Faculty
of Chemistry and Chemical Engineering, University
of Maribor, Smetanova
ulica 17, Maribor SI-2000, Slovenia
| | - Marko Jukič
- Faculty
of Chemistry and Chemical Engineering, University
of Maribor, Smetanova
ulica 17, Maribor SI-2000, Slovenia
- The
Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, Koper SI-6000, Slovenia
- Institute
for Environmental Protection and Sensors, Beloruska ulica 7, Maribor SI-2000, Slovenia
| | - Urban Bren
- Faculty
of Chemistry and Chemical Engineering, University
of Maribor, Smetanova
ulica 17, Maribor SI-2000, Slovenia
- The
Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, Koper SI-6000, Slovenia
- Institute
for Environmental Protection and Sensors, Beloruska ulica 7, Maribor SI-2000, Slovenia
| |
Collapse
|
18
|
Kakoulidis P, Vlachos IS, Thanos D, Blatch GL, Emiris IZ, Anastasiadou E. Identifying and profiling structural similarities between Spike of SARS-CoV-2 and other viral or host proteins with Machaon. Commun Biol 2023; 6:752. [PMID: 37468602 PMCID: PMC10356814 DOI: 10.1038/s42003-023-05076-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 06/26/2023] [Indexed: 07/21/2023] Open
Abstract
Using protein structure to predict function, interactions, and evolutionary history is still an open challenge, with existing approaches relying extensively on protein homology and families. Here, we present Machaon, a data-driven method combining orientation invariant metrics on phi-psi angles, inter-residue contacts and surface complexity. It can be readily applied on whole structures or segments-such as domains and binding sites. Machaon was applied on SARS-CoV-2 Spike monomers of native, Delta and Omicron variants and identified correlations with a wide range of viral proteins from close to distant taxonomy ranks, as well as host proteins, such as ACE2 receptor. Machaon's meta-analysis of the results highlights structural, chemical and transcriptional similarities between the Spike monomer and human proteins, indicating a multi-level viral mimicry. This extended analysis also revealed relationships of the Spike protein with biological processes such as ubiquitination and angiogenesis and highlighted different patterns in virus attachment among the studied variants. Available at: https://machaonweb.com .
Collapse
Affiliation(s)
- Panos Kakoulidis
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main St., Cambridge, MA, 02142, USA
- Cancer Research Institute, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Dana Building, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
| | - Dimitris Thanos
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece
| | - Gregory L Blatch
- Biomedical Biotechnology Research Unit, Department of Biochemistry and Microbiology, Rhodes University, PO Box 94, Makhanda (Grahamstown) 6140, Eastern Cape, South Africa
- Biomedical and Drug Discovery Research Group, Faculty of Health Sciences, Higher Colleges of Technology, PO 25026, Sharjah, UAE
- Institute for Health and Sport, Victoria University, Melbourne, PO Box 14428, VIC 8001, Melbourne, Australia
- The Vice Chancellery, The University of Notre Dame Australia, PO Box 1225, WA 6959, Fremantle, Australia
| | - Ioannis Z Emiris
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Ilisia, 157 84, Athens, Greece
- ATHENA Research and Innovation Center, Artemidos 6 & Epidavrou 15125, Marousi, Greece
| | - Ema Anastasiadou
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St., 115 27, Athens, Greece.
| |
Collapse
|
19
|
Bittrich S, Bhikadiya C, Bi C, Chao H, Duarte JM, Dutta S, Fayazi M, Henry J, Khokhriakov I, Lowe R, Piehl DW, Segura J, Vallat B, Voigt M, Westbrook JD, Burley SK, Rose Y. RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances. J Mol Biol 2023; 435:167994. [PMID: 36738985 PMCID: PMC11514064 DOI: 10.1016/j.jmb.2023.167994] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/27/2023] [Accepted: 01/28/2023] [Indexed: 02/05/2023]
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to ∼1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside ∼200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with minimal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA.
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| |
Collapse
|
20
|
Pan A, Zeng Y, Liu J, Zhou M, Lai EC, Yu Y. Unanticipated broad phylogeny of BEN DNA-binding domains revealed by structural homology searches. Curr Biol 2023; 33:2270-2282.e2. [PMID: 37236184 PMCID: PMC10348805 DOI: 10.1016/j.cub.2023.05.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/07/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023]
Abstract
Organization of protein sequences into domain families is a foundation for cataloging and investigating protein functions. However, long-standing strategies based on primary amino acid sequences are blind to the possibility that proteins with dissimilar sequences could have comparable tertiary structures. Building on our recent findings that in silico structural predictions of BEN family DNA-binding domains closely resemble their experimentally determined crystal structures, we exploited the AlphaFold2 database for comprehensive identification of BEN domains. Indeed, we identified numerous novel BEN domains, including members of new subfamilies. For example, while no BEN domain factors had previously been annotated in C. elegans, this species actually encodes multiple BEN proteins. These include key developmental timing genes of orphan domain status, sel-7 and lin-14, the latter being the central target of the founding miRNA lin-4. We also reveal that the domain of unknown function 4806 (DUF4806), which is widely distributed across metazoans, is structurally similar to BEN and comprises a new subtype. Surprisingly, we find that BEN domains resemble both metazoan and non-metazoan homeodomains in 3D conformation and preserve characteristic residues, indicating that despite their inability to be aligned by conventional methods, these DNA-binding modules are probably evolutionarily related. Finally, we broaden the application of structural homology searches by revealing novel human members of DUF3504, which exists on diverse proteins with presumed or known nuclear functions. Overall, our work strongly expands this recently identified family of transcription factors and illustrates the value of 3D structural predictions to annotate protein domains and interpret their functions.
Collapse
Affiliation(s)
- Anyu Pan
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Yangfan Zeng
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Jingjing Liu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Mengjie Zhou
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Eric C Lai
- Developmental Biology Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Yang Yu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing 100005, China.
| |
Collapse
|
21
|
Xue Y, Mei H, Chen Y, Griffin JD, Liu Q, Weisberg E, Yang J. Repurposing clinically available drugs and therapies for pathogenic targets to combat SARS-CoV-2. MedComm (Beijing) 2023; 4:e254. [PMID: 37193304 PMCID: PMC10183156 DOI: 10.1002/mco2.254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/11/2023] [Accepted: 03/07/2023] [Indexed: 05/18/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has affected a large portion of the global population, both physically and mentally. Current evidence suggests that the rapidly evolving coronavirus subvariants risk rendering vaccines and antibodies ineffective due to their potential to evade existing immunity, with enhanced transmission activity and higher reinfection rates that could lead to new outbreaks across the globe. The goal of viral management is to disrupt the viral life cycle as well as to relieve severe symptoms such as lung damage, cytokine storm, and organ failure. In the fight against viruses, the combination of viral genome sequencing, elucidation of the structure of viral proteins, and identifying proteins that are highly conserved across multiple coronaviruses has revealed many potential molecular targets. In addition, the time- and cost-effective repurposing of preexisting antiviral drugs or approved/clinical drugs for these targets offers considerable clinical advantages for COVID-19 patients. This review provides a comprehensive overview of various identified pathogenic targets and pathways as well as corresponding repurposed approved/clinical drugs and their potential against COVID-19. These findings provide new insight into the discovery of novel therapeutic strategies that could be applied to the control of disease symptoms emanating from evolving SARS-CoV-2 variants.
Collapse
Affiliation(s)
- Yiying Xue
- Department of Hematology, Tongji Hospital, Frontier Science Center for Stem Cell Research, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and TechnologyTongji UniversityShanghaiChina
| | - Husheng Mei
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical ScienceChinese Academy of SciencesHefeiChina
- University of Science and Technology of ChinaHefeiAnhuiChina
| | - Yisa Chen
- Department of Hematology, Tongji Hospital, Frontier Science Center for Stem Cell Research, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and TechnologyTongji UniversityShanghaiChina
| | - James D. Griffin
- Department of Medical Oncology, Dana‐Farber Cancer InstituteBostonMassachusettsUSA
- Department of Medicine, Harvard Medical SchoolBostonMassachusettsUSA
| | - Qingsong Liu
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical ScienceChinese Academy of SciencesHefeiChina
- University of Science and Technology of ChinaHefeiAnhuiChina
- Hefei Cancer HospitalChinese Academy of SciencesHefeiChina
| | - Ellen Weisberg
- Department of Medical Oncology, Dana‐Farber Cancer InstituteBostonMassachusettsUSA
- Department of Medicine, Harvard Medical SchoolBostonMassachusettsUSA
| | - Jing Yang
- Department of Hematology, Tongji Hospital, Frontier Science Center for Stem Cell Research, Shanghai Key Laboratory of Signaling and Disease Research, School of Life Sciences and TechnologyTongji UniversityShanghaiChina
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical ScienceChinese Academy of SciencesHefeiChina
| |
Collapse
|
22
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
23
|
Zhao K, Xia Y, Zhang F, Zhou X, Li SZ, Zhang G. Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader. Commun Biol 2023; 6:243. [PMID: 36871126 PMCID: PMC9985440 DOI: 10.1038/s42003-023-04605-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/16/2023] [Indexed: 03/06/2023] Open
Abstract
Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we propose a method, PAthreader, to recognize remote templates and explore folding pathways. Firstly, we design a three-track alignment between predicted distance profiles and structure profiles extracted from PDB and AlphaFold DB, to improve the recognition accuracy of remote templates. Secondly, we improve the performance of AlphaFold2 using the templates identified by PAthreader. Thirdly, we explore protein folding pathways based on our conjecture that dynamic folding information of protein is implicitly contained in its remote homologs. The results show that the average accuracy of PAthreader templates is 11.6% higher than that of HHsearch. In terms of structure modelling, PAthreader outperform AlphaFold2 and ranks first on the CAMEO blind test for the latest three months. Furthermore, we predict protein folding pathways for 37 proteins, in which the results of 7 proteins are almost consistent with those of biological experiments, and the other 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Fujin Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Stan Z Li
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, 310030, Zhejiang, China.
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
24
|
Xu Q, Dunbrack R. The protein common assembly database (ProtCAD)-a comprehensive structural resource of protein complexes. Nucleic Acids Res 2023; 51:D466-D478. [PMID: 36300618 PMCID: PMC9825537 DOI: 10.1093/nar/gkac937] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/04/2022] [Accepted: 10/11/2022] [Indexed: 01/29/2023] Open
Abstract
Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryo-electron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10-15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).
Collapse
Affiliation(s)
- Qifang Xu
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| |
Collapse
|
25
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan S, Ghosh S, Goodsell DS, Green RK, Guranovic V, Henry J, Hudson BP, Khokhriakov I, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook JD, Whetstone S, Young JY, Zalevsky A, Zardecki C. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023; 51:D488-D508. [PMID: 36420884 PMCID: PMC9825554 DOI: 10.1093/nar/gkac1077] [Citation(s) in RCA: 360] [Impact Index Per Article: 180.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 11/02/2022] [Indexed: 11/27/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Paul A Craig
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Justin W Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sai Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ben Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
26
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
27
|
Selvaraj V, Rathinavel T, Ammashi S, Nasir Iqbal M. Polyphenolic Phytochemicals Exhibit Promising SARS-COV-2 Papain Like Protease (PLpro) Inhibition Validated through a Computational Approach. Polycycl Aromat Compd 2022. [DOI: 10.1080/10406638.2022.2103578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Vasuki Selvaraj
- Department of Biotechnology, Sona College of Arts and Science, Salem, India
| | | | - Subramanian Ammashi
- PG and Research Department of Biochemistry, Rajah Serfoji Government College, Thanjavur, India
| | - Muhammad Nasir Iqbal
- Department of Bioinformatics, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| |
Collapse
|
28
|
Exploring protein symmetry at the RCSB Protein Data Bank. Emerg Top Life Sci 2022; 6:231-243. [PMID: 35801924 PMCID: PMC9472815 DOI: 10.1042/etls20210267] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/15/2022] [Accepted: 06/20/2022] [Indexed: 11/17/2022]
Abstract
The symmetry of biological molecules has fascinated structural biologists ever since the structure of hemoglobin was determined. The Protein Data Bank (PDB) archive is the central global archive of three-dimensional (3D), atomic-level structures of biomolecules, providing open access to the results of structural biology research with no limitations on usage. Roughly 40% of the structures in the archive exhibit some type of symmetry, including formal global symmetry, local symmetry, or pseudosymmetry. The Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (founding member of the Worldwide Protein Data Bank partnership that jointly manages, curates, and disseminates the archive) provides a variety of tools to assist users interested in exploring the symmetry of biological macromolecules. These tools include multiple modalities for searching and browsing the archive, turnkey methods for biomolecular visualization, documentation, and outreach materials for exploring functional biomolecular symmetry.
Collapse
|
29
|
Langenfeld F, Aderinwale T, Christoffer C, Shin WH, Terashi G, Wang X, Kihara D, Benhabiles H, Hammoudi K, Cabani A, Windal F, Melkemi M, Otu E, Zwiggelaar R, Hunter D, Liu Y, Sirugue L, Nguyen HNH, Nguyen TDH, Nguyen-Truong VT, Le D, Nguyen HD, Tran MT, Montès M. Surface-based protein domains retrieval methods from a SHREC2021 challenge. J Mol Graph Model 2022; 111:108103. [PMID: 34959149 PMCID: PMC9746607 DOI: 10.1016/j.jmgm.2021.108103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/29/2021] [Accepted: 12/04/2021] [Indexed: 12/15/2022]
Abstract
Proteins are essential to nearly all cellular mechanism and the effectors of the cells activities. As such, they often interact through their surface with other proteins or other cellular ligands such as ions or organic molecules. The evolution generates plenty of different proteins, with unique abilities, but also proteins with related functions hence similar 3D surface properties (shape, physico-chemical properties, …). The protein surfaces are therefore of primary importance for their activity. In the present work, we assess the ability of different methods to detect such similarities based on the geometry of the protein surfaces (described as 3D meshes), using either their shape only, or their shape and the electrostatic potential (a biologically relevant property of proteins surface). Five different groups participated in this contest using the shape-only dataset, and one group extended its pre-existing method to handle the electrostatic potential. Our comparative study reveals both the ability of the methods to detect related proteins and their difficulties to distinguish between highly related proteins. Our study allows also to analyze the putative influence of electrostatic information in addition to the one of protein shapes alone. Finally, the discussion permits to expose the results with respect to ones obtained in the previous contests for the extended method. The source codes of each presented method have been made available online.
Collapse
Affiliation(s)
- Florent Langenfeld
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France,Corresponding author: (F. Langenfeld)
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Woong-Hee Shin
- Department of Chemical Science Education, Sunchon National University, Suncheon, 57922, Republic of Korea
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Halim Benhabiles
- Univ. Lille, CNRS, Centrale Lille, Univ. Polytechnique Hauts-de-France, Junia, UMR 8520, IEMN - Institut d’Electronique de Microélectronique et de Nanotechnologie, F-59 000, Lille, France
| | - Karim Hammoudi
- Université de Haute-Alsace, Department of Computer Science, IRIMAS, F-68 100, Mulhouse, France,Université de Strasbourg, France
| | - Adnane Cabani
- Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000, Rouen, France
| | - Feryal Windal
- Univ. Lille, CNRS, Centrale Lille, Univ. Polytechnique Hauts-de-France, Junia, UMR 8520, IEMN - Institut d’Electronique de Microélectronique et de Nanotechnologie, F-59 000, Lille, France
| | - Mahmoud Melkemi
- Université de Haute-Alsace, Department of Computer Science, IRIMAS, F-68 100, Mulhouse, France,Université de Strasbourg, France
| | - Ekpo Otu
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - Reyer Zwiggelaar
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - David Hunter
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - Yonghuai Liu
- Department of Computer Science, Edge Hill University, Ormskirk, L39 4QP, UK
| | - Léa Sirugue
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France
| | - Huu-Nghia H. Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Tuan-Duy H. Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | | | - Danh Le
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Hai-Dang Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Minh-Triet Tran
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam,John von Neumann Institute, VNU-HCM, Viet Nam
| | - Matthieu Montès
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France,Corresponding author: (M. Montès)
| |
Collapse
|
30
|
Bayly-Jones C, Whisstock JC. Mining folded proteomes in the era of accurate structure prediction. PLoS Comput Biol 2022; 18:e1009930. [PMID: 35333855 PMCID: PMC8986115 DOI: 10.1371/journal.pcbi.1009930] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/06/2022] [Accepted: 02/16/2022] [Indexed: 01/02/2023] Open
Abstract
Protein structure fundamentally underpins the function and processes of numerous biological systems. Fold recognition algorithms offer a sensitive and robust tool to detect structural, and thereby functional, similarities between distantly related homologs. In the era of accurate structure prediction owing to advances in machine learning techniques and a wealth of experimentally determined structures, previously curated sequence databases have become a rich source of biological information. Here, we use bioinformatic fold recognition algorithms to scan the entire AlphaFold structure database to identify novel protein family members, infer function and group predicted protein structures. As an example of the utility of this approach, we identify novel, previously unknown members of various pore-forming protein families, including MACPFs, GSDMs and aerolysin-like proteins.
Collapse
Affiliation(s)
- Charles Bayly-Jones
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Australia
- Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Australia
| | - James C. Whisstock
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Australia
- Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Australia
| |
Collapse
|
31
|
Goodsell DS, Burley SK. RCSB Protein Data Bank resources for structure-facilitated design of mRNA vaccines for existing and emerging viral pathogens. Structure 2022; 30:55-68.e2. [PMID: 34739839 PMCID: PMC8567414 DOI: 10.1016/j.str.2021.10.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 09/17/2021] [Accepted: 10/14/2021] [Indexed: 01/11/2023]
Abstract
Structural biologists provide direct insights into the molecular bases of human health and disease. The open-access Protein Data Bank (PDB) stores and delivers three-dimensional (3D) biostructure data that facilitate discovery and development of therapeutic agents and diagnostic tools. We are in the midst of a revolution in vaccinology. Non-infectious mRNA vaccines have been proven during the coronavirus disease 2019 (COVID-19) pandemic. This new technology underpins nimble discovery and clinical development platforms that use knowledge of 3D viral protein structures for societal benefit. The RCSB PDB supports vaccine designers through expert biocuration and rigorous validation of 3D structures; open-access dissemination of structure information; and search, visualization, and analysis tools for structure-guided design efforts. This resource article examines the structural biology underpinning the success of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) mRNA vaccines and enumerates some of the many protein structures in the PDB archive that could guide design of new countermeasures against existing and emerging viral pathogens.
Collapse
Affiliation(s)
- David S Goodsell
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Stephen K Burley
- RCSB Protein Data Bank and Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, CA 92093, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
| |
Collapse
|
32
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow GV, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan SJ, Goodsell DS, Ghosh S, Kramer Green R, Guranovic V, Henry J, Hudson BP, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Westbrook JD, Whetstone S, Young JY, Zardecki C. RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci 2022; 31:187-208. [PMID: 34676613 PMCID: PMC8740825 DOI: 10.1002/pro.4213] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/12/2021] [Accepted: 10/12/2021] [Indexed: 01/03/2023]
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), funded by the US National Science Foundation, National Institutes of Health, and Department of Energy, has served structural biologists and Protein Data Bank (PDB) data consumers worldwide since 1999. RCSB PDB, a founding member of the Worldwide Protein Data Bank (wwPDB) partnership, is the US data center for the global PDB archive housing biomolecular structure data. RCSB PDB is also responsible for the security of PDB data, as the wwPDB-designated Archive Keeper. Annually, RCSB PDB serves tens of thousands of three-dimensional (3D) macromolecular structure data depositors (using macromolecular crystallography, nuclear magnetic resonance spectroscopy, electron microscopy, and micro-electron diffraction) from all inhabited continents. RCSB PDB makes PDB data available from its research-focused RCSB.org web portal at no charge and without usage restrictions to millions of PDB data consumers working in every nation and territory worldwide. In addition, RCSB PDB operates an outreach and education PDB101.RCSB.org web portal that was used by more than 800,000 educators, students, and members of the public during calendar year 2020. This invited Tools Issue contribution describes (i) how the archive is growing and evolving as new experimental methods generate ever larger and more complex biomolecular structures; (ii) the importance of data standards and data remediation in effective management of the archive and facile integration with more than 50 external data resources; and (iii) new tools and features for 3D structure analysis and visualization made available during the past year via the RCSB.org web portal.
Collapse
Affiliation(s)
- Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Cancer Institute of New JerseyRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
- Department of Chemistry and Chemical BiologyRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Gregg V. Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jose M. Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Cancer Institute of New JerseyRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Justin W. Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Sai J. Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences InstituteUniversity of CaliforniaSan FranciscoCaliforniaUSA
| | - David S. Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Cancer Institute of New JerseyRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
- Department of Integrative Structural and Computational BiologyThe Scripps Research InstituteLa JollaCaliforniaUSA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Brian P. Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Catherine L. Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Dennis W. Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences InstituteUniversity of CaliforniaSan FranciscoCaliforniaUSA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaLa JollaCaliforniaUSA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - John D. Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Cancer Institute of New JerseyRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Jasmine Y. Young
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data BankRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
- Institute for Quantitative BiomedicineRutgers, The State University of New JerseyPiscatawayNew JerseyUSA
| |
Collapse
|
33
|
Mondal A, Perez A. Simultaneous Assignment and Structure Determination of Proteins From Sparsely Labeled NMR Datasets. Front Mol Biosci 2021; 8:774394. [PMID: 34912846 PMCID: PMC8667806 DOI: 10.3389/fmolb.2021.774394] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 10/25/2021] [Indexed: 11/29/2022] Open
Abstract
Sparsely labeled NMR samples provide opportunities to study larger biomolecular assemblies than is traditionally done by NMR. This requires new computational tools that can handle the sparsity and ambiguity in the NMR datasets. The MELD (modeling employing limited data) Bayesian approach was assessed to be the best performing in predicting structures from sparsely labeled NMR data in the 13th edition of the Critical Assessment of Structure Prediction (CASP) event—and limitations of the methodology were also noted. In this report, we evaluate the nature and difficulty in modeling unassigned sparsely labeled NMR datasets and report on an improved methodological pipeline leading to higher-accuracy predictions. We benchmark our methodology against the NMR datasets provided by CASP 13.
Collapse
Affiliation(s)
- Arup Mondal
- The Quantum Theory Project, Department of Chemistry, University of Florida, Gainesville, FL, United States
| | - Alberto Perez
- The Quantum Theory Project, Department of Chemistry, University of Florida, Gainesville, FL, United States
| |
Collapse
|
34
|
BEHZADI PAYAM, GAJDÁCS MÁRIÓ. Worldwide Protein Data Bank (wwPDB): A virtual treasure for research in biotechnology. Eur J Microbiol Immunol (Bp) 2021; 11:77-86. [PMID: 34908533 PMCID: PMC8830413 DOI: 10.1556/1886.2021.00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 11/23/2021] [Indexed: 12/25/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RSCB PDB) provides a wide range of digital data regarding biology and biomedicine. This huge internet resource involves a wide range of important biological data, obtained from experiments around the globe by different scientists. The Worldwide Protein Data Bank (wwPDB) represents a brilliant collection of 3D structure data associated with important and vital biomolecules including nucleic acids (RNAs and DNAs) and proteins. Moreover, this database accumulates knowledge regarding function and evolution of biomacromolecules which supports different disciplines such as biotechnology. 3D structure, functional characteristics and phylogenetic properties of biomacromolecules give a deep understanding of the biomolecules' characteristics. An important advantage of the wwPDB database is the data updating time, which is done every week. This updating process helps users to have the newest data and information for their projects. The data and information in wwPDB can be a great support to have an accurate imagination and illustrations of the biomacromolecules in biotechnology. As demonstrated by the SARS-CoV-2 pandemic, rapidly reliable and accessible biological data for microbiology, immunology, vaccinology, and drug development are critical to address many healthcare-related challenges that are facing humanity. The aim of this paper is to introduce the readers to wwPDB, and to highlight the importance of this database in biotechnology, with the expectation that the number of scientists interested in the utilization of Protein Data Bank's resources will increase substantially in the coming years.
Collapse
Affiliation(s)
- PAYAM BEHZADI
- Department of Microbiology, College of Basic Sciences, Shahr-e-Qods Branch, Islamic Azad University, Tehran, 37541-374, Iran
| | - MÁRIÓ GAJDÁCS
- Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, 6720, Szeged, Hungary,*Corresponding author. Tel.: +36-62-342-532. E-mail:
| |
Collapse
|
35
|
Machat M, Langenfeld F, Craciun D, Sirugue L, Labib T, Lagarde N, Maria M, Montes M. Comparative evaluation of shape retrieval methods on macromolecular surfaces: an application of computer vision methods in structural bioinformatics. Bioinformatics 2021; 37:4375-4382. [PMID: 34247232 PMCID: PMC8652110 DOI: 10.1093/bioinformatics/btab511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 05/18/2021] [Accepted: 07/08/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. RESULTS Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure-function paradigm toward a protein structure-surface(s)-function paradigm. AVAILABILITYAND IMPLEMENTATION All data are available online at http://datasetmachat.drugdesign.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohamed Machat
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Daniela Craciun
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Léa Sirugue
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Taoufik Labib
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Maxime Maria
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
- Laboratoire XLIM, UMR CNRS 7252, Université de Limoges, Limoges 87000, France
| | - Matthieu Montes
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| |
Collapse
|
36
|
Ljung F, André I. ZEAL: protein structure alignment based on shape similarity. Bioinformatics 2021; 37:2874-2881. [PMID: 33772587 DOI: 10.1093/bioinformatics/btab205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 02/02/2021] [Accepted: 03/25/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Most protein-structure superimposition tools consider only Cartesian coordinates. Yet, much of biology happens on the surface of proteins, which is why proteins with shared ancestry and similar function often have comparable surface shapes. Superposition of proteins based on surface shape can enable comparison of highly divergent proteins, identify convergent evolution and enable detailed comparison of surface features and binding sites. RESULTS We present ZEAL, an interactive tool to superpose global and local protein structures based on their shape resemblance using 3D (Zernike-Canterakis) functions to represent the molecular surface. In a benchmark study of structures with the same fold, we show that ZEAL outperforms two other methods for shape-based superposition. In addition, alignments from ZEAL were of comparable quality to the coordinate-based superpositions provided by TM-align. For comparisons of proteins with limited sequence and backbone-fold similarity, where coordinate-based methods typically fail, ZEAL can often find alignments with substantial surface-shape correspondence. In combination with shape-based matching, ZEAL can be used as a general tool to study relationships between shape and protein function. We identify several categories of protein functions where global shape similarity is significantly more likely than expected by random chance, when comparing proteins with little similarity on the fold level. In particular, we find that global surface shape similarity is particular common among DNA binding proteins. AVAILABILITY AND IMPLEMENTATION ZEAL can be used online at https://andrelab.org/zeal or as a standalone program with command line or graphical user interface. Source files and installers are available at https://github.com/Andre-lab/ZEAL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Filip Ljung
- Division of Biochemistry and Structural Biology, Department of Chemistry, Lund University, Lund SE-22100, Sweden
| | - Ingemar André
- Division of Biochemistry and Structural Biology, Department of Chemistry, Lund University, Lund SE-22100, Sweden
| |
Collapse
|
37
|
Druggable hot spots in trypanothione reductase: novel insights and opportunities for drug discovery revealed by DRUGpy. J Comput Aided Mol Des 2021; 35:871-882. [PMID: 34181199 DOI: 10.1007/s10822-021-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 06/18/2021] [Indexed: 10/21/2022]
Abstract
Assessment of target druggability guided by search and characterization of hot spots is a pivotal step in early stages of drug-discovery. The raw output of FTMap provides the data to perform this task, but it relies on manual intervention to properly combine different sets of consensus sites, therefore allowing identification of hot spots and evaluation of strength, shape and distance among them. Thus, the user's previous experience on the target and the software has a direct impact on how data generated by FTMap server can be explored. DRUGpy plugin was developed to overcome this limitation. By automatically assembling and scoring all possible combinations of consensus sites, DRUGpy plugin provides FTMap users a straight-forward method to identify and characterize hot spots in protein targets. DRUGpy is available in all operating systems that support PyMOL software. DRUGpy promptly identifies and characterizes pockets that are predicted by FTMap to bind druglike molecules with high-affinity (druggable sites) or low-affinity (borderline sites) and reveals how protein conformational flexibility impacts on the target's druggability. The use of DRUGpy on the analysis of trypanothione reductases (TR), a validated drug target against trypanosomatids, showcases the usefulness of the plugin, and led to the identification of a druggable pocket in the conserved dimer interface present in this class of proteins, opening new perspectives to the design of selective inhibitors.
Collapse
|
38
|
Milanetti E, Miotto M, Di Rienzo L, Nagaraj M, Monti M, Golbek TW, Gosti G, Roeters SJ, Weidner T, Otzen DE, Ruocco G. In-Silico Evidence for a Two Receptor Based Strategy of SARS-CoV-2. Front Mol Biosci 2021; 8:690655. [PMID: 34179095 PMCID: PMC8219949 DOI: 10.3389/fmolb.2021.690655] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Accepted: 05/19/2021] [Indexed: 01/04/2023] Open
Abstract
We propose a computational investigation on the interaction mechanisms between SARS-CoV-2 spike protein and possible human cell receptors. In particular, we make use of our newly developed numerical method able to determine efficiently and effectively the relationship of complementarity between portions of protein surfaces. This innovative and general procedure, based on the representation of the molecular isoelectronic density surface in terms of 2D Zernike polynomials, allows the rapid and quantitative assessment of the geometrical shape complementarity between interacting proteins, which was unfeasible with previous methods. Our results indicate that SARS-CoV-2 uses a dual strategy: in addition to the known interaction with angiotensin-converting enzyme 2, the viral spike protein can also interact with sialic-acid receptors of the cells in the upper airways.
Collapse
Affiliation(s)
- Edoardo Milanetti
- Department of Physics, Sapienza University, Rome, Italy
- Center for Life Nano and Neuro Science, Italian Institute of Technology, Rome, Italy
| | - Mattia Miotto
- Department of Physics, Sapienza University, Rome, Italy
- Center for Life Nano and Neuro Science, Italian Institute of Technology, Rome, Italy
| | - Lorenzo Di Rienzo
- Center for Life Nano and Neuro Science, Italian Institute of Technology, Rome, Italy
| | - Madhu Nagaraj
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
| | - Michele Monti
- Centre for Genomic Regulation (CRG), the Barcelona Institute for Science and Technology, Barcelona, Spain
- RNA System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa, Italy
| | | | - Giorgio Gosti
- Center for Life Nano and Neuro Science, Italian Institute of Technology, Rome, Italy
| | | | - Tobias Weidner
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Daniel E. Otzen
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
| | - Giancarlo Ruocco
- Department of Physics, Sapienza University, Rome, Italy
- Center for Life Nano and Neuro Science, Italian Institute of Technology, Rome, Italy
| |
Collapse
|
39
|
Rose Y, Duarte JM, Lowe R, Segura J, Bi C, Bhikadiya C, Chen L, Rose AS, Bittrich S, Burley SK, Westbrook JD. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive. J Mol Biol 2021; 433:166704. [PMID: 33186584 PMCID: PMC9093041 DOI: 10.1016/j.jmb.2020.11.003] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/03/2020] [Accepted: 11/05/2020] [Indexed: 11/10/2022]
Abstract
The US Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves many millions of unique users worldwide by delivering experimentally-determined 3D structures of biomolecules integrated with >40 external data resources via RCSB.org, application programming interfaces (APIs), and FTP downloads. Herein, we present the architectural redesign of RCSB PDB data delivery services that build on existing PDBx/mmCIF data schemas. New data access APIs (data.rcsb.org) enable efficient delivery of all PDB archive data. A novel GraphQL-based API provides flexible, declarative data retrieval along with a simple-to-use REST API. A powerful new search system (search.rcsb.org) seamlessly integrates heterogeneous types of searches across the PDB archive. Searches may combine text attributes, protein or nucleic acid sequences, small-molecule chemical descriptors, 3D macromolecular shapes, and sequence motifs. The new RCSB.org architecture adheres to the FAIR Principles, empowering users to address a wide array of research problems in fundamental biology, biomedicine, biotechnology, bioengineering, and bioenergy.
Collapse
Affiliation(s)
- Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Alexander S Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA.
| |
Collapse
|
40
|
Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors. Nat Commun 2021; 12:743. [PMID: 33531496 PMCID: PMC7854729 DOI: 10.1038/s41467-021-21060-3] [Citation(s) in RCA: 310] [Impact Index Per Article: 77.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 12/21/2020] [Indexed: 12/13/2022] Open
Abstract
The pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) continues to expand. Papain-like protease (PLpro) is one of two SARS-CoV-2 proteases potentially targetable with antivirals. PLpro is an attractive target because it plays an essential role in cleavage and maturation of viral polyproteins, assembly of the replicase-transcriptase complex, and disruption of host responses. We report a substantive body of structural, biochemical, and virus replication studies that identify several inhibitors of the SARS-CoV-2 enzyme. We determined the high resolution structure of wild-type PLpro, the active site C111S mutant, and their complexes with inhibitors. This collection of structures details inhibitors recognition and interactions providing fundamental molecular and mechanistic insight into PLpro. All compounds inhibit the peptidase activity of PLpro in vitro, some block SARS-CoV-2 replication in cell culture assays. These findings will accelerate structure-based drug design efforts targeting PLpro to identify high-affinity inhibitors of clinical value. The SARS-CoV-2 papain-like protease (PLpro) is of interest as an antiviral drug target. Here, the authors synthesize and characterise naphthalene-based inhibitors for PLpro and present the crystal structures of PLpro in its apo state and with the bound inhibitors, which is of interest for further structure-based drug design efforts.
Collapse
|
41
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow GV, Christie CH, Dalenberg K, Di Costanzo L, Duarte JM, Dutta S, Feng Z, Ganesan S, Goodsell DS, Ghosh S, Green RK, Guranović V, Guzenko D, Hudson BP, Lawson C, Liang Y, Lowe R, Namkoong H, Peisach E, Persikova I, Randle C, Rose A, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Tao YP, Voigt M, Westbrook J, Young JY, Zardecki C, Zhuravleva M. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res 2021; 49:D437-D451. [PMID: 33211854 PMCID: PMC7779003 DOI: 10.1093/nar/gkaa1038] [Citation(s) in RCA: 925] [Impact Index Per Article: 231.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/14/2020] [Accepted: 11/17/2020] [Indexed: 12/14/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), the US data center for the global PDB archive and a founding member of the Worldwide Protein Data Bank partnership, serves tens of thousands of data depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without restrictions to millions of RCSB.org users around the world, including >660 000 educators, students and members of the curious public using PDB101.RCSB.org. PDB data depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy, 3D electron microscopy and micro-electron diffraction. PDB data consumers accessing our web portals include researchers, educators and students studying fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. During the past 2 years, the research-focused RCSB PDB web portal (RCSB.org) has undergone a complete redesign, enabling improved searching with full Boolean operator logic and more facile access to PDB data integrated with >40 external biodata resources. New features and resources are described in detail using examples that showcase recently released structures of SARS-CoV-2 proteins and host cell proteins relevant to understanding and addressing the COVID-19 global pandemic.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Cole H Christie
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Luigi Di Costanzo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sai Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Biotherapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Center for Computational Structural Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranović
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Harry Namkoong
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chris Randle
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Alexander Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Biotherapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yi-Ping Tao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Marina Zhuravleva
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
42
|
Bittrich S, Burley SK, Rose AS. Real-time structural motif searching in proteins using an inverted index strategy. PLoS Comput Biol 2020; 16:e1008502. [PMID: 33284792 PMCID: PMC7746303 DOI: 10.1371/journal.pcbi.1008502] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 12/17/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids. The Protein Data Bank (PDB) provides open access to more than 170,000 three-dimensional structures of proteins, nucleic acids, and biological complexes. Similarities between PDB structures give valuable functional and evolutionary insights but such resemblance may not be evident at sequence or global structure level. Throughout the database, there are recurring structural motifs—groups of modest numbers of residues in proximity that, for example, support catalytic activity. Identification of common structural motifs can reveal similarities between proteins and serve as fingerprints for spatial configurations of amino acids, such as the His-Asp-Ser catalytic triad found in serine proteases or the zinc coordination site found in Zinc Finger DNA-binding domains. We present a highly efficient yet flexible strategy that allows users for the first time to search for arbitrary structural motifs across the entire PDB archive in real-time. Our approach scales favorably with the increasing number and complexity of deposited structures, and, also, has the potential to be adapted for other applications in a macromolecular context.
Collapse
Affiliation(s)
- Sebastian Bittrich
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- * E-mail:
| | - Stephen K. Burley
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California, USA
| | - Alexander S. Rose
- RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, California, USA
| |
Collapse
|