1
|
Segura J, Sanchez-Garcia R, Bittrich S, Rose Y, Burley SK, Duarte JM. Multi-scale structural similarity embedding search across entire proteomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.28.640875. [PMID: 40093062 PMCID: PMC11908163 DOI: 10.1101/2025.02.28.640875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures. Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.
Collapse
Affiliation(s)
- Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Ruben Sanchez-Garcia
- School of Science and Technology, IE University, Paseo de la Castellana 259, 28046 Madrid, Spain
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank and the Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Artificial Intelligence and Data Science (RAD) Collaboratory, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
2
|
Gheeraert A, Bailly T, Ren Y, Hamraoui A, Te J, Vander Meersche Y, Cretin G, Leon Foun Lin R, Gelly JC, Pérez S, Guyon F, Galochkina T. DIONYSUS: a database of protein-carbohydrate interfaces. Nucleic Acids Res 2025; 53:D387-D395. [PMID: 39436020 PMCID: PMC11701518 DOI: 10.1093/nar/gkae890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/03/2024] [Accepted: 09/26/2024] [Indexed: 10/23/2024] Open
Abstract
Protein-carbohydrate interactions govern a wide variety of biological processes and play an essential role in the development of different diseases. Here, we present DIONYSUS, the first database of protein-carbohydrate interfaces annotated according to structural, chemical and functional properties of both proteins and carbohydrates. We provide exhaustive information on the nature of interactions, binding site composition, biological function and specific additional information retrieved from existing databases. The user can easily search the database using protein sequence and structure information or by carbohydrate binding site properties. Moreover, for a given interaction site, the user can perform its comparison with a representative subset of non-covalent protein-carbohydrate interactions to retrieve information on its potential function or specificity. Therefore, DIONYSUS is a source of valuable information both for a deeper understanding of general protein-carbohydrate interaction patterns, for annotation of the previously unannotated proteins and for such applications as carbohydrate-based drug design. DIONYSUS is freely available at www.dsimb.inserm.fr/DIONYSUS/.
Collapse
Affiliation(s)
- Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Thomas Bailly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yani Ren
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350 Jouy-en-Josas, France
| | - Ali Hamraoui
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Universite Paris, 75005 Paris, France
| | - Julie Te
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Ravy Leon Foun Lin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Serge Pérez
- Centre de Recherches sur les Macromolécules Végétales, University Grenoble Alpes, CNRS, UPR, 5301 Grenoble, France
| | - Frédéric Guyon
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| |
Collapse
|
3
|
Vander Meersche Y, Cretin G, Gheeraert A, Gelly JC, Galochkina T. ATLAS: protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Res 2024; 52:D384-D392. [PMID: 37986215 PMCID: PMC10767941 DOI: 10.1093/nar/gkad1084] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/15/2023] [Accepted: 10/30/2023] [Indexed: 11/22/2023] Open
Abstract
Dynamical behaviour is one of the most crucial protein characteristics. Despite the advances in the field of protein structure resolution and prediction, analysis and prediction of protein dynamic properties remains a major challenge, mostly due to the low accessibility of data and its diversity and heterogeneity. To address this issue, we present ATLAS, a database of standardised all-atom molecular dynamics simulations, accompanied by their analysis in the form of interactive diagrams and trajectory visualisation. ATLAS offers a large-scale view and valuable insights on protein dynamics for a large and representative set of proteins, by combining data obtained through molecular dynamics simulations with information extracted from experimental structures. Users can easily analyse dynamic properties of functional protein regions, such as domain limits (hinge positions) and residues involved in interaction with other biological molecules. Additionally, the database enables exploration of proteins with uncommon dynamic properties conditioned by their environment such as chameleon subsequences and Dual Personality Fragments. The ATLAS database is freely available at https://www.dsimb.inserm.fr/ATLAS.
Collapse
Affiliation(s)
- Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| |
Collapse
|
4
|
Cretin G, Périn C, Zimmermann N, Galochkina T, Gelly JC. ICARUS: flexible protein structural alignment based on Protein Units. Bioinformatics 2023; 39:btad459. [PMID: 37498544 PMCID: PMC10400377 DOI: 10.1093/bioinformatics/btad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/04/2023] [Accepted: 07/26/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Alignment of protein structures is a major problem in structural biology. The first approach commonly used is to consider proteins as rigid bodies. However, alignment of protein structures can be very complex due to conformational variability, or complex evolutionary relationships between proteins such as insertions, circular permutations or repetitions. In such cases, introducing flexibility becomes useful for two reasons: (i) it can help compare two protein chains which adopted two different conformational states, such as due to proteins/ligands interaction or post-translational modifications, and (ii) it aids in the identification of conserved regions in proteins that may have distant evolutionary relationships. RESULTS We propose ICARUS, a new approach for flexible structural alignment based on identification of Protein Units, evolutionarily preserved structural descriptors of intermediate size, between secondary structures and domains. ICARUS significantly outperforms reference methods on a dataset of very difficult structural alignments. AVAILABILITY AND IMPLEMENTATION Code is freely available online at https://github.com/DSIMB/ICARUS.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Charlotte Périn
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- TBI, Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France
| | - Nicolas Zimmermann
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
5
|
Dhondge H, Chauvot de Beauchêne I, Devignes MD. CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment. BIOINFORMATICS ADVANCES 2023; 3:vbad081. [PMID: 37431435 PMCID: PMC10329740 DOI: 10.1093/bioadv/vbad081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/16/2023] [Accepted: 06/26/2023] [Indexed: 07/12/2023]
Abstract
Motivation Protein domains can be viewed as building blocks, essential for understanding structure-function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. Results We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories ('Core', 'True', 'Domain-like' and 'Failed'). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 'True' and 541 'Domain-like' structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. Availability and implementation The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
|
6
|
Montiel-Garcia D, Rojas-Labra O, Santoyo-Rivera N, Reddy VS. Epitope-Analyzer: A structure-based webtool to analyze broadly neutralizing epitopes. J Struct Biol 2022; 214:107839. [PMID: 35134530 PMCID: PMC8829422 DOI: 10.1016/j.jsb.2022.107839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 01/27/2022] [Accepted: 02/01/2022] [Indexed: 11/23/2022]
Abstract
The antigenic epitope regions of pathogens (e.g., viruses) are recognized by antibodies (Abs) and subsequently cleared by the host immune system, thereby protecting us from disease. Some of these epitopes are conserved among different variants or subgroups of pathogens (e.g., Influenza (FLU) viruses, Coronaviruses), hence can be targeted for potential broad-neutralization. Here we report a web-based tool, Epitope Analyzer (EA), that rapidly identifies conformational epitope and paratope residues in an antigen-antibody complex structure. Furthermore, the tool provides the ways and means to analyze broadly neutralizing epitopes by comparing the equivalent epitope residues in similar antigen structures. The similarity in the epitope residues between (multiple) pairs of similar antigen molecules suggest the presence of conserved epitopes that can be targeted by broadly neutralizing antibodies. These details can be used as a guide in developing effective treatments, such as the design of novel vaccines and formulation of cocktail of broadly neutralizing antibodies, against multiple variants or subgroups of viruses. The web application can be freely accessed from the URL, http://viperdb.scripps.edu/ea.php.
Collapse
Affiliation(s)
- Daniel Montiel-Garcia
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Oscar Rojas-Labra
- Departments of Computer Systems and Information Technologies, Tecnologico Nacional de Mexico & Instituto Tecnológico Superior de Irapuato, Irapuato, Guanajuato, Mexico
| | - Nelly Santoyo-Rivera
- Departments of Computer Systems and Information Technologies, Tecnologico Nacional de Mexico & Instituto Tecnológico Superior de Irapuato, Irapuato, Guanajuato, Mexico
| | - Vijay S Reddy
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
7
|
Dey S, Prilusky J, Levy ED. QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure. Front Mol Biosci 2022; 8:787510. [PMID: 35071324 PMCID: PMC8769216 DOI: 10.3389/fmolb.2021.787510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.
Collapse
Affiliation(s)
- Sucharita Dey
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Jaime Prilusky
- Department of Life Sciences and Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
8
|
Machat M, Langenfeld F, Craciun D, Sirugue L, Labib T, Lagarde N, Maria M, Montes M. Comparative evaluation of shape retrieval methods on macromolecular surfaces: an application of computer vision methods in structural bioinformatics. Bioinformatics 2021; 37:4375-4382. [PMID: 34247232 PMCID: PMC8652110 DOI: 10.1093/bioinformatics/btab511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 05/18/2021] [Accepted: 07/08/2021] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. RESULTS Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure-function paradigm toward a protein structure-surface(s)-function paradigm. AVAILABILITYAND IMPLEMENTATION All data are available online at http://datasetmachat.drugdesign.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mohamed Machat
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Florent Langenfeld
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Daniela Craciun
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Léa Sirugue
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Taoufik Labib
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Nathalie Lagarde
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| | - Maxime Maria
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
- Laboratoire XLIM, UMR CNRS 7252, Université de Limoges, Limoges 87000, France
| | - Matthieu Montes
- Laboratoire GBCM, EA 7528, Conservatoire National des Arts et Métiers, Hesam Université, Paris 75003, France
| |
Collapse
|
9
|
PDB-wide identification of physiological hetero-oligomeric assemblies based on conserved quaternary structure geometry. Structure 2021; 29:1303-1311.e3. [PMID: 34520740 PMCID: PMC8575123 DOI: 10.1016/j.str.2021.07.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 03/22/2021] [Accepted: 07/23/2021] [Indexed: 11/21/2022]
Abstract
An accurate understanding of biomolecular mechanisms and diseases requires information on protein quaternary structure (QS). A critical challenge in inferring QS information from crystallography data is distinguishing biological interfaces from fortuitous crystal-packing contacts. Here, we employ QS conservation across homologs to infer the biological relevance of hetero-oligomers. We compare the structures and compositions of hetero-oligomers, which allow us to annotate 7,810 complexes as physiologically relevant, 1,060 as likely errors, and 1,432 with comparative information on subunit stoichiometry and composition. Excluding immunoglobulins, these annotations encompass over 51% of hetero-oligomers in the PDB. We curate a dataset of 577 hetero-oligomeric complexes to benchmark these annotations, which reveals an accuracy >94%. When homology information is not available, we compare QS across repositories (PDB, PISA, and EPPIC) to derive confidence estimates. This work provides high-quality annotations along with a large benchmark dataset of hetero-assemblies.
Collapse
|
10
|
Montiel-Garcia D, Santoyo-Rivera N, Ho P, Carrillo-Tripp M, Iii CLB, Johnson JE, Reddy VS. VIPERdb v3.0: a structure-based data analytics platform for viral capsids. Nucleic Acids Res 2021; 49:D809-D816. [PMID: 33313778 PMCID: PMC7779063 DOI: 10.1093/nar/gkaa1096] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/23/2020] [Accepted: 12/10/2020] [Indexed: 11/14/2022] Open
Abstract
VIrus Particle ExploreR data base (VIPERdb) (http://viperdb.scripps.edu) is a curated repository of virus capsid structures and a database of structure-derived data along with various virus specific information. VIPERdb has been continuously improved for over 20 years and contains a number of virus structure analysis tools. The release of VIPERdb v3.0 contains new structure-based data analytics tools like Multiple Structure-based and Sequence Alignment (MSSA) to identify hot-spot residues within a selected group of structures and an anomaly detection application to analyze and curate the structure-derived data within individual virus families. At the time of this writing, there are 931 virus structures from 62 different virus families in the database. Significantly, the new release also contains a standalone database called 'Virus World database' (VWdb) that comprises all the characterized viruses (∼181 000) known to date, gathered from ICTVdb and NCBI, and their capsid protein sequences, organized according to their virus taxonomy with links to known structures in VIPERdb and PDB. Moreover, the new release of VIPERdb includes a service-oriented data engine to handle all the data access requests and provides an interface for futuristic data analytics using machine leaning applications.
Collapse
Affiliation(s)
- Daniel Montiel-Garcia
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.,Departments of Computer Systems and Information Technologies, Tecnologico Nacional de Mexico & Instituto Tecnológico Superior de Irapuato, Irapuato, Guanajuato, México
| | - Nelly Santoyo-Rivera
- Departments of Computer Systems and Information Technologies, Tecnologico Nacional de Mexico & Instituto Tecnológico Superior de Irapuato, Irapuato, Guanajuato, México
| | - Phuong Ho
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Mauricio Carrillo-Tripp
- Biomolecular Diversity Laboratory, Centro de Investigación y de Estudios Avanzados Unidad Monterrey, Vía del Conocimiento 201, Parque PIIT, C.P. 66600, Apodaca, Nuevo León, México
| | - Charles L Brooks Iii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.,Department of Biophysics, University of Michigan, Ann Arbor, MI, USA
| | - John E Johnson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Vijay S Reddy
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
11
|
Zhao Z, Bourne PE. Structural Insights into the Binding Modes of Viral RNA-Dependent RNA Polymerases Using a Function-Site Interaction Fingerprint Method for RNA Virus Drug Discovery. J Proteome Res 2020; 19:4698-4705. [PMID: 32946692 PMCID: PMC7640976 DOI: 10.1021/acs.jproteome.0c00623] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Indexed: 01/18/2023]
Abstract
The coronavirus disease of 2019 (COVID-19) pandemic speaks to the need for drugs that not only are effective but also remain effective given the mutation rate of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). To this end, we describe structural binding-site insights for facilitating COVID-19 drug design when targeting RNA-dependent RNA polymerase (RDRP), a common conserved component of RNA viruses. We combined an RDRP structure data set, including 384 RDRP PDB structures and all corresponding RDRP-ligand interaction fingerprints, thereby revealing the structural characteristics of the active sites for application to RDRP-targeted drug discovery. Specifically, we revealed the intrinsic ligand-binding modes and associated RDRP structural characteristics. Four types of binding modes with corresponding binding pockets were determined, suggesting two major subpockets available for drug discovery. We screened a drug data set of 7894 compounds against these binding pockets and presented the top-10 small molecules as a starting point in further exploring the prevention of virus replication. In summary, the binding characteristics determined here help rationalize RDRP-targeted drug discovery and provide insights into the specific binding mechanisms important for containing the SARS-CoV-2 virus.
Collapse
Affiliation(s)
- Zheng Zhao
- School
of Data Science, University of Virginia, Charlottesville, Virginia 22904, United States of America
- Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States of America
| | - Philip E. Bourne
- School
of Data Science, University of Virginia, Charlottesville, Virginia 22904, United States of America
- Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States of America
| |
Collapse
|
12
|
Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics 2020; 36:478-486. [PMID: 31384919 DOI: 10.1093/bioinformatics/btz609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/22/2019] [Accepted: 08/02/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Protein structure alignment is one of the fundamental problems in computational structure biology. A variety of algorithms have been developed to address this important issue in the past decade. However, due to their heuristic nature, current structure alignment methods may suffer from suboptimal alignment and/or over-fragmentation and thus lead to a biologically wrong alignment in some cases. To overcome these limitations, we have developed an accurate topology-independent and global structure alignment method through an FFT-based exhaustive search algorithm, which is referred to as FTAlign. RESULTS Our FTAlign algorithm was extensively tested on six commonly used datasets and compared with seven state-of-the-art structure alignment approaches, TMalign, DeepAlign, Kpax, 3DCOMB, MICAN, SPalignNS and CLICK. It was shown that FTAlign outperformed the other methods in reproducing manually curated alignments and obtained a high success rate of 96.7 and 90.0% on two gold-standard benchmarks, MALIDUP and MALISAM, respectively. Moreover, FTAlign also achieved the overall best performance in terms of biologically meaningful structure overlap (SO) and TMscore on both the sequential alignment test sets including MALIDUP, MALISAM and 64 difficult cases from HOMSTRAD, and the non-sequential sets including MALIDUP-NS, MALISAM-NS, 199 topology-different cases, where FTAlign especially showed more advantage for non-sequential alignment. Despite its global search feature, FTAlign is also computationally efficient and can normally complete a pairwise alignment within one second. AVAILABILITY AND IMPLEMENTATION http://huanglab.phys.hust.edu.cn/ftalign/.
Collapse
Affiliation(s)
- Zeyu Wen
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| |
Collapse
|
13
|
El Houasli M, Maigret B, Devignes MD, Ghoorah AW, Grudinin S, Ritchie DW. Modeling and minimizing CAPRI round 30 symmetrical protein complexes from CASP-11 structural models. Proteins 2016; 85:463-469. [PMID: 27701764 DOI: 10.1002/prot.25182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Revised: 09/13/2016] [Accepted: 09/24/2016] [Indexed: 11/06/2022]
Abstract
Many of the modeling targets in the blind CASP-11/CAPRI-30 experiment were protein homo-dimers and homo-tetramers. Here, we perform a retrospective docking-based analysis of the perfectly symmetrical CAPRI Round 30 targets whose crystal structures have been published. Starting from the CASP "stage-2" fold prediction models, we show that using our recently developed "SAM" polar Fourier symmetry docking algorithm combined with NAMD energy minimization often gives acceptable or better 3D models of the target complexes. We also use SAM to analyze the overall quality of all CASP structural models for the selected targets from a docking-based perspective. We demonstrate that docking only CASP "center" structures for the selected targets provides a fruitful and economical docking strategy. Furthermore, our results show that many of the CASP models are dockable in the sense that they can lead to acceptable or better models of symmetrical complexes. Even though SAM is very fast, using docking and NAMD energy minimization to pull out acceptable docking models from a large ensemble of docked CASP models is computationally expensive. Nonetheless, thanks to our SAM docking algorithm, we expect that applying our docking protocol on a modern computer cluster will give us the ability to routinely model 3D structures of symmetrical protein complexes from CASP-quality models. Proteins 2017; 85:463-469. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Marwa El Houasli
- INRIA, Equipe Capsid, Campus Scientique, BP 239, 54506, Vandoeuvre-lès-Nancy, France
| | | | | | - Anisah W Ghoorah
- Department of Computer Science and Engineering, University of Mauritius
| | | | - David W Ritchie
- INRIA, Equipe Capsid, Campus Scientique, BP 239, 54506, Vandoeuvre-lès-Nancy, France
| |
Collapse
|