1
|
Carpenter KA, Altman RB. Databases of ligand-binding pockets and protein-ligand interactions. Comput Struct Biotechnol J 2024; 23:1320-1338. [PMID: 38585646 PMCID: PMC10997877 DOI: 10.1016/j.csbj.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/16/2024] [Accepted: 03/17/2024] [Indexed: 04/09/2024] Open
Abstract
Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
Collapse
Affiliation(s)
- Kristy A. Carpenter
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
2
|
Lazou M, Bekar-Cesaretli AA, Vajda S, Joseph-McCarthy D. Identification and Ranking of Binding Sites from Structural Ensembles: Application to SARS-CoV-2. Viruses 2024; 16:1647. [PMID: 39599762 PMCID: PMC11599001 DOI: 10.3390/v16111647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 10/16/2024] [Accepted: 10/18/2024] [Indexed: 11/29/2024] Open
Abstract
Target identification and evaluation is a critical step in the drug discovery process. Although time-intensive and complex, the challenge becomes even more acute in the realm of infectious disease, where the rapid emergence of new viruses, the swift mutation of existing targets, and partial effectiveness of approved antivirals can lead to outbreaks of significant public health concern. The COVID-19 pandemic, caused by the SARS-CoV-2 virus, serves as a prime example of this, where despite the allocation of substantial resources, Paxlovid is currently the only effective treatment. In that case, significant effort pre-pandemic had been expended to evaluate the biological target for the closely related SARS-CoV. In this work, we utilize the computational hot spot mapping method, FTMove, to rapidly identify and rank binding sites for a set of nine SARS-CoV-2 drug/potential drug targets. FTMove takes into account protein flexibility by mapping binding site hot spots across an ensemble of structures for a given target. To assess the applicability of the FTMove approach to a wide range of drug targets for viral pathogens, we also carry out a comprehensive review of the known SARS-CoV-2 ligandable sites. The approach is able to identify the vast majority of all known sites and a few additional sites, which may in fact be yet to be discovered as ligandable. Furthermore, a UMAP analysis of the FTMove features for each identified binding site is largely able to separate predicted sites with experimentally known binders from those without known binders. These results demonstrate the utility of FTMove to rapidly identify actionable sites across a range of targets for a given indication. As such, the approach is expected to be particularly useful for assessing target binding sites for any emerging pathogen, as well as for indications in other disease areas, and providing actionable starting points for structure-based drug design efforts.
Collapse
Affiliation(s)
- Maria Lazou
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; (M.L.); (S.V.)
| | | | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; (M.L.); (S.V.)
- Department of Chemistry, Boston University, Boston, MA 02215, USA;
| | - Diane Joseph-McCarthy
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; (M.L.); (S.V.)
- Department of Chemistry, Boston University, Boston, MA 02215, USA;
| |
Collapse
|
3
|
Stevenson GA, Kirshner D, Bennion BJ, Yang Y, Zhang X, Zemla A, Torres MW, Epstein A, Jones D, Kim H, Bennett WFD, Wong SE, Allen JE, Lightstone FC. Clustering Protein Binding Pockets and Identifying Potential Drug Interactions: A Novel Ligand-Based Featurization Method. J Chem Inf Model 2023; 63:6655-6666. [PMID: 37847557 PMCID: PMC10647021 DOI: 10.1021/acs.jcim.3c00722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Indexed: 10/18/2023]
Abstract
Protein-ligand interactions are essential to drug discovery and drug development efforts. Desirable on-target or multitarget interactions are the first step in finding an effective therapeutic, while undesirable off-target interactions are the first step in assessing safety. In this work, we introduce a novel ligand-based featurization and mapping of human protein pockets to identify closely related protein targets and to project novel drugs into a hybrid protein-ligand feature space to identify their likely protein interactions. Using structure-based template matches from PDB, protein pockets are featured by the ligands that bind to their best co-complex template matches. The simplicity and interpretability of this approach provide a granular characterization of the human proteome at the protein-pocket level instead of the traditional protein-level characterization by family, function, or pathway. We demonstrate the power of this featurization method by clustering a subset of the human proteome and evaluating the predicted cluster associations of over 7000 compounds.
Collapse
Affiliation(s)
- Garrett A. Stevenson
- Computational
Engineering Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Dan Kirshner
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Brian J. Bennion
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Yue Yang
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Xiaohua Zhang
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Adam Zemla
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Marisa W. Torres
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Aidan Epstein
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Derek Jones
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
- Department
of Computer Science and Engineering, University
of California, San Diego, La Jolla, California 92093, United States
| | - Hyojin Kim
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore, California 94550, United States
| | - W. F. Drew Bennett
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Sergio E. Wong
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| | - Jonathan E. Allen
- Global
Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
| | - Felice C. Lightstone
- Biosciences
and Biotechnology Division, Lawrence Livermore
National Laboratory, Livermore, California 94550, United States
| |
Collapse
|
4
|
Sandholtz SH, Drocco JA, Zemla AT, Torres MW, Silva MS, Allen JE. A Computational Pipeline to Identify and Characterize Binding Sites and Interacting Chemotypes in SARS-CoV-2. ACS OMEGA 2023; 8:21871-21884. [PMID: 37309388 PMCID: PMC10254058 DOI: 10.1021/acsomega.3c01621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/17/2023] [Indexed: 06/14/2023]
Abstract
Minimizing the human and economic costs of the COVID-19 pandemic and future pandemics requires the ability to develop and deploy effective treatments for novel pathogens as soon as possible after they emerge. To this end, we introduce a new computational pipeline for the rapid identification and characterization of binding sites in viral proteins along with the key chemical features, which we call chemotypes, of the compounds predicted to interact with those same sites. The composition of source organisms for the structural models associated with an individual binding site is used to assess the site's degree of structural conservation across different species, including other viruses and humans. We propose a search strategy for novel therapeutics that involves the selection of molecules preferentially containing the most structurally rich chemotypes identified by our algorithm. While we demonstrate the pipeline on SARS-CoV-2, it is generalizable to any new virus, as long as either experimentally solved structures for its proteins are available or sufficiently accurate predicted structures can be constructed.
Collapse
Affiliation(s)
- Sarah H. Sandholtz
- Biosciences
and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| | - Jeffrey A. Drocco
- Biosciences
and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| | - Adam T. Zemla
- Global
Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| | - Marisa W. Torres
- Global
Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| | - Mary S. Silva
- Global
Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| | - Jonathan E. Allen
- Global
Security Computing Applications Division, Computing Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, United States
of America
| |
Collapse
|
5
|
Kimbrel J, Moon J, Avila-Herrera A, Martí JM, Thissen J, Mulakken N, Sandholtz SH, Ferrell T, Daum C, Hall S, Segelke B, Arrildt KT, Messenger S, Wadford DA, Jaing C, Allen JE, Borucki MK. Multiple Mutations Associated with Emergent Variants Can Be Detected as Low-Frequency Mutations in Early SARS-CoV-2 Pandemic Clinical Samples. Viruses 2022; 14:2775. [PMID: 36560780 PMCID: PMC9788161 DOI: 10.3390/v14122775] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/23/2022] [Accepted: 12/06/2022] [Indexed: 12/15/2022] Open
Abstract
Genetic analysis of intra-host viral populations provides unique insight into pre-emergent mutations that may contribute to the genotype of future variants. Clinical samples positive for SARS-CoV-2 collected in California during the first months of the pandemic were sequenced to define the dynamics of mutation emergence as the virus became established in the state. Deep sequencing of 90 nasopharyngeal samples showed that many mutations associated with the establishment of SARS-CoV-2 globally were present at varying frequencies in a majority of the samples, even those collected as the virus was first detected in the US. A subset of mutations that emerged months later in consensus sequences were detected as subconsensus members of intra-host populations. Spike mutations P681H, H655Y, and V1104L were detected prior to emergence in variant genotypes, mutations were detected at multiple positions within the furin cleavage site, and pre-emergent mutations were identified in the nucleocapsid and the envelope genes. Because many of the samples had a very high depth of coverage, a bioinformatics pipeline, "Mappgene", was established that uses both iVar and LoFreq variant calling to enable identification of very low-frequency variants. This enabled detection of a spike protein deletion present in many samples at low frequency and associated with a variant of concern.
Collapse
Affiliation(s)
- Jeffrey Kimbrel
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Joseph Moon
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | | | | | - James Thissen
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Nisha Mulakken
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | | | - Tyshawn Ferrell
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Chris Daum
- Lawrence Berkeley National Laboratory, US Department of Energy Joint Genome Institute, Berkeley, CA 94720, USA
| | - Sara Hall
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | - Brent Segelke
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | | | - Sharon Messenger
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, CA 94804, USA
| | - Debra A. Wadford
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, CA 94804, USA
| | - Crystal Jaing
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
| | | | | |
Collapse
|