51
|
Werren EA, Peirent ER, Jantti H, Guxholli A, Srivastava KR, Orenstein N, Narayanan V, Wiszniewski W, Dawidziuk M, Gawlinski P, Umair M, Khan A, Khan SN, Geneviève D, Lehalle D, van Gassen KLI, Giltay JC, Oegema R, van Jaarsveld RH, Rafiullah R, Rappold GA, Rabin R, Pappas JG, Wheeler MM, Bamshad MJ, Tsan YC, Johnson MB, Keegan CE, Srivastava A, Bielas SL. Biallelic variants in CSMD1 are implicated in a neurodevelopmental disorder with intellectual disability and variable cortical malformations. Cell Death Dis 2024; 15:379. [PMID: 38816421 PMCID: PMC11140003 DOI: 10.1038/s41419-024-06768-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 05/03/2024] [Accepted: 05/22/2024] [Indexed: 06/01/2024]
Abstract
CSMD1 (Cub and Sushi Multiple Domains 1) is a well-recognized regulator of the complement cascade, an important component of the innate immune response. CSMD1 is highly expressed in the central nervous system (CNS) where emergent functions of the complement pathway modulate neural development and synaptic activity. While a genetic risk factor for neuropsychiatric disorders, the role of CSMD1 in neurodevelopmental disorders is unclear. Through international variant sharing, we identified inherited biallelic CSMD1 variants in eight individuals from six families of diverse ancestry who present with global developmental delay, intellectual disability, microcephaly, and polymicrogyria. We modeled CSMD1 loss-of-function (LOF) pathogenesis in early-stage forebrain organoids differentiated from CSMD1 knockout human embryonic stem cells (hESCs). We show that CSMD1 is necessary for neuroepithelial cytoarchitecture and synchronous differentiation. In summary, we identified a critical role for CSMD1 in brain development and biallelic CSMD1 variants as the molecular basis of a previously undefined neurodevelopmental disorder.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Advanced Precision Medicine Laboratory, The Jackson Laboratory for Genomic Medicine, Farmington, CTt, 06032, USA
| | - Emily R Peirent
- Neuroscience Graduate Program, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Henna Jantti
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Alba Guxholli
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Kinshuk Raj Srivastava
- Medicinal and Process Chemistry Division, CSIR-Central Drug Research Institute, Lucknow, 226031, India
| | - Naama Orenstein
- Schneider Children's Medical Center of Israel, Petah Tikva, 4920235, Israel
| | - Vinodh Narayanan
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ, 85004, USA
| | - Wojciech Wiszniewski
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, 97239, USA
| | - Mateusz Dawidziuk
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, 01-211, Poland
| | - Pawel Gawlinski
- Department of Medical Genetics, Institute of Mother and Child, Warsaw, 01-211, Poland
| | - Muhammad Umair
- Medical Genomics Research Department, King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, 11481, Saudi Arabia
- Department of Life Sciences, School of Science, University of Management and Technology, Lahore, Punjab, 54770, Pakistan
| | - Amjad Khan
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, 97239, USA
- Department of Zoology, University of Lakki Marwat, Lakki Marwat, Khyber Pakhtunkhwa, 28420, Pakistan
| | - Shahid Niaz Khan
- Department of Zoology, Kohat University of Science and Technology, Kohat, Pakistan
| | - David Geneviève
- Montpellier University, Inserm Unit U1183, Reference Center for Rare Diseases and Developmental Anomalies, CHU, 34000, Montpellier, France
| | - Daphné Lehalle
- Sorbonne University, Department of Medical Genetics, Hospital Armand Trousseau, 75012, Paris, France
| | - K L I van Gassen
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Jacques C Giltay
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Renske Oegema
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Richard H van Jaarsveld
- Department of Genetics, University Medical Centre Utrecht, Utrecht University, Utrecht, 3584 EA, The Netherlands
| | - Rafiullah Rafiullah
- Department of Biotechnology, Faculty of Life Sciences, BUITEMS, Quetta, 87300, Pakistan
| | - Gudrun A Rappold
- Department of Human Molecular Genetics, Institute of Human Genetics, Ruprecht-Karls-University, Heidelberg, 69120, Germany
| | - Rachel Rabin
- Department of Pediatrics, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - John G Pappas
- Department of Pediatrics, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - Marsha M Wheeler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Michael J Bamshad
- Department of Pediatrics, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute, Washington, 98195, USA
| | - Yao-Chang Tsan
- Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Matthew B Johnson
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Catherine E Keegan
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Anshika Srivastava
- Department of Medical Genetics, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, 226014, India.
| | - Stephanie L Bielas
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
52
|
Rao J, Xie J, Yuan Q, Liu D, Wang Z, Lu Y, Zheng S, Yang Y. A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions. Nat Commun 2024; 15:4476. [PMID: 38796523 PMCID: PMC11530528 DOI: 10.1038/s41467-024-48801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/14/2024] [Indexed: 05/28/2024] Open
Abstract
Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.
Collapse
Affiliation(s)
- Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Deqin Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Zhen Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, Guangzhou, China.
- State Key Laboratory of Oncology in South China, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
53
|
Catoiu EA, Mih N, Lu M, Palsson B. Establishing comprehensive quaternary structural proteomes from genome sequence. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.24.590993. [PMID: 38712217 PMCID: PMC11071507 DOI: 10.1101/2024.04.24.590993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
A critical body of knowledge has developed through advances in protein microscopy, protein-fold modeling, structural biology software, availability of sequenced bacterial genomes, large-scale mutation databases, and genome-scale models. Based on these recent advances, we develop a computational framework that; i) identifies the oligomeric structural proteome encoded by an organism's genome from available structural resources; ii) maps multi-strain alleleomic variation, resulting in the structural proteome for a species; and iii) calculates the 3D orientation of proteins across subcellular compartments with residue-level precision. Using the platform, we; iv) compute the quaternary E. coli K-12 MG1655 structural proteome; v) use a dataset of 12,000 mutations to build Random Forest classifiers that can predict the severity of mutations; and, in combination with a genome-scale model that computes proteome allocation, vi) obtain the spatial allocation of the E. coli proteome. Thus, in conjunction with relevant datasets and increasingly accurate computational models, we can now annotate quaternary structural proteomes, at genome-scale, to obtain a molecular-level understanding of whole-cell functions. Significance Advancements in experimental and computational methods have revealed the shapes of multi-subunit proteins. The absence of a unified platform that maps actionable datatypes onto these increasingly accurate structures creates a barrier to structural analyses, especially at the genome-scale. Here, we describe QSPACE, a computational annotation platform that evaluates existing resources to identify the best-available structure for each protein in a user's query, maps the 3D location of actionable datatypes ( e.g. , active sites, published mutations) onto the selected structures, and uses third-party APIs to determine the subcellular compartment of all amino acids of a protein. As proof-of-concept, we deployed QSPACE to generate the quaternary structural proteome of E. coli MG1655 and demonstrate two use-cases involving large-scale mutant analysis and genome-scale modelling.
Collapse
|
54
|
Wilson E, Cava JK, Chowell D, Raja R, Mangalaparthi KK, Pandey A, Curtis M, Anderson KS, Singharoy A. The electrostatic landscape of MHC-peptide binding revealed using inception networks. Cell Syst 2024; 15:362-373.e7. [PMID: 38554709 DOI: 10.1016/j.cels.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 11/24/2023] [Accepted: 03/05/2024] [Indexed: 04/02/2024]
Abstract
Predictive modeling of macromolecular recognition and protein-protein complementarity represents one of the cornerstones of biophysical sciences. However, such models are often hindered by the combinatorial complexity of interactions at the molecular interfaces. Exemplary of this problem is peptide presentation by the highly polymorphic major histocompatibility complex class I (MHC-I) molecule, a principal component of immune recognition. We developed human leukocyte antigen (HLA)-Inception, a deep biophysical convolutional neural network, which integrates molecular electrostatics to capture non-bonded interactions for predicting peptide binding motifs across 5,821 MHC-I alleles. These predictions of generated motifs correlate strongly with experimental peptide binding and presentation data. Beyond molecular interactions, the study demonstrates the application of predicted motifs in analyzing MHC-I allele associations with HIV disease progression and patient response to immune checkpoint inhibitors. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Wilson
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85207, USA; The Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - John Kevin Cava
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85207, USA
| | - Diego Chowell
- The Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Remya Raja
- Department of Immunology, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Kiran K Mangalaparthi
- Department of Laboratory Medicine and Pathology, Mayo Clinic, 200 First St SW, Rochester, MN 55905, USA
| | - Akhilesh Pandey
- Department of Laboratory Medicine and Pathology, Mayo Clinic, 200 First St SW, Rochester, MN 55905, USA; Center for Individualized Medicine, Mayo Clinic, 200 First St SW, Rochester, MN 55905, USA; Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
| | - Marion Curtis
- Department of Immunology, Mayo Clinic, Scottsdale, AZ 85259, USA; College of Medicine and Science, Mayo Clinic, Scottsdale, AZ 85259, USA; Department of Cancer Biology, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Karen S Anderson
- School of Life Sciences, Arizona State University, Tempe, AZ 85207, USA.
| | - Abhishek Singharoy
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85207, USA.
| |
Collapse
|
55
|
Yuan Q, Tian C, Yang Y. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024; 13:RP93695. [PMID: 38630609 PMCID: PMC11023698 DOI: 10.7554/elife.93695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven't fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| |
Collapse
|
56
|
Alexander MP, Zaidi M, Larson N, Mullan A, Pavelko KD, Stegall MD, Bentall A, Wouters BG, McKee T, Taner T. Exploring the single-cell immune landscape of kidney allograft inflammation using imaging mass cytometry. Am J Transplant 2024; 24:549-563. [PMID: 37979921 DOI: 10.1016/j.ajt.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/01/2023] [Accepted: 11/11/2023] [Indexed: 11/20/2023]
Abstract
Kidney allograft inflammation, mostly attributed to rejection and infection, is an important cause of graft injury and loss. Standard histopathological assessment of allograft inflammation provides limited insights into biological processes and the immune landscape. Here, using imaging mass cytometry with a panel of 28 validated biomarkers, we explored the single-cell landscape of kidney allograft inflammation in 32 kidney transplant biopsies and 247 high-dimensional histopathology images of various phenotypes of allograft inflammation (antibody-mediated rejection, T cell-mediated rejection, BK nephropathy, and chronic pyelonephritis). Using novel analytical tools, for cell segmentation, we segmented over 900 000 cells and developed a tissue-based classifier using over 3000 manually annotated kidney microstructures (glomeruli, tubules, interstitium, and arteries). Using PhenoGraph, we identified 11 immune and 9 nonimmune clusters and found a high prevalence of memory T cell and macrophage-enriched immune populations across phenotypes. Additionally, we trained a machine learning classifier to identify spatial biomarkers that could discriminate between the different allograft inflammatory phenotypes. Further validation of imaging mass cytometry in larger cohorts and with more biomarkers will likely help interrogate kidney allograft inflammation in more depth than has been possible to date.
Collapse
Affiliation(s)
- Mariam P Alexander
- Department of Pathology and Laboratory Medicine, Mayo Clinic, Rochester, Minnesota, USA.
| | - Mark Zaidi
- Department of Medical Biophysics, University of Toronto, Canada
| | - Nicholas Larson
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Aidan Mullan
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Kevin D Pavelko
- Immune Monitoring Core Laboratory, Mayo Clinic, Rochester, Minnesota, USA
| | - Mark D Stegall
- Departments of Surgery and Immunology, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Bentall
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, Minnesota, USA
| | - Bradly G Wouters
- Department of Medical Biophysics, University of Toronto, Canada; Princess Margaret Cancer Center, University Health Network, University of Toronto, Toronto, Ontario, Canada
| | - Trevor McKee
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada; Pathomics Inc., Toronto, Ontario, Canada
| | - Timucin Taner
- Departments of Surgery and Immunology, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
57
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
58
|
Martini C, Araba V, Beniani M, Armoa Ortiz P, Simmons M, Chalbi M, Mellouk A, El Bakkouri M, Calmettes C. Unraveling the crystal structure of the HpaA adhesin: insights into cell adhesion function and epitope localization of a Helicobacter pylori vaccine candidate. mBio 2024; 15:e0295223. [PMID: 38376163 PMCID: PMC10936181 DOI: 10.1128/mbio.02952-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/26/2024] [Indexed: 02/21/2024] Open
Abstract
Helicobacter pylori is a bacterium that exhibits strict host restriction to humans and non-human primates, and the bacterium is widely acknowledged as a significant etiological factor in the development of chronic gastritis, peptic ulcers, and gastric cancers. The pathogenic potential of this organism lies in its adeptness at colonizing the gastric mucosa, which is facilitated by a diverse repertoire of virulence factors, including adhesins that promote the attachment of the bacteria to the gastric epithelium. Among these adhesins, HpaA stands out due to its conserved nature and pivotal role in establishing H. pylori colonization. Moreover, this lipoprotein holds promise as an antigen for the development of effective H. pylori vaccines, thus attracting considerable attention for in-depth investigations into its molecular function and identification of binding determinants. Here, we present the elucidation of the crystallographic structure of HpaA at 2.9 Å resolution. The folding adopts an elongated protein shape, which is distinctive to the Helicobacteraceae family, and features an apical domain extension that plays a critical role in the cell-adhesion activity on gastric epithelial cells. Our study also demonstrates the ability of HpaA to induce TNF-α expression in macrophages, highlighting a novel role as an immunoregulatory effector promoting the pro-inflammatory response in vitro. These findings not only contribute to a deeper comprehension of the multifaceted role of HpaA in H. pylori pathogenesis but also establish a fundamental basis for the design and development of structure-based derivatives, aimed at enhancing the efficacy of H. pylori vaccines. IMPORTANCE Helicobacter pylori is a bacterium that can cause chronic gastritis, peptic ulcers, and gastric cancers. The bacterium adheres to the lining of the stomach using proteins called adhesins. One of these proteins, HpaA, is particularly important for H. pylori colonization and is considered a promising vaccine candidate against H. pylori infections. In this work, we determined the atomic structure of HpaA, identifying a characteristic protein fold to the Helicobacter family and delineating specific amino acids that are crucial to support the attachment to the gastric cells. Additionally, we discovered that HpaA can trigger the production of TNF-α, a proinflammatory molecule, in macrophages. These findings provide valuable insights into how H. pylori causes disease and suggest that HpaA has a dual role in both attachment and immune activation. This knowledge could contribute to the development of improved vaccine strategies for preventing H. pylori infections.
Collapse
Affiliation(s)
- Cyrielle Martini
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Victoria Araba
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Meriem Beniani
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Paula Armoa Ortiz
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Mimi Simmons
- National Research Council of Canada (NRC), Human Health Therapeutics Research Center, Montréal, Québec, Canada
| | - Mariem Chalbi
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Abdelkader Mellouk
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
| | - Majida El Bakkouri
- National Research Council of Canada (NRC), Human Health Therapeutics Research Center, Montréal, Québec, Canada
| | - Charles Calmettes
- Institut National de la Recherche Scientifique (INRS), Centre Armand Frappier Santé Biotechnologie, Institut Pasteur International Network, Laval, Québec, Canada
- PROTEO, the Quebec Network for Research on Protein Function, Structure, and Engineering, Québec city, Québec, Canada
| |
Collapse
|
59
|
Tu G, Fu T, Zheng G, Xu B, Gou R, Luo D, Wang P, Xue W. Computational Chemistry in Structure-Based Solute Carrier Transporter Drug Design: Recent Advances and Future Perspectives. J Chem Inf Model 2024; 64:1433-1455. [PMID: 38294194 DOI: 10.1021/acs.jcim.3c01736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Solute carrier transporters (SLCs) are a class of important transmembrane proteins that are involved in the transportation of diverse solute ions and small molecules into cells. There are approximately 450 SLCs within the human body, and more than a quarter of them are emerging as attractive therapeutic targets for multiple complex diseases, e.g., depression, cancer, and diabetes. However, only 44 unique transporters (∼9.8% of the SLC superfamily) with 3D structures and specific binding sites have been reported. To design innovative and effective drugs targeting diverse SLCs, there are a number of obstacles that need to be overcome. However, computational chemistry, including physics-based molecular modeling and machine learning- and deep learning-based artificial intelligence (AI), provides an alternative and complementary way to the classical drug discovery approach. Here, we present a comprehensive overview on recent advances and existing challenges of the computational techniques in structure-based drug design of SLCs from three main aspects: (i) characterizing multiple conformations of the proteins during the functional process of transportation, (ii) identifying druggability sites especially the cryptic allosteric ones on the transporters for substrates and drugs binding, and (iii) discovering diverse small molecules or synthetic protein binders targeting the binding sites. This work is expected to provide guidelines for a deep understanding of the structure and function of the SLC superfamily to facilitate rational design of novel modulators of the transporters with the aid of state-of-the-art computational chemistry technologies including artificial intelligence.
Collapse
Affiliation(s)
- Gao Tu
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | | | - Binbin Xu
- Chengdu Sintanovo Biotechnology Co., Ltd., Chengdu 610200, China
| | - Rongpei Gou
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Ding Luo
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Weiwei Xue
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
60
|
Sagendorf JM, Mitra R, Huang J, Chen XS, Rohs R. PNAbind: Structure-based prediction of protein-nucleic acid binding using graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582387. [PMID: 38529493 PMCID: PMC10962711 DOI: 10.1101/2024.02.27.582387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
The recognition and binding of nucleic acids (NAs) by proteins depends upon complementary chemical, electrostatic and geometric properties of the protein-NA binding interface. Structural models of protein-NA complexes provide insights into these properties but are scarce relative to models of unbound proteins. We present a deep learning approach for predicting protein-NA binding given the apo structure of a protein (PNAbind). Our method utilizes graph neural networks to encode spatial distributions of physicochemical and geometric properties of the protein molecular surface that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein and can discriminate between specificity for DNA or RNA binding. We show that such predictions made on protein structures modeled with AlphaFold2 can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Using local encodings, our models predict the location of NA binding sites at the level of individual binding residues. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and show that our predictions are consistent with experimental RNA binding data.
Collapse
|
61
|
Waksman T, Astin E, Fisher SR, Hunter WN, Bos JIB. Computational Prediction of Structure, Function, and Interaction of Myzus persicae (Green Peach Aphid) Salivary Effector Proteins. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2024; 37:338-346. [PMID: 38171380 DOI: 10.1094/mpmi-10-23-0154-fi] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Similar to plant pathogens, phloem-feeding insects such as aphids deliver effector proteins inside their hosts that act to promote host susceptibility and enable feeding and infestation. Despite exciting progress toward identifying and characterizing effector proteins from these insects, their functions remain largely unknown. The recent groundbreaking development in protein structure prediction algorithms, combined with the availability of proteomics and transcriptomic datasets for agriculturally important pests, provides new opportunities to explore the structural and functional diversity of effector repertoires. In this study, we sought to gain insight into the infection strategy used by the Myzus persicae (green peach aphid) by predicting and analyzing the structures of a set of 71 effector candidate proteins. We used two protein structure prediction methods, AlphaFold and OmegaFold, that produced mutually consistent results. We observed a wide continuous spectrum of structures among the effector candidates, from disordered proteins to globular enzymes. We made use of the structural information and state-of-the-art computational methods to predict M. persicae effector protein properties, including function and interaction with host plant proteins. Overall, our investigation provides novel insights into prediction of structure, function, and interaction of M. persicae effector proteins and will guide the necessary experimental characterization to address new hypotheses. [Formula: see text] Copyright © 2024 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Thomas Waksman
- Division of Plant Sciences, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, U.K
| | - Edmund Astin
- Division of Plant Sciences, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, U.K
| | - S Ronan Fisher
- Division of Plant Sciences, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, U.K
| | - William N Hunter
- Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, U.K
| | - Jorunn I B Bos
- Division of Plant Sciences, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, U.K
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, U.K
| |
Collapse
|
62
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
63
|
Ni B, Kaplan DL, Buehler MJ. ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a language diffusion model. SCIENCE ADVANCES 2024; 10:eadl4000. [PMID: 38324676 PMCID: PMC10849601 DOI: 10.1126/sciadv.adl4000] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/08/2024] [Indexed: 02/09/2024]
Abstract
Through evolution, nature has presented a set of remarkable protein materials, including elastins, silks, keratins and collagens with superior mechanical performances that play crucial roles in mechanobiology. However, going beyond natural designs to discover proteins that meet specified mechanical properties remains challenging. Here, we report a generative model that predicts protein designs to meet complex nonlinear mechanical property-design objectives. Our model leverages deep knowledge on protein sequences from a pretrained protein language model and maps mechanical unfolding responses to create proteins. Via full-atom molecular simulations for direct validation, we demonstrate that the designed proteins are de novo, and fulfill the targeted mechanical properties, including unfolding energy and mechanical strength, as well as the detailed unfolding force-separation curves. Our model offers rapid pathways to explore the enormous mechanobiological protein sequence space unconstrained by biological synthesis, using mechanical features as the target to enable the discovery of protein materials with superior mechanical properties.
Collapse
Affiliation(s)
- Bo Ni
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| | - David L. Kaplan
- Department of Biomedical Engineering, Tufts University, Medford, MA 02155, USA
| | - Markus J. Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
- Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA
| |
Collapse
|
64
|
Høie MH, Gade FS, Johansen J, Würtzen C, Winther O, Nielsen M, Marcatili P. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol 2024; 15:1322712. [PMID: 38390326 PMCID: PMC10882062 DOI: 10.3389/fimmu.2024.1322712] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/08/2024] [Indexed: 02/24/2024] Open
Abstract
Accurate computational identification of B-cell epitopes is crucial for the development of vaccines, therapies, and diagnostic tools. However, current structure-based prediction methods face limitations due to the dependency on experimentally solved structures. Here, we introduce DiscoTope-3.0, a markedly improved B-cell epitope prediction tool that innovatively employs inverse folding structure representations and a positive-unlabelled learning strategy, and is adapted for both solved and predicted structures. Our tool demonstrates a considerable improvement in performance over existing methods, accurately predicting linear and conformational epitopes across multiple independent datasets. Most notably, DiscoTope-3.0 maintains high predictive performance across solved, relaxed and predicted structures, alleviating the need for experimental structures and extending the general applicability of accurate B-cell epitope prediction by 3 orders of magnitude. DiscoTope-3.0 is made widely accessible on two web servers, processing over 100 structures per submission, and as a downloadable package. In addition, the servers interface with RCSB and AlphaFoldDB, facilitating large-scale prediction across over 200 million cataloged proteins. DiscoTope-3.0 is available at: https://services.healthtech.dtu.dk/service.php?DiscoTope-3.0.
Collapse
Affiliation(s)
- Magnus Haraldson Høie
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Frederik Steensgaard Gade
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Julie Maria Johansen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Charlotte Würtzen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Ole Winther
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
- Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen, Denmark
- Department of Biology, Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| | - Paolo Marcatili
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark (DTU), Kgs. Lyngby, Denmark
| |
Collapse
|
65
|
Chu L, Ruffolo JA, Harmalkar A, Gray JJ. Flexible protein-protein docking with a multitrack iterative transformer. Protein Sci 2024; 33:e4862. [PMID: 38148272 PMCID: PMC10804679 DOI: 10.1002/pro.4862] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 11/17/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and reranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, for example, structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multitrack iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments, GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the Database of Interacting Protein Structures (DIPS) test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under 1 s on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee‐Shin Chu
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey A. Ruffolo
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
66
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.24.538110. [PMID: 37162909 PMCID: PMC10168245 DOI: 10.1101/2023.04.24.538110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Human genome sequencing studies have identified numerous loci associated with complex diseases. However, translating human genetic and genomic findings to disease pathobiology and therapeutic discovery remains a major challenge at multiscale interactome network levels. Here, we present a deep-learning-based ensemble framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that accurately predicts protein binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms, generating comprehensive structurally-informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods. We further systematically validated PIONEER predictions experimentally through generating 2,395 mutations and testing their impact on 6,754 mutation-interaction pairs, confirming the high quality and validity of PIONEER predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces after mapping mutations from ~60,000 germline exomes and ~36,000 somatic genomes. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from pan-cancer analysis of ~11,000 tumor whole-exomes across 33 cancer types. We show that PIONEER-predicted oncoPPIs are significantly associated with patient survival and drug responses from both cancer cell lines and patient-derived xenograft mouse models. We identify a landscape of PPI-perturbing tumor alleles upon ubiquitination by E3 ligases, and we experimentally validate the tumorigenic KEAP1-NRF2 interface mutation p.Thr80Lys in non-small cell lung cancer. We show that PIONEER-predicted PPI-perturbing alleles alter protein abundance and correlates with drug responses and patient survival in colon and uterine cancers as demonstrated by proteogenomic data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY 10032, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
- Biophysics Program, Cornell University, Ithaca, NY 14853, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
67
|
Zheng L, Shi S, Sun X, Lu M, Liao Y, Zhu S, Zhang H, Pan Z, Fang P, Zeng Z, Li H, Li Z, Xue W, Zhu F. MoDAFold: a strategy for predicting the structure of missense mutant protein based on AlphaFold2 and molecular dynamics. Brief Bioinform 2024; 25:bbae006. [PMID: 38305456 PMCID: PMC10835750 DOI: 10.1093/bib/bbae006] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 12/26/2023] [Accepted: 01/01/2024] [Indexed: 02/03/2024] Open
Abstract
Protein structure prediction is a longstanding issue crucial for identifying new drug targets and providing a mechanistic understanding of protein functions. To enhance the progress in this field, a spectrum of computational methodologies has been cultivated. AlphaFold2 has exhibited exceptional precision in predicting wild-type protein structures, with performance exceeding that of other methods. However, predicting the structures of missense mutant proteins using AlphaFold2 remains challenging due to the intricate and substantial structural alterations caused by minor sequence variations in the mutant proteins. Molecular dynamics (MD) has been validated for precisely capturing changes in amino acid interactions attributed to protein mutations. Therefore, for the first time, a strategy entitled 'MoDAFold' was proposed to improve the accuracy and reliability of missense mutant protein structure prediction by combining AlphaFold2 with MD. Multiple case studies have confirmed the superior performance of MoDAFold compared to other methods, particularly AlphaFold2.
Collapse
Affiliation(s)
- Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Yang Liao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Sisi Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Pan Fang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhenyu Zeng
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhaorong Li
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
68
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
69
|
Zhang S, Han J, Liu J. Protein-protein and protein-nucleic acid binding site prediction via interpretable hierarchical geometric deep learning. Gigascience 2024; 13:giae080. [PMID: 39484977 PMCID: PMC11528319 DOI: 10.1093/gigascience/giae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/29/2024] [Accepted: 09/25/2024] [Indexed: 11/03/2024] Open
Abstract
Identification of protein-protein and protein-nucleic acid binding sites provides insights into biological processes related to protein functions and technical guidance for disease diagnosis and drug design. However, accurate predictions by computational approaches remain highly challenging due to the limited knowledge of residue binding patterns. The binding pattern of a residue should be characterized by the spatial distribution of its neighboring residues combined with their physicochemical information interaction, which yet cannot be achieved by previous methods. Here, we design GraphRBF, a hierarchical geometric deep learning model to learn residue binding patterns from big data. To achieve it, GraphRBF describes physicochemical information interactions by designing an enhanced graph neural network and characterizes residue spatial distributions by introducing a prioritized radial basis function neural network. After training and testing, GraphRBF shows great improvements over existing state-of-the-art methods and strong interpretability of its learned representations. Applying GraphRBF to the SARS-CoV-2 omicron spike protein, it successfully identifies known epitopes of the protein. Moreover, it predicts multiple potential binding regions for new nanobodies or even new drugs with strong evidence. A user-friendly online server for GraphRBF is freely available at http://liulab.top/GraphRBF/server.
Collapse
Affiliation(s)
- Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
70
|
Zhang W, Chen K, Zhang L, Zhang X, Zhu B, Lv N, Mi K. The impact of global warming on the signature virulence gene, thermolabile hemolysin, of Vibrio parahaemolyticus. Microbiol Spectr 2023; 11:e0150223. [PMID: 37843303 PMCID: PMC10715048 DOI: 10.1128/spectrum.01502-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 09/05/2023] [Indexed: 10/17/2023] Open
Abstract
IMPORTANCE In this study, Vibrio parahaemolyticus strains were collected from a large number of aquatic products globally and found that temperature has an impact on the virulence of these bacteria. As global temperatures rise, mutations in a gene marker called thermolabile hemolysin (tlh) also increase. This suggests that environmental isolates adapt to the warming environment and become more pathogenic. The findings can help in developing tools to analyze and monitor these bacteria as well as assess any link between climate change and vibrio-associated diseases, which could be used for forecasting outbreaks associated with them.
Collapse
Affiliation(s)
- Weishan Zhang
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, China
| | - Keyu Chen
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, China
| | - Lin Zhang
- Shijiazhuang Customs Technology Center, Hebei, China
| | - Ximeng Zhang
- Science and Technology Research Center of China Customs, Beijing, China
| | - Baoli Zhu
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, China
| | - Na Lv
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Kaixia Mi
- CAS Key Laboratory of Pathogen Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
71
|
Lensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, Kong R, Shi H, Lu X, Chang S, Liu J, Guo Z, Chen X, Morehead A, Roy RS, Wu T, Giri N, Quadir F, Chen C, Cheng J, Del Carpio CA, Ichiishi E, Rodriguez‐Lumbreras LA, Fernandez‐Recio J, Harmalkar A, Chu L, Canner S, Smanta R, Gray JJ, Li H, Lin P, He J, Tao H, Huang S, Roel‐Touris J, Jimenez‐Garcia B, Christoffer CW, Jain AJ, Kagaya Y, Kannan H, Nakamura T, Terashi G, Verburgt JC, Zhang Y, Zhang Z, Fujuta H, Sekijima M, Kihara D, Khan O, Kotelnikov S, Ghani U, Padhorny D, Beglov D, Vajda S, Kozakov D, Negi SS, Ricciardelli T, Barradas‐Bautista D, Cao Z, Chawla M, Cavallo L, Oliva R, Yin R, Cheung M, Guest JD, Lee J, Pierce BG, Shor B, Cohen T, Halfon M, Schneidman‐Duhovny D, Zhu S, Yin R, Sun Y, Shen Y, Maszota‐Zieleniak M, Bojarski KK, Lubecka EA, Marcisz M, Danielsson A, Dziadek L, Gaardlos M, Gieldon A, Liwo A, Samsonov SA, Slusarz R, Zieba K, Sieradzan AK, Czaplewski C, Kobayashi S, Miyakawa Y, Kiyota Y, Takeda‐Shitaka M, Olechnovic K, Valancauskas L, Dapkunas J, Venclovas C, et alLensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, Kong R, Shi H, Lu X, Chang S, Liu J, Guo Z, Chen X, Morehead A, Roy RS, Wu T, Giri N, Quadir F, Chen C, Cheng J, Del Carpio CA, Ichiishi E, Rodriguez‐Lumbreras LA, Fernandez‐Recio J, Harmalkar A, Chu L, Canner S, Smanta R, Gray JJ, Li H, Lin P, He J, Tao H, Huang S, Roel‐Touris J, Jimenez‐Garcia B, Christoffer CW, Jain AJ, Kagaya Y, Kannan H, Nakamura T, Terashi G, Verburgt JC, Zhang Y, Zhang Z, Fujuta H, Sekijima M, Kihara D, Khan O, Kotelnikov S, Ghani U, Padhorny D, Beglov D, Vajda S, Kozakov D, Negi SS, Ricciardelli T, Barradas‐Bautista D, Cao Z, Chawla M, Cavallo L, Oliva R, Yin R, Cheung M, Guest JD, Lee J, Pierce BG, Shor B, Cohen T, Halfon M, Schneidman‐Duhovny D, Zhu S, Yin R, Sun Y, Shen Y, Maszota‐Zieleniak M, Bojarski KK, Lubecka EA, Marcisz M, Danielsson A, Dziadek L, Gaardlos M, Gieldon A, Liwo A, Samsonov SA, Slusarz R, Zieba K, Sieradzan AK, Czaplewski C, Kobayashi S, Miyakawa Y, Kiyota Y, Takeda‐Shitaka M, Olechnovic K, Valancauskas L, Dapkunas J, Venclovas C, Wallner B, Yang L, Hou C, He X, Guo S, Jiang S, Ma X, Duan R, Qui L, Xu X, Zou X, Velankar S, Wodak SJ. Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment. Proteins 2023; 91:1658-1683. [PMID: 37905971 PMCID: PMC10841881 DOI: 10.1002/prot.26609] [Show More Authors] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/22/2023] [Accepted: 09/28/2023] [Indexed: 11/02/2023]
Abstract
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
Collapse
Affiliation(s)
- Marc F. Lensink
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Guillaume Brysbaert
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Nessim Raouraoua
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Paul A. Bates
- Biomolecular Modeling LaboratoryThe Francis Crick InstituteLondonUK
| | - Marco Giulini
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Rodrigo V. Honorato
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Charlotte van Noort
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Joao M. C. Teixeira
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Jian Liu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Zhiye Guo
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Xiao Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Alex Morehead
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Raj S. Roy
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Tianqi Wu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Nabin Giri
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Farhan Quadir
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | | | - Eichiro Ichiishi
- International University of Health and Welfare (IUHV Hospital)Nasushiobara‐CityJapan
| | - Luis A. Rodriguez‐Lumbreras
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Juan Fernandez‐Recio
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Ameya Harmalkar
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Lee‐Shin Chu
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Sam Canner
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Rituparna Smanta
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Hao Li
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Peicong Lin
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jiahua He
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Huanyu Tao
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Sheng‐You Huang
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jorge Roel‐Touris
- Protein Design and Modeling Lab, Dept. of Structural BiologyMolecular Biology Institute of Barcelona (IBMB‐CSIC)BarcelonaSpain
| | | | | | - Anika J. Jain
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuki Kagaya
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Harini Kannan
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - Tsukasa Nakamura
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Genki Terashi
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Jacob C. Verburgt
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuanyuan Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Zicong Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Hayato Fujuta
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | | | - Daisuke Kihara
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | | | | | | | | | | | | | | | - Surendra S. Negi
- Sealy Center for Structural Biology and Molecular BiophysicsUniversity of Texas Medical BranchGalvestonTexasUSA
| | | | | | - Zhen Cao
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Mohit Chawla
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
- Department of Chemistry and BiologyUniversity of SalernoFiscianoItaly
| | | | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Melyssa Cheung
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Chemistry and BiochemistryUniversity of MarylandCollege ParkMarylandUSA
| | - Johnathan D. Guest
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Jessica Lee
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Ben Shor
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Tomer Cohen
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Matan Halfon
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | | | - Shaowen Zhu
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Rujie Yin
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yuanfei Sun
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yang Shen
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
- Department of Computer Science and EngineeringTexas A&M UniversityCollege StationTexasUSA
- Institute of Biosciences and Technology and Department of Translational Medical SciencesTexas A&M UniversityHoustonTexasUSA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yuta Miyakawa
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | - Yasuomi Kiyota
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | | | - Kliment Olechnovic
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Lukas Valancauskas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Ceslovas Venclovas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Bjorn Wallner
- Bioinformatics Division, Department of Physics, Chemistry, and BiologyLinkoping UniversityLinköpingSweden
| | - Lin Yang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- School of Aerospace, Mechanical and Mechatronic EngineeringThe University of SydneyNew South WalesAustralia
| | - Chengyu Hou
- School of Electronics and Information EngineeringHarbin Institute of TechnologyHarbinChina
| | - Xiaodong He
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- Shenzhen STRONG Advanced Materials Research Institute Col, LtdShenzhenPeople's Republic of China
| | - Shuai Guo
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Shenda Jiang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Xiaoliang Ma
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Rui Duan
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Liming Qui
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xianjin Xu
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xiaoqin Zou
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
- Dept. of Physics and AstronomyUniversity of MissouriColumbiaMissouriUSA
- Dept. of BiochemistryUniversity of MissouriColumbiaMissouriUSA
- Institute for Data Science and InformaticsUniversity of MissouriColumbiaMissouriUSA
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)HinxtonCambridgeUK
| | | |
Collapse
|
72
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
73
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
74
|
Brixi G, Ye T, Hong L, Wang T, Monticello C, Lopez-Barbosa N, Vincoff S, Yudistyra V, Zhao L, Haarer E, Chen T, Pertsemlidis S, Palepu K, Bhat S, Christopher J, Li X, Liu T, Zhang S, Petersen L, DeLisa MP, Chatterjee P. SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders. Commun Biol 2023; 6:1081. [PMID: 37875551 PMCID: PMC10598214 DOI: 10.1038/s42003-023-05464-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/13/2023] [Indexed: 10/26/2023] Open
Abstract
Protein-protein interactions (PPIs) are critical for biological processes and predicting the sites of these interactions is useful for both computational and experimental applications. We present a Structure-agnostic Language Transformer and Peptide Prioritization (SaLT&PepPr) pipeline to predict interaction interfaces from a protein sequence alone for the subsequent generation of peptidic binding motifs. Our model fine-tunes the ESM-2 protein language model (pLM) with a per-position prediction task to identify PPI sites using data from the PDB, and prioritizes motifs which are most likely to be involved within inter-chain binding. By only using amino acid sequence as input, our model is competitive with structural homology-based methods, but exhibits reduced performance compared with deep learning models that input both structural and sequence features. Inspired by our previous results using co-crystals to engineer target-binding "guide" peptides, we curate PPI databases to identify partners for subsequent peptide derivation. Fusing guide peptides to an E3 ubiquitin ligase domain, we demonstrate degradation of endogenous β-catenin, 4E-BP2, and TRIM8, and highlight the nanomolar binding affinity, low off-targeting propensity, and function-altering capability of our best-performing degraders in cancer cells. In total, our study suggests that prioritizing binders from natural interactions via pLMs can enable programmable protein targeting and modulation.
Collapse
Affiliation(s)
- Garyk Brixi
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Lauren Hong
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tian Wang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Connor Monticello
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Natalia Lopez-Barbosa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Vivian Yudistyra
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Elena Haarer
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianlai Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Kalyan Palepu
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Suhaas Bhat
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Xinning Li
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tong Liu
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Sue Zhang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Lillian Petersen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Matthew P DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
75
|
Pandi B, Brenman S, Black A, Ng DCM, Lau E, Lam MPY. Tissue Usage Preference and Intrinsically Disordered Region Remodeling of Alternative Splicing Derived Proteoforms in the Heart. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561375. [PMID: 37873130 PMCID: PMC10592692 DOI: 10.1101/2023.10.08.561375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
A computational analysis of mass spectrometry data was performed to uncover alternative splicing derived protein variants across chambers of the human heart. Evidence for 216 non-canonical isoforms was apparent in the atrium and the ventricle, including 52 isoforms not documented on SwissProt and recovered using an RNA sequencing derived database. Among non-canonical isoforms, 29 show signs of regulation based on statistically significant preferences in tissue usage, including a ventricular enriched protein isoform of tensin-1 (TNS1) and an atrium-enriched PDZ and LIM Domain 3 (PDLIM3) isoform 2 (PDLIM3-2/ALP-H). Examined variant regions that differ between alternative and canonical isoforms are highly enriched in intrinsically disordered regions, and over two-thirds of such regions are predicted to function in protein binding and/or RNA binding. The analysis here lends further credence to the notion that alternative splicing diversifies the proteome by rewiring intrinsically disordered regions, which are increasingly recognized to play important roles in the generation of biological function from protein sequences.
Collapse
Affiliation(s)
- Boomathi Pandi
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Stella Brenman
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Alexander Black
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dominic C. M. Ng
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Edward Lau
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Maggie P. Y. Lam
- Department of Medicine/Division of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Consortium for Fibrosis Research and Translation (CFReT), University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
76
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
77
|
Wu H, Han J, Zhang S, Xin G, Mou C, Liu J. Spatom: a graph neural network for structure-based protein-protein interaction site prediction. Brief Bioinform 2023; 24:bbad345. [PMID: 37779247 DOI: 10.1093/bib/bbad345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/22/2023] [Accepted: 09/13/2023] [Indexed: 10/03/2023] Open
Abstract
Accurate identification of protein-protein interaction (PPI) sites remains a computational challenge. We propose Spatom, a novel framework for PPI site prediction. This framework first defines a weighted digraph for a protein structure to precisely characterize the spatial contacts of residues, then performs a weighted digraph convolution to aggregate both spatial local and global information and finally adds an improved graph attention layer to drive the predicted sites to form more continuous region(s). Spatom was tested on a diverse set of challenging protein-protein complexes and demonstrated the best performance among all the compared methods. Furthermore, when tested on multiple popular proteins in a case study, Spatom clearly identifies the interaction interfaces and captures the majority of hotspots. Spatom is expected to contribute to the understanding of protein interactions and drug designs targeting protein binding.
Collapse
Affiliation(s)
- Haonan Wu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Gaojia Xin
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Chaozhou Mou
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| |
Collapse
|
78
|
Braghetto A, Orlandini E, Baiesi M. Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach. J Chem Theory Comput 2023; 19:6011-6022. [PMID: 37552831 PMCID: PMC10500975 DOI: 10.1021/acs.jctc.3c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Indexed: 08/10/2023]
Abstract
Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress consistently into a few bits the information stored in a sequence of five amino acids at the start or end of α-helices or β-sheets. The weights learned by the machines reveal unexpected properties of the amino acids and the secondary structure of proteins: (i) His and Thr have a negligible contribution to the amphiphilic pattern of α-helices; (ii) there is a class of α-helices particularly rich in Ala at their end; (iii) Pro occupies most often slots otherwise occupied by polar or charged amino acids, and its presence at the start of helices is relevant; (iv) Glu and especially Asp on one side and Val, Leu, Iso, and Phe on the other display the strongest tendency to mark amphiphilic patterns, i.e., extreme values of an effective hydrophobicity, though they are not the most powerful (non)hydrophobic amino acids.
Collapse
Affiliation(s)
- Anna Braghetto
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Enzo Orlandini
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Marco Baiesi
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| |
Collapse
|
79
|
Zhu Y, Zhao L, Wen N, Wang J, Wang C. DataDTA: a multi-feature and dual-interaction aggregation framework for drug-target binding affinity prediction. Bioinformatics 2023; 39:btad560. [PMID: 37688568 PMCID: PMC10516524 DOI: 10.1093/bioinformatics/btad560] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/09/2023] [Accepted: 09/07/2023] [Indexed: 09/11/2023] Open
Abstract
MOTIVATION Accurate prediction of drug-target binding affinity (DTA) is crucial for drug discovery. The increase in the publication of large-scale DTA datasets enables the development of various computational methods for DTA prediction. Numerous deep learning-based methods have been proposed to predict affinities, some of which only utilize original sequence information or complex structures, but the effective combination of various information and protein-binding pockets have not been fully mined. Therefore, a new method that integrates available key information is urgently needed to predict DTA and accelerate the drug discovery process. RESULTS In this study, we propose a novel deep learning-based predictor termed DataDTA to estimate the affinities of drug-target pairs. DataDTA utilizes descriptors of predicted pockets and sequences of proteins, as well as low-dimensional molecular features and SMILES strings of compounds as inputs. Specifically, the pockets were predicted from the three-dimensional structure of proteins and their descriptors were extracted as the partial input features for DTA prediction. The molecular representation of compounds based on algebraic graph features was collected to supplement the input information of targets. Furthermore, to ensure effective learning of multiscale interaction features, a dual-interaction aggregation neural network strategy was developed. DataDTA was compared with state-of-the-art methods on different datasets, and the results showed that DataDTA is a reliable prediction tool for affinities estimation. Specifically, the concordance index (CI) of DataDTA is 0.806 and the Pearson correlation coefficient (R) value is 0.814 on the test dataset, which is higher than other methods. AVAILABILITY AND IMPLEMENTATION The codes and datasets of DataDTA are available at https://github.com/YanZhu06/DataDTA.
Collapse
Affiliation(s)
- Yan Zhu
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| | - Naifeng Wen
- School of Mechanical and Electrical Engineering, Dalian Minzu University, Dalian 116600, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
80
|
Schweke H, Xu Q, Tauriello G, Pantolini L, Schwede T, Cazals F, Lhéritier A, Fernandez-Recio J, Rodríguez-Lumbreras LÁ, Schueler-Furman O, Varga JK, Jiménez-García B, Réau MF, Bonvin A, Savojardo C, Martelli PL, Casadio R, Tubiana J, Wolfson H, Oliva R, Barradas-Bautista D, Ricciardelli T, Cavallo L, Venclovas Č, Olechnovič K, Guerois R, Andreani J, Martin J, Wang X, Kihara D, Marchand A, Correia B, Zou X, Dey S, Dunbrack R, Levy E, Wodak S. Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics 2023; 23:e2200323. [PMID: 37365936 PMCID: PMC10937251 DOI: 10.1002/pmic.202200323] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/28/2023]
Abstract
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julia K. Varga
- Hebrew University of Jerusalem Institute for Medical Research Israel-Canada
| | | | | | | | | | | | | | - Jérôme Tubiana
- Tel Aviv University Blavatnik School of Computer Science
| | - Haim Wolfson
- Tel Aviv University Blavatnik School of Computer Science
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, Institute for Data Science and Informatics, University of Missouri
| | | | | | | | | |
Collapse
|
81
|
Bauer J, Rajagopal N, Gupta P, Gupta P, Nixon AE, Kumar S. How can we discover developable antibody-based biotherapeutics? Front Mol Biosci 2023; 10:1221626. [PMID: 37609373 PMCID: PMC10441133 DOI: 10.3389/fmolb.2023.1221626] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 07/10/2023] [Indexed: 08/24/2023] Open
Abstract
Antibody-based biotherapeutics have emerged as a successful class of pharmaceuticals despite significant challenges and risks to their discovery and development. This review discusses the most frequently encountered hurdles in the research and development (R&D) of antibody-based biotherapeutics and proposes a conceptual framework called biopharmaceutical informatics. Our vision advocates for the syncretic use of computation and experimentation at every stage of biologic drug discovery, considering developability (manufacturability, safety, efficacy, and pharmacology) of potential drug candidates from the earliest stages of the drug discovery phase. The computational advances in recent years allow for more precise formulation of disease concepts, rapid identification, and validation of targets suitable for therapeutic intervention and discovery of potential biotherapeutics that can agonize or antagonize them. Furthermore, computational methods for de novo and epitope-specific antibody design are increasingly being developed, opening novel computationally driven opportunities for biologic drug discovery. Here, we review the opportunities and limitations of emerging computational approaches for optimizing antigens to generate robust immune responses, in silico generation of antibody sequences, discovery of potential antibody binders through virtual screening, assessment of hits, identification of lead drug candidates and their affinity maturation, and optimization for developability. The adoption of biopharmaceutical informatics across all aspects of drug discovery and development cycles should help bring affordable and effective biotherapeutics to patients more quickly.
Collapse
Affiliation(s)
- Joschka Bauer
- Early Stage Pharmaceutical Development Biologicals, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
| | - Nandhini Rajagopal
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Priyanka Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Pankaj Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Andrew E. Nixon
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Sandeep Kumar
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| |
Collapse
|
82
|
Wang C, Liu S, Tang Y, Yang H, Liu J. Diagnostic Test Accuracy of Deep Learning Prediction Models on COVID-19 Severity: Systematic Review and Meta-Analysis. J Med Internet Res 2023; 25:e46340. [PMID: 37477951 PMCID: PMC10403760 DOI: 10.2196/46340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/27/2023] [Accepted: 06/30/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND Deep learning (DL) prediction models hold great promise in the triage of COVID-19. OBJECTIVE We aimed to evaluate the diagnostic test accuracy of DL prediction models for assessing and predicting the severity of COVID-19. METHODS We searched PubMed, Scopus, LitCovid, Embase, Ovid, and the Cochrane Library for studies published from December 1, 2019, to April 30, 2022. Studies that used DL prediction models to assess or predict COVID-19 severity were included, while those without diagnostic test accuracy analysis or severity dichotomies were excluded. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2), PROBAST (Prediction Model Risk of Bias Assessment Tool), and funnel plots were used to estimate the bias and applicability. RESULTS A total of 12 retrospective studies involving 2006 patients reported the cross-sectionally assessed value of DL on COVID-19 severity. The pooled sensitivity and area under the curve were 0.92 (95% CI 0.89-0.94; I2=0.00%) and 0.95 (95% CI 0.92-0.96), respectively. A total of 13 retrospective studies involving 3951 patients reported the longitudinal predictive value of DL for disease severity. The pooled sensitivity and area under the curve were 0.76 (95% CI 0.74-0.79; I2=0.00%) and 0.80 (95% CI 0.76-0.83), respectively. CONCLUSIONS DL prediction models can help clinicians identify potentially severe cases for early triage. However, high-quality research is lacking. TRIAL REGISTRATION PROSPERO CRD42022329252; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD 42022329252.
Collapse
Affiliation(s)
- Changyu Wang
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Yu Tang
- Xiangya School of Medicine, Central South University, Changsha, China
| | - Hao Yang
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
83
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
84
|
Li P, Liu ZP. GeoBind: segmentation of nucleic acid binding interface on protein surface with geometric deep learning. Nucleic Acids Res 2023; 51:e60. [PMID: 37070217 PMCID: PMC10250245 DOI: 10.1093/nar/gkad288] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/21/2023] [Accepted: 04/06/2023] [Indexed: 04/19/2023] Open
Abstract
Unveiling the nucleic acid binding sites of a protein helps reveal its regulatory functions in vivo. Current methods encode protein sites from the handcrafted features of their local neighbors and recognize them via a classification, which are limited in expressive ability. Here, we present GeoBind, a geometric deep learning method for predicting nucleic binding sites on protein surface in a segmentation manner. GeoBind takes the whole point clouds of protein surface as input and learns the high-level representation based on the aggregation of their neighbors in local reference frames. Testing GeoBind on benchmark datasets, we demonstrate GeoBind is superior to state-of-the-art predictors. Specific case studies are performed to show the powerful ability of GeoBind to explore molecular surfaces when deciphering proteins with multimer formation. To show the versatility of GeoBind, we further extend GeoBind to five other types of ligand binding sites prediction tasks and achieve competitive performances.
Collapse
Affiliation(s)
- Pengpai Li
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
85
|
Graef J, Ehrt C, Rarey M. Binding Site Detection Remastered: Enabling Fast, Robust, and Reliable Binding Site Detection and Descriptor Calculation with DoGSite3. J Chem Inf Model 2023; 63:3128-3137. [PMID: 37130052 DOI: 10.1021/acs.jcim.3c00336] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Binding site prediction on protein structures is a crucial step in early phase drug discovery whenever experimental or predicted structure models are involved. DoGSite belongs to the widely used tools for this task. It is a grid-based method that uses a Difference-of-Gaussian filter to detect cavities on the protein surface. We recently reimplemented the first version of this method, released in 2010, focusing on improved binding site detection in the presence of ligands and optimized parameters for more robust, reliable, and fast predictions and binding site descriptor calculations. Here, we introduce the new version, DoGSite3, compare it to its predecessor, and re-evaluate DoGSite on published data sets for a large-scale comparative performance evaluation.
Collapse
Affiliation(s)
- Joel Graef
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
| |
Collapse
|
86
|
Sharkia R, Jain S, Mahajnah M, Habib C, Azem A, Al-Shareef W, Zalan A. PTRH2 Gene Variants: Recent Review of the Phenotypic Features and Their Bioinformatics Analysis. Genes (Basel) 2023; 14:genes14051031. [PMID: 37239392 DOI: 10.3390/genes14051031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/25/2023] [Accepted: 04/28/2023] [Indexed: 05/28/2023] Open
Abstract
Peptidyl-tRNA hydrolase 2 (PTRH2) is an evolutionarily highly conserved mitochondrial protein. The biallelic mutations in the PTRH2 gene have been suggested to cause a rare autosomal recessive disorder characterized by an infantile-onset multisystem neurologic endocrine and pancreatic disease (IMNEPD). Patients with IMNEPD present varying clinical manifestations, including global developmental delay associated with microcephaly, growth retardation, progressive ataxia, distal muscle weakness with ankle contractures, demyelinating sensorimotor neuropathy, sensorineural hearing loss, and abnormalities of thyroid, pancreas, and liver. In the current study, we conducted an extensive literature review with an emphasis on the variable clinical spectrum and genotypes in patients. Additionally, we reported on a new case with a previously documented mutation. A bioinformatics analysis of the various PTRH2 gene variants was also carried out from a structural perspective. It appears that the most common clinical characteristics among all patients include motor delay (92%), neuropathy (90%), distal weakness (86.4%), intellectual disability (84%), hearing impairment (80%), ataxia (79%), and deformity of head and face (~70%). The less common characteristics include hand deformity (64%), cerebellar atrophy/hypoplasia (47%), and pancreatic abnormality (35%), while the least common appear to be diabetes mellitus (~30%), liver abnormality (~22%), and hypothyroidism (16%). Three missense mutations were revealed in the PTRH2 gene, the most common one being Q85P, which was shared by four different Arab communities and was presented in our new case. Moreover, four different nonsense mutations in the PTRH2 gene were detected. It may be concluded that disease severity depends on the PTRH2 gene variant, as most of the clinical features are manifested by nonsense mutations, while only the common features are presented by missense mutations. A bioinformatics analysis of the various PTRH2 gene variants also suggested the mutations to be deleterious, as they seem to disrupt the structural confirmation of the enzyme, leading to loss of stability and functionality.
Collapse
Affiliation(s)
- Rajech Sharkia
- Unit of Human Biology and Genetics, Triangle Regional Research and Development Center, Kfar Qari 30075, Israel
- Unit of Natural Sciences, Beit-Berl Academic College, Beit-Berl 4490500, Israel
| | - Sahil Jain
- Department of Biochemistry and Molecular Biology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Muhammad Mahajnah
- The Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel
- Child Neurology and Development Center, Hillel Yaffe Medical Center, Hadera 38100, Israel
| | - Clair Habib
- Genetics Institute, Rambam Health Care Campus, Haifa 31096, Israel
| | - Abdussalam Azem
- Department of Biochemistry and Molecular Biology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Wasif Al-Shareef
- Unit of Human Biology and Genetics, Triangle Regional Research and Development Center, Kfar Qari 30075, Israel
| | - Abdelnaser Zalan
- Unit of Human Biology and Genetics, Triangle Regional Research and Development Center, Kfar Qari 30075, Israel
| |
Collapse
|
87
|
Krapp LF, Abriata LA, Cortés Rodriguez F, Dal Peraro M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat Commun 2023; 14:2175. [PMID: 37072397 PMCID: PMC10113261 DOI: 10.1038/s41467-023-37701-8] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 04/20/2023] Open
Abstract
Proteins are essential molecular building blocks of life, responsible for most biological functions as a result of their specific molecular interactions. However, predicting their binding interfaces remains a challenge. In this study, we present a geometric transformer that acts directly on atomic coordinates labeled only with element names. The resulting model-the Protein Structure Transformer, PeSTo-surpasses the current state of the art in predicting protein-protein interfaces and can also predict and differentiate between interfaces involving nucleic acids, lipids, ions, and small molecules with high confidence. Its low computational cost enables processing high volumes of structural data, such as molecular dynamics ensembles allowing for the discovery of interfaces that remain otherwise inconspicuous in static experimentally solved structures. Moreover, the growing foldome provided by de novo structural predictions can be easily analyzed, providing new opportunities to uncover unexplored biology.
Collapse
Affiliation(s)
- Lucien F Krapp
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Fabio Cortés Rodriguez
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland.
| |
Collapse
|
88
|
Isert C, Atz K, Schneider G. Structure-based drug design with geometric deep learning. Curr Opin Struct Biol 2023; 79:102548. [PMID: 36842415 DOI: 10.1016/j.sbi.2023.102548] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/16/2023] [Accepted: 01/24/2023] [Indexed: 02/26/2023]
Abstract
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Collapse
Affiliation(s)
- Clemens Isert
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Kenneth Atz
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, Zurich, 8093, Switzerland; ETH Singapore SEC Ltd, 1 CREATE Way, #06-01 CREATE Tower, Singapore, 8093, Singapore.
| |
Collapse
|
89
|
Gao Z, Jiang C, Zhang J, Jiang X, Li L, Zhao P, Yang H, Huang Y, Li J. Hierarchical graph learning for protein-protein interaction. Nat Commun 2023; 14:1093. [PMID: 36841846 PMCID: PMC9968329 DOI: 10.1038/s41467-023-36736-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/14/2023] [Indexed: 02/27/2023] Open
Abstract
Protein-Protein Interactions (PPIs) are fundamental means of functions and signalings in biological systems. The massive growth in demand and cost associated with experimental PPI studies calls for computational tools for automated prediction and understanding of PPIs. Despite recent progress, in silico methods remain inadequate in modeling the natural PPI hierarchy. Here we present a double-viewed hierarchical graph learning model, HIGH-PPI, to predict PPIs and extrapolate the molecular details involved. In this model, we create a hierarchical graph, in which a node in the PPI network (top outside-of-protein view) is a protein graph (bottom inside-of-protein view). In the bottom view, a group of chemically relevant descriptors, instead of the protein sequences, are used to better capture the structure-function relationship of the protein. HIGH-PPI examines both outside-of-protein and inside-of-protein of the human interactome to establish a robust machine understanding of PPIs. This model demonstrates high accuracy and robustness in predicting PPIs. Moreover, HIGH-PPI can interpret the modes of action of PPIs by identifying important binding and catalytic sites precisely. Overall, "HIGH-PPI [ https://github.com/zqgao22/HIGH-PPI ]" is a domain-knowledge-driven and interpretable framework for PPI prediction studies.
Collapse
Affiliation(s)
- Ziqi Gao
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China.,Division of Emerging Interdisciplinary Areas, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Chenran Jiang
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Jiawen Zhang
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China
| | - Xiaosen Jiang
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Lanqing Li
- AI Lab, Tencent, Shenzhen, 518000, China
| | | | - Huanming Yang
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Chinese Academy of Sciences, Hangzhou, 310022, China
| | - Yong Huang
- Department of Chemistry, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| | - Jia Li
- Data Science and Analytics, The Hong Kong University of Science and Technology, Guangzhou, 511400, China. .,Division of Emerging Interdisciplinary Areas, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
90
|
Derry A, Altman RB. COLLAPSE: A representation learning framework for identification and characterization of protein structural sites. Protein Sci 2023; 32:e4541. [PMID: 36519247 PMCID: PMC9847082 DOI: 10.1002/pro.4541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 12/02/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.
Collapse
Affiliation(s)
- Alexander Derry
- Department of Biomedical Data ScienceStanford UniversityStanfordCaliforniaUSA
| | - Russ B. Altman
- Department of Biomedical Data ScienceStanford UniversityStanfordCaliforniaUSA
- Departments of Bioengineering, Genetics, and MedicineStanford UniversityStanfordCaliforniaUSA
| |
Collapse
|
91
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
92
|
Zheng J, Yang X, Zhang Z. Using PlaPPISite to Predict and Analyze Plant Protein-Protein Interaction Sites. Methods Mol Biol 2023; 2690:385-399. [PMID: 37450161 DOI: 10.1007/978-1-0716-3327-4_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Proteome-wide characterization of protein-protein interactions (PPIs) is crucial to understand the functional roles of protein machinery within cells systematically. With the accumulation of PPI data in different plants, the interaction details of binary PPIs, such as the three-dimensional (3D) structural contexts of interaction sites/interfaces, are urgently demanded. To meet this requirement, we have developed a comprehensive and easy-to-use database called PlaPPISite ( http://zzdlab.com/plappisite/index.php ) to present interaction details for 13 plant interactomes. Here, we provide a clear guide on how to search and view protein interaction details through the PlaPPISite database. Firstly, the running environment of our database is introduced. Secondly, the input file format is briefly introduced. Moreover, we discussed which information related to interaction sites can be achieved through several examples. In addition, some notes about PlaPPISite are also provided. More importantly, we would like to emphasize the importance of interaction site information in plant systems biology through this user guide of PlaPPISite. In particular, the easily accessible 3D structures of PPIs in the coming post-AlphaFold2 era will definitely boost the application of plant interactome to decipher the molecular mechanisms of many fundamental biological issues.
Collapse
Affiliation(s)
- Jingyan Zheng
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Xiaodi Yang
- Department of Hematology, Peking University First Hospital, Beijing, China.
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| |
Collapse
|
93
|
Akagawa M, Shirai T, Sada M, Nagasawa N, Kondo M, Takeda M, Nagasawa K, Kimura R, Okayama K, Hayashi Y, Sugai T, Tsugawa T, Ishii H, Kawashima H, Katayama K, Ryo A, Kimura H. Detailed Molecular Interactions between Respiratory Syncytial Virus Fusion Protein and the TLR4/MD-2 Complex In Silico. Viruses 2022; 14:v14112382. [PMID: 36366480 PMCID: PMC9694959 DOI: 10.3390/v14112382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/19/2022] [Accepted: 10/27/2022] [Indexed: 01/31/2023] Open
Abstract
Molecular interactions between respiratory syncytial virus (RSV) fusion protein (F protein) and the cellular receptor Toll-like receptor 4 (TLR4) and myeloid differentiation factor-2 (MD-2) protein complex are unknown. Thus, to reveal the detailed molecular interactions between them, in silico analyses were performed using various bioinformatics techniques. The present simulation data showed that the neutralizing antibody (NT-Ab) binding sites in both prefusion and postfusion proteins at sites II and IV were involved in the interactions between them and the TLR4 molecule. Moreover, the binding affinity between postfusion proteins and the TLR4/MD-2 complex was higher than that between prefusion proteins and the TLR4/MD-2 complex. This increased binding affinity due to conformational changes in the F protein may be able to form syncytium in RSV-infected cells. These results may contribute to better understand the infectivity and pathogenicity (syncytium formation) of RSV.
Collapse
Affiliation(s)
- Mao Akagawa
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Tatsuya Shirai
- Advanced Medical Science Research Center, Gunma Paz University Research Institute, Shibukawa-shi 377-0008, Japan
| | - Mitsuru Sada
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Norika Nagasawa
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Mayumi Kondo
- Department of Clinical Engineering, Faculty of Medical Technology, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Makoto Takeda
- Department of Virology III, National Institute of Infectious Diseases, Musashimurayama-shi, Tokyo 208-0011, Japan
| | - Koo Nagasawa
- Department of Pediatrics, Graduate School of Medical Science, Chiba University, Chiba-shi 260-8670, Japan
| | - Ryusuke Kimura
- Advanced Medical Science Research Center, Gunma Paz University Research Institute, Shibukawa-shi 377-0008, Japan
- Department of Bacteriology, Graduate School of Medicine, Gunma University, Maebashi-shi 371-8514, Japan
| | - Kaori Okayama
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Yuriko Hayashi
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
| | - Toshiyuki Sugai
- Department of Nursing Science, Graduate School of Health Science, Hiroshima University, Hiroshima-shi 734-8551, Japan
| | - Takeshi Tsugawa
- Department of Pediatrics, School of Medicine, Sapporo Medical University, Sapporo-shi 060-8543, Japan
| | - Haruyuki Ishii
- Department of Respiratory Medicine, School of Medicine, Kyorin University, Mitaka-shi, Tokyo 181-8611, Japan
| | - Hisashi Kawashima
- Department of Pediatrics and Adolescent Medicine, Tokyo Medical University, Shinjuku-ku, Tokyo 160-0023, Japan
| | - Kazuhiko Katayama
- Laboratory of Viral Infection Control, Graduate School of Infection Control Sciences, Ōmura Satoshi Memorial Institute, Kitasato University, Minato-ku, Tokyo 108-8641, Japan
| | - Akihide Ryo
- Department of Microbiology, School of Medicine, Yokohama City University, Yokohama-shi 236-0004, Japan
| | - Hirokazu Kimura
- Department of Health Science, Graduate School of Health Sciences, Gunma Paz University, Takasaki-shi 370-0006, Japan
- Advanced Medical Science Research Center, Gunma Paz University Research Institute, Shibukawa-shi 377-0008, Japan
- Correspondence: ; Tel.: +81-27-365-3366; Fax: +81-42-247-8077
| |
Collapse
|
94
|
Tubiana J, Xiang Y, Fan L, Wolfson HJ, Chen K, Schneidman-Duhovny D, Shi Y. Reduced B cell antigenicity of Omicron lowers host serologic response. Cell Rep 2022; 41:111512. [PMID: 36223774 PMCID: PMC9515332 DOI: 10.1016/j.celrep.2022.111512] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/10/2022] [Accepted: 09/26/2022] [Indexed: 11/25/2022] Open
Abstract
The SARS-CoV-2 Omicron variant evades most neutralizing vaccine-induced antibodies and is associated with lower antibody titers upon breakthrough infections than previous variants. However, the mechanism remains unclear. Here, we find using a geometric deep-learning model that Omicron's extensively mutated receptor binding site (RBS) features reduced antigenicity compared with previous variants. Mice immunization experiments with different recombinant receptor binding domain (RBD) variants confirm that the serological response to Omicron is drastically attenuated and less potent. Analyses of serum cross-reactivity and competitive ELISA reveal a reduction in antibody response across both variable and conserved RBD epitopes. Computational modeling confirms that the RBS has a potential for further antigenicity reduction while retaining efficient receptor binding. Finally, we find a similar trend of antigenicity reduction over decades for hCoV229E, a common cold coronavirus. Thus, our study explains the reduced antibody titers associated with Omicron infection and reveals a possible trajectory of future viral evolution.
Collapse
Affiliation(s)
- Jérôme Tubiana
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel; School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190501, Israel
| | - Yufei Xiang
- Center for Protein Engineering and Therapeutics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Li Fan
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Kong Chen
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | - Dina Schneidman-Duhovny
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190501, Israel.
| | - Yi Shi
- Center for Protein Engineering and Therapeutics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
95
|
Sim J, Kwon S, Seok C. HProteome-BSite: predicted binding sites and ligands in human 3D proteome. Nucleic Acids Res 2022; 51:D403-D408. [PMID: 36243970 PMCID: PMC9825455 DOI: 10.1093/nar/gkac873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/20/2022] [Accepted: 09/29/2022] [Indexed: 01/29/2023] Open
Abstract
Atomic-level knowledge of protein-ligand interactions allows a detailed understanding of protein functions and provides critical clues to discovering molecules regulating the functions. While recent innovative deep learning methods for protein structure prediction dramatically increased the structural coverage of the human proteome, molecular interactions remain largely unknown. A new database, HProteome-BSite, provides predictions of binding sites and ligands in the enlarged 3D human proteome. The model structures for human proteins from the AlphaFold Protein Structure Database were processed to structural domains of high confidence to maximize the coverage and reliability of interaction prediction. For ligand binding site prediction, an updated version of a template-based method GalaxySite was used. A high-level performance of the updated GalaxySite was confirmed. HProteome-BSite covers 80.74% of the UniProt entries in the AlphaFold human 3D proteome. Predicted binding sites and binding poses of potential ligands are provided for effective applications to further functional studies and drug discovery. The HProteome-BSite database is available at https://galaxy.seoklab.org/hproteome-bsite/database and is free and open to all users.
Collapse
Affiliation(s)
- Jiho Sim
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea,Galux Inc, Gwanak-gu, Seoul 08738, Republic of Korea
| | - Chaok Seok
- To whom correspondence should be addressed. Tel: +82 2 880 9197; Fax: +82 2 889 1568;
| |
Collapse
|
96
|
Evaluation of the Effectiveness of Derived Features of AlphaFold2 on Single-Sequence Protein Binding Site Prediction. BIOLOGY 2022; 11:biology11101454. [PMID: 36290358 PMCID: PMC9598995 DOI: 10.3390/biology11101454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Simple Summary With the development of artificial intelligence, researchers can roughly predict the crystal structure of a protein by computer without the need for biological experiments, which provides new ideas and solutions to problems, such as protein-protein interaction and drug-target predictions. In this study, we proposed strategies to combine predicted protein structures with deep learning networks and evaluated them on different protein binding site prediction tasks. Our computational experiment results showed that all proposed strategies could effectively encode structural information for deep learning models. Abstract Though AlphaFold2 has attained considerably high precision on protein structure prediction, it is reported that directly inputting coordinates into deep learning networks cannot achieve desirable results on downstream tasks. Thus, how to process and encode the predicted results into effective forms that deep learning models can understand to improve the performance of downstream tasks is worth exploring. In this study, we tested the effects of five processing strategies of coordinates on two single-sequence protein binding site prediction tasks. These five strategies are spatial filtering, the singular value decomposition of a distance map, calculating the secondary structure feature, and the relative accessible surface area feature of proteins. The computational experiment results showed that all strategies were suitable and effective methods to encode structural information for deep learning models. In addition, by performing a case study of a mutated protein, we showed that the spatial filtering strategy could introduce structural changes into HHblits profiles and deep learning networks when protein mutation happens. In sum, this work provides new insight into the downstream tasks of protein-molecule interaction prediction, such as predicting the binding residues of proteins and estimating the effects of mutations.
Collapse
|
97
|
Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:5014-5027. [PMID: 36091720 PMCID: PMC9448712 DOI: 10.1016/j.csbj.2022.09.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/03/2022] [Accepted: 09/03/2022] [Indexed: 11/26/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), has led to a global pandemic. Deep learning (DL) technology and molecular dynamics (MD) simulation are two mainstream computational approaches to investigate the geometric, chemical and structural features of protein and guide the relevant drug design. Despite a large amount of research papers focusing on drug design for SARS-COV-2 using DL architectures, it remains unclear how the binding energy of the protein-protein/ligand complex dynamically evolves which is also vital for drug development. In addition, traditional deep neural networks usually have obvious deficiencies in predicting the interaction sites as protein conformation changes. In this review, we introduce the latest progresses of the DL and DL-based MD simulation approaches in structure-based drug design (SBDD) for SARS-CoV-2 which could address the problems of protein structure and binding prediction, drug virtual screening, molecular docking and complex evolution. Furthermore, the current challenges and future directions of DL-based MD simulation for SBDD are also discussed.
Collapse
Affiliation(s)
- Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Chengcheng Shi
- State Key Lab of Urban Water Resource and Environment, School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|