1
|
Gainza P, Bunker RD, Townson SA, Castle JC. Machine learning to predict de novo protein-protein interactions. Trends Biotechnol 2025:S0167-7799(25)00158-1. [PMID: 40425414 DOI: 10.1016/j.tibtech.2025.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/23/2025] [Accepted: 04/23/2025] [Indexed: 05/29/2025]
Abstract
Advances in machine learning for structural biology have dramatically enhanced our capacity to predict protein-protein interactions (PPIs). Here, we review recent developments in the computational prediction of PPIs, particularly focusing on innovations that enable interaction predictions that have no precedence in nature, termed de novo. We discuss novel machine learning algorithms for PPI prediction, including approaches based on co-folding and atomic graphs. We further highlight methods that learn from molecular surfaces, which can predict PPIs not found in nature including interactions induced by small molecules. Finally, we explore the emerging biotechnological applications enabled by these predictive capabilities, including the prediction of antibody-antigen complexes and molecular glue-induced PPIs, and discuss their potential to empower drug discovery and protein engineering.
Collapse
Affiliation(s)
- Pablo Gainza
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| | - Richard D Bunker
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - Sharon A Townson
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - John C Castle
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| |
Collapse
|
2
|
Ballerini C, Amoriello R, Maghrebi O, Bellucci G, Addazio I, Betti M, Aprea MG, Masciulli C, Caporali A, Penati V, Ballerini C, De Meo E, Portaccio E, Salvetti M, Amato MP. Exploring the role of EBV in multiple sclerosis pathogenesis through EBV interactome. Front Immunol 2025; 16:1557483. [PMID: 40242760 PMCID: PMC11999961 DOI: 10.3389/fimmu.2025.1557483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 03/18/2025] [Indexed: 04/18/2025] Open
Abstract
Background Epstein-Barr virus (EBV) is a known risk factor for multiple sclerosis (MS), even though the underlying molecular mechanisms are unclear and engage multiple immune pathways. Furthermore, the ultimate role of EBV in MS pathogenesis is still elusive. In contrast, Cytomegalovirus (CMV) has been identified as a protective factor for MS. Objectives This study aims to identify MS-associated genes that overlap with EBV interactome and to examine their expression in immune and glial cell subtypes. Methods We used P-HIPSTer, GWAS, and the Human Protein Atlas (HPA) to derive data on the EBV interactome, MS-associated genes and single-cell gene expression in immune and glial cells. The geneOverlap and dplyr R packages identified overlapping genes. A similar analysis was done for CMV and Adenovirus as negative control. Metascape and GTEx analyzed biological pathways and brain-level gene expression; transcriptomic analysis was performed on glial cells and peripheral blood in MS and controls. All the analyses performed in this study were generated using publicly available data sets. Results We identified a "core" group of 21 genes shared across EBV interactome, MS genes, and immune and glial cells (p<0.001). Pathway analysis revealed expected associations, such as immune system activation, and unforeseen results, like the prolactin signaling pathway. BCL2 in astrocytes, MINK1 in microglia were significantly upregulated while AHI1 was downregulated in MS compared to controls. Conclusions Our findings offer novel insights into EBV and CMV interaction with immune and glial cells in MS, that may shed light on mechanisms involved in disease pathophysiology.
Collapse
Affiliation(s)
- Chiara Ballerini
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Roberta Amoriello
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Olfa Maghrebi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Gianmarco Bellucci
- Department of Neurosciences, Mental Health and Sensory Organs, Sapienza University of Rome, Rome, Italy
| | - Ilaria Addazio
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Matteo Betti
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Maria Grazia Aprea
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Camilla Masciulli
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Arianna Caporali
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Valeria Penati
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Clara Ballerini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Ermelinda De Meo
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Emilio Portaccio
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
| | - Marco Salvetti
- Department of Neurosciences, Mental Health and Sensory Organs, Sapienza University of Rome, Rome, Italy
- Neuromed, IRCCS Istituto Neurologico Mediterraneo (INM), Pozzilli, Italy
| | - Maria Pia Amato
- Department of Neuroscience, Psychology, Drug Research and Child Health (NEUROFARBA), University of Florence, Florence, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico (IRCCS) Fondazione Don Carlo Gnocchi, University of Florence, Florence, Italy
| |
Collapse
|
3
|
Yoon MS, Bae B, Kim K, Park H, Baek M. Deep learning methods for proteome-scale interaction prediction. Curr Opin Struct Biol 2025; 90:102981. [PMID: 39848140 DOI: 10.1016/j.sbi.2024.102981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/13/2024] [Accepted: 12/22/2024] [Indexed: 01/25/2025]
Abstract
Proteome-scale interaction prediction is essential for understanding protein functions and disease mechanisms. Traditional experimental methods are often limited by scale and complexity, driving the need for computational approaches. Deep learning has emerged as a powerful tool, enabling high-throughput, accurate predictions of protein interactions. This review highlights recent advances in deep learning methods for protein-protein and protein-ligand interaction screening, along with datasets used for model training. Despite the progress with deep learning, challenges such as data quality and validation biases remain. We also discuss the increasing importance of integrating structural information to enhance prediction accuracy and how structure-based deep learning approaches can help overcome current limitations, ultimately advancing biological research and drug discovery.
Collapse
Affiliation(s)
- Min Su Yoon
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Byunghyun Bae
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea; Biomedical Research Division, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea
| | - Kunhee Kim
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Hahnbeom Park
- Biomedical Research Division, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea; KIST-SKKU Brain Research Center, SKKU Institute for Convergence, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| | - Minkyung Baek
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea.
| |
Collapse
|
4
|
Buzzao D, Persson E, Guala D, Sonnhammer ELL. FunCoup 6: advancing functional association networks across species with directed links and improved user experience. Nucleic Acids Res 2025; 53:D658-D671. [PMID: 39530220 PMCID: PMC11701656 DOI: 10.1093/nar/gkae1021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/11/2024] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
FunCoup 6 (https://funcoup.org) represents a significant advancement in global functional association networks, aiming to provide researchers with a comprehensive view of the functional coupling interactome. This update introduces novel methodologies and integrated tools for improved network inference and analysis. Major new developments in FunCoup 6 include vastly expanding the coverage of gene regulatory links, a new framework for bin-free Bayesian training and a new website. FunCoup 6 integrates a new tool for disease and drug target module identification using the TOPAS algorithm. To expand the utility of the resource for biomedical research, it incorporates pathway enrichment analysis using the ANUBIX and EASE algorithms. The unique comparative interactomics analysis in FunCoup provides insights of network conservation, now allowing users to align orthologs only or query each species network independently. Bin-free training was applied to 23 primary species, and in addition, networks were generated for all remaining 618 species in InParanoiDB 9. Accompanying these advancements, FunCoup 6 features a new redesigned website, together with updated API functionalities, and represents a pivotal step forward in functional genomics research, offering unique capabilities for exploring the complex landscape of protein interactions.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| |
Collapse
|
5
|
Wright SN, Colton S, Schaffer LV, Pillich RT, Churas C, Pratt D, Ideker T. State of the interactomes: an evaluation of molecular networks for generating biological insights. Mol Syst Biol 2025; 21:1-29. [PMID: 39653848 PMCID: PMC11697402 DOI: 10.1038/s44320-024-00077-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/07/2024] [Accepted: 11/11/2024] [Indexed: 12/18/2024] Open
Abstract
Advancements in genomic and proteomic technologies have powered the creation of large gene and protein networks ("interactomes") for understanding biological systems. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 45 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP, Reactome, and SIGNOR demonstrate stronger performance in interaction prediction. Our study provides a benchmark for interactomes across diverse biological applications and clarifies factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.
Collapse
Affiliation(s)
- Sarah N Wright
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Scott Colton
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Leah V Schaffer
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher Churas
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
6
|
Bhadra-Lobo S, Derevyanko G, Lamoureux G. Dock2D: Synthetic Data for the Molecular Recognition Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2580-2586. [PMID: 38814763 DOI: 10.1109/tcbb.2024.3407477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New classes of learning-based algorithms are actively being developed, and are typically trained end-to-end on protein complex structures extracted from the Protein Data Bank. These training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. We present Dock2D-IP and Dock2D-IF, two "toy" datasets that can be used to select algorithms predicting protein-protein interactions-or any other type of molecular interactions. Using two-dimensional shapes as input, each example from Dock2D-IP ("interaction pose") describes the interaction pose of two shapes known to interact and each example from Dock2D-IF ("interaction fact") describes whether two shapes form a stable complex or not, regardless of how they bind. We propose a number of baseline solutions to the problem and show that the same underlying energy function can be learned either by solving the interaction pose task (formulated as an energy-minimization "docking" problem) or the fact-of-interaction task (formulated as a binding free energy estimation problem).
Collapse
|
7
|
Wilkins GR, Lugo-Martinez J, Murphy RF. Improved protein interaction models predict differences in complexes between human cell lines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.25.620244. [PMID: 39484534 PMCID: PMC11527118 DOI: 10.1101/2024.10.25.620244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The interactions of proteins to form complexes play a crucial role in cell function. Data on protein-protein or pairwise interactions (PPI) typically come from a combination of sample separation and mass spectrometry. Since 2010, several extensive, high-throughput mass spectrometry-based experimental studies have dramatically expanded public repositories for PPI data and, by extension, our knowledge of protein complexes. Unfortunately, challenges of limited overlap between experiments, modality-oriented biases, and prohibitive costs of experimental reproducibility continue to limit coverage of the human protein assembly map, both underscoring the need for and spurring the development of relevant computational approaches. Here, we present a new method for predicting the strength of protein interactions. It addresses two important issues that have limited past PPI prediction approaches: incomplete feature sets and incomplete proteome coverage. For a given collection of protein pairs, we fused data from heterogeneous sources into a feature matrix and identified the minimal set of feature partitions for which a non-empty set of protein pairs had complete values. For each such feature partition, we trained a classifier to predict PPI probabilities. We then calculated an overall prediction for a given protein pair by weighting the probabilities from all models that applied to that pair. Our approach accurately identified known and highly probable PPI, far exceeding the performance of current approaches and providing more complete proteome coverage. We then used the predicted probabilities to assemble complexes using previously-described graph-based tools and clustering algorithms and again obtained improved results. Lastly, we used features for three human cell lines to predict PPI and complex scores and identified complexes predicted to differ between those cell lines.
Collapse
Affiliation(s)
- Gary R. Wilkins
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Jose Lugo-Martinez
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Robert F. Murphy
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| |
Collapse
|
8
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. A structurally informed human protein-protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 2024:10.1038/s41587-024-02428-4. [PMID: 39448882 DOI: 10.1038/s41587-024-02428-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/11/2024] [Indexed: 10/26/2024]
Abstract
To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Grants
- R01GM124559 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM125639 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM130885 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- RM1GM139738 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01DK115398 U.S. Department of Health & Human Services | NIH | National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases)
- U01HG007691 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01HL155107 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL155096 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL166137 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U54HL119145 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- AHA957729 American Heart Association (American Heart Association, Inc.)
- 24MERIT1185447 American Heart Association (American Heart Association, Inc.)
- R01AG084250 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R56AG074001 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- U01AG073323 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG066707 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG076448 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG082118 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1AG082211 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R21AG083003 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1NS133812 U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Yunguang Qiu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY, USA
| | - Yadi Zhou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
- Biophysics Program, Cornell University, Ithaca, NY, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Charis Eng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
9
|
Rodriguez DCP, Weber KC, Sundberg B, Glasgow A. MAGPIE: An interactive tool for visualizing and analyzing protein-ligand interactions. Protein Sci 2024; 33:e5027. [PMID: 38989559 PMCID: PMC11237554 DOI: 10.1002/pro.5027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/22/2024] [Accepted: 05/05/2024] [Indexed: 07/12/2024]
Abstract
Quantitative tools to compile and analyze biomolecular interactions among chemically diverse binding partners would improve therapeutic design and aid in studying molecular evolution. Here we present Mapping Areas of Genetic Parsimony In Epitopes (MAGPIE), a publicly available software package for simultaneously visualizing and analyzing thousands of interactions between a single protein or small molecule ligand (the "target") and all of its protein binding partners ("binders"). MAGPIE generates an interactive three-dimensional visualization from a set of protein complex structures that share the target ligand, as well as sequence logo-style amino acid frequency graphs that show all the amino acids from the set of protein binders that interact with user-defined target ligand positions or chemical groups. MAGPIE highlights all the salt bridge and hydrogen bond interactions made by the target in the visualization and as separate amino acid frequency graphs. Finally, MAGPIE collates the most common target-binder interactions as a list of "hotspots," which can be used to analyze trends or guide the de novo design of protein binders. As an example of the utility of the program, we used MAGPIE to probe how different antibody fragments bind a viral antigen; how a common metabolite binds diverse protein partners; and how two ligands bind orthologs of a well-conserved glycolytic enzyme for a detailed understanding of evolutionarily conserved interactions involved in its activation and inhibition. MAGPIE is implemented in Python 3 and freely available at https://github.com/glasgowlab/MAGPIE, along with sample datasets, usage examples, and helper scripts to prepare input structures.
Collapse
Affiliation(s)
- Daniel C. Pineda Rodriguez
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Kyle C. Weber
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Belen Sundberg
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Anum Glasgow
- Department of Biochemistry and Molecular BiophysicsColumbia University Irving Medical CenterNew YorkNew YorkUSA
| |
Collapse
|
10
|
Reys V, Pons JL, Labesse G. SLiMAn 2.0: meaningful navigation through peptide-protein interaction networks. Nucleic Acids Res 2024; 52:W313-W317. [PMID: 38783158 PMCID: PMC11223867 DOI: 10.1093/nar/gkae398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/17/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024] Open
Abstract
Among the myriad of protein-protein interactions occurring in living organisms, a substantial amount involves small linear motifs (SLiMs) recognized by structured domains. However, predictions of SLiM-based networks are tedious, due to the abundance of such motifs and a high portion of false positive hits. For this reason, a webserver SLiMAn (Short Linear Motif Analysis) was developed to focus the search on the most relevant SLiMs. Using SLiMAn, one can navigate into a given (meta-)interactome and tune a variety of parameters associated to each type of SLiMs in attempt to identify functional ELM motifs and their recognition domains. The IntAct and BioGRID databases bring experimental information, while IUPred and AlphaFold provide boundaries of folded and disordered regions. Post-translational modifications listed in PhosphoSite+ are highlighted. Links to PubMed accelerate scrutiny into the literature, to support (or not) putative pairings. Dedicated visualization features are also incorporated, such as Cytoscape for macromolecular networks and BINANA for intermolecular contacts within structural models generated by SCWRL 3.0. The use of SLiMAn 2.0 is illustrated on a simple example. It is freely available at https://sliman2.cbs.cnrs.fr.
Collapse
Affiliation(s)
- Victor Reys
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| | - Jean-Luc Pons
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| | - Gilles Labesse
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| |
Collapse
|
11
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
12
|
Su Z, Griffin B, Emmons S, Wu Y. Prediction of interactions between cell surface proteins by machine learning. Proteins 2024; 92:567-580. [PMID: 38050713 DOI: 10.1002/prot.26648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/06/2023]
Abstract
Cells detect changes in their external environments or communicate with each other through proteins on their surfaces. These cell surface proteins form a complicated network of interactions in order to fulfill their functions. The interactions between cell surface proteins are highly dynamic and, thus, challenging to detect using traditional experimental techniques. Here, we tackle this challenge using a computational framework. The primary focus of the framework is to develop new tools to identify interactions between domains in the immunoglobulin (Ig) fold, which is the most abundant domain family in cell surface proteins. These interactions could be formed between ligands and receptors from different cells or between proteins on the same cell surface. In practice, we collected all structural data on Ig domain interactions and transformed them into an interface fragment pair library. A high-dimensional profile can then be constructed from the library for a given pair of query protein sequences. Multiple machine learning models were used to read this profile so that the probability of interaction between the query proteins could be predicted. We tested our models on an experimentally derived dataset that contains 564 cell surface proteins in humans. The cross-validation results show that we can achieve higher than 70% accuracy in identifying the PPIs within this dataset. We then applied this method to a group of 46 cell surface proteins in Caenorhabditis elegans. We screened every possible interaction between these proteins. Many interactions recognized by our machine learning classifiers have been experimentally confirmed in the literature. In conclusion, our computational platform serves as a useful tool to help identify potential new interactions between cell surface proteins in addition to current state-of-the-art experimental techniques. The tool is freely accessible for use by the scientific community. Moreover, the general framework of the machine learning classification can also be extended to study the interactions of proteins in other domain superfamilies.
Collapse
Affiliation(s)
- Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Brian Griffin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Scott Emmons
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
13
|
Zhang J, Durham J, Qian Cong. Revolutionizing protein-protein interaction prediction with deep learning. Curr Opin Struct Biol 2024; 85:102775. [PMID: 38330793 DOI: 10.1016/j.sbi.2024.102775] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 02/10/2024]
Abstract
Protein-protein interactions (PPIs) are pivotal for driving diverse biological processes, and any disturbance in these interactions can lead to disease. Thus, the study of PPIs has been a central focus in biology. Recent developments in deep learning methods, coupled with the vast genomic sequence data, have significantly boosted the accuracy of predicting protein structures and modeling protein complexes, approaching levels comparable to experimental techniques. Herein, we review the latest advances in the computational methods for modeling 3D protein complexes and the prediction of protein interaction partners, emphasizing the application of deep learning methods deriving from coevolution analysis. The review also highlights biomedical applications of PPI prediction and outlines challenges in the field.
Collapse
Affiliation(s)
- Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. https://twitter.com/jzhang_genome
| | - Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
14
|
Leuzzi G, Vasciaveo A, Taglialatela A, Chen X, Firestone TM, Hickman AR, Mao W, Thakar T, Vaitsiankova A, Huang JW, Cuella-Martin R, Hayward SB, Kesner JS, Ghasemzadeh A, Nambiar TS, Ho P, Rialdi A, Hebrard M, Li Y, Gao J, Gopinath S, Adeleke OA, Venters BJ, Drake CG, Baer R, Izar B, Guccione E, Keogh MC, Guerois R, Sun L, Lu C, Califano A, Ciccia A. SMARCAL1 is a dual regulator of innate immune signaling and PD-L1 expression that promotes tumor immune evasion. Cell 2024; 187:861-881.e32. [PMID: 38301646 PMCID: PMC10980358 DOI: 10.1016/j.cell.2024.01.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 07/23/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024]
Abstract
Genomic instability can trigger cancer-intrinsic innate immune responses that promote tumor rejection. However, cancer cells often evade these responses by overexpressing immune checkpoint regulators, such as PD-L1. Here, we identify the SNF2-family DNA translocase SMARCAL1 as a factor that favors tumor immune evasion by a dual mechanism involving both the suppression of innate immune signaling and the induction of PD-L1-mediated immune checkpoint responses. Mechanistically, SMARCAL1 limits endogenous DNA damage, thereby suppressing cGAS-STING-dependent signaling during cancer cell growth. Simultaneously, it cooperates with the AP-1 family member JUN to maintain chromatin accessibility at a PD-L1 transcriptional regulatory element, thereby promoting PD-L1 expression in cancer cells. SMARCAL1 loss hinders the ability of tumor cells to induce PD-L1 in response to genomic instability, enhances anti-tumor immune responses and sensitizes tumors to immune checkpoint blockade in a mouse melanoma model. Collectively, these studies uncover SMARCAL1 as a promising target for cancer immunotherapy.
Collapse
Affiliation(s)
- Giuseppe Leuzzi
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Alessandro Vasciaveo
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Angelo Taglialatela
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Xiao Chen
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | | | | | - Wendy Mao
- Columbia Center for Translational Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Tanay Thakar
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Alina Vaitsiankova
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Jen-Wei Huang
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Raquel Cuella-Martin
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Samuel B Hayward
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Jordan S Kesner
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Ali Ghasemzadeh
- Columbia Center for Translational Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Tarun S Nambiar
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Patricia Ho
- Columbia Center for Translational Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Medicine, Division of Hematology and Oncology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Alexander Rialdi
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Maxime Hebrard
- Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| | - Yinglu Li
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Jinmei Gao
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | | | | | | | - Charles G Drake
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Columbia Center for Translational Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Urology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Richard Baer
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Institute for Cancer Genetics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Benjamin Izar
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Columbia Center for Translational Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Medicine, Division of Hematology and Oncology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Ernesto Guccione
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Center for Therapeutics Discovery, Department of Oncological Sciences and Pharmacological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Raphael Guerois
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Lu Sun
- EpiCypher Inc., Durham, NC 27709, USA
| | - Chao Lu
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Andrea Califano
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Medicine, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Alberto Ciccia
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY 10032, USA; Institute for Cancer Genetics, Columbia University Irving Medical Center, New York, NY 10032, USA.
| |
Collapse
|
15
|
Gopalakrishnan S, Venkatraman S. Prediction of influential proteins and enzymes of certain diseases using a directed unimodular hypergraph. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:325-345. [PMID: 38303425 DOI: 10.3934/mbe.2024015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-protein interaction (PPI) analysis based on mathematical modeling is an efficient means of identifying hub proteins, corresponding enzymes and many underlying structures. In this paper, a method for the analysis of PPI is introduced and used to analyze protein interactions of diseases such as Parkinson's, COVID-19 and diabetes melitus. A directed hypergraph is used to represent PPI interactions. A novel directed hypergraph depth-first search algorithm is introduced to find the longest paths. The minor hypergraph reduces the dimension of the directed hypergraph, representing the longest paths and results in the unimodular hypergraph. The property of unimodular hypergraph clusters influential proteins and enzymes that are related thereby providing potential avenues for disease treatment.
Collapse
Affiliation(s)
- Sathyanarayanan Gopalakrishnan
- Department of Mathematics, Srinivasa Ramanujan Centre, School of Arts, Sciences, Humanities and Education, SASTRA Deemed University, Thanjavur, India
| | - Swaminathan Venkatraman
- Department of Mathematics, School of Arts, Sciences, Humanities and Education, SASTRA Deemed University, Thanjavur, India
| |
Collapse
|
16
|
Hung TI, Hsieh YJ, Lu WL, Wu KP, Chang CEA. What Strengthens Protein-Protein Interactions: Analysis and Applications of Residue Correlation Networks. J Mol Biol 2023; 435:168337. [PMID: 37918563 PMCID: PMC11637584 DOI: 10.1016/j.jmb.2023.168337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/13/2023] [Accepted: 10/26/2023] [Indexed: 11/04/2023]
Abstract
Identifying residues critical to protein-protein binding and efficient design of stable and specific protein binders are challenging tasks. Extending beyond the direct contacts in a protein-protein binding interface, our study employs computational modeling to reveal the essential network of residue interactions and dihedral angle correlations critical in protein-protein recognition. We hypothesized that mutating residues exhibiting highly correlated dynamic motion within the interaction network could efficiently optimize protein-protein interactions to create tight and selective protein binders. We tested this hypothesis using the ubiquitin (Ub) and MERS coronaviral papain-like protease (PLpro) complex, since Ub is a central player in multiple cellular functions and PLpro is an antiviral drug target. Our designed ubiquitin variant (UbV) hosting three mutated residues displayed a ∼3,500-fold increase in functional inhibition relative to wild-type Ub. Further optimization of two C-terminal residues within the Ub network resulted in a KD of 1.5 nM and IC50 of 9.7 nM for the five-point Ub mutant, eliciting 27,500-fold and 5,500-fold enhancements in affinity and potency, respectively, as well as improved selectivity, without destabilizing the UbV structure. Our study highlights residue correlation and interaction networks in protein-protein interactions, and introduces an effective approach to design high-affinity protein binders for cell biology research and future therapeutics.
Collapse
Affiliation(s)
- Ta I Hung
- Department of Chemistry, University of California, Riverside, United States; Department of Bioengineering, University of California, Riverside, United States
| | - Yun-Jung Hsieh
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan; Institute of Biochemical Sciences, National Taiwan University, Taipei, Taiwan
| | - Wei-Lin Lu
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Kuen-Phon Wu
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan; Institute of Biochemical Sciences, National Taiwan University, Taipei, Taiwan.
| | - Chia-En A Chang
- Department of Chemistry, University of California, Riverside, United States.
| |
Collapse
|
17
|
Zhao H, Murray D, Petrey D, Honig B. ZEPPI: proteome-scale sequence-based evaluation of protein-protein interaction models. RESEARCH SQUARE 2023:rs.3.rs-3289791. [PMID: 37790387 PMCID: PMC10543297 DOI: 10.21203/rs.3.rs-3289791/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence co-evolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. ZEPPI can be implemented on a proteome-wide scale as evidenced by calculations on millions of structural models of dimeric complexes in the E. coli and human interactomes found in the PrePPI database. A number of examples that illustrate how these tools can yield novel functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Medicine, Columbia University, New York, NY 10032, USA
- Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, NY 10027, USA
| |
Collapse
|
18
|
Wodak SJ, Velankar S. Structural biology: The transformational era. Proteomics 2023; 23:e2200084. [PMID: 37667815 DOI: 10.1002/pmic.202200084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 07/26/2023] [Indexed: 09/06/2023]
Affiliation(s)
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
19
|
Xie L, Xie L. Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning. PLoS Comput Biol 2023; 19:e1010974. [PMID: 37590332 PMCID: PMC10464998 DOI: 10.1371/journal.pcbi.1010974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/29/2023] [Accepted: 07/27/2023] [Indexed: 08/19/2023] Open
Abstract
Proteolysis-targeting chimeras (PROTACs) are hetero-bifunctional molecules that induce the degradation of target proteins by recruiting an E3 ligase. PROTACs have the potential to inactivate disease-related genes that are considered undruggable by small molecules, making them a promising therapy for the treatment of incurable diseases. However, only a few hundred proteins have been experimentally tested for their amenability to PROTACs, and it remains unclear which other proteins in the entire human genome can be targeted by PROTACs. In this study, we have developed PrePROTAC, an interpretable machine learning model based on a transformer-based protein sequence descriptor and random forest classification. PrePROTAC predicts genome-wide targets that can be degraded by CRBN, one of the E3 ligases. In the benchmark studies, PrePROTAC achieved a ROC-AUC of 0.81, an average precision of 0.84, and over 40% sensitivity at a false positive rate of 0.05. When evaluated by an external test set which comprised proteins from different structural folds than those in the training set, the performance of PrePROTAC did not drop significantly, indicating its generalizability. Furthermore, we developed an embedding SHapley Additive exPlanations (eSHAP) method, which extends conventional SHAP analysis for original features to an embedding space through in silico mutagenesis. This method allowed us to identify key residues in the protein structure that play critical roles in PROTAC activity. The identified key residues were consistent with existing knowledge. Using PrePROTAC, we identified over 600 novel understudied proteins that are potentially degradable by CRBN and proposed PROTAC compounds for three novel drug targets associated with Alzheimer's disease.
Collapse
Affiliation(s)
- Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York City, New York, United States of America
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York City, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York City, New York, United States of America
| |
Collapse
|
20
|
Mathews DH, Casadio R, Sternberg MJE. Computational Resources for Molecular Biology 2023. J Mol Biol 2023:168160. [PMID: 37244569 DOI: 10.1016/j.jmb.2023.168160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Affiliation(s)
- David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.
| | - Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Bologna I-40126, Italy.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
21
|
Li B, Altelaar M, van Breukelen B. Identification of Protein Complexes by Integrating Protein Abundance and Interaction Features Using a Deep Learning Strategy. Int J Mol Sci 2023; 24:ijms24097884. [PMID: 37175590 PMCID: PMC10178578 DOI: 10.3390/ijms24097884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/23/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
Many essential cellular functions are carried out by multi-protein complexes that can be characterized by their protein-protein interactions. The interactions between protein subunits are critically dependent on the strengths of their interactions and their cellular abundances, both of which span orders of magnitude. Despite many efforts devoted to the global discovery of protein complexes by integrating large-scale protein abundance and interaction features, there is still room for improvement. Here, we integrated >7000 quantitative proteomic samples with three published affinity purification/co-fractionation mass spectrometry datasets into a deep learning framework to predict protein-protein interactions (PPIs), followed by the identification of protein complexes using a two-stage clustering strategy. Our deep-learning-technique-based classifier significantly outperformed recently published machine learning prediction models and in the process captured 5010 complexes containing over 9000 unique proteins. The vast majority of proteins in our predicted complexes exhibited low or no tissue specificity, which is an indication that the observed complexes tend to be ubiquitously expressed throughout all cell types and tissues. Interestingly, our combined approach increased the model sensitivity for low abundant proteins, which amongst other things allowed us to detect the interaction of MCM10, which connects to the replicative helicase complex via the MCM6 protein. The integration of protein abundances and their interaction features using a deep learning approach provided a comprehensive map of protein-protein interactions and a unique perspective on possible novel protein complexes.
Collapse
Affiliation(s)
- Bohui Li
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| | - Maarten Altelaar
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
- Mass Spectrometry and Proteomics Facility, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands
| | - Bas van Breukelen
- Biomolecular Mass Spectrometry and Proteomics, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Utrecht Institute for Pharmaceutical Sciences (UIPS), Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands
| |
Collapse
|
22
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|