1
|
Su T, Xia Y. A quantitative comparison of the deleteriousness of missense and nonsense mutations using the structurally resolved human protein interactome. Protein Sci 2025; 34:e70155. [PMID: 40384578 PMCID: PMC12086521 DOI: 10.1002/pro.70155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 04/02/2025] [Accepted: 04/22/2025] [Indexed: 05/20/2025]
Abstract
The complex genotype-to-phenotype relationships in Mendelian diseases can be elucidated by mutation-induced disturbances to the networks of molecular interactions (interactomes) in human cells. Missense and nonsense mutations cause distinct perturbations within the human protein interactome, leading to functional and phenotypic effects with varying degrees of severity. Here, we structurally resolve the human protein interactome at atomic-level resolutions and perform structural and thermodynamic calculations to assess the biophysical implications of these mutations. We focus on a specific type of missense mutation, known as "quasi-null" mutations, which destabilize proteins and cause similar functional consequences (node removal) to nonsense mutations. We propose a "fold difference" quantification of deleteriousness, which measures the ratio between the fractions of node-removal mutations in datasets of Mendelian disease-causing and non-pathogenic mutations. We estimate the fold differences of node-removal mutations to range from 3 (for quasi-null mutations with folding ΔΔG ≥2 kcal/mol) to 20 (for nonsense mutations). We observe a strong positive correlation between biophysical destabilization and phenotypic deleteriousness, demonstrating that the deleteriousness of quasi-null mutations spans a continuous spectrum, with nonsense mutations at the extreme (highly deleterious) end. Our findings substantiate the disparity in phenotypic severity between missense and nonsense mutations and suggest that mutation-induced protein destabilization is indicative of the phenotypic outcomes of missense mutations. Our analyses of node-removal mutations allow for the potential identification of proteins whose removal or destabilization lead to harmful phenotypes, enabling the development of targeted therapeutic approaches, and enhancing comprehension of the intricate mechanisms governing genotype-to-phenotype relationships in clinically relevant diseases.
Collapse
Affiliation(s)
- Ting‐Yi Su
- Graduate Program in Quantitative Life SciencesMcGill UniversityMontréalQuébecCanada
| | - Yu Xia
- Graduate Program in Quantitative Life SciencesMcGill UniversityMontréalQuébecCanada
- Department of BioengineeringMcGill UniversityMontréalQuébecCanada
| |
Collapse
|
2
|
Maier BD, Petursson B, Lussana A, Petsalaki E. Data-driven extraction of human kinase-substrate relationships from omics datasets. Mol Cell Proteomics 2025:100994. [PMID: 40381888 DOI: 10.1016/j.mcpro.2025.100994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 05/01/2025] [Accepted: 05/09/2025] [Indexed: 05/20/2025] Open
Abstract
Phosphorylation forms an important part of the signalling system that cells use for decision making and regulation of processes such as cell division and differentiation. In human, >90% of identified phosphosites don't have annotations regarding the relevant upstream kinase. At the same time around 30% of kinases (as annotated in Uniprot) have no known target. This knowledge gap stresses the need to make large scale, data-driven computational predictions. In this study, we have created a machine learning-based model to derive a probabilistic kinase-substrate network from omics datasets. Our methodology displays improved performance compared to other state-of-the-art kinase-substrate prediction methods and provides predictions for more kinases. Importantly, it better captures new experimentally-identified kinase-substrate relationships. It can therefore allow the improved prioritisation of kinase-substrate pairs for illuminating the dark human cell signalling space. Our model is integrated into a web server, SELPHI2.0, to allow unbiased analysis of phosphoproteomics data, facilitating the design of downstream experiments to uncover mechanisms of signal transduction across conditions and cellular contexts.
Collapse
Affiliation(s)
- Benjamin Dominik Maier
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Borgthor Petursson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Alessandro Lussana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Evangelia Petsalaki
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom.
| |
Collapse
|
3
|
Deritei D, Inuzuka H, Castaldi PJ, Yun JH, Xu Z, Anamika WJ, Asara JM, Guo F, Zhou X, Glass K, Wei W, Silverman EK. HHIP protein interactions in lung cells provide insight into COPD pathogenesis. Hum Mol Genet 2025; 34:777-789. [PMID: 39945347 PMCID: PMC12037150 DOI: 10.1093/hmg/ddaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 01/16/2025] [Accepted: 02/10/2025] [Indexed: 02/19/2025] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is the third leading cause of death worldwide. The primary causes of COPD are environmental, including cigarette smoking; however, genetic susceptibility also contributes to COPD risk. Genome-Wide Association Studies (GWASes) have revealed more than 80 genetic loci associated with COPD, leading to the identification of multiple COPD GWAS genes. However, the biological relationships between the identified COPD susceptibility genes are largely unknown. Genes associated with a complex disease are often in close network proximity, i.e. their protein products often interact directly with each other and/or similar proteins. In this study, we use affinity purification mass spectrometry (AP-MS) to identify protein interactions with HHIP, a well-established COPD GWAS gene which is part of the sonic hedgehog pathway, in two disease-relevant lung cell lines (IMR90 and 16HBE). To better understand the network neighborhood of HHIP, its proximity to the protein products of other COPD GWAS genes, and its functional role in COPD pathogenesis, we create HUBRIS, a protein-protein interaction network compiled from 8 publicly available databases. We identified both common and cell type-specific protein-protein interactors of HHIP. We find that our newly identified interactions shorten the network distance between HHIP and the protein products of several COPD GWAS genes, including DSP, MFAP2, TET2, and FBLN5. These new shorter paths include proteins that are encoded by genes involved in extracellular matrix and tissue organization. We found and validated interactions to proteins that provide new insights into COPD pathobiology, including CAVIN1 (IMR90) and TP53 (16HBE). The newly discovered HHIP interactions with CAVIN1 and TP53 implicate HHIP in response to oxidative stress.
Collapse
Affiliation(s)
- Dávid Deritei
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Hiroyuki Inuzuka
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Jeong Hyun Yun
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Zhonghui Xu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Wardatul Jannat Anamika
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - John M Asara
- Division of Signal Transduction, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Feng Guo
- Jiangsu Key Laboratory of Immunity and Metabolism, Jiangsu International Laboratory of Immunity and Metabolism, Department of Pathogen Biology and Immunology, Xuzhou Medical University, Yunlong District, Xuzhou, Jiangsu 221004, China
| | - Xiaobo Zhou
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| | - Wenyi Wei
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Avenue, Boston, MA 02215, United States
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, United States
| |
Collapse
|
4
|
Strom JM, Luck K. Bias in, bias out - AlphaFold-Multimer and the structural complexity of protein interfaces. Curr Opin Struct Biol 2025; 91:103002. [PMID: 39938238 DOI: 10.1016/j.sbi.2025.103002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 11/28/2024] [Accepted: 01/22/2025] [Indexed: 02/14/2025]
Abstract
A structural understanding of protein-protein interactions is a key component of many facets of applied molecular biology research. AlphaFold-Multimer (AF-MM) provided a breakthrough in the ability to predict protein-protein interface structure. However, the available training data for this model and the resulting benchmarking and validation efforts show a bias toward interactions between more ordered regions of proteins. Here we highlight some of the successes and limitations of AF-MM and discuss available methods and future directions to enable balanced prediction of all interface types.
Collapse
Affiliation(s)
- Joelle Morgan Strom
- Institute of Molecular Biology (IMB) gGmbH, Ackermannweg 4, Mainz 55128, Germany.
| | - Katja Luck
- Institute of Molecular Biology (IMB) gGmbH, Ackermannweg 4, Mainz 55128, Germany.
| |
Collapse
|
5
|
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins 2025; 93:400-410. [PMID: 37850517 PMCID: PMC11623436 DOI: 10.1002/prot.26614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| | - Maxim Tsenkov
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| |
Collapse
|
6
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. A structurally informed human protein-protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat Biotechnol 2024:10.1038/s41587-024-02428-4. [PMID: 39448882 DOI: 10.1038/s41587-024-02428-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 09/11/2024] [Indexed: 10/26/2024]
Abstract
To assist the translation of genetic findings to disease pathobiology and therapeutics discovery, we present an ensemble deep learning framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that predicts protein-binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms to generate comprehensive structurally informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods and experimentally validate its predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces and explore their impact on disease prognosis and drug responses. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from analysis of approximately 11,000 whole exomes across 33 cancer types and show significant associations of oncoPPIs with patient survival and drug responses. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Grants
- R01GM124559 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM125639 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01GM130885 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- RM1GM139738 U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
- R01DK115398 U.S. Department of Health & Human Services | NIH | National Institute of Diabetes and Digestive and Kidney Diseases (National Institute of Diabetes & Digestive & Kidney Diseases)
- U01HG007691 U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- R01HL155107 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL155096 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- R01HL166137 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- U54HL119145 U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute (NHLBI)
- AHA957729 American Heart Association (American Heart Association, Inc.)
- 24MERIT1185447 American Heart Association (American Heart Association, Inc.)
- R01AG084250 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R56AG074001 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- U01AG073323 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG066707 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG076448 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R01AG082118 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1AG082211 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- R21AG083003 U.S. Department of Health & Human Services | NIH | National Institute on Aging (U.S. National Institute on Aging)
- RF1NS133812 U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Yunguang Qiu
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY, USA
| | - Yadi Zhou
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
- Biophysics Program, Cornell University, Ithaca, NY, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA
| | - Charis Eng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA.
- Center for Innovative Proteomics, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
7
|
Dörig C, Marulli C, Peskett T, Volkmar N, Pantolini L, Studer G, Paleari C, Frommelt F, Schwede T, de Souza N, Barral Y, Picotti P. Global profiling of protein complex dynamics with an experimental library of protein interaction markers. Nat Biotechnol 2024:10.1038/s41587-024-02432-8. [PMID: 39415059 DOI: 10.1038/s41587-024-02432-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 09/16/2024] [Indexed: 10/18/2024]
Abstract
Methods to systematically monitor protein complex dynamics are needed. We introduce serial ultrafiltration combined with limited proteolysis-coupled mass spectrometry (FLiP-MS), a structural proteomics workflow that generates a library of peptide markers specific to changes in PPIs by probing differences in protease susceptibility between complex-bound and monomeric forms of proteins. The library includes markers mapping to protein-binding interfaces and markers reporting on structural changes that accompany PPI changes. Integrating the marker library with LiP-MS data allows for global profiling of protein-protein interactions (PPIs) from unfractionated lysates. We apply FLiP-MS to Saccharomyces cerevisiae and probe changes in protein complex dynamics after DNA replication stress, identifying links between Spt-Ada-Gcn5 acetyltransferase activity and the assembly state of several complexes. FLiP-MS enables protein complex dynamics to be probed on any perturbation, proteome-wide, at high throughput, with peptide-level structural resolution and informing on occupancy of binding interfaces, thus providing both global and molecular views of a system under study.
Collapse
Affiliation(s)
- Christian Dörig
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Cathy Marulli
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Thomas Peskett
- Institute of Biochemistry, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Norbert Volkmar
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Lorenzo Pantolini
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Camilla Paleari
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Fabian Frommelt
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Natalie de Souza
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Yves Barral
- Institute of Biochemistry, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Paola Picotti
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
8
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. EMBO J 2024; 43:4720-4751. [PMID: 39256561 PMCID: PMC11480408 DOI: 10.1038/s44318-024-00200-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 09/12/2024] Open
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known. Here, we use Saccharomyces cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, the resulting tyrosine phosphorylation is biologically spurious. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3500 proteins. The number of spurious pY sites generated correlates strongly with decreased growth, and we predict over 1000 pY events to be deleterious. However, we also find that many of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with tyrosine kinases. Our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada.
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada.
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada.
- Department of Biology, Université Laval, Québec, QC, Canada.
| |
Collapse
|
9
|
Alam R, Mahbub S, Bayzid MS. Pair-EGRET: enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models. Bioinformatics 2024; 40:btae588. [PMID: 39360982 PMCID: PMC11495673 DOI: 10.1093/bioinformatics/btae588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 09/03/2024] [Accepted: 10/01/2024] [Indexed: 10/05/2024] Open
Abstract
MOTIVATION Proteins are responsible for most biological functions, many of which require the interaction of more than one protein molecule. However, accurately predicting protein-protein interaction (PPI) sites (the interfacial residues of a protein that interact with other protein molecules) remains a challenge. The growing demand and cost associated with the reliable identification of PPI sites using conventional experimental methods call for computational tools for automated prediction and understanding of PPIs. RESULTS We present Pair-EGRET, an edge-aggregated graph attention network that leverages the features extracted from pretrained transformer-like models to accurately predict PPI sites. Pair-EGRET works on a k-nearest neighbor graph, representing the 3D structure of a protein, and utilizes the cross-attention mechanism for accurate identification of interfacial residues of a pair of proteins. Through an extensive evaluation study using a diverse array of experimental data, evaluation metrics, and case studies on representative protein sequences, we demonstrate that Pair-EGRET can achieve remarkable performance in predicting PPI sites. Moreover, Pair-EGRET can provide interpretable insights from the learned cross-attention matrix. AVAILABILITY AND IMPLEMENTATION Pair-EGRET is freely available in open source form at the GitHub Repository https://github.com/1705004/Pair-EGRET.
Collapse
Affiliation(s)
- Ramisa Alam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
10
|
Mier P, Andrade-Navarro MA. Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context. Heliyon 2024; 10:e37861. [PMID: 39323775 PMCID: PMC11422028 DOI: 10.1016/j.heliyon.2024.e37861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024] Open
Abstract
Homorepeats, specifically polyglutamine (polyQ) and polyalanine (polyA), are often implicated in protein-protein interactions (PPIs). So far, a method to predict the participation of homorepeats in protein interactions is lacking. We propose a machine learning approach to identify PPI-involved polyQ and polyA regions within the human proteome based on known interacting regions. Using the dataset of human homorepeats, we identified 157 polyQ and 745 polyA regions potentially involved in PPIs. Machine learning models, trained on amino acid context and homorepeat length, demonstrated high precision (0.90-0.98) but variable recall (0.42-0.85). Random forest outperformed other models (AUC polyQ = 0.686, AUC polyA = 0.732) using the positions surrounding the homorepeat -10 to +10. Integrating paralog information marginally improved predictions but was excluded for model simplicity. Further optimization revealed that for polyQ, using amino acid surrounding positions from -6 to +6 increased AUC to 0.715. For polyA, no improvement was found. Incorporating coiled coil overlap information enhanced polyA predictions (AUC = 0.745) but not polyQ. Finally, we applied these models to predict PPI involvement across all polyQ and polyA regions, identifying potential interactions. Case studies illustrated the method's predictive capacity, highlighting known interacting regions with high scores and elucidating potential false negatives.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
11
|
Gao K, Cao W, He Z, Liu L, Guo J, Dong L, Song J, Wu Y, Zhao Y. Network medicine analysis for dissecting the therapeutic mechanism of consensus TCM formulae in treating hepatocellular carcinoma with different TCM syndromes. Front Endocrinol (Lausanne) 2024; 15:1373054. [PMID: 39211446 PMCID: PMC11357915 DOI: 10.3389/fendo.2024.1373054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 07/26/2024] [Indexed: 09/04/2024] Open
Abstract
Introduction Hepatocellular carcinoma (HCC) is a major cause of cancer-related mortality worldwide. Traditional Chinese Medicine (TCM) is widely utilized as an adjunct therapy, improving patient survival and quality of life. TCM categorizes HCC into five distinct syndromes, each treated with specific herbal formulae. However, the molecular mechanisms underlying these treatments remain unclear. Methods We employed a network medicine approach to explore the therapeutic mechanisms of TCM in HCC. By constructing a protein-protein interaction (PPI) network, we integrated genes associated with TCM syndromes and their corresponding herbal formulae. This allowed for a quantitative analysis of the topological and functional relationships between TCM syndromes, HCC, and the specific formulae used for treatment. Results Our findings revealed that genes related to the five TCM syndromes were closely associated with HCC-related genes within the PPI network. The gene sets corresponding to the five TCM formulae exhibited significant proximity to HCC and its related syndromes, suggesting the efficacy of TCM syndrome differentiation and treatment. Additionally, through a random walk algorithm applied to a heterogeneous network, we prioritized active herbal ingredients, with results confirmed by literature. Discussion The identification of these key compounds underscores the potential of network medicine to unravel the complex pharmacological actions of TCM. This study provides a molecular basis for TCM's therapeutic strategies in HCC and highlights specific herbal ingredients as potential leads for drug development and precision medicine.
Collapse
Affiliation(s)
- Kai Gao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - WanChen Cao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - ZiHao He
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - Liu Liu
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - JinCheng Guo
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - Lei Dong
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
| | - Jini Song
- New York Institute of Technology College of Osteopathic Medicine, Arkansas State University, Jonesboro, AR, United States
| | - Yang Wu
- The Research Center for Ubiquitous Computing Systems (CUbiCS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Yi Zhao
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Chaoyang District, Beijing, China
- The Research Center for Ubiquitous Computing Systems (CUbiCS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
12
|
Beltrao P, Van Den Bossche T, Gabriels R, Holstein T, Kockmann T, Nameni A, Panse C, Schlapbach R, Lautenbacher L, Mattanovich M, Nesvizhskii A, Van Puyvelde B, Scheid J, Schwämmle V, Strauss M, Susmelj AK, The M, Webel H, Wilhelm M, Winkelhardt D, Wolski WE, Xi M. Proceedings of the EuBIC-MS developers meeting 2023. J Proteomics 2024; 305:105246. [PMID: 38964537 DOI: 10.1016/j.jprot.2024.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 06/19/2024] [Accepted: 06/27/2024] [Indexed: 07/06/2024]
Abstract
The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on "Artificial Intelligence in proteomics" to stimulate future directions in the MS-driven omics areas. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts and actively contribute to highly relevant research projects. We successfully produced several new tools applicable to the proteomics community by improving data analysis and facilitating future research.
Collapse
Affiliation(s)
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, 9052 Zwijnaarde, Belgium; Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, 9052 Zwijnaarde, Belgium; Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tanja Holstein
- VIB-UGent Center for Medical Biotechnology, 9052 Zwijnaarde, Belgium; Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tobias Kockmann
- Functional Genomics Center Zürich, ETH Zürich/University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Alireza Nameni
- VIB-UGent Center for Medical Biotechnology, 9052 Zwijnaarde, Belgium; Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Christian Panse
- Functional Genomics Center Zürich, ETH Zürich/University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland; Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, 1015 Lausanne, Switzerland.
| | - Ralph Schlapbach
- Functional Genomics Center Zürich, ETH Zürich/University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, D - 85354 Freising, Germany
| | - Matthias Mattanovich
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark
| | - Alexey Nesvizhskii
- Departments of Pathology and Computational Medicine and Bioinfoirmatics, University of Michigan, Ann Arbor, MI 48105, USA
| | - Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ottergemsesteenweg 460, BE-9000 Ghent, Belgium
| | - Jonas Scheid
- Department of Peptide-based Immunotherapy, Institute of Immunology, University and University hospital Tübingen, Auf der Morgenstelle 15, D-72076 Tübingen, Germany; Quantitative Biology Center (QBiC), University of Tübingen, Auf der Morgenstelle 10, D-72076 Tübingen, Germany
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Maximilian Strauss
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Matthew The
- TUM School of Life Sciences Technische Universität München, D - 85354 Freising, Germany
| | - Henry Webel
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 3, D - 85354 Freising, Germany
| | | | - Witold E Wolski
- Functional Genomics Center Zürich, ETH Zürich/University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland; Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, 1015 Lausanne, Switzerland
| | - Muyao Xi
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark
| |
Collapse
|
13
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
14
|
Geist JL, Lee CY, Strom JM, de Jesús Naveja J, Luck K. Generation of a high confidence set of domain-domain interface types to guide protein complex structure predictions by AlphaFold. Bioinformatics 2024; 40:btae482. [PMID: 39171834 PMCID: PMC11361816 DOI: 10.1093/bioinformatics/btae482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 07/10/2024] [Accepted: 08/20/2024] [Indexed: 08/23/2024] Open
Abstract
MOTIVATION While the release of AlphaFold (AF) represented a breakthrough for the prediction of protein complex structures, its sensitivity, especially when using full length protein sequences, still remains limited. Modeling success rates might increase if AF predictions were guided by likely interacting protein fragments. This approach requires available sets of highly confident protein-protein interface types. Computational resources, such as 3did, infer interacting globular domain types from observed contacts in protein structures. Assessing the accuracy of these predicted interface types is difficult because we lack hand-curated reference sets of verified domain-domain interface (DDI) types. RESULTS To improve protein complex modeling of DDIs by AF, we manually inspected 80 randomly selected DDI types from the 3did resource to generate a first reference set of DDI types. Identified cases of DDI type nonapproval (40%) primarily resulted from inaccurate Pfam domain matches, crystal contacts, and synthetic protein constructs. Using logistic regression, we predicted a subset of 2411 out of 5724 considered DDI types in 3did to be of high confidence, which we subsequently applied to 53 000 human-protein interactions to predict DDIs followed by AF modeling. We obtained highly confident AF models for 604 out of 1129 predicted DDIs. Of note, for 47% of them no confident AF structural model could be obtained using full length protein sequences. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/KatjaLuckLab/DDI_manuscript.
Collapse
Affiliation(s)
| | - Chop Yan Lee
- Institute of Molecular Biology (IMB) gGmbH, Mainz 55128, Germany
| | | | - José de Jesús Naveja
- Institute of Molecular Biology (IMB) gGmbH, Mainz 55128, Germany
- 3rd Medical Department, University Medical Center, Johannes Gutenberg University Mainz, Mainz 55131, Germany
- University Cancer Center, University Medical Center, Johannes Gutenberg University Mainz, Mainz 55131, Germany
| | - Katja Luck
- Institute of Molecular Biology (IMB) gGmbH, Mainz 55128, Germany
| |
Collapse
|
15
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
16
|
Volzhenin K, Bittner L, Carbone A. SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. iScience 2024; 27:110371. [PMID: 39055916 PMCID: PMC11269938 DOI: 10.1016/j.isci.2024.110371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/04/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open
Abstract
Ab initio computational reconstructions of protein-protein interaction (PPI) networks will provide invaluable insights into cellular systems, enabling the discovery of novel molecular interactions and elucidating biological mechanisms within and between organisms. Leveraging the latest generation protein language models and recurrent neural networks, we present SENSE-PPI, a sequence-based deep learning model that efficiently reconstructs ab initio PPIs, distinguishing partners among tens of thousands of proteins and identifying specific interactions within functionally similar proteins. SENSE-PPI demonstrates high accuracy, limited training requirements, and versatility in cross-species predictions, even with non-model organisms and human-virus interactions. Its performance decreases for phylogenetically more distant model and non-model organisms, but signal alteration is very slow. In this regard, it demonstrates the important role of parameters in protein language models. SENSE-PPI is very fast and can test 10,000 proteins against themselves in a matter of hours, enabling the reconstruction of genome-wide proteomes.
Collapse
Affiliation(s)
- Konstantin Volzhenin
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Lucie Bittner
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
- Institut Universitaire de France, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
17
|
Dapkūnas J, Timinskas A, Olechnovič K, Tomkuvienė M, Venclovas Č. PPI3D: a web server for searching, analyzing and modeling protein-protein, protein-peptide and protein-nucleic acid interactions. Nucleic Acids Res 2024; 52:W264-W271. [PMID: 38619046 PMCID: PMC11223826 DOI: 10.1093/nar/gkae278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 03/19/2024] [Accepted: 04/03/2024] [Indexed: 04/16/2024] Open
Abstract
Structure-resolved protein interactions with other proteins, peptides and nucleic acids are key for understanding molecular mechanisms. The PPI3D web server enables researchers to query preprocessed and clustered structural data, analyze the results and make homology-based inferences for protein interactions. PPI3D offers three interaction exploration modes: (i) all interactions for proteins homologous to the query, (ii) interactions between two proteins or their homologs and (iii) interactions within a specific PDB entry. The server allows interactive analysis of the identified interactions in both summarized and detailed manner. This includes protein annotations, structures, the interface residues and the corresponding contact surface areas. In addition, users can make inferences about residues at the interaction interface for the query protein(s) from the sequence alignments and homology models. The weekly updated PPI3D database includes all the interaction interfaces and binding sites from PDB, clustered based on both protein sequence and structural similarity, yielding non-redundant datasets without loss of alternative interaction modes. Consequently, the PPI3D users avoid being flooded with redundant information, a typical situation for intensely studied proteins. Furthermore, PPI3D provides a possibility to download user-defined sets of interaction interfaces and analyze them locally. The PPI3D web server is available at https://bioinformatics.lt/ppi3d.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Albertas Timinskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Miglė Tomkuvienė
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
18
|
Captur G, Doykov I, Chung SC, Field E, Barnes A, Zhang E, Heenan I, Norrish G, Moon JC, Elliott PM, Heywood WE, Mills K, Kaski JP. Novel Multiplexed Plasma Biomarker Panel Has Diagnostic and Prognostic Potential in Children With Hypertrophic Cardiomyopathy. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2024; 17:e004448. [PMID: 38847081 PMCID: PMC11188636 DOI: 10.1161/circgen.123.004448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 04/16/2024] [Indexed: 06/20/2024]
Abstract
BACKGROUND Hypertrophic cardiomyopathy (HCM) is defined clinically by pathological left ventricular hypertrophy. We have previously developed a plasma proteomics biomarker panel that correlates with clinical markers of disease severity and sudden cardiac death risk in adult patients with HCM. The aim of this study was to investigate the utility of adult biomarkers and perform new discoveries in proteomics for childhood-onset HCM. METHODS Fifty-nine protein biomarkers were identified from an exploratory plasma proteomics screen in children with HCM and augmented into our existing multiplexed targeted liquid chromatography-tandem/mass spectrometry-based assay. The association of these biomarkers with clinical phenotypes and outcomes was prospectively tested in plasma collected from 148 children with HCM and 50 healthy controls. Machine learning techniques were used to develop novel pediatric plasma proteomic biomarker panels. RESULTS Four previously identified adult HCM markers (aldolase fructose-bisphosphate A, complement C3a, talin-1, and thrombospondin 1) and 3 new markers (glycogen phosphorylase B, lipoprotein a and profilin 1) were elevated in pediatric HCM. Using supervised machine learning applied to training (n=137) and validation cohorts (n=61), this 7-biomarker panel differentiated HCM from healthy controls with an area under the curve of 1.0 in the training data set (sensitivity 100% [95% CI, 95-100]; specificity 100% [95% CI, 96-100]) and 0.82 in the validation data set (sensitivity 75% [95% CI, 59-86]; specificity 88% [95% CI, 75-94]). Reduced circulating levels of 4 other peptides (apolipoprotein L1, complement 5b, immunoglobulin heavy constant epsilon, and serum amyloid A4) found in children with high sudden cardiac death risk provided complete separation from the low and intermediate risk groups and predicted mortality and adverse arrhythmic outcomes (hazard ratio, 2.04 [95% CI, 1.0-4.2]; P=0.044). CONCLUSIONS In children, a 7-biomarker proteomics panel can distinguish HCM from controls with high sensitivity and specificity, and another 4-biomarker panel identifies those at high risk of adverse arrhythmic outcomes, including sudden cardiac death.
Collapse
Affiliation(s)
- Gabriella Captur
- UCL MRC Unit for Lifelong Health & Ageing, UCL, London, United Kingdom (G.C.)
- UCL Institute of Cardiovascular Science, UCL, London, United Kingdom (G.C., J.C.M., P.M.E.)
- The Royal Free Hospital, Centre for Inherited Heart Muscle Conditions, Cardiology Department, UCL, London, United Kingdom (G.C.)
| | - Ivan Doykov
- Translational Mass Spectrometry Research Group, UCL Institute of Child Health, London, United Kingdom (I.D., E.Z., W.E.H., K.M.)
| | - Sheng-Chia Chung
- UCL Institute of Health Informatics Research, Division of Infection and Immunity, London, United Kingdom (S.-C.C.)
| | - Ella Field
- Centre for Paediatric Inherited & Rare Cardiovascular Disease, Institute of Cardiovascular Science, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
- Centre for Inherited Cardiovascular Diseases, Great Ormond Street Hospital, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
| | - Annabelle Barnes
- Centre for Paediatric Inherited & Rare Cardiovascular Disease, Institute of Cardiovascular Science, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
- Centre for Inherited Cardiovascular Diseases, Great Ormond Street Hospital, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
| | - Enpei Zhang
- Translational Mass Spectrometry Research Group, UCL Institute of Child Health, London, United Kingdom (I.D., E.Z., W.E.H., K.M.)
- UCL Medical School, University College London, London, United Kingdom (E.Z.)
| | - Imogen Heenan
- Centre for Paediatric Inherited & Rare Cardiovascular Disease, Institute of Cardiovascular Science, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
- Centre for Inherited Cardiovascular Diseases, Great Ormond Street Hospital, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
| | - Gabrielle Norrish
- Centre for Paediatric Inherited & Rare Cardiovascular Disease, Institute of Cardiovascular Science, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
- Centre for Inherited Cardiovascular Diseases, Great Ormond Street Hospital, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
| | - James C. Moon
- Barts Heart Centre, the Cardiovascular Magnetic Resonance Unit, London, United Kingdom (J.C.M.)
| | - Perry M. Elliott
- Barts Heart Centre, the Inherited Cardiovascular Diseases Unit, St Bartholomew’s Hospital, London, United Kingdom (P.M.E.)
| | - Wendy E. Heywood
- Translational Mass Spectrometry Research Group, UCL Institute of Child Health, London, United Kingdom (I.D., E.Z., W.E.H., K.M.)
| | - Kevin Mills
- Translational Mass Spectrometry Research Group, UCL Institute of Child Health, London, United Kingdom (I.D., E.Z., W.E.H., K.M.)
| | - Juan Pablo Kaski
- Centre for Paediatric Inherited & Rare Cardiovascular Disease, Institute of Cardiovascular Science, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
- Centre for Inherited Cardiovascular Diseases, Great Ormond Street Hospital, London, United Kingdom (E.F., A.B., I.H., G.N., J.P.K.)
| |
Collapse
|
19
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
20
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
21
|
Deritei D, Inuzuka H, Castaldi PJ, Yun JH, Xu Z, Anamika WJ, Asara JM, Guo F, Zhou X, Glass K, Wei W, Silverman EK. HHIP protein interactions in lung cells provide insight into COPD pathogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.01.586839. [PMID: 38617310 PMCID: PMC11014494 DOI: 10.1101/2024.04.01.586839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Chronic obstructive pulmonary disease (COPD) is the third leading cause of death worldwide. The primary causes of COPD are environmental, including cigarette smoking; however, genetic susceptibility also contributes to COPD risk. Genome-Wide Association Studies (GWASes) have revealed more than 80 genetic loci associated with COPD, leading to the identification of multiple COPD GWAS genes. However, the biological relationships between the identified COPD susceptibility genes are largely unknown. Genes associated with a complex disease are often in close network proximity, i.e. their protein products often interact directly with each other and/or similar proteins. In this study, we use affinity purification mass spectrometry (AP-MS) to identify protein interactions with HHIP , a well-established COPD GWAS gene which is part of the sonic hedgehog pathway, in two disease-relevant lung cell lines (IMR90 and 16HBE). To better understand the network neighborhood of HHIP , its proximity to the protein products of other COPD GWAS genes, and its functional role in COPD pathogenesis, we create HUBRIS, a protein-protein interaction network compiled from 8 publicly available databases. We identified both common and cell type-specific protein-protein interactors of HHIP. We find that our newly identified interactions shorten the network distance between HHIP and the protein products of several COPD GWAS genes, including DSP, MFAP2, TET2 , and FBLN5 . These new shorter paths include proteins that are encoded by genes involved in extracellular matrix and tissue organization. We found and validated interactions to proteins that provide new insights into COPD pathobiology, including CAVIN1 (IMR90) and TP53 (16HBE). The newly discovered HHIP interactions with CAVIN1 and TP53 implicate HHIP in response to oxidative stress.
Collapse
|
22
|
Ruiz-Serra V, Valentini S, Madroñero S, Valencia A, Porta-Pardo E. 3Dmapper: a command line tool for BioBank-scale mapping of variants to protein structures. Bioinformatics 2024; 40:btae171. [PMID: 38565273 PMCID: PMC11018535 DOI: 10.1093/bioinformatics/btae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 02/09/2024] [Accepted: 03/30/2024] [Indexed: 04/04/2024] Open
Abstract
MOTIVATION The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures. However, most of these tools are web-based and not well-suited for large-scale genomic data analysis. RESULTS To address this issue, we introduce 3Dmapper, a stand-alone command-line tool developed in Python and R. It systematically maps annotated protein positions and variants to protein structures, providing a solution that is both efficient and reliable. AVAILABILITY AND IMPLEMENTATION https://github.com/vicruiser/3Dmapper.
Collapse
Affiliation(s)
- Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Sergi Madroñero
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)
- Institució Catalana de Recerca Avançada (ICREA)
| | - Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| |
Collapse
|
23
|
Lee CY, Hubrich D, Varga JK, Schäfer C, Welzel M, Schumbera E, Djokic M, Strom JM, Schönfeld J, Geist JL, Polat F, Gibson TJ, Keller Valsecchi CI, Kumar M, Schueler-Furman O, Luck K. Systematic discovery of protein interaction interfaces using AlphaFold and experimental validation. Mol Syst Biol 2024; 20:75-97. [PMID: 38225382 PMCID: PMC10883280 DOI: 10.1038/s44320-023-00005-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/17/2024] Open
Abstract
Structural resolution of protein interactions enables mechanistic and functional studies as well as interpretation of disease variants. However, structural data is still missing for most protein interactions because we lack computational and experimental tools at scale. This is particularly true for interactions mediated by short linear motifs occurring in disordered regions of proteins. We find that AlphaFold-Multimer predicts with high sensitivity but limited specificity structures of domain-motif interactions when using small protein fragments as input. Sensitivity decreased substantially when using long protein fragments or full length proteins. We delineated a protein fragmentation strategy particularly suited for the prediction of domain-motif interfaces and applied it to interactions between human proteins associated with neurodevelopmental disorders. This enabled the prediction of highly confident and likely disease-related novel interfaces, which we further experimentally corroborated for FBXO23-STX1B, STX1B-VAMP2, ESRRG-PSMC5, PEX3-PEX19, PEX3-PEX16, and SNRPB-GIGYF1 providing novel molecular insights for diverse biological processes. Our work highlights exciting perspectives, but also reveals clear limitations and the need for future developments to maximize the power of Alphafold-Multimer for interface predictions.
Collapse
Affiliation(s)
- Chop Yan Lee
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Dalmira Hubrich
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Julia K Varga
- Department of Microbiology and Molecular Genetics, Institute for Biomedical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112001, Israel
| | | | - Mareen Welzel
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Eric Schumbera
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
- Computational Biology and Data Mining Group Biozentrum I, 55128, Mainz, Germany
| | - Milena Djokic
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Joelle M Strom
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Jonas Schönfeld
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Johanna L Geist
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Feyza Polat
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany
| | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
| | | | - Manjeet Kumar
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Biomedical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, 9112001, Israel.
| | - Katja Luck
- Institute of Molecular Biology (IMB) gGmbH, 55128, Mainz, Germany.
| |
Collapse
|
24
|
Xiong D, Qiu Y, Zhao J, Zhou Y, Lee D, Gupta S, Torres M, Lu W, Liang S, Kang JJ, Eng C, Loscalzo J, Cheng F, Yu H. Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.04.24.538110. [PMID: 37162909 PMCID: PMC10168245 DOI: 10.1101/2023.04.24.538110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Human genome sequencing studies have identified numerous loci associated with complex diseases. However, translating human genetic and genomic findings to disease pathobiology and therapeutic discovery remains a major challenge at multiscale interactome network levels. Here, we present a deep-learning-based ensemble framework, termed PIONEER (Protein-protein InteractiOn iNtErfacE pRediction), that accurately predicts protein binding partner-specific interfaces for all known protein interactions in humans and seven other common model organisms, generating comprehensive structurally-informed protein interactomes. We demonstrate that PIONEER outperforms existing state-of-the-art methods. We further systematically validated PIONEER predictions experimentally through generating 2,395 mutations and testing their impact on 6,754 mutation-interaction pairs, confirming the high quality and validity of PIONEER predictions. We show that disease-associated mutations are enriched in PIONEER-predicted protein-protein interfaces after mapping mutations from ~60,000 germline exomes and ~36,000 somatic genomes. We identify 586 significant protein-protein interactions (PPIs) enriched with PIONEER-predicted interface somatic mutations (termed oncoPPIs) from pan-cancer analysis of ~11,000 tumor whole-exomes across 33 cancer types. We show that PIONEER-predicted oncoPPIs are significantly associated with patient survival and drug responses from both cancer cell lines and patient-derived xenograft mouse models. We identify a landscape of PPI-perturbing tumor alleles upon ubiquitination by E3 ligases, and we experimentally validate the tumorigenic KEAP1-NRF2 interface mutation p.Thr80Lys in non-small cell lung cancer. We show that PIONEER-predicted PPI-perturbing alleles alter protein abundance and correlates with drug responses and patient survival in colon and uterine cancers as demonstrated by proteogenomic data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium. PIONEER, implemented as both a web server platform and a software package, identifies functional consequences of disease-associated alleles and offers a deep learning tool for precision medicine at multiscale interactome network levels.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Yunguang Qiu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Junfei Zhao
- Department of Systems Biology, Herbert Irving Comprehensive Center, Columbia University, New York, NY 10032, USA
| | - Yadi Zhou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shobhita Gupta
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
- Biophysics Program, Cornell University, Ithaca, NY 14853, USA
| | - Mateo Torres
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Weiqiang Lu
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| | - Charis Eng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
- Center for Innovative Proteomics, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
25
|
Sebek M, Menichetti G. Network Science and Machine Learning for Precision Nutrition. PRECISION NUTRITION 2024:367-402. [DOI: 10.1016/b978-0-443-15315-0.00012-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
26
|
Kotev M, Diaz Gonzalez C. Molecular Dynamics and Other HPC Simulations for Drug Discovery. Methods Mol Biol 2024; 2716:265-291. [PMID: 37702944 DOI: 10.1007/978-1-0716-3449-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
High performance computing (HPC) is taking an increasingly important place in drug discovery. It makes possible the simulation of complex biochemical systems with high precision in a short time, thanks to the use of sophisticated algorithms. It promotes the advancement of knowledge in fields that are inaccessible or difficult to access through experimentation and it contributes to accelerating the discovery of drugs for unmet medical needs while reducing costs. Herein, we report how computational performance has evolved over the past years, and then we detail three domains where HPC is essential. Molecular dynamics (MD) is commonly used to explore the flexibility of proteins, thus generating a better understanding of different possible approaches to modulate their activity. Modeling and simulation of biopolymer complexes enables the study of protein-protein interactions (PPI) in healthy and disease states, thus helping the identification of targets of pharmacological interest. Virtual screening (VS) also benefits from HPC to predict in a short time, among millions or billions of virtual chemical compounds, the best potential ligands that will be tested in relevant assays to start a rational drug design process.
Collapse
Affiliation(s)
- Martin Kotev
- Evotec SE, Integrated Drug Discovery, Molecular Architects, Campus Curie, Toulouse, France
| | | |
Collapse
|
27
|
Zięba A, Matosiuk D. Sampling and Scoring in Protein-Protein Docking. Methods Mol Biol 2024; 2780:15-26. [PMID: 38987461 DOI: 10.1007/978-1-0716-3985-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein docking is considered one of the most important techniques supporting experimental proteomics. Recent developments in the field of computer science helped to improve this computational technique so that it better handles the complexity of protein nature. Sampling algorithms are responsible for the generation of numerous protein-protein ensembles. Unfortunately, a primary docking output comprises a set of both near-native poses and decoys. Application of the efficient scoring function helps to differentiate poses with the most favorable properties from those that are very unlikely to represent a natural state of the complex. This chapter explains the importance of sampling and scoring in the process of protein-protein docking. Moreover, it summarizes advances in the field.
Collapse
Affiliation(s)
- Agata Zięba
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland.
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
28
|
Liang JZ, Li DH, Xiao YC, Shi FJ, Zhong T, Liao QY, Wang Y, He QY. LAFEM: A Scoring Model to Evaluate Functional Landscape of Lysine Acetylome. Mol Cell Proteomics 2024; 23:100700. [PMID: 38104799 PMCID: PMC10828473 DOI: 10.1016/j.mcpro.2023.100700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 11/18/2023] [Accepted: 12/14/2023] [Indexed: 12/19/2023] Open
Abstract
Protein lysine acetylation is a critical post-translational modification involved in a wide range of biological processes. To date, about 20,000 acetylation sites of Homo sapiens were identified through mass spectrometry-based proteomic technology, but more than 95% of them have unclear functional annotations because of the lack of existing prioritization strategy to assess the functional importance of the acetylation sites on large scale. Hence, we established a lysine acetylation functional evaluating model (LAFEM) by considering eight critical features surrounding lysine acetylation site to high-throughput estimate the functional importance of given acetylation sites. This was achieved by selecting one of the random forest models with the best performance in 10-fold cross-validation on undersampled training dataset. The global analysis demonstrated that the molecular environment of acetylation sites with high acetylation functional scores (AFSs) mainly had the features of larger solvent-accessible surface area, stronger hydrogen bonding-donating abilities, near motif and domain, higher homology, and disordered degree. Importantly, LAFEM performed well in validation dataset and acetylome, showing good accuracy to screen out fitness directly relevant acetylation sites and assisting to explain the core reason for the difference between biological models from the perspective of acetylome. We further used cellular experiments to confirm that, in nuclear casein kinase and cyclin-dependent kinase substrate 1, acetyl-K35 with higher AFS was more important than acetyl-K9 with lower AFS in the proliferation of A549 cells. LAFEM provides a prioritization strategy to large scale discover the fitness directly relevant acetylation sites, which constitutes an unprecedented resource for better understanding of functional acetylome.
Collapse
Affiliation(s)
- Jun-Ze Liang
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - De-Hua Li
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Yong-Chun Xiao
- Department of Orthopedics, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Fu-Jin Shi
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Tairan Zhong
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Qian-Ying Liao
- IMEC-DistriNet Research Group, Department of Computer Science, KU Leuven, Leuven, Belgium
| | - Yang Wang
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China.
| | - Qing-Yu He
- MOE Key Laboratory of Tumor Molecular Biology and State Key Laboratory of Bioactive Molecules and Druggability Assessment, College of Life Science and Technology, Jinan University, Guangzhou, China.
| |
Collapse
|
29
|
Pascual‐Reguant L, Serra‐Camprubí Q, Datta D, Cianferoni D, Kourtis S, Gañez‐Zapater A, Cannatá C, Espinar L, Querol J, García‐López L, Musa‐Afaneh S, Guirola M, Gkanogiannis A, Miró Canturri A, Guzman M, Rodríguez O, Herencia‐Ropero A, Arribas J, Serra V, Serrano L, Tian TV, Peiró S, Sdelci S. Interactions between BRD4S, LOXL2, and MED1 drive cell cycle transcription in triple-negative breast cancer. EMBO Mol Med 2023; 15:e18459. [PMID: 37937685 PMCID: PMC10701626 DOI: 10.15252/emmm.202318459] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 11/09/2023] Open
Abstract
Triple-negative breast cancer (TNBC) often develops resistance to single-agent treatment, which can be circumvented using targeted combinatorial approaches. Here, we demonstrate that the simultaneous inhibition of LOXL2 and BRD4 synergistically limits TNBC proliferation in vitro and in vivo. Mechanistically, LOXL2 interacts in the nucleus with the short isoform of BRD4 (BRD4S), MED1, and the cell cycle transcriptional regulator B-MyB. These interactions sustain the formation of BRD4 and MED1 nuclear transcriptional foci and control cell cycle progression at the gene expression level. The pharmacological co-inhibition of LOXL2 and BRD4 reduces BRD4 nuclear foci, BRD4-MED1 colocalization, and the transcription of cell cycle genes, thus suppressing TNBC cell proliferation. Targeting the interaction between BRD4S and LOXL2 could be a starting point for the development of new anticancer strategies for the treatment of TNBC.
Collapse
Affiliation(s)
- Laura Pascual‐Reguant
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | | | - Debayan Datta
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Damiano Cianferoni
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Savvas Kourtis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Antoni Gañez‐Zapater
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Chiara Cannatá
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Lorena Espinar
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Jessica Querol
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | - Laura García‐López
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Sara Musa‐Afaneh
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Maria Guirola
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Anestis Gkanogiannis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Andrea Miró Canturri
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
- IMIM (Hospital del Mar Medical Research Institute)BarcelonaSpain
| | - Marta Guzman
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | - Olga Rodríguez
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | | | - Joaquin Arribas
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
- IMIM (Hospital del Mar Medical Research Institute)BarcelonaSpain
- Centro de Investigación Biomédica en Red de CáncerMonforte de LemosMadridSpain
- Department of Biochemistry and Molecular BiologyUniversitat Autónoma de BarcelonaBellaterraSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
| | - Violeta Serra
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| | - Tian V Tian
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | - Sandra Peiró
- Vall d'Hebron Institute of Oncology (VHIO)BarcelonaSpain
| | - Sara Sdelci
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyBarcelonaSpain
| |
Collapse
|
30
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
31
|
Kosoglu K, Aydin Z, Tuncbag N, Gursoy A, Keskin O. Structural coverage of the human interactome. Brief Bioinform 2023; 25:bbad496. [PMID: 38180828 PMCID: PMC10768791 DOI: 10.1093/bib/bbad496] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/16/2023] [Accepted: 11/30/2023] [Indexed: 01/07/2024] Open
Abstract
Complex biological processes in cells are embedded in the interactome, representing the complete set of protein-protein interactions. Mapping and analyzing the protein structures are essential to fully comprehending these processes' molecular details. Therefore, knowing the structural coverage of the interactome is important to show the current limitations. Structural modeling of protein-protein interactions requires accurate protein structures. In this study, we mapped all experimental structures to the reference human proteome. Later, we found the enrichment in structural coverage when complementary methods such as homology modeling and deep learning (AlphaFold) were included. We then collected the interactions from the literature and databases to form the reference human interactome, resulting in 117 897 non-redundant interactions. When we analyzed the structural coverage of the interactome, we found that the number of experimentally determined protein complex structures is scarce, corresponding to 3.95% of all binary interactions. We also analyzed known and modeled structures to potentially construct the structural interactome with a docking method. Our analysis showed that 12.97% of the interactions from HuRI and 73.62% and 32.94% from the filtered versions of STRING and HIPPIE could potentially be modeled with high structural coverage or accuracy, respectively. Overall, this paper provides an overview of the current state of structural coverage of the human proteome and interactome.
Collapse
Affiliation(s)
- Kayra Kosoglu
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Zeynep Aydin
- Computational Sciences and Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Nurcan Tuncbag
- School of Medicine, Koc University, 34450 Istanbul, Turkey
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Attila Gursoy
- Department of Computer Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| | - Ozlem Keskin
- Department of Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey
| |
Collapse
|
32
|
Liu Y, Yang J, Wang T, Luo M, Chen Y, Chen C, Ronai Z, Zhou Y, Ruppin E, Han L. Expanding PROTACtable genome universe of E3 ligases. Nat Commun 2023; 14:6509. [PMID: 37845222 PMCID: PMC10579327 DOI: 10.1038/s41467-023-42233-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 09/28/2023] [Indexed: 10/18/2023] Open
Abstract
Proteolysis-targeting chimera (PROTAC) and other targeted protein degradation (TPD) molecules that induce degradation by the ubiquitin-proteasome system (UPS) offer new opportunities to engage targets that remain challenging to be inhibited by conventional small molecules. One fundamental element in the degradation process is the E3 ligase. However, less than 2% amongst hundreds of E3 ligases in the human genome have been engaged in current studies in the TPD field, calling for the recruiting of additional ones to further enhance the therapeutic potential of TPD. To accelerate the development of PROTACs utilizing under-explored E3 ligases, we systematically characterize E3 ligases from seven different aspects, including chemical ligandability, expression patterns, protein-protein interactions (PPI), structure availability, functional essentiality, cellular location, and PPI interface by analyzing 30 large-scale data sets. Our analysis uncovers several E3 ligases as promising extant PROTACs. In total, combining confidence score, ligandability, expression pattern, and PPI, we identified 76 E3 ligases as PROTAC-interacting candidates. We develop a user-friendly and flexible web portal ( https://hanlaboratory.com/E3Atlas/ ) aimed at assisting researchers to rapidly identify E3 ligases with promising TPD activities against specifically desired targets, facilitating the development of these therapies in cancer and beyond.
Collapse
Affiliation(s)
- Yuan Liu
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Jingwen Yang
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Tianlu Wang
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Mei Luo
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA
| | - Yamei Chen
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Chengxuan Chen
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Ze'ev Ronai
- Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Yubin Zhou
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
- Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, 20892, MD, USA.
| | - Leng Han
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN, USA.
- Brown Center for Immunotherapy, School of Medicine, Indiana University, Indianapolis, IN, USA.
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA.
- Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX, USA.
| |
Collapse
|
33
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561337. [PMID: 37873463 PMCID: PMC10592693 DOI: 10.1101/2023.10.08.561337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known, but quantifying this is required to understand the constraints faced by cell systems as they evolve. Here, we use the model organism S. cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, most of the resulting tyrosine phosphorylation is spurious. This provides a suitable system to measure the impact of artificial protein interactions on fitness. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3,500 proteins. Examination of the fitness costs in each strain revealed a strong correlation between the number of spurious pY sites and decreased growth. Moreover, the analysis of pY effects on protein structure and on protein function revealed over 1000 pY events that we predict to be deleterious. However, we also find that a large number of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with bona fide tyrosine kinases. Taken together, our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| |
Collapse
|
34
|
Buljan M, Banaei-Esfahani A, Blattmann P, Meier-Abt F, Shao W, Vitek O, Tang H, Aebersold R. A computational framework for the inference of protein complex remodeling from whole-proteome measurements. Nat Methods 2023; 20:1523-1529. [PMID: 37749212 PMCID: PMC10555833 DOI: 10.1038/s41592-023-02011-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 08/16/2023] [Indexed: 09/27/2023]
Abstract
Protein complexes are responsible for the enactment of most cellular functions. For the protein complex to form and function, its subunits often need to be present at defined quantitative ratios. Typically, global changes in protein complex composition are assessed with experimental approaches that tend to be time consuming. Here, we have developed a computational algorithm for the detection of altered protein complexes based on the systematic assessment of subunit ratios from quantitative proteomic measurements. We applied it to measurements from breast cancer cell lines and patient biopsies and were able to identify strong remodeling of HDAC2 epigenetic complexes in more aggressive forms of cancer. The presented algorithm is available as an R package and enables the inference of changes in protein complex states by extracting functionally relevant information from bottom-up proteomic datasets.
Collapse
Affiliation(s)
- Marija Buljan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
- EMPA, Swiss Federal Laboratories for Materials Science and Technology, St Gallen, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Amir Banaei-Esfahani
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
| | - Peter Blattmann
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Idorsia Pharmaceuticals, Allschwil, Switzerland
| | - Fabienne Meier-Abt
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- Department of Medical Oncology and Hematology, University and University Hospital Zurich, Zurich, Switzerland
- Institute of Medical Genetics, University of Zurich, Zurich, Switzerland
| | - Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
- Faculty of Science, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
35
|
Zhang L, Wang S, Hou J, Si D, Zhu J, Cao R. ComplexQA: a deep graph learning approach for protein complex structure assessment. Brief Bioinform 2023; 24:bbad287. [PMID: 37930021 DOI: 10.1093/bib/bbad287] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/09/2023] [Accepted: 07/24/2023] [Indexed: 11/07/2023] Open
Abstract
MOTIVATION In recent years, the end-to-end deep learning method for single-chain protein structure prediction has achieved high accuracy. For example, the state-of-the-art method AlphaFold, developed by Google, has largely increased the accuracy of protein structure predictions to near experimental accuracy in some of the cases. At the same time, there are few methods that can evaluate the quality of protein complexes at the residue level. In particular, evaluating the quality of residues at the interface of protein complexes can lead to a wide range of applications, such as protein function analysis and drug design. In this paper, we introduce a new deep graph neural network-based method ComplexQA, to evaluate the local quality of interfaces for protein complexes by utilizing the residue-level structural information in 3D space and the sequence-level constraints. RESULTS We benchmark our method to other state-of-the-art quality assessment approaches on the HAF2 and DBM55-AF2 datasets (high-quality structural models predicted by AlphaFold-Multimer), and the BM5 docking dataset. The experimental results show that our proposed method achieves better or similar performance compared with other state-of-the-art methods, especially on difficult targets which only contain a few acceptable models. Our method is able to suggest a score for each interfac e residue, which demonstrates a powerful assessment tool for the ever-increasing number of protein complexes. AVAILABILITY https://github.com/Cao-Labs/ComplexQA.git. Contact: caora@plu.edu.
Collapse
Affiliation(s)
- Lei Zhang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Sheng Wang
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint. Louis, 63103, MO, USA
| | - Dong Si
- Division of Computing and Software Systems, University of Washington Bothell, Bothell, 98011, WA, USA
| | - Junyong Zhu
- Department of Computer Science and Technology, AnHui University, Hefei, 230601, Anhui, China
| | - Renzhi Cao
- Department of Humanities, Pacific Lutheran University, Tacoma, 98447, WA, USA
| |
Collapse
|
36
|
Zsidó BZ, Bayarsaikhan B, Börzsei R, Hetényi C. Construction of Histone-Protein Complex Structures by Peptide Growing. Int J Mol Sci 2023; 24:13831. [PMID: 37762134 PMCID: PMC10530865 DOI: 10.3390/ijms241813831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/04/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
The structures of histone complexes are master keys to epigenetics. Linear histone peptide tails often bind to shallow pockets of reader proteins via weak interactions, rendering their structure determination challenging. In the present study, a new protocol, PepGrow, is introduced. PepGrow uses docked histone fragments as seeds and grows the full peptide tails in the reader-binding pocket, producing atomic-resolution structures of histone-reader complexes. PepGrow is able to handle the flexibility of histone peptides, and it is demonstrated to be more efficient than linking pre-docked peptide fragments. The new protocol combines the advantages of popular program packages and allows fast generation of solution structures. AutoDock, a force-field-based program, is used to supply the docked peptide fragments used as structural seeds, and the building algorithm of Modeller is adopted and tested as a peptide growing engine. The performance of PepGrow is compared to ten other docking methods, and it is concluded that in situ growing of a ligand from a seed is a viable strategy for the production of complex structures of histone peptides at atomic resolution.
Collapse
Affiliation(s)
| | | | | | - Csaba Hetényi
- Pharmacoinformatics Unit, Department of Pharmacology and Pharmacotherapy, Medical School, University of Pécs, Szigeti Út 12, 7624 Pécs, Hungary; (B.Z.Z.); (B.B.); (R.B.)
| |
Collapse
|
37
|
Mohseni Behbahani Y, Saighi P, Corsi F, Laine E, Carbone A. LEVELNET to visualize, explore, and compare protein-protein interaction networks. Proteomics 2023; 23:e2200159. [PMID: 37403279 DOI: 10.1002/pmic.202200159] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 04/27/2023] [Accepted: 04/28/2023] [Indexed: 07/06/2023]
Abstract
Physical interactions between proteins are central to all biological processes. Yet, the current knowledge of who interacts with whom in the cell and in what manner relies on partial, noisy, and highly heterogeneous data. Thus, there is a need for methods comprehensively describing and organizing such data. LEVELNET is a versatile and interactive tool for visualizing, exploring, and comparing protein-protein interaction (PPI) networks inferred from different types of evidence. LEVELNET helps to break down the complexity of PPI networks by representing them as multi-layered graphs and by facilitating the direct comparison of their subnetworks toward biological interpretation. It focuses primarily on the protein chains whose 3D structures are available in the Protein Data Bank. We showcase some potential applications, such as investigating the structural evidence supporting PPIs associated to specific biological processes, assessing the co-localization of interaction partners, comparing the PPI networks obtained through computational experiments versus homology transfer, and creating PPI benchmarks with desired properties.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Paul Saighi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Flavia Corsi
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris, France
| |
Collapse
|
38
|
Petrey D, Zhao H, Trudeau SJ, Murray D, Honig B. PrePPI: A Structure Informed Proteome-wide Database of Protein-Protein Interactions. J Mol Biol 2023; 435:168052. [PMID: 36933822 PMCID: PMC10293085 DOI: 10.1016/j.jmb.2023.168052] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/09/2023] [Accepted: 03/10/2023] [Indexed: 03/18/2023]
Abstract
We present an updated version of the Predicting Protein-Protein Interactions (PrePPI) webserver which predicts PPIs on a proteome-wide scale. PrePPI combines structural and non-structural evidence within a Bayesian framework to compute a likelihood ratio (LR) for essentially every possible pair of proteins in a proteome; the current database is for the human interactome. The structural modeling (SM) component is derived from template-based modeling and its application on a proteome-wide scale is enabled by a unique scoring function used to evaluate a putative complex. The updated version of PrePPI leverages AlphaFold structures that are parsed into individual domains. As has been demonstrated in earlier applications, PrePPI performs extremely well as measured by receiver operating characteristic curves derived from testing on E. coli and human protein-protein interaction (PPI) databases. A PrePPI database of ∼1.3 million human PPIs can be queried with a webserver application that comprises multiple functionalities for examining query proteins, template complexes, 3D models for predicted complexes, and related features (https://honiglab.c2b2.columbia.edu/PrePPI). PrePPI is a state-of-the-art resource that offers an unprecedented structure-informed view of the human interactome.
Collapse
Affiliation(s)
- Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Stephen J Trudeau
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Medicine, Columbia University, New York, NY 10032, USA; Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
39
|
Pennica C, Hanna G, Islam SA, Sternberg MJE, David A. Missense3D-PPI: A Web Resource to Predict the Impact of Missense Variants at Protein Interfaces Using 3D Structural Data. J Mol Biol 2023; 435:168060. [PMID: 37356905 PMCID: PMC7617523 DOI: 10.1016/j.jmb.2023.168060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 03/19/2023] [Accepted: 03/21/2023] [Indexed: 03/30/2023]
Abstract
In 2019, we released Missense3D which identifies stereochemical features that are disrupted by a missense variant, such as introducing a buried charge. Missense3D analyses the effect of a missense variant on a single structure and thus may fail to identify as damaging surface variants disrupting a protein interface i.e., a protein-protein interaction (PPI) site. Here we present Missense3D-PPI designed to predict missense variants at PPI interfaces. Our development dataset comprised of 1,279 missense variants (pathogenic n = 733, benign n = 546) in 434 proteins and 545 experimental structures of PPI complexes. Benchmarking of Missense3D-PPI was performed after dividing the dataset in training (320 benign and 320 pathogenic variants) and testing (226 benign and 413 pathogenic). Structural features affecting PPI, such as disruption of interchain bonds and introduction of unbalanced charged interface residues, were analysed to assess the impact of the variant at PPI. The performance of Missense3D-PPI was superior to that of Missense3D: sensitivity 44 % versus 8% and accuracy 58% versus 40%, p = 4.23 × 10-16. However, the specificity of Missense3D-PPI was lower compared to Missense3D (84% versus 98%). On our dataset, Missense3D-PPI's accuracy was superior to BeAtMuSiC (p = 3.4 × 10-5), mCSM-PPI2 (p = 1.5 × 10-12) and MutaBind2 (p = 0.0025). Missense3D-PPI represents a valuable tool for predicting the structural effect of missense variants on biological protein networks and is available at the Missense3D web portal (http://missense3d.bc.ic.ac.uk).
Collapse
Affiliation(s)
- Cecilia Pennica
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Gordon Hanna
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Suhail A Islam
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
40
|
Zheng J, Zheng Z, Fu C, Weng Y, He A, Ye X, Gao W, Tian R. Deciphering intercellular signaling complexes by interaction-guided chemical proteomics. Nat Commun 2023; 14:4138. [PMID: 37438365 DOI: 10.1038/s41467-023-39881-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 06/27/2023] [Indexed: 07/14/2023] Open
Abstract
Indirect cell-cell interactions mediated by secreted proteins and their plasma membrane receptors play essential roles for regulating intercellular signaling. However, systematic profiling of the interactions between living cell surface receptors and secretome from neighboring cells remains challenging. Here we develop a chemical proteomics approach, termed interaction-guided crosslinking (IGC), to identify ligand-receptor interactions in situ. By introducing glycan-based ligation and click chemistry, the IGC approach via glycan-to-glycan crosslinking successfully captures receptors from as few as 0.1 million living cells using only 10 ng of secreted ligand. The unparalleled sensitivity and selectivity allow systematic crosslinking and identification of ligand-receptor complexes formed between cell secretome and surfaceome in an unbiased and all-to-all manner, leading to the discovery of a ligand-receptor interaction between pancreatic cancer cell-secreted urokinase (PLAU) and neuropilin 1 (NRP1) on pancreatic cancer-associated fibroblasts. This approach is thus useful for systematic exploring new ligand-receptor pairs and discovering critical intercellular signaling events.
Collapse
Affiliation(s)
- Jiangnan Zheng
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.
| | - Zhendong Zheng
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
- School of Environment, Harbin Institute of Technology, Harbin, 150090, China
| | - Changying Fu
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Yicheng Weng
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - An He
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xueting Ye
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Weina Gao
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Ruijun Tian
- Department of Chemistry, School of Science, Southern University of Science and Technology, Shenzhen, 518055, China.
- Research Center for Chemical Biology and Omics Analysis, School of Science, Southern University of Science and Technology, 1088 Xueyuan Road, Shenzhen, 518055, China.
| |
Collapse
|
41
|
Mo J, Li Z, Chen H, Lu Z, Ding B, Yuan X, Liu Y, Zhu W. Network medicine framework identified drug-repurposing opportunities of pharmaco-active compounds of Angelica acutiloba (Siebold & Zucc.) Kitag. for skin aging. Aging (Albany NY) 2023; 15:5144-5163. [PMID: 37310405 PMCID: PMC10292898 DOI: 10.18632/aging.204789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/15/2023] [Indexed: 06/14/2023]
Abstract
Increasing incidence of skin aging has highlighted the importance of identifying effective drugs with repurposed opportunities for skin aging. We aimed to identify pharmaco-active compounds with drug-repurposing opportunities for skin aging from Angelica acutiloba (Siebold & Zucc.) Kitag. (AAK). The proximity of network medicine framework (NMF) firstly identified 8 key AAK compounds with repurposed opportunities for skin aging, which may exert by regulating 29 differentially expressed genes (DGEs) of skin aging, including 13 up-regulated targets and 16 down-regulated targets. Connectivity MAP (cMAP) analysis revealed 8 key compounds were involved in regulating the process of cell proliferation and apoptosis, mitochondrial energy metabolism and oxidative stress of skin aging. Molecular docking analysis showed that 8 key compounds had a high docked ability with AR, BCHE, HPGD and PI3, which were identified as specific biomarker for the diagnosis of skin aging. Finally, the mechanisms of these key compounds were predicted to be involved in inhibiting autophagy pathway and activating Phospholipase D signaling pathway. In conclusion, this study firstly elucidated the drug-repurposing opportunities of AAK compounds for skin aging, providing a theoretical reference for identifying repurposing drugs from Chinese medicine and new insights for our future research.
Collapse
Affiliation(s)
- Jiaxin Mo
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou Province 510006, China
| | - Zunjiang Li
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou Province 510006, China
| | - Hankun Chen
- Guangzhou Qinglan Biotechnology Co. Ltd., Guangzhou Province 515000, China
| | - Zhongyu Lu
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou Province 510006, China
| | - Banghan Ding
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou Province 510006, China
- Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou Province 510120, China
| | - Xiaohong Yuan
- Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou Province 510120, China
| | - Yuan Liu
- Guangzhou Huamiao Biotechnology Research Institute Co. Ltd., Guangzhou Province 510000, China
| | - Wei Zhu
- Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangzhou Province 510120, China
| |
Collapse
|
42
|
Sousa A, Rocha S, Vieira J, Reboiro-Jato M, López-Fernández H, Vieira CP. On the identification of potential novel therapeutic targets for spinocerebellar ataxia type 1 (SCA1) neurodegenerative disease using EvoPPI3. J Integr Bioinform 2023; 20:jib-2022-0056. [PMID: 36848492 PMCID: PMC10561075 DOI: 10.1515/jib-2022-0056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 11/26/2022] [Indexed: 03/01/2023] Open
Abstract
EvoPPI (http://evoppi.i3s.up.pt), a meta-database for protein-protein interactions (PPI), has been upgraded (EvoPPI3) to accept new types of data, namely, PPI from patients, cell lines, and animal models, as well as data from gene modifier experiments, for nine neurodegenerative polyglutamine (polyQ) diseases caused by an abnormal expansion of the polyQ tract. The integration of the different types of data allows users to easily compare them, as here shown for Ataxin-1, the polyQ protein involved in spinocerebellar ataxia type 1 (SCA1) disease. Using all available datasets and the data here obtained for Drosophila melanogaster wt and exp Ataxin-1 mutants (also available at EvoPPI3), we show that, in humans, the Ataxin-1 network is much larger than previously thought (380 interactors), with at least 909 interactors. The functional profiling of the newly identified interactors is similar to the ones already reported in the main PPI databases. 16 out of 909 interactors are putative novel SCA1 therapeutic targets, and all but one are already being studied in the context of this disease. The 16 proteins are mainly involved in binding and catalytic activity (mainly kinase activity), functional features already thought to be important in the SCA1 disease.
Collapse
Affiliation(s)
- André Sousa
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135Porto, Portugal
| | - Sara Rocha
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135Porto, Portugal
| | - Jorge Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135Porto, Portugal
| | - Miguel Reboiro-Jato
- Department of Computer Science, CINBIO, Universidade de Vigo, ESEI – Escuela Superior de Ingeniería Informática, 32004Ourense, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Hugo López-Fernández
- Department of Computer Science, CINBIO, Universidade de Vigo, ESEI – Escuela Superior de Ingeniería Informática, 32004Ourense, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Cristina P. Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135Porto, Portugal
| |
Collapse
|
43
|
Hao B, Kovács IA. A positive statistical benchmark to assess network agreement. Nat Commun 2023; 14:2988. [PMID: 37225699 DOI: 10.1038/s41467-023-38625-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
Current computational methods for validating experimental network datasets compare overlap, i.e., shared links, with a reference network using a negative benchmark. However, this fails to quantify the level of agreement between the two networks. To address this, we propose a positive statistical benchmark to determine the maximum possible overlap between networks. Our approach can efficiently generate this benchmark in a maximum entropy framework and provides a way to assess whether the observed overlap is significantly different from the best-case scenario. We introduce a normalized overlap score, Normlap, to enhance comparisons between experimental networks. As an application, we compare molecular and functional networks, resulting in an agreement network of human as well as yeast network datasets. The Normlap score can improve the comparison between experimental networks by providing a computational alternative to network thresholding and validation.
Collapse
Affiliation(s)
- Bingjie Hao
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA
| | - István A Kovács
- Department of Physics and Astronomy, Northwestern University, Evanston, IL, 60208, USA.
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, 60208, USA.
| |
Collapse
|
44
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
45
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
46
|
Bartolec TK, Vázquez-Campos X, Norman A, Luong C, Johnson M, Payne RJ, Wilkins MR, Mackay JP, Low JKK. Cross-linking mass spectrometry discovers, evaluates, and corroborates structures and protein-protein interactions in the human cell. Proc Natl Acad Sci U S A 2023; 120:e2219418120. [PMID: 37071682 PMCID: PMC10151615 DOI: 10.1073/pnas.2219418120] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/16/2023] [Indexed: 04/19/2023] Open
Abstract
Significant recent advances in structural biology, particularly in the field of cryoelectron microscopy, have dramatically expanded our ability to create structural models of proteins and protein complexes. However, many proteins remain refractory to these approaches because of their low abundance, low stability, or-in the case of complexes-simply not having yet been analyzed. Here, we demonstrate the power of using cross-linking mass spectrometry (XL-MS) for the high-throughput experimental assessment of the structures of proteins and protein complexes. This included those produced by high-resolution but in vitro experimental data, as well as in silico predictions based on amino acid sequence alone. We present the largest XL-MS dataset to date, describing 28,910 unique residue pairs captured across 4,084 unique human proteins and 2,110 unique protein-protein interactions. We show that models of proteins and their complexes predicted by AlphaFold2, and inspired and corroborated by the XL-MS data, offer opportunities to deeply mine the structural proteome and interactome and reveal mechanisms underlying protein structure and function.
Collapse
Affiliation(s)
- Tara K. Bartolec
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Xabier Vázquez-Campos
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Alexander Norman
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
| | - Clement Luong
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Marcus Johnson
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Richard J. Payne
- School of Chemistry, University of Sydney, Sydney, NSW2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, Sydney, NSW2006, Australia
| | - Marc R. Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW2052, Australia
| | - Joel P. Mackay
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| | - Jason K. K. Low
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW2006, Australia
| |
Collapse
|
47
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
48
|
Mirela Bota P, Hernandez AC, Segura J, Gallego O, Oliva B, Fernandez-Fuentes N. CM2D3: Furnishing the human interactome with structural models of protein complexes derived by comparative modeling and docking. J Mol Biol 2023:168055. [PMID: 36958605 DOI: 10.1016/j.jmb.2023.168055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/05/2023] [Accepted: 03/16/2023] [Indexed: 03/25/2023]
Abstract
The human interactome is composed of around half a million interactions according to recent estimations and it is only for a small fraction of those that three-dimensional structural information is available. Indeed, the structural coverage of the human interactome is very low and given the complexity and time-consuming requirements of solving protein structures this problem will remain for the foreseeable future. Structural models, or predictions, of protein complexes can provide valuable information when the experimentally determined 3D structures are not available. Here we present CM2D3, a relational database containing structural models of the whole human interactome derived both from comparative modeling and data-driven docking. Starting from a consensus interactome derived from integrating several interactomics databases, a strategy was devised to derive structural models by computational means. Currently, CM2D3 includes 33338 structural models of which 5121 derived from comparative modeling and the remaining from docking. Of the latter, the structures of 14554 complexes were derived from monomers modeled by M4T while the rest were modeled with structures as predicted by AlphaFold2. Lastly, CM2D3 complements existing resources by focusing on models derived from both free-docking, as opposed to template-based docking, and hence expanding the available structural information on protein complexes to the scientific community. Database URL:http://www.bioinsilico.org/CM2D3.
Collapse
Affiliation(s)
- Patricia Mirela Bota
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, 08950 Barcelona, Catalonia, Spain
| | - Altair C Hernandez
- Live-cell Structural Biology, Department of Medicine and Life Sciences, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Oriol Gallego
- Live-cell Structural Biology, Department of Medicine and Life Sciences, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, 08950 Barcelona, Catalonia, Spain.
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences. Aberystwyth University, SY233EE Aberystwyth, United Kingdom.
| |
Collapse
|
49
|
Petrey D, Zhao H, Trudeau S, Murray D, Honig B. PrePPI: A structure informed proteome-wide database of protein-protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.27.530276. [PMID: 36909476 PMCID: PMC10002632 DOI: 10.1101/2023.02.27.530276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/04/2023]
Abstract
We present an updated version of the Predicting Protein-Protein Interactions (PrePPI) webserver which predicts PPIs on a proteome-wide scale. PrePPI combines structural and non-structural clues within a Bayesian framework to compute a likelihood ratio (LR) for essentially every possible pair of proteins in a proteome; the current database is for the human interactome. The structural modeling (SM) clue is derived from templatebased modeling and its application on a proteome-wide scale is enabled by a unique scoring function used to evaluate a putative complex. The updated version of PrePPI leverages AlphaFold structures that are parsed into individual domains. As has been demonstrated in earlier applications, PrePPI performs extremely well as measured by receiver operating characteristic curves derived from testing on E. coli and human protein-protein interaction (PPI) databases. A PrePPI database of ~1.3 million human PPIs can be queried with a webserver application that comprises multiple functionalities for examining query proteins, template complexes, 3D models for predicted complexes, and related features ( https://honiglab.c2b2.columbia.edu/PrePPI ). PrePPI is a state-of- the-art resource that offers an unprecedented structure-informed view of the human interactome. Graphic Abstract
Collapse
|
50
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|