1
|
Yuan R, Zhang J, Zhou J, Cong Q. Recent progress and future challenges in structure-based protein-protein interaction prediction. Mol Ther 2025; 33:2252-2268. [PMID: 40195117 DOI: 10.1016/j.ymthe.2025.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 03/05/2025] [Accepted: 04/02/2025] [Indexed: 04/09/2025] Open
Abstract
Protein-protein interactions (PPIs) play a fundamental role in cellular processes, and understanding these interactions is crucial for advances in both basic biological science and biomedical applications. This review presents an overview of recent progress in computational methods for modeling protein complexes and predicting PPIs based on 3D structures, focusing on the transformative role of artificial intelligence-based approaches. We further discuss the expanding biomedical applications of PPI research, including the elucidation of disease mechanisms, drug discovery, and therapeutic design. Despite these advances, significant challenges remain in predicting host-pathogen interactions, interactions between intrinsically disordered regions, and interactions related to immune responses. These challenges are worthwhile for future explorations and represent the frontier of research in this field.
Collapse
Affiliation(s)
- Rongqing Yuan
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
2
|
Deng P, Zhang Y, Xu L, Lyu J, Li L, Sun F, Zhang WB, Gao H. Computational discovery and systematic analysis of protein entangling motifs in nature: from algorithm to database. Chem Sci 2025:d4sc08649j. [PMID: 40271025 PMCID: PMC12013726 DOI: 10.1039/d4sc08649j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 03/29/2025] [Indexed: 04/25/2025] Open
Abstract
Nontrivial protein topology has the potential to revolutionize protein engineering by enabling the manipulation of proteins' stability and dynamics. However, the rarity of topological proteins in nature poses a challenge for their design, synthesis and application, primarily due to the limited number of available entangling motifs as synthetic templates. Discovering these motifs is particularly difficult, as entanglement is a subtle structural feature that is not readily discernible from protein sequences. In this study, we developed a streamlined workflow enabling efficient and accurate identification of structurally reliable and applicable entangling motifs from protein sequences. Through this workflow, we automatically curated a database of 1115 entangling protein motifs from over 100 thousand sequences in the UniProt Knowledgebase. In our database, 73.3% of C2 entangling motifs and 80.1% of C3 entangling motifs exhibited low structural similarity to known protein structures. The entangled structures in the database were categorized into different groups and their functional and biological significance were analyzed. The results were summarized in an online database accessible through a user-friendly web platform, providing researchers with an expanded toolbox of entangling motifs. This resource is poised to significantly advance the field of protein topology engineering and inspire new research directions in protein design and application.
Collapse
Affiliation(s)
- Puqing Deng
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Yuxuan Zhang
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Lianjie Xu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
| | - Jinyu Lyu
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Linyan Li
- Department of Data Science, City University of Hong Kong Kowloon Hong Kong
| | - Fei Sun
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Wen-Bin Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
- AI for Science (AI4S)-Preferred Program, Shenzhen Graduate School, Peking University Shenzhen 518055 P. R. China
| | - Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| |
Collapse
|
3
|
Doran BA, Chen RY, Giba H, Behera V, Barat B, Sundararajan A, Lin H, Sidebottom A, Pamer EG, Raman AS. Subspecies phylogeny in the human gut revealed by co-evolutionary constraints across the bacterial kingdom. Cell Syst 2025; 16:101167. [PMID: 39826551 DOI: 10.1016/j.cels.2024.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/16/2024] [Accepted: 12/18/2024] [Indexed: 01/22/2025]
Abstract
The human gut microbiome contains many bacterial strains of the same species ("strain-level variants") that shape microbiome function. The tremendous scale and molecular resolution at which microbial communities are being interrogated motivates addressing how to describe strain-level variants. We introduce the "Spectral Tree"-an inferred tree of relatedness built from patterns of co-evolutionary constraint between greater than 7,000 diverse bacteria. Using the Spectral Tree to describe over 600 diverse gut commensal strains that we isolated, whole-genome sequenced, and metabolically profiled revealed (1) widespread phylogenetic structure among strain-level variants, (2) the origins of subspecies phylogeny as a shared history of phage infections across humans, and (3) the key role of inter-human strain variation in predicting strain-level metabolic qualities. Overall, our work demonstrates the existence and metabolic importance of structured phylogeny below the level of species for commensal gut bacteria, motivating a redefinition of individual strains according to their evolutionary context. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Benjamin A Doran
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA; Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 60637, USA
| | - Robert Y Chen
- Department of Psychiatry, University of Washington, Seattle, WA 98195, USA
| | - Hannah Giba
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA; Department of Pathology, University of Chicago, Chicago, IL 60637, USA
| | - Vivek Behera
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Bidisha Barat
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA
| | | | - Huaiying Lin
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA
| | - Ashley Sidebottom
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA
| | - Eric G Pamer
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA; Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Arjun S Raman
- Duchossois Family Institute, University of Chicago, Chicago, IL 60637, USA; Department of Pathology, University of Chicago, Chicago, IL 60637, USA; Center for the Physics of Evolving Systems, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
4
|
Zhang Z, Ghavasieh A, Zhang J, De Domenico M. Coarse-graining network flow through statistical physics and machine learning. Nat Commun 2025; 16:1605. [PMID: 39948344 PMCID: PMC11825948 DOI: 10.1038/s41467-025-56034-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 01/06/2025] [Indexed: 02/16/2025] Open
Abstract
Information dynamics plays a crucial role in complex systems, from cells to societies. Recent advances in statistical physics have made it possible to capture key network properties, such as flow diversity and signal speed, using entropy and free energy. However, large system sizes pose computational challenges. We use graph neural networks to identify suitable groups of components for coarse-graining a network and achieve a low computational complexity, suitable for practical application. Our approach preserves information flow even under significant compression, as shown through theoretical analysis and experiments on synthetic and empirical networks. We find that the model merges nodes with similar structural properties, suggesting they perform redundant roles in information transmission. This method enables low-complexity compression for extremely large networks, offering a multiscale perspective that preserves information flow in biological, social, and technological networks better than existing methods mostly focused on network structure.
Collapse
Affiliation(s)
- Zhang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China.
- Swarma Research, Beijing, China.
- Department of Physics & Astronomy 'Galileo Galilei', University of Padua, Padua, Italy.
| | - Arsham Ghavasieh
- Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
| | - Jiang Zhang
- School of Systems Science, Beijing Normal University, Beijing, China
- Swarma Research, Beijing, China
| | - Manlio De Domenico
- Department of Physics & Astronomy 'Galileo Galilei', University of Padua, Padua, Italy.
- Padua Center for Network Medicine, University of Padua, Padua, Italy.
- Istituto Nazionale di Fisica Nucleare, Sez., Padova, Italy.
| |
Collapse
|
5
|
Lupo U, Sgarbossa D, Milighetti M, Bitbol AF. DiffPaSS-high-performance differentiable pairing of protein sequences using soft scores. Bioinformatics 2024; 41:btae738. [PMID: 39672677 PMCID: PMC11676329 DOI: 10.1093/bioinformatics/btae738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 12/05/2024] [Accepted: 12/11/2024] [Indexed: 12/15/2024] Open
Abstract
MOTIVATION Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores. RESULTS We introduce Differentiable Pairing using Soft Scores (DiffPaSS), a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual information and neighbor graph alignment scores. DiffPaSS outperforms existing algorithms for optimizing the same scores. We demonstrate the usefulness of our paired alignments for the prediction of protein complex structure. DiffPaSS does not require sequences to be aligned, and we also apply it to nonaligned sequences from T-cell receptors. AVAILABILITY AND IMPLEMENTATION A PyTorch implementation and installable Python package are available at https://github.com/Bitbol-Lab/DiffPaSS.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom
- Cancer Institute, University College London, London WC1E 6DD, United Kingdom
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
6
|
Zhao C, Jiang X, Wang M, Gui S, Yan X, Dong Y, Liu D. Constructing protein-functionalized DNA origami nanodevices for biological applications. NANOSCALE 2024; 17:142-157. [PMID: 39564893 DOI: 10.1039/d4nr03599b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2024]
Abstract
In living systems, proteins participate in various physiological processes and the clustering of multiple proteins is essential for efficient signaling. Therefore, understanding the effects of the number, distance and orientation of proteins is of great significance. With programmability and addressability, DNA origami technology has enabled fabrication of sophisticated nanostructures with precise arrangement and orientation control of proteins to investigate the effects of these parameters on protein-involved cellular processes. Herein, we highlight the construction and applications of protein-functionalized DNA origami nanodevices. After the introduction of the structural design principles of DNA origami and the strategies of protein-DNA conjugation, the emerging applications of protein-functionalized DNA origami nanodevices with controlled key parameters are mainly discussed, including the regulation of enzyme cascade reactions, modulation of cellular behaviours, drug delivery therapy and protein structural analysis. Finally, the current challenges and potential directions of protein-functionalized DNA origami nanodevices are also presented, advancing their applications in biomedicine, cell biology and structural biology.
Collapse
Affiliation(s)
- Chuangyuan Zhao
- CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Xinran Jiang
- School of Life Sciences Fudan University, Shanghai, 200433, China
| | - Miao Wang
- Chemistry and chemical biology, Cornell university, 122 Baker Laboratory, Ithaca, NY 14853, USA
| | - Songbai Gui
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 100071, Beijing, China.
| | - Xin Yan
- Department of Sports Medicine, Beijing Key Laboratory of Sports Injuries, Peking University Third Hospital, Beijing, 100191, China.
| | - Yuanchen Dong
- CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing, 100190, China
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Dongsheng Liu
- Engineering Research Center of Advanced Rare Earth Materials, (Ministry of Education), Department of Chemistry, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
7
|
Bhadra-Lobo S, Derevyanko G, Lamoureux G. Dock2D: Synthetic Data for the Molecular Recognition Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2580-2586. [PMID: 38814763 DOI: 10.1109/tcbb.2024.3407477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Predicting the physical interaction of proteins is a cornerstone problem in computational biology. New classes of learning-based algorithms are actively being developed, and are typically trained end-to-end on protein complex structures extracted from the Protein Data Bank. These training datasets tend to be large and difficult to use for prototyping and, unlike image or natural language datasets, they are not easily interpretable by non-experts. We present Dock2D-IP and Dock2D-IF, two "toy" datasets that can be used to select algorithms predicting protein-protein interactions-or any other type of molecular interactions. Using two-dimensional shapes as input, each example from Dock2D-IP ("interaction pose") describes the interaction pose of two shapes known to interact and each example from Dock2D-IF ("interaction fact") describes whether two shapes form a stable complex or not, regardless of how they bind. We propose a number of baseline solutions to the problem and show that the same underlying energy function can be learned either by solving the interaction pose task (formulated as an energy-minimization "docking" problem) or the fact-of-interaction task (formulated as a binding free energy estimation problem).
Collapse
|
8
|
Humphreys IR, Zhang J, Baek M, Wang Y, Krishnakumar A, Pei J, Anishchenko I, Tower CA, Jackson BA, Warrier T, Hung DT, Peterson SB, Mougous JD, Cong Q, Baker D. Protein interactions in human pathogens revealed through deep learning. Nat Microbiol 2024; 9:2642-2652. [PMID: 39294458 PMCID: PMC11445079 DOI: 10.1038/s41564-024-01791-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 07/23/2024] [Indexed: 09/20/2024]
Abstract
Identification of bacterial protein-protein interactions and predicting the structures of these complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for infectious diseases. Here we developed RoseTTAFold2-Lite, a rapid deep learning model that leverages residue-residue coevolution and protein structure prediction to systematically identify and structurally characterize protein-protein interactions at the proteome-wide scale. Using this pipeline, we searched through 78 million pairs of proteins across 19 human bacterial pathogens and identified 1,923 confidently predicted complexes involving essential genes and 256 involving virulence factors. Many of these complexes were not previously known; we experimentally tested 12 such predictions, and half of them were validated. The predicted interactions span core metabolic and virulence pathways ranging from post-transcriptional modification to acid neutralization to outer-membrane machinery and should contribute to our understanding of the biology of these important pathogens and the design of drugs to combat them.
Collapse
Affiliation(s)
- Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Minkyung Baek
- Department of Biological Sciences, Seoul National University, Seoul, South Korea.
| | - Yaxi Wang
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Aditya Krishnakumar
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Catherine A Tower
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Blake A Jackson
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Thulasi Warrier
- Department of Molecular Biology and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Deborah T Hung
- Department of Molecular Biology and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - S Brook Peterson
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Joseph D Mougous
- Department of Microbiology, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
- Microbial Interactions and Microbiome Center, University of Washington, Seattle, WA, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
9
|
Dietler N, Abbara A, Choudhury S, Bitbol AF. Impact of phylogeny on the inference of functional sectors from protein sequence data. PLoS Comput Biol 2024; 20:e1012091. [PMID: 39312591 PMCID: PMC11449291 DOI: 10.1371/journal.pcbi.1012091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 10/03/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024] Open
Abstract
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alia Abbara
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Subham Choudhury
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
10
|
Fenster JA, Azzinaro PA, Dinhobl M, Borca MV, Spinard E, Gladue DP. African Swine Fever Virus Protein-Protein Interaction Prediction. Viruses 2024; 16:1170. [PMID: 39066332 PMCID: PMC11281715 DOI: 10.3390/v16071170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/05/2024] [Accepted: 07/12/2024] [Indexed: 07/28/2024] Open
Abstract
The African swine fever virus (ASFV) is an often deadly disease in swine and poses a threat to swine livestock and swine producers. With its complex genome containing more than 150 coding regions, developing effective vaccines for this virus remains a challenge due to a lack of basic knowledge about viral protein function and protein-protein interactions between viral proteins and between viral and host proteins. In this work, we identified ASFV-ASFV protein-protein interactions (PPIs) using artificial intelligence-powered protein structure prediction tools. We benchmarked our PPI identification workflow on the Vaccinia virus, a widely studied nucleocytoplasmic large DNA virus, and found that it could identify gold-standard PPIs that have been validated in vitro in a genome-wide computational screening. We applied this workflow to more than 18,000 pairwise combinations of ASFV proteins and were able to identify seventeen novel PPIs, many of which have corroborating experimental or bioinformatic evidence for their protein-protein interactions, further validating their relevance. Two protein-protein interactions, I267L and I8L, I267L__I8L, and B175L and DP79L, B175L__DP79L, are novel PPIs involving viral proteins known to modulate host immune response.
Collapse
Affiliation(s)
- Jacob A. Fenster
- Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN 37830, USA;
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Paul A. Azzinaro
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Mark Dinhobl
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Manuel V. Borca
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Edward Spinard
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| | - Douglas P. Gladue
- Plum Island Animal Disease Center, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Orient, NY 11957, USA; (P.A.A.); (M.D.); (E.S.)
- National Bio and Agro-Defense Facility, Foreign Animal Disease Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Manhattan, KS 66502, USA
| |
Collapse
|
11
|
Pawnikar S, Magenheimer BS, Joshi K, Munoz EN, Haldane A, Maser RL, Miao Y. Activation of Polycystin-1 Signaling by Binding of Stalk-derived Peptide Agonists. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.06.574465. [PMID: 38260358 PMCID: PMC10802338 DOI: 10.1101/2024.01.06.574465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Polycystin-1 (PC1) is the membrane protein product of the PKD1 gene whose mutation is responsible for 85% of the cases of autosomal dominant polycystic kidney disease (ADPKD). ADPKD is primarily characterized by the formation of renal cysts and potential kidney failure. PC1 is an atypical G protein-coupled receptor (GPCR) consisting of 11 transmembrane helices and an autocatalytic GAIN domain that cleaves PC1 into extracellular N-terminal (NTF) and membrane-embedded C-terminal (CTF) fragments. Recently, signaling activation of the PC1 CTF was shown to be regulated by a stalk tethered agonist (TA), a distinct mechanism observed in the adhesion GPCR family. A novel allosteric activation pathway was elucidated for the PC1 CTF through a combination of Gaussian accelerated molecular dynamics (GaMD), mutagenesis and cellular signaling experiments. Here, we show that synthetic, soluble peptides with 7 to 21 residues derived from the stalk TA, in particular, peptides including the first 9 residues (p9), 17 residues (p17) and 21 residues (p21) exhibited the ability to re-activate signaling by a stalkless PC1 CTF mutant in cellular assays. To reveal molecular mechanisms of stalk peptide-mediated signaling activation, we have applied a novel Peptide GaMD (Pep-GaMD) algorithm to elucidate binding conformations of selected stalk peptide agonists p9, p17 and p21 to the stalkless PC1 CTF. The simulations revealed multiple specific binding regions of the stalk peptide agonists to the PC1 protein including an "intermediate" bound yet inactive state. Our Pep-GaMD simulation findings were consistent with the cellular assay experimental data. Binding of peptide agonists to the TOP domain of PC1 induced close TOP-putative pore loop interactions, a characteristic feature of the PC1 CTF signaling activation mechanism. Using sequence covariation analysis of PC1 homologs, we further showed that the peptide binding regions were consistent with covarying residue pairs identified between the TOP domain and the stalk TA. Therefore, structural dynamic insights into the mechanisms of PC1 activation by stalk-derived peptide agonists have enabled an in-depth understanding of PC1 signaling. They will form a foundation for development of PC1 as a therapeutic target for the treatment of ADPKD.
Collapse
Affiliation(s)
- Shristi Pawnikar
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66047
| | - Brenda S. Magenheimer
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Keya Joshi
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| | - Ericka Nevarez Munoz
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
| | - Allan Haldane
- Dept of Physics, and Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA 19122
| | - Robin L. Maser
- Departments of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, KS 66160
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Yinglong Miao
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| |
Collapse
|
12
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
13
|
Yang Q, Jin X, Zhou H, Ying J, Zou J, Liao Y, Lu X, Ge S, Yu H, Min X. SurfPro-NN: A 3D point cloud neural network for the scoring of protein-protein docking models based on surfaces features and protein language models. Comput Biol Chem 2024; 110:108067. [PMID: 38714420 DOI: 10.1016/j.compbiolchem.2024.108067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/18/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Protein-protein interactions (PPI) play a crucial role in numerous key biological processes, and the structure of protein complexes provides valuable clues for in-depth exploration of molecular-level biological processes. Protein-protein docking technology is widely used to simulate the spatial structure of proteins. However, there are still challenges in selecting candidate decoys that closely resemble the native structure from protein-protein docking simulations. In this study, we introduce a docking evaluation method based on three-dimensional point cloud neural networks named SurfPro-NN, which represents protein structures as point clouds and learns interaction information from protein interfaces by applying a point cloud neural network. With the continuous advancement of deep learning in the field of biology, a series of knowledge-rich pre-trained models have emerged. We incorporate protein surface representation models and language models into our approach, greatly enhancing feature representation capabilities and achieving superior performance in protein docking model scoring tasks. Through comprehensive testing on public datasets, we find that our method outperforms state-of-the-art deep learning approaches in protein-protein docking model scoring. Not only does it significantly improve performance, but it also greatly accelerates training speed. This study demonstrates the potential of our approach in addressing protein interaction assessment problems, providing strong support for future research and applications in the field of biology.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaocheng Jin
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Haixia Zhou
- School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Junjie Ying
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - JiaJun Zou
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Yiyang Liao
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Xiaoli Lu
- Information and Networking Center, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Hai Yu
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaoping Min
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| |
Collapse
|
14
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
15
|
Serra Moncadas L, Hofer C, Bulzu PA, Pernthaler J, Andrei AS. Freshwater genome-reduced bacteria exhibit pervasive episodes of adaptive stasis. Nat Commun 2024; 15:3421. [PMID: 38653968 PMCID: PMC11039613 DOI: 10.1038/s41467-024-47767-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 04/10/2024] [Indexed: 04/25/2024] Open
Abstract
The emergence of bacterial species is rooted in their inherent potential for continuous evolution and adaptation to an ever-changing ecological landscape. The adaptive capacity of most species frequently resides within the repertoire of genes encoding the secreted proteome (SP), as it serves as a primary interface used to regulate survival/reproduction strategies. Here, by applying evolutionary genomics approaches to metagenomics data, we show that abundant freshwater bacteria exhibit biphasic adaptation states linked to the eco-evolutionary processes governing their genome sizes. While species with average to large genomes adhere to the dominant paradigm of evolution through niche adaptation by reducing the evolutionary pressure on their SPs (via the augmentation of functionally redundant genes that buffer mutational fitness loss) and increasing the phylogenetic distance of recombination events, most of the genome-reduced species exhibit a nonconforming state. In contrast, their SPs reflect a combination of low functional redundancy and high selection pressure, resulting in significantly higher levels of conservation and invariance. Our findings indicate that although niche adaptation is the principal mechanism driving speciation, freshwater genome-reduced bacteria often experience extended periods of adaptive stasis. Understanding the adaptive state of microbial species will lead to a better comprehension of their spatiotemporal dynamics, biogeography, and resilience to global change.
Collapse
Affiliation(s)
- Lucas Serra Moncadas
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| | - Cyrill Hofer
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| | - Paul-Adrian Bulzu
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Jakob Pernthaler
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland
| | - Adrian-Stefan Andrei
- Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland.
| |
Collapse
|
16
|
Humphreys IR, Zhang J, Baek M, Wang Y, Krishnakumar A, Pei J, Anishchenko I, Tower CA, Jackson BA, Warrier T, Hung DT, Peterson SB, Mougous JD, Cong Q, Baker D. Essential and virulence-related protein interactions of pathogens revealed through deep learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589144. [PMID: 38645026 PMCID: PMC11030334 DOI: 10.1101/2024.04.12.589144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Identification of bacterial protein-protein interactions and predicting the structures of the complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for infectious diseases. Here, we developed a deep learning-based pipeline that leverages residue-residue coevolution and protein structure prediction to systematically identify and structurally characterize protein-protein interactions at the proteome-wide scale. Using this pipeline, we searched through 78 million pairs of proteins across 19 human bacterial pathogens and identified 1923 confidently predicted complexes involving essential genes and 256 involving virulence factors. Many of these complexes were not previously known; we experimentally tested 12 such predictions, and half of them were validated. The predicted interactions span core metabolic and virulence pathways ranging from post-transcriptional modification to acid neutralization to outer membrane machinery and should contribute to our understanding of the biology of these important pathogens and the design of drugs to combat them.
Collapse
|
17
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
18
|
Zhang J, Durham J, Qian Cong. Revolutionizing protein-protein interaction prediction with deep learning. Curr Opin Struct Biol 2024; 85:102775. [PMID: 38330793 DOI: 10.1016/j.sbi.2024.102775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 02/10/2024]
Abstract
Protein-protein interactions (PPIs) are pivotal for driving diverse biological processes, and any disturbance in these interactions can lead to disease. Thus, the study of PPIs has been a central focus in biology. Recent developments in deep learning methods, coupled with the vast genomic sequence data, have significantly boosted the accuracy of predicting protein structures and modeling protein complexes, approaching levels comparable to experimental techniques. Herein, we review the latest advances in the computational methods for modeling 3D protein complexes and the prediction of protein interaction partners, emphasizing the application of deep learning methods deriving from coevolution analysis. The review also highlights biomedical applications of PPI prediction and outlines challenges in the field.
Collapse
Affiliation(s)
- Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. https://twitter.com/jzhang_genome
| | - Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; HaroldC.Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
19
|
Zheng J, Dong C, Xiong S. Mycobacterial Rv1804c binds to the PEST domain of IκBα and activates macrophage-mediated proinflammatory responses. iScience 2024; 27:109101. [PMID: 38384838 PMCID: PMC10879709 DOI: 10.1016/j.isci.2024.109101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/18/2023] [Accepted: 01/30/2024] [Indexed: 02/23/2024] Open
Abstract
Recognition of the components of Mycobacterium tuberculosis (Mtb) by macrophages is vital for initiating a cascade of host immune responses. However, the recognition of Mtb-secretory proteins by the receptor-independent pathways of the host remains unclear. Rv1804c is a highly conserved secretory protein in Mtb. However, its exact function and underlying mechanism in Mtb infection remain poorly understood. In the present study, we observed that Rv1804c activates macrophage-mediated proinflammatory responses in an IKKα-independent manner. Furthermore, we noted that Rv1804c inhibits mycobacterial survival. By elucidating the underlying mechanisms, we observed that Rv1804c activates IκBα by directly interacting with its PEST domain. Moreover, Rv1804c was enriched in attenuated but not in virulent mycobacteria and associated with the disease process of tuberculosis. Our findings provide an alternative pathway via which a mycobacterial secretory protein activates macrophage-mediated proinflammatory responses. Our study findings may shed light on the prevention and treatment of tuberculosis.
Collapse
Affiliation(s)
- Jianjian Zheng
- Jiangsu Key Laboratory of Infection and Immunity, Institutes of Biology and Medical Sciences, Soochow University, Suzhou 215123, China
| | - Chunsheng Dong
- Jiangsu Key Laboratory of Infection and Immunity, Institutes of Biology and Medical Sciences, Soochow University, Suzhou 215123, China
| | - Sidong Xiong
- Jiangsu Key Laboratory of Infection and Immunity, Institutes of Biology and Medical Sciences, Soochow University, Suzhou 215123, China
| |
Collapse
|
20
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
21
|
Lei ZC, Wang X, Yang L, Qu H, Sun Y, Yang Y, Li W, Zhang WB, Cao XY, Fan C, Li G, Wu J, Tian ZQ. What can molecular assembly learn from catalysed assembly in living organisms? Chem Soc Rev 2024; 53:1892-1914. [PMID: 38230701 DOI: 10.1039/d3cs00634d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
Molecular assembly is the process of organizing individual molecules into larger structures and complex systems. The self-assembly approach is predominantly utilized in creating artificial molecular assemblies, and was believed to be the primary mode of molecular assembly in living organisms as well. However, it has been shown that the assembly of many biological complexes is "catalysed" by other molecules, rather than relying solely on self-assembly. In this review, we summarize these catalysed-assembly (catassembly) phenomena in living organisms and systematically analyse their mechanisms. We then expand on these phenomena and discuss related concepts, including catalysed-disassembly and catalysed-reassembly. Catassembly proves to be an efficient and highly selective strategy for synergistically controlling and manipulating various noncovalent interactions, especially in hierarchical molecular assemblies. Overreliance on self-assembly may, to some extent, hinder the advancement of artificial molecular assembly with powerful features. Furthermore, inspired by the biological catassembly phenomena, we propose guidelines for designing artificial catassembly systems and developing characterization and theoretical methods, and review pioneering works along this new direction. Overall, this approach may broaden and deepen our understanding of molecular assembly, enabling the construction and control of intelligent assembly systems with advanced functionality.
Collapse
Affiliation(s)
- Zhi-Chao Lei
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Xinchang Wang
- School of Electronic Science and Engineering, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, P. R. China
| | - Liulin Yang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
| | - Hang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
| | - Yibin Sun
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China
| | - Yang Yang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
| | - Wei Li
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Wen-Bin Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China
| | - Xiao-Yu Cao
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
| | - Chunhai Fan
- School of Chemistry and Chemical Engineering, Frontiers Science, Center for Transformative Molecules and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai 200240, P. R. China
| | - Guohong Li
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Jiarui Wu
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, P. R. China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, P. R. China
| | - Zhong-Qun Tian
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China.
| |
Collapse
|
22
|
Gómez Borrego J, Torrent Burgas M. Structural assembly of the bacterial essential interactome. eLife 2024; 13:e94919. [PMID: 38226900 PMCID: PMC10863985 DOI: 10.7554/elife.94919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 12/22/2023] [Indexed: 01/17/2024] Open
Abstract
The study of protein interactions in living organisms is fundamental for understanding biological processes and central metabolic pathways. Yet, our knowledge of the bacterial interactome remains limited. Here, we combined gene deletion mutant analysis with deep-learning protein folding using AlphaFold2 to predict the core bacterial essential interactome. We predicted and modeled 1402 interactions between essential proteins in bacteria and generated 146 high-accuracy models. Our analysis reveals previously unknown details about the assembly mechanisms of these complexes, highlighting the importance of specific structural features in their stability and function. Our work provides a framework for predicting the essential interactomes of bacteria and highlight the potential of deep-learning algorithms in advancing our understanding of the complex biology of living organisms. Also, the results presented here offer a promising approach to identify novel antibiotic targets.
Collapse
Affiliation(s)
- Jordi Gómez Borrego
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de BarcelonaCerdanyola del VallèsSpain
| | - Marc Torrent Burgas
- Systems Biology of Infection Lab, Department of Biochemistry and Molecular Biology, Biosciences Faculty, Universitat Autònoma de BarcelonaCerdanyola del VallèsSpain
| |
Collapse
|
23
|
Pei J, Zhang J, Cong Q. Computational analysis of protein-protein interactions of cancer drivers in renal cell carcinoma. FEBS Open Bio 2024; 14:112-126. [PMID: 37964489 PMCID: PMC10761929 DOI: 10.1002/2211-5463.13732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/30/2023] [Accepted: 11/06/2023] [Indexed: 11/16/2023] Open
Abstract
Renal cell carcinoma (RCC) is the most common type of kidney cancer with rising cases in recent years. Extensive research has identified various cancer driver proteins associated with different subtypes of RCC. Most RCC drivers are encoded by tumor suppressor genes and exhibit enrichment in functional categories such as protein degradation, chromatin remodeling, and transcription. To further our understanding of RCC, we utilized powerful deep-learning methods based on AlphaFold to predict protein-protein interactions (PPIs) involving RCC drivers. We predicted high-confidence complexes formed by various RCC drivers, including TCEB1, KMT2C/D and KDM6A of the COMPASS-related complexes, TSC1 of the MTOR pathway, and TRRAP. These predictions provide valuable structural insights into the interaction interfaces, some of which are promising targets for cancer drug design, such as the NRF2-MAFK interface. Cancer somatic missense mutations from large datasets of genome sequencing of RCCs were mapped to the interfaces of predicted and experimental structures of PPIs involving RCC drivers, and their effects on the binding affinity were evaluated. We observed more than 100 cancer somatic mutations affecting the binding affinity of complexes formed by key RCC drivers such as VHL and TCEB1. These findings emphasize the importance of these mutations in RCC pathogenesis and potentially offer new avenues for targeted therapies.
Collapse
Affiliation(s)
- Jimin Pei
- Eugene McDermott Center for Human Growth and DevelopmentUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Department of BiophysicsUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Harold C. Simmons Comprehensive Cancer CenterUniversity of Texas Southwestern Medical CenterDallasTXUSA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and DevelopmentUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Department of BiophysicsUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Harold C. Simmons Comprehensive Cancer CenterUniversity of Texas Southwestern Medical CenterDallasTXUSA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and DevelopmentUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Department of BiophysicsUniversity of Texas Southwestern Medical CenterDallasTXUSA
- Harold C. Simmons Comprehensive Cancer CenterUniversity of Texas Southwestern Medical CenterDallasTXUSA
| |
Collapse
|
24
|
Siebenmorgen T, Saremi Nanji Y, Zacharias M. Refinement of Docked Protein-Protein Complexes Using Repulsive Scaling Replica Exchange Simulations. Methods Mol Biol 2024; 2780:289-302. [PMID: 38987474 DOI: 10.1007/978-1-0716-3985-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Accurate prediction and evaluation of protein-protein complex structures is of major importance to understand the cellular interactome. Predicted complex structures based on deep learning approaches or traditional docking methods require often structural refinement and rescoring for realistic evaluation. Standard molecular dynamics (MD) simulations are time-consuming and often do not structurally improve docking solutions. Better refinement can be achieved with our recently developed replica-exchange-based scheme employing different levels of repulsive biasing between proteins in each replica simulation (RS-REMD). The bias acts specifically on the intermolecular interactions based on an increase in effective pairwise van der Waals radii without changing interactions within each protein or with the solvent. It allows for an improvement of the predicted protein-protein complex structure and simultaneous realistic free energy scoring of protein-protein complexes. The setup of RS-REMD simulations is described in detail including the application on two examples (all necessary scripts and input files can be obtained from https://gitlab.com/TillCyrill/mmib ).
Collapse
Affiliation(s)
- Till Siebenmorgen
- Technical University of Munich, Physics Department and Center of Functional Protein Assemblies, Garching, Germany
| | - Yasmin Saremi Nanji
- Technical University of Munich, Physics Department and Center of Functional Protein Assemblies, Garching, Germany
| | - Martin Zacharias
- Technical University of Munich, Physics Department and Center of Functional Protein Assemblies, Garching, Germany.
| |
Collapse
|
25
|
Xia Y, Zhao K, Liu D, Zhou X, Zhang G. Multi-domain and complex protein structure prediction using inter-domain interactions from deep learning. Commun Biol 2023; 6:1221. [PMID: 38040847 PMCID: PMC10692239 DOI: 10.1038/s42003-023-05610-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023] Open
Abstract
Accurately capturing domain-domain interactions is key to understanding protein function and designing structure-based drugs. Although AlphaFold2 has made a breakthrough on single domain, it should be noted that the structure modeling for multi-domain protein and complex remains a challenge. In this study, we developed a multi-domain and complex structure assembly protocol, named DeepAssembly, based on domain segmentation and single domain modeling algorithms. Firstly, DeepAssembly uses a population-based evolutionary algorithm to assemble multi-domain proteins by inter-domain interactions inferred from a developed deep learning network. Secondly, protein complexes are assembled by means of domains rather than chains using DeepAssembly. Experimental results show that on 219 multi-domain proteins, the average inter-domain distance precision by DeepAssembly is 22.7% higher than that of AlphaFold2. Moreover, DeepAssembly improves accuracy by 13.1% for 164 multi-domain structures with low confidence deposited in AlphaFold database. We apply DeepAssembly for the prediction of 247 heterodimers. We find that DeepAssembly successfully predicts the interface (DockQ ≥ 0.23) for 32.4% of the dimers, suggesting a lighter way to assemble complex structures by treating domains as assembly units and using inter-domain interactions learned from monomer structures.
Collapse
Affiliation(s)
- Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
26
|
Kewalramani N, Emili A, Crovella M. State-of-the-art computational methods to predict protein-protein interactions with high accuracy and coverage. Proteomics 2023; 23:e2200292. [PMID: 37401192 DOI: 10.1002/pmic.202200292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/24/2023] [Accepted: 06/09/2023] [Indexed: 07/05/2023]
Abstract
Prediction of protein-protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.
Collapse
Affiliation(s)
- Neal Kewalramani
- Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Andrew Emili
- OHSU Knight Cancer Institute, Portland, Oregon, USA
| | - Mark Crovella
- Department of Computer Science and Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
27
|
Kilian M, Bischofs IB. Co-evolution at protein-protein interfaces guides inference of stoichiometry of oligomeric protein complexes by de novo structure prediction. Mol Microbiol 2023; 120:763-782. [PMID: 37777474 DOI: 10.1111/mmi.15169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023]
Abstract
The quaternary structure with specific stoichiometry is pivotal to the specific function of protein complexes. However, determining the structure of many protein complexes experimentally remains a major bottleneck. Structural bioinformatics approaches, such as the deep learning algorithm Alphafold2-multimer (AF2-multimer), leverage the co-evolution of amino acids and sequence-structure relationships for accurate de novo structure and contact prediction. Pseudo-likelihood maximization direct coupling analysis (plmDCA) has been used to detect co-evolving residue pairs by statistical modeling. Here, we provide evidence that combining both methods can be used for de novo prediction of the quaternary structure and stoichiometry of a protein complex. We achieve this by augmenting the existing AF2-multimer confidence metrics with an interpretable score to identify the complex with an optimal fraction of native contacts of co-evolving residue pairs at intermolecular interfaces. We use this strategy to predict the quaternary structure and non-trivial stoichiometries of Bacillus subtilis spore germination protein complexes with unknown structures. Co-evolution at intermolecular interfaces may therefore synergize with AI-based de novo quaternary structure prediction of structurally uncharacterized bacterial protein complexes.
Collapse
Affiliation(s)
- Max Kilian
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Ilka B Bischofs
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| |
Collapse
|
28
|
Hou Y, Xie T, He L, Tao L, Huang J. Topological links in predicted protein complex structures reveal limitations of AlphaFold. Commun Biol 2023; 6:1098. [PMID: 37898666 PMCID: PMC10613300 DOI: 10.1038/s42003-023-05489-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 10/19/2023] [Indexed: 10/30/2023] Open
Abstract
AlphaFold is making great progress in protein structure prediction, not only for single-chain proteins but also for multi-chain protein complexes. When using AlphaFold-Multimer to predict protein‒protein complexes, we observed some unusual structures in which chains are looped around each other to form topologically intertwining links at the interface. Based on physical principles, such topological links should generally not exist in native protein complex structures unless covalent modifications of residues are involved. Although it is well known and has been well studied that protein structures may have topologically complex shapes such as knots and links, existing methods are hampered by the chain closure problem and show poor performance in identifying topologically linked structures in protein‒protein complexes. Therefore, we address the chain closure problem by using sliding windows from a local perspective and propose an algorithm to measure the topological-geometric features that can be used to identify topologically linked structures. An application of the method to AlphaFold-Multimer-predicted protein complex structures finds that approximately 1.72% of the predicted structures contain topological links. The method presented in this work will facilitate the computational study of protein‒protein interactions and help further improve the structural prediction of multi-chain protein complexes.
Collapse
Affiliation(s)
- Yingnan Hou
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
| | - Tengyu Xie
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
| | - Liuqing He
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Center for Infectious Disease Research, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
| | - Liang Tao
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Center for Infectious Disease Research, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
| | - Jing Huang
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China.
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China.
| |
Collapse
|
29
|
Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023; 3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open
Abstract
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
Collapse
Affiliation(s)
- Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
| | - Mesih Kilinc
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| | - Robert L. Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, United States
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States
| |
Collapse
|
30
|
van Keulen SC, Bonvin AMJJ. Improving the quality of co-evolution intermolecular contact prediction with DisVis. Proteins 2023; 91:1407-1416. [PMID: 37237441 DOI: 10.1002/prot.26514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/28/2023]
Abstract
The steep rise in protein sequences and structures has paved the way for bioinformatics approaches to predict residue-residue interactions in protein complexes. Multiple sequence alignments are commonly used in contact predictions to identify co-evolving residues. These contacts, however, often include false positives (FPs), which may impair their use to predict three dimensional structures of biomolecular complexes and affect the accuracy of the generated models. Previously, we have developed DisVis to identify FP in mass spectrometry cross-linking data. DisVis allows to assess the accessible interaction space between two proteins consistent with a set of distance restraints. Here, we investigate if a similar approach could be applied to co-evolution predicted contacts in order to improve their precision prior to using them for modeling. We analyze co-evolution contact predictions with DisVis for a set of 26 protein-protein complexes. The DisVis-reranked and the original co-evolution contacts are then used to model the complexes with our integrative docking software HADDOCK using different filtering scenarios. Our results show that HADDOCK is robust with respect to the precision of the predicted contacts due to the 50% random contact removal during docking and can enhance the quality of docking predictions when combined with DisVis filtering for low precision contact data. DisVis can thus have a beneficial effect on low quality data, but overall HADDOCK can accommodate FP restraints without negatively impacting the quality of the resulting models. Other more precision-sensitive docking protocols might, however, benefit from the increased precision of the predicted contacts after DisVis filtering.
Collapse
Affiliation(s)
- Siri C van Keulen
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
31
|
Shome S, Jia K, Sivasankar S, Jernigan RL. Characterizing interactions in E-cadherin assemblages. Biophys J 2023; 122:3069-3077. [PMID: 37345249 PMCID: PMC10432173 DOI: 10.1016/j.bpj.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 09/26/2022] [Accepted: 06/14/2023] [Indexed: 06/23/2023] Open
Abstract
Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites. The sequence coevolutionary results yield a surprising result indicating that there are no strongly favored intermolecular interaction sites, which is unusual and suggests that many interaction sites may be possible, with none being strongly preferred over others. By using molecular dynamics, we test the persistence of these interactions and how they facilitate adhesion. We build several types of cadherin assemblages, with different numbers and combinations of cis and trans interfaces to understand how these conformations act to facilitate adhesion. Our results suggest that, in addition to the established interaction sites on the EC1 and EC2 domains, an additional plausible cis interface at the EC3-EC5 domain exists. Furthermore, we identify specific mutations at cis/trans binding sites that impair adhesion within E-cadherin assemblages.
Collapse
Affiliation(s)
- Sayane Shome
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Sanjeevi Sivasankar
- Department of Biomedical Engineering, University of California, Davis, Davis, California
| | - Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa.
| |
Collapse
|
32
|
Chen YH, Chao KH, Wong JY, Liu CF, Leu JY, Tsai HK. A feature extraction free approach for protein interactome inference from co-elution data. Brief Bioinform 2023; 24:bbad229. [PMID: 37328692 DOI: 10.1093/bib/bbad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 05/01/2023] [Accepted: 05/29/2023] [Indexed: 06/18/2023] Open
Abstract
Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.
Collapse
Affiliation(s)
- Yu-Hsin Chen
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Kuan-Hao Chao
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Jin Yung Wong
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chien-Fu Liu
- Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Huai-Kuang Tsai
- Bioinformatics Program, Taiwan International Graduate Program, National Taiwan University, Taipei 106, Taiwan
- Bioinformatics Program, Taiwan International Graduate Program, Academic Sinica, Taipei 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| |
Collapse
|
33
|
Lynch M. Mutation pressure, drift, and the pace of molecular coevolution. Proc Natl Acad Sci U S A 2023; 120:e2306741120. [PMID: 37364099 PMCID: PMC10319038 DOI: 10.1073/pnas.2306741120] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 05/09/2023] [Indexed: 06/28/2023] Open
Abstract
Most aspects of the molecular biology of cells involve tightly coordinated intermolecular interactions requiring specific recognition at the nucleotide and/or amino acid levels. This has led to long-standing interest in the degree to which constraints on interacting molecules result in conserved vs. accelerated rates of sequence evolution, with arguments commonly being made that molecular coevolution can proceed at rates exceeding the neutral expectation. Here, a fairly general model is introduced to evaluate the degree to which the rate of evolution at functionally interacting sites is influenced by effective population sizes (Ne), mutation rates, strength of selection, and the magnitude of recombination between sites. This theory is of particular relevance to matters associated with interactions between organelle- and nuclear-encoded proteins, as the two genomic environments often exhibit dramatic differences in the power of mutation and drift. Although genes within low Ne environments can drive the rate of evolution of partner genes experiencing higher Ne, rates exceeding the neutral expectation require that the former also have an elevated mutation rate. Testable predictions, some counterintuitive, are presented on how patterns of coevolutionary rates should depend on the relative intensities of drift, selection, and mutation.
Collapse
Affiliation(s)
- Michael Lynch
- Center for Mechanisms of Evolution, Biodesign Institute, Arizona State University, Tempe, AZ85287
| |
Collapse
|
34
|
Choi J. Narrow funnel-like interaction energy distribution is an indicator of specific protein interaction partner. iScience 2023; 26:106911. [PMID: 37305691 PMCID: PMC10250834 DOI: 10.1016/j.isci.2023.106911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 04/28/2023] [Accepted: 05/12/2023] [Indexed: 06/13/2023] Open
Abstract
Protein interaction networks underlie countless biological mechanisms. However, most protein interaction predictions are based on biological evidence that are biased to well-known protein interaction or physical evidence that exhibits low accuracy for weak interactions and requires high computational power. In this study, a novel method has been suggested to predict protein interaction partners by investigating narrow funnel-like interaction energy distribution. In this study, it was demonstrated that various protein interactions including kinases and E3 ubiquitin ligases have narrow funnel-like interaction energy distribution. To analyze protein interaction distribution, modified scores of iRMS and TM-score are introduced. Then, using these scores, algorithm and deep learning model for prediction of protein interaction partner and substrate of kinase and E3 ubiquitin ligase were developed. The prediction accuracy was similar to or even better than that of yeast two-hybrid screening. Ultimately, this knowledge-free protein interaction prediction method will broaden our understanding of protein interaction networks.
Collapse
Affiliation(s)
- Juyoung Choi
- Department of Life Science, Sogang University, Seoul 04017, South Korea
| |
Collapse
|
35
|
Cheng Y, Wang H, Xu H, Liu Y, Ma B, Chen X, Zeng X, Wang X, Wang B, Shiau C, Ovchinnikov S, Su XD, Wang C. Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat Chem Biol 2023; 19:548-555. [PMID: 36593274 DOI: 10.1038/s41589-022-01223-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 11/08/2022] [Indexed: 01/03/2023]
Abstract
Metal ions have various important biological roles in proteins, including structural maintenance, molecular recognition and catalysis. Previous methods of predicting metal-binding sites in proteomes were based on either sequence or structural motifs. Here we developed a co-evolution-based pipeline named 'MetalNet' to systematically predict metal-binding sites in proteomes. We applied MetalNet to proteomes of four representative prokaryotic species and predicted 4,849 potential metalloproteins, which substantially expands the currently annotated metalloproteomes. We biochemically and structurally validated previously unannotated metal-binding sites in several proteins, including apo-citrate lyase phosphoribosyl-dephospho-CoA transferase citX, an Escherichia coli enzyme lacking structural or sequence homology to any known metalloprotein (Protein Data Bank (PDB) codes: 7DCM and 7DCN ). MetalNet also successfully recapitulated all known zinc-binding sites from the human spliceosome complex. The pipeline of MetalNet provides a unique and enabling tool for interrogating the hidden metalloproteome and studying metal biology.
Collapse
Affiliation(s)
- Yao Cheng
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Haobo Wang
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Hua Xu
- State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
| | - Yuan Liu
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China.
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China.
| | - Bin Ma
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Xuemin Chen
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Xin Zeng
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Xianghe Wang
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
| | - Bo Wang
- State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
| | | | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellow, Harvard University, Cambridge, MA, USA
| | - Xiao-Dong Su
- State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China.
| | - Chu Wang
- Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China.
- Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China.
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China.
| |
Collapse
|
36
|
Li Y, Yu Q, Huang R, Chen H, Ren H, Ma L, He Y, Li W. SARS-CoV-2 SUD2 and Nsp5 Conspire to Boost Apoptosis of Respiratory Epithelial Cells via an Augmented Interaction with the G-Quadruplex of BclII. mBio 2023; 14:e0335922. [PMID: 36853058 PMCID: PMC10127692 DOI: 10.1128/mbio.03359-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/09/2023] [Indexed: 03/01/2023] Open
Abstract
The molecular mechanisms underlying how SUD2 recruits other proteins of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to exert its G-quadruplex (G4)-dependent pathogenic function is unknown. Herein, Nsp5 was singled out as a binding partner of the SUD2-N+M domains (SUD2core) with high affinity, through the surface located crossing these two domains. Biochemical and fluorescent assays demonstrated that this complex also formed in the nucleus of living host cells. Moreover, the SUD2core-Nsp5 complex displayed significantly enhanced selective binding affinity for the G4 structure in the BclII promoter than did SUD2core alone. This increased stability exhibited by the tertiary complex was rationalized by AlphaFold2 and molecular dynamics analysis. In line with these molecular interactions, downregulation of BclII and subsequent augmented apoptosis of respiratory cells were both observed. These results provide novel information and a new avenue to explore therapeutic strategies targeting SARS-CoV-2. IMPORTANCE SUD2, a unique protein domain closely related to the pathogenesis of SARS-CoV-2, has been reported to bind with the G-quadruplex (G4), a special noncanonical DNA structure endowed with important functions in regulating gene expression. However, the interacting partner of SUD2, among other SARS-CoV-2 Nsps, and the resulting functional consequences remain unknown. Here, a stable complex formed between SUD2 and Nsp5 was fully characterized both in vitro and in host cells. Moreover, this complex had a significantly enhanced binding affinity specifically targeting the Bcl2G4 in the promoter region of the antiapoptotic gene BclII, compared with SUD2 alone. In respiratory epithelial cells, the SUD2-Nsp5 complex promoted BclII-mediated apoptosis in a G4-dependent manner. These results reveal fresh information about matched multicomponent interactions, which can be parlayed to develop new therapeutics for future relevant viral disease.
Collapse
Affiliation(s)
- Ying Li
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Quanwei Yu
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Ridong Huang
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Hai Chen
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Hequan Ren
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Lingling Ma
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yang He
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| | - Weimin Li
- Department of Respiratory and Critical Care Medicine, Targeted Tracer Research and Development Laboratory, West China Hospital, Sichuan University, Chengdu, China
- Institute of Respiratory Health, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
- Precision Medicine Research Center, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
37
|
Krapp LF, Abriata LA, Cortés Rodriguez F, Dal Peraro M. PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces. Nat Commun 2023; 14:2175. [PMID: 37072397 PMCID: PMC10113261 DOI: 10.1038/s41467-023-37701-8] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 04/20/2023] Open
Abstract
Proteins are essential molecular building blocks of life, responsible for most biological functions as a result of their specific molecular interactions. However, predicting their binding interfaces remains a challenge. In this study, we present a geometric transformer that acts directly on atomic coordinates labeled only with element names. The resulting model-the Protein Structure Transformer, PeSTo-surpasses the current state of the art in predicting protein-protein interfaces and can also predict and differentiate between interfaces involving nucleic acids, lipids, ions, and small molecules with high confidence. Its low computational cost enables processing high volumes of structural data, such as molecular dynamics ensembles allowing for the discovery of interfaces that remain otherwise inconspicuous in static experimentally solved structures. Moreover, the growing foldome provided by de novo structural predictions can be easily analyzed, providing new opportunities to uncover unexplored biology.
Collapse
Affiliation(s)
- Lucien F Krapp
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Fabio Cortés Rodriguez
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, Ecole Fédérale de Lausanne (EPFL) and Swiss Institute of Bioinformatics (SIB), Lausanne, 1015, Switzerland.
| |
Collapse
|
38
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
39
|
O'Reilly FJ, Graziadei A, Forbrig C, Bremenkamp R, Charles K, Lenz S, Elfmann C, Fischer L, Stülke J, Rappsilber J. Protein complexes in cells by AI-assisted structural proteomics. Mol Syst Biol 2023; 19:e11544. [PMID: 36815589 PMCID: PMC10090944 DOI: 10.15252/msb.202311544] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate-based approach to systematically model novel protein assemblies. Here, we use a combination of in-cell crosslinking mass spectrometry and co-fractionation mass spectrometry (CoFrac-MS) to identify protein-protein interactions in the model Gram-positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold-Multimer and, after controlling for the false-positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein-protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.
Collapse
Affiliation(s)
- Francis J O'Reilly
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Present address:
Center for Structural Biology, Center for Cancer ResearchNational Cancer Institute (NCI)FrederickMDUSA
| | | | | | - Rica Bremenkamp
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | | | - Swantje Lenz
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Christoph Elfmann
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Lutz Fischer
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Jörg Stülke
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Juri Rappsilber
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Wellcome Centre for Cell BiologyUniversity of EdinburghEdinburghUK
| |
Collapse
|
40
|
Gandarilla-Pérez CA, Pinilla S, Bitbol AF, Weigt M. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins. PLoS Comput Biol 2023; 19:e1011010. [PMID: 36996234 PMCID: PMC10089317 DOI: 10.1371/journal.pcbi.1011010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/11/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023] Open
Abstract
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana, Cuba
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| | - Sergio Pinilla
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (UMR 8237), Paris, France
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| |
Collapse
|
41
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
42
|
Kleeorin Y, Russ WP, Rivoire O, Ranganathan R. Undersampling and the inference of coevolution in proteins. Cell Syst 2023; 14:210-219.e7. [PMID: 36693377 PMCID: PMC10911952 DOI: 10.1016/j.cels.2022.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 01/02/2022] [Accepted: 12/23/2022] [Indexed: 01/24/2023]
Abstract
Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted. Here, we show that as currently implemented, this inference unequally represents epistatic interactions, a problem that fundamentally arises from limited sampling of sequences in the context of distinct scales at which epistasis occurs in proteins. We show that these issues explain the ability of current approaches to predict tertiary contacts between amino acids and the inability to obviously expose larger networks of functionally relevant, collectively evolving residues called sectors. This work provides a necessary foundation for more deeply understanding and improving evolution-based models of proteins.
Collapse
Affiliation(s)
- Yaakov Kleeorin
- Center for Physics of Evolving Systems, Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | - William P Russ
- Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Olivier Rivoire
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, 75005 Paris, France.
| | - Rama Ranganathan
- Center for Physics of Evolving Systems, Department of Biochemistry & Molecular Biology, University of Chicago, Chicago, IL 60637, USA; The Pritzker School for Molecular Engineering, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
43
|
Karamanos TK. Chasing long-range evolutionary couplings in the AlphaFold era. Biopolymers 2023; 114:e23530. [PMID: 36752285 PMCID: PMC10909459 DOI: 10.1002/bip.23530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/26/2023] [Accepted: 01/27/2023] [Indexed: 02/09/2023]
Abstract
Coevolution between protein residues is normally interpreted as direct contact. However, the evolutionary record of a protein sequence contains rich information that may include long-range functional couplings, couplings that report on homo-oligomeric states or even conformational changes. Due to the complexity of the sequence space and the lack of structural information on various members of a protein family, it has been difficult to effectively mine the additional information encoded in a multiple sequence alignment (MSA). Here, taking advantage of the recent release of the AlphaFold (AF) database we attempt to identify coevolutionary couplings that cannot be explained simply by spatial proximity. We propose a simple computational method that performs direct coupling analysis on a MSA and searches for couplings that are not satisfied in any of the AF models of members of the identified protein family. Application of this method on 2012 protein families suggests that ~12% of the total identified coevolving residue pairs are spatially distant and more likely to be disordered than their contacting counterparts. We expect that this analysis will help improve the quality of coevolutionary distance restraints used for structure determination and will be useful in identifying potentially functional/allosteric cross-talk between distant residues.
Collapse
|
44
|
Sarmah DT, Parveen R, Kundu J, Chatterjee S. Latent tuberculosis and computational biology: A less-talked affair. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2023; 178:17-31. [PMID: 36781150 DOI: 10.1016/j.pbiomolbio.2023.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/13/2023]
Abstract
Tuberculosis (TB) is a pervasive and devastating air-borne disease caused by the organisms belonging to the Mycobacterium tuberculosis (Mtb) complex. Currently, it is the global leader in infectious disease-related death in adults. The proclivity of TB to enter the latent state has become a significant impediment to the global effort to eradicate TB. Despite decades of research, latent tuberculosis (LTB) mechanisms remain poorly understood, making it difficult to develop efficient treatment methods. In this review, we seek to shed light on the current understanding of the mechanism of LTB, with an accentuation on the insights gained through computational biology. We have outlined various well-established computational biology components, such as omics, network-based techniques, mathematical modelling, artificial intelligence, and molecular docking, to disclose the crucial facets of LTB. Additionally, we highlighted important tools and software that may be used to conduct a variety of systems biology assessments. Finally, we conclude the article by addressing the possible future directions in this field, which might help a better understanding of LTB progression.
Collapse
Affiliation(s)
- Dipanka Tanu Sarmah
- Complex Analysis Group, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, 121001, India
| | - Rubi Parveen
- Complex Analysis Group, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, 121001, India
| | - Jayendrajyoti Kundu
- Complex Analysis Group, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, 121001, India
| | - Samrat Chatterjee
- Complex Analysis Group, Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, 121001, India.
| |
Collapse
|
45
|
Sgarbossa D, Lupo U, Bitbol AF. Generative power of a protein language model trained on multiple sequence alignments. eLife 2023; 12:e79854. [PMID: 36734516 PMCID: PMC10038667 DOI: 10.7554/elife.79854] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Open
Abstract
Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences score as well as natural sequences, for homology, coevolution, and structure-based measures. For large protein families, our synthetic sequences have similar or better properties compared to sequences generated by Potts models, including experimentally validated ones. Moreover, for small protein families, our generation method based on MSA Transformer outperforms Potts models. Our method also more accurately reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
Collapse
Affiliation(s)
- Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
46
|
Towards a structurally resolved human protein interaction network. Nat Struct Mol Biol 2023; 30:216-225. [PMID: 36690744 PMCID: PMC9935395 DOI: 10.1038/s41594-022-00910-8] [Citation(s) in RCA: 118] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 12/14/2022] [Indexed: 01/25/2023]
Abstract
Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.
Collapse
|
47
|
Dietler N, Lupo U, Bitbol AF. Impact of phylogeny on structural contact inference from protein sequence data. J R Soc Interface 2023; 20:20220707. [PMID: 36751926 PMCID: PMC9905998 DOI: 10.1098/rsif.2022.0707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 01/09/2023] [Indexed: 02/09/2023] Open
Abstract
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
48
|
Tresnak DT, Hackel BJ. Deep Antimicrobial Activity and Stability Analysis Inform Lysin Sequence-Function Mapping. ACS Synth Biol 2023; 12:249-264. [PMID: 36599162 PMCID: PMC10822705 DOI: 10.1021/acssynbio.2c00509] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Antibiotic-resistant infectious disease is a critical challenge to human health. Antimicrobial proteins offer a compelling solution if engineered for potency, selectivity, and physiological stability. Lysins, which lyse cells via degradation of cell wall peptidoglycans, have significant potential to fill this role. Yet, the functional complexity of antimicrobial activity has hindered high-throughput characterization for discovery and design. To dramatically expand knowledge of the sequence-function landscape of lysins, we developed a depletion-based assay for library-scale measurement of lysin inhibitory activity. We coupled this platform with a high-throughput proteolytic stability assay to assess the activity and stability of ∼5 × 104 lysin catalytic domain variants, resulting in the discovery of a variant with increased activity (70 ± 20%) and stability (7.2 ± 0.4 °C increased midpoint of thermal denaturation). Ridge regression of the resulting data set demonstrated that libraries with a higher average Hamming distance better informed pairwise models and that coupling activity and stability assays enabled better prediction of catalytically active lysins. The best models achieved Pearson's correlation coefficients of 0.87 ± 0.01 and 0.61 ± 0.04 for predicting catalytic domain stability and activity, respectively. Our work provides an efficient strategy for constructing protein sequence-function landscapes, drastically increases screening throughput for engineering lysins, and yields promising lysins for further development.
Collapse
Affiliation(s)
- Daniel T Tresnak
- Department of Chemical Engineering and Materials Science, University of Minnesota─Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota55455, United States
| | - Benjamin J Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota─Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota55455, United States
| |
Collapse
|
49
|
Kim D, Ha D, Lee K, Lee H, Kim I, Kim S. An evolution-based machine learning to identify cancer type-specific driver mutations. Brief Bioinform 2023; 24:6961611. [PMID: 36575568 DOI: 10.1093/bib/bbac593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 11/18/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
Identifying cancer type-specific driver mutations is crucial for illuminating distinct pathologic mechanisms across various tumors and providing opportunities of patient-specific treatment. However, although many computational methods were developed to predict driver mutations in a type-specific manner, the methods still have room to improve. Here, we devise a novel feature based on sequence co-evolution analysis to identify cancer type-specific driver mutations and construct a machine learning (ML) model with state-of-the-art performance. Specifically, relying on 28 000 tumor samples across 66 cancer types, our ML framework outperformed current leading methods of detecting cancer driver mutations. Interestingly, the cancer mutations identified by sequence co-evolution feature are frequently observed in interfaces mediating tissue-specific protein-protein interactions that are known to associate with shaping tissue-specific oncogenesis. Moreover, we provide pre-calculated potential oncogenicity on available human proteins with prediction scores of all possible residue alterations through user-friendly website (http://sbi.postech.ac.kr/w/cancerCE). This work will facilitate the identification of cancer type-specific driver mutations in newly sequenced tumor samples.
Collapse
Affiliation(s)
| | | | | | | | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Sanguk Kim
- Department of Life Sciences.,Artificial Intelligence Graduate Program, Pohang University of Science and Technology, Pohang 790-784, South Korea.,Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 120-149, South Korea
| |
Collapse
|
50
|
Launay R, Teppa E, Esque J, André I. Modeling Protein Complexes and Molecular Assemblies Using Computational Methods. Methods Mol Biol 2023; 2553:57-77. [PMID: 36227539 DOI: 10.1007/978-1-0716-2617-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many biological molecules are assembled into supramolecular complexes that are necessary to perform functions in the cell. Better understanding and characterization of these molecular assemblies are thus essential to further elucidate molecular mechanisms and key protein-protein interactions that could be targeted to modulate the protein binding affinity or develop new binders. Experimental access to structural information on these supramolecular assemblies is often hampered by the size of these systems that make their recombinant production and characterization rather difficult. Computational methods combining both structural data, molecular modeling techniques, and sequence coevolution information can thus offer a good alternative to gain access to the structural organization of protein complexes and assemblies. Herein, we present some computational methods to predict structural models of the protein partners, to search for interacting regions using coevolution information, and to build molecular assemblies. The approach is exemplified using a case study to model the succinate-quinone oxidoreductase heterocomplex.
Collapse
Affiliation(s)
- Romain Launay
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Elin Teppa
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Jérémy Esque
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| | - Isabelle André
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| |
Collapse
|