1
|
Deng P, Zhang Y, Xu L, Lyu J, Li L, Sun F, Zhang WB, Gao H. Computational discovery and systematic analysis of protein entangling motifs in nature: from algorithm to database. Chem Sci 2025; 16:8998-9009. [PMID: 40271025 PMCID: PMC12013726 DOI: 10.1039/d4sc08649j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 03/29/2025] [Indexed: 04/25/2025] Open
Abstract
Nontrivial protein topology has the potential to revolutionize protein engineering by enabling the manipulation of proteins' stability and dynamics. However, the rarity of topological proteins in nature poses a challenge for their design, synthesis and application, primarily due to the limited number of available entangling motifs as synthetic templates. Discovering these motifs is particularly difficult, as entanglement is a subtle structural feature that is not readily discernible from protein sequences. In this study, we developed a streamlined workflow enabling efficient and accurate identification of structurally reliable and applicable entangling motifs from protein sequences. Through this workflow, we automatically curated a database of 1115 entangling protein motifs from over 100 thousand sequences in the UniProt Knowledgebase. In our database, 73.3% of C2 entangling motifs and 80.1% of C3 entangling motifs exhibited low structural similarity to known protein structures. The entangled structures in the database were categorized into different groups and their functional and biological significance were analyzed. The results were summarized in an online database accessible through a user-friendly web platform, providing researchers with an expanded toolbox of entangling motifs. This resource is poised to significantly advance the field of protein topology engineering and inspire new research directions in protein design and application.
Collapse
Affiliation(s)
- Puqing Deng
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Yuxuan Zhang
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Lianjie Xu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
| | - Jinyu Lyu
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Linyan Li
- Department of Data Science, City University of Hong Kong Kowloon Hong Kong
| | - Fei Sun
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| | - Wen-Bin Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University Beijing 100871 P. R. China
- AI for Science (AI4S)-Preferred Program, Shenzhen Graduate School, Peking University Shenzhen 518055 P. R. China
| | - Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology Clear Water Bay Hong Kong
| |
Collapse
|
2
|
da Silva FB, Perlinska AP, Płonka J, Flapan E, Sulkowska JI. Universe of Lasso Proteins: Exploring the limit of entanglement of protein predicted by AlphaFold. J Mol Biol 2025:169217. [PMID: 40398674 DOI: 10.1016/j.jmb.2025.169217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2025] [Revised: 05/13/2025] [Accepted: 05/13/2025] [Indexed: 05/23/2025]
Abstract
Knots and lasso topology represent a class of natural motifs found in proteins that are characterized by a threaded structure. Proteins with a lasso motif represent a macroscopic version of the peptide lasso, which is known for its high stability and offers tremendous potential for the development of novel therapeutics. Here, based on AlphaFold, we have shown the limit of topological complexity of naturally occurring protein structures with cysteine bridges. Based on 176 million high confidence (pLDDT > 70) AlphaFold-predicted protein models and a detailed analysis of the conservation of the motif in a family, we found four new lasso motifs, including L4 and LS4LS3 topologies, and the first examples of knotted lasso proteins: L1K31 and L3#K31. We show that in the case of natural proteins, there are no lassos with 5 threadings, but there exist some with 6. Families possessing proteins with more than 6 threadings did not exceed the conservation threshold of 10%. Moreover, we propose a probable folding mechanism for the LS4LS3 lasso motif, enhancing our view on protein folding and stability. This work expands the topological space of lasso type motifs in proteins but also suggests that more complex structures could be unfavorable for proteins.
Collapse
Affiliation(s)
| | - Agata P Perlinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Jacek Płonka
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland; Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | | | - Joanna I Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland.
| |
Collapse
|
3
|
Deng P, Xu L, Wei Y, Sun F, Li L, Zhang WB, Gao H. Deep Learning-Assisted Discovery of Protein Entangling Motifs. Biomacromolecules 2025; 26:1520-1529. [PMID: 39937127 DOI: 10.1021/acs.biomac.4c01243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2025]
Abstract
Natural topological proteins exhibit unique properties including enhanced stability, controlled quaternary structures, and dynamic switching properties, highlighting topology as a unique dimension in protein engineering. Although artificial design and synthesis of topological proteins have achieved certain success, their diversity and complexity remain rather limited due to the scarcity of available entangling motifs essential for the construction of nontrivial protein topologies. In this work, we developed a deep-learning model to predict the entanglement features of a homodimer based solely on its amino acid sequence via the Gauss linking number matrices. The model achieved a search speed that was dozens of times faster than AlphaFold-Multimer, while maintaining comparable mean squared error. It was used to screen for entangling motifs from the genome of a hyperthermophilic archaeon. We demonstrated the effectiveness of our model by successful wet-lab synthesis of protein catenanes using two candidate entangling motifs. These findings show the great potential of our model for advancing the design and synthesis of novel topological proteins.
Collapse
Affiliation(s)
- Puqing Deng
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay 999077, Hong Kong
| | - Lianjie Xu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China
| | - Ying Wei
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, P. R. China
| | - Fei Sun
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay 999077, Hong Kong
| | - Linyan Li
- Department of Data Science, City University of Hong Kong, Kowloon 999077, Hong Kong
| | - Wen-Bin Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China
- AI for Science (AI4S)-Preferred Program, Shenzhen Graduate School, Peking University, Shenzhen 518055, P. R. China
| | - Hanyu Gao
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Clear Water Bay 999077, Hong Kong
| |
Collapse
|
4
|
Røgen P. Sequence-Similar Protein Domain Pairs With Structural or Topological Dissimilarity. Proteins 2025; 93:588-597. [PMID: 39392124 PMCID: PMC11809131 DOI: 10.1002/prot.26753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 08/14/2024] [Accepted: 09/26/2024] [Indexed: 10/12/2024]
Abstract
For a variety of applications, protein structures are clustered by sequence similarity, and sequence-redundant structures are disregarded. Sequence-similar chains are likely to have similar structures, but significant structural variation, as measured with RMSD, has been documented for sequence-similar chains and found usually to have a functional explanation. Moving two neighboring stretches of backbone through each other may change the chain topology and alter possible folding paths. The size of this motion is compatible to a variation in a flexible loop. We search and find domains with alternate chain topology in CATH4.2 sequence families relatively independent of sequence identity and of structural similarity as measured by RMSD. Structural, topological, and functional representative sets should therefore keep sequence-similar domains not just with structural variation but also with topological variation. We present BCAlign that finds Alignment and superposition of protein Backbone Curves by optimizing a user chosen convex combination of structural derivation and derivation between the structure-based sequence alignment and an input sequence alignment. Steric and topological obstructions from deforming a curve into an aligned curve are then found by a previously developed algorithm. For highly sequence-similar domains, sequence-based structural alignment better represents the chains motion and generally reveals larger structural and topological variation than structure-based does. Fold-switching protein pairs have been reported to be most frequent between X-ray and NMR structures and estimated to be underrepresented in the PDB as the alternate configuration is harder to resolve. Here we similarly find chain topology most frequently altered between X-ray and NMR structures.
Collapse
Affiliation(s)
- Peter Røgen
- Department of Applied Mathematics and Computer ScienceTechnical University of DenmarkKongens LyngbyDenmark
| |
Collapse
|
5
|
Liu Y, Rao S, Hoskins I, Geng M, Zhao Q, Chacko J, Ghatpande V, Qi K, Persyn L, Wang J, Zheng D, Zhong Y, Park D, Cenik ES, Agarwal V, Ozadam H, Cenik C. Translation efficiency covariation across cell types is a conserved organizing principle of mammalian transcriptomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.11.607360. [PMID: 39149359 PMCID: PMC11326257 DOI: 10.1101/2024.08.11.607360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Characterization of shared patterns of RNA expression between genes across conditions has led to the discovery of regulatory networks and novel biological functions. However, it is unclear if such coordination extends to translation, a critical step in gene expression. Here, we uniformly analyzed 3,819 ribosome profiling datasets from 117 human and 94 mouse tissues and cell lines. We introduce the concept of Translation Efficiency Covariation (TEC), identifying coordinated translation patterns across cell types. We nominate potential mechanisms driving shared patterns of translation regulation. TEC is conserved across human and mouse cells and helps uncover gene functions. Moreover, our observations indicate that proteins that physically interact are highly enriched for positive covariation at both translational and transcriptional levels. Our findings establish translational covariation as a conserved organizing principle of mammalian transcriptomes.
Collapse
Affiliation(s)
- Yue Liu
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Michael Geng
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Qiuxia Zhao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Jonathan Chacko
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Vighnesh Ghatpande
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Kangsheng Qi
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Logan Persyn
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Jun Wang
- mRNA Center of Excellence, Sanofi, Waltham, MA 02451, USA
| | - Dinghai Zheng
- mRNA Center of Excellence, Sanofi, Waltham, MA 02451, USA
| | - Yochen Zhong
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Dayea Park
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Elif Sarinay Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Vikram Agarwal
- mRNA Center of Excellence, Sanofi, Waltham, MA 02451, USA
| | - Hakan Ozadam
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
- Present address: Sail Biomedicines, Cambridge, MA, 02141, USA
| | | |
Collapse
|
6
|
Lau AM, Bordin N, Kandathil SM, Sillitoe I, Waman VP, Wells J, Orengo CA, Jones DT. Exploring structural diversity across the protein universe with The Encyclopedia of Domains. Science 2024; 386:eadq4946. [PMID: 39480926 DOI: 10.1126/science.adq4946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/30/2024] [Indexed: 11/02/2024]
Abstract
The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly expanding representation of their domain space. We uncovered more than 10,000 new structural interactions between superfamilies and thousands of new folds across the fold space continuum.
Collapse
Affiliation(s)
- Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Jude Wells
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
7
|
Csikász-Nagy A, Fichó E, Noto S, Reguly I. Computational tools to predict context-specific protein complexes. Curr Opin Struct Biol 2024; 88:102883. [PMID: 38986166 DOI: 10.1016/j.sbi.2024.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/12/2024]
Abstract
Interactions between thousands of proteins define cells' protein-protein interaction (PPI) network. Some of these interactions lead to the formation of protein complexes. It is challenging to identify a protein complex in a haystack of protein-protein interactions, and it is even more difficult to predict all protein complexes of the complexome. Simulations and machine learning approaches try to crack these problems by looking at the PPI network or predicted protein structures. Clustering of PPI networks led to the first protein complex predictions, while most recently, atomistic models of protein complexes and deep-learning-based structure prediction methods have also emerged. The simulation of PPI level interactions even enables the quantitative prediction of protein complexes. These methods, the required data sources, and their potential future developments are discussed in this review.
Collapse
Affiliation(s)
- Attila Csikász-Nagy
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
| | | | - Santiago Noto
- Cytocast Hungary Kft, Budapest, Hungary; Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - István Reguly
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| |
Collapse
|
8
|
Sikora M, Klimentova E, Uchal D, Sramkova D, Perlinska AP, Nguyen ML, Korpacz M, Malinowska R, Nowakowski S, Rubach P, Simecek P, Sulkowska JI. Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model. Protein Sci 2024; 33:e4998. [PMID: 38888487 PMCID: PMC11184937 DOI: 10.1002/pro.4998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/14/2024] [Accepted: 04/09/2024] [Indexed: 06/20/2024]
Abstract
Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
Collapse
Affiliation(s)
- Maciej Sikora
- Centre of New Technologies, University of WarsawWarsawPoland
- Faculty of Mathematics, Informatics and Mechanics, University of WarsawWarsawPoland
| | - Eva Klimentova
- Central European Institute of Technology, Masaryk UniversityBrnoCzech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk UniversityBrnoCzech Republic
| | - Dawid Uchal
- Centre of New Technologies, University of WarsawWarsawPoland
- Faculty of Physics, University of WarsawWarsawPoland
| | - Denisa Sramkova
- Central European Institute of Technology, Masaryk UniversityBrnoCzech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk UniversityBrnoCzech Republic
| | | | - Mai Lan Nguyen
- Centre of New Technologies, University of WarsawWarsawPoland
| | - Marta Korpacz
- Centre of New Technologies, University of WarsawWarsawPoland
- Faculty of Mathematics, Informatics and Mechanics, University of WarsawWarsawPoland
| | - Roksana Malinowska
- Centre of New Technologies, University of WarsawWarsawPoland
- Faculty of Mathematics, Informatics and Mechanics, University of WarsawWarsawPoland
| | - Szymon Nowakowski
- Faculty of Mathematics, Informatics and Mechanics, University of WarsawWarsawPoland
- Faculty of Physics, University of WarsawWarsawPoland
| | - Pawel Rubach
- Centre of New Technologies, University of WarsawWarsawPoland
- Warsaw School of EconomicsWarsawPoland
| | - Petr Simecek
- Central European Institute of Technology, Masaryk UniversityBrnoCzech Republic
| | | |
Collapse
|
9
|
Xie T, Huang J. Can Protein Structure Prediction Methods Capture Alternative Conformations of Membrane Transporters? J Chem Inf Model 2024; 64:3524-3536. [PMID: 38564295 DOI: 10.1021/acs.jcim.3c01936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Understanding the conformational dynamics of proteins, such as the inward-facing (IF) and outward-facing (OF) transition observed in transporters, is vital for elucidating their functional mechanisms. Despite significant advances in protein structure prediction (PSP) over the past three decades, most efforts have been focused on single-state prediction, leaving multistate or alternative conformation prediction (ACP) relatively unexplored. This discrepancy has led to the development of highly accurate PSP methods such as AlphaFold, yet their capabilities for ACP remain limited. To investigate the performance of current PSP methods in ACP, we curated a data set, named IOMemP, consisting of 32 experimentally determined high-resolution IF and OF structures of 16 membrane proteins with substantial conformational changes. We benchmarked 12 representative PSP methods, along with two recent multistate methods based on AlphaFold, against this data set. Our findings reveal a remarkably consistent preference for specific states across various PSP methods. We elucidated how coevolution information in MSAs influences state preference. Moreover, we showed that AlphaFold, when excluding coevolution information, estimated similar energies between the experimental IF and OF conformations, indicating that the energy model learned by AlphaFold is not biased toward any particular state. Our IOMemP data set and benchmark results are anticipated to advance the development of robust ACP methods.
Collapse
Affiliation(s)
- Tengyu Xie
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| | - Jing Huang
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| |
Collapse
|