1
|
Sun Q, Wang H, Xie J, Wang L, Mu J, Li J, Ren Y, Lai L. Computer-Aided Drug Discovery for Undruggable Targets. Chem Rev 2025. [PMID: 40423592 DOI: 10.1021/acs.chemrev.4c00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2025]
Abstract
Undruggable targets are those of therapeutical significance but challenging for conventional drug design approaches. Such targets often exhibit unique features, including highly dynamic structures, a lack of well-defined ligand-binding pockets, the presence of highly conserved active sites, and functional modulation by protein-protein interactions. Recent advances in computational simulations and artificial intelligence have revolutionized the drug design landscape, giving rise to innovative strategies for overcoming these obstacles. In this review, we highlight the latest progress in computational approaches for drug design against undruggable targets, present several successful case studies, and discuss remaining challenges and future directions. Special emphasis is placed on four primary target categories: intrinsically disordered proteins, protein allosteric regulation, protein-protein interactions, and protein degradation, along with discussion of emerging target types. We also examine how AI-driven methodologies have transformed the field, from applications in protein-ligand complex structure prediction and virtual screening to de novo ligand generation for undruggable targets. Integration of computational methods with experimental techniques is expected to bring further breakthroughs to overcome the hurdles of undruggable targets. As the field continues to evolve, these advancements hold great promise to expand the druggable space, offering new therapeutic opportunities for previously untreatable diseases.
Collapse
Affiliation(s)
- Qi Sun
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
| | - Hanping Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Liying Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Junxi Mu
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yuhao Ren
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
2
|
Tolmachev V, Papalanis E, Bezverkhniaia EA, Rosly AH, Vorobyeva A, Orlova A, Carlqvist M, Frejd FY, Oroujeni M. Impact of Radiometal Chelates on In Vivo Visualization of Immune Checkpoint Protein Using Radiolabeled Affibody Molecules. ACS Pharmacol Transl Sci 2025; 8:706-717. [PMID: 40109742 PMCID: PMC11915182 DOI: 10.1021/acsptsci.4c00539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/05/2025] [Accepted: 02/10/2025] [Indexed: 03/22/2025]
Abstract
The immune checkpoint protein B7-H3 (CD276) is overexpressed in various cancers and is an attractive target for the treatment of malignant tumors. Radionuclide molecular imaging of B7-H3 expression using engineered scaffold proteins such as Affibody molecules is a promising strategy for the selection of potential responders to B7-H3-targeted therapy. Feasibility of B7-H3 imaging was demonstrated using two 99mTc-labeled probes, AC12 and an affinity-matured SYNT179 using a [99mTc]Tc-GGGC label. This study aimed to evaluate whether the use of a residualizing 111In-based label provides better imaging contrast compared with a nonresidualizing label. To do that, SYNT179 and AC12-GGGC Affibody molecules were labeled with 111In using (4,10-bis-carboxymethyl-7-{[2-(2,5-dioxo-3-thioxo-pyrrolidin-1-yl)-ethylcarbamoyl]-methyl}-1,4,7,10-tetraaza-cyclododec-1-yl)-acetic acid (maleimide-DOTA) chelator, site-specifically coupled to the C-terminus of Affibody molecules. The binding affinities of the 111In-labeled conjugates to B7-H3-expressing living cells were higher compared with the affinities of the 99mTc-labeled variants. In mice with B7-H3-expressing xenografts, the tumor uptake of 111In-labeled proteins (3.6 ± 0.3 and 1.8 ± 0.5%ID/g for [111In]In-SYNT179-DOTA and [111In]In-AC12-DOTA, respectively) was significantly (p < 0.05, ANOVA) higher than those for 99mTc-labeled counterparts (1.6 ± 0.2%ID/g and 0.8 ± 0.2%ID/g for [99mTc]Tc-SYNT179 and [99mTc]Tc-AC12-GGGC, respectively). The best variant, [111In]In-SYNT179-DOTA, provided a tumor-to-blood ratio of 31.1 ± 2.9, which was twice higher than that for [99mTc]Tc-SYNT179 and 7-fold higher than that for [99mTc]Tc-AC12-GGGC. Both 111In-labeled Affibody molecules had higher renal retention compared with 99mTc-labeled ones, but the hepatobiliary excretion of 111In-labeled proteins was appreciably lower, potentially improving the imaging of abdominal metastases. Overall, [111In]In-SYNT179-DOTA is the most promising tracer for visualization of B7-H3 expression.
Collapse
Affiliation(s)
- Vladimir Tolmachev
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| | - Eleftherios Papalanis
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| | | | - Alia Hani Rosly
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| | - Anzhelika Vorobyeva
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| | - Anna Orlova
- Department of Medicinal Chemistry, Uppsala University, 751 83 Uppsala, Sweden
| | | | - Fredrik Y Frejd
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
- Affibody AB, 171 65 Solna, Sweden
| | - Maryam Oroujeni
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| |
Collapse
|
3
|
Zhong J, Zou Z, Qiu J, Wang S. ScFold: a GNN-based model for efficient inverse folding of short-chain proteins via spatial reduction. Brief Bioinform 2025; 26:bbaf156. [PMID: 40205854 PMCID: PMC11982017 DOI: 10.1093/bib/bbaf156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 02/24/2025] [Accepted: 03/19/2025] [Indexed: 04/11/2025] Open
Abstract
In the realm of protein design, the efficient construction of protein sequences that accurately fold into predefined structures has become an important area of research. Although advancements have been made in the study of long-chain proteins, the design of short-chain proteins requires equal consideration. The structural information inherent in short and single chains is typically less comprehensive than that of full-length chains, which can negatively impact their performance. To address this challenge, we introduce ScFold, a novel model that incorporates an innovative node module. This module utilizes spatial dimensionality reduction and positional encoding mechanisms to enhance the extraction of structural features. Experimental results indicate that ScFold achieves a recovery rate of 52.22$\%$ on the CATH4.2 dataset, demonstrating notable efficacy for short-chain proteins, with a recovery rate of 41.6$\%$. Additionally, ScFold further exhibits enhanced recovery rates of 59.32$\%$ and 61.59$\%$ on the TS50 and TS500 datasets, respectively, demonstrating its effectiveness across diverse protein types. Additionally, we performed protein length stratification on the TS500 and CATH4.2 datasets and tested ScFold on length-specific sub-datasets. The results confirm the model's superiority in handling short-chain proteins. Finally, we selected several protein sequence groups from the CATH4.2 dataset for structural visualization analysis and provided comparisons between the model-generated sequences and the target sequences.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Zhiwei Zou
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Jie Qiu
- College of Information Science and Engineering, Hunan Normal University, 36 Lushan Road, Yuelu District, Changsha 410081, Hunan, China
| | - Shaokai Wang
- Department of Mathematics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| |
Collapse
|
4
|
Patat AS, Nalbantoğlu ÖU. Enhancing Functional Protein Design Using Heuristic Optimization and Deep Learning for Anti-Inflammatory and Gene Therapy Applications. Proteins 2025. [PMID: 39985803 DOI: 10.1002/prot.26810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/21/2025] [Accepted: 02/03/2025] [Indexed: 02/24/2025]
Abstract
Protein sequence design is a highly challenging task, aimed at discovering new proteins that are more functional and producible under laboratory conditions than their natural counterparts. Deep learning-based approaches developed to address this problem have achieved significant success. However, these approaches often do not adequately emphasize the functional properties of proteins. In this study, we developed a heuristic optimization method to enhance key functionalities such as solubility, flexibility, and stability, while preserving the structural integrity of proteins. This method aims to reduce laboratory demands by enabling a design that is both functional and structurally sound. This approach is particularly valuable for the synthetic production of proteins with anti-inflammatory properties and those used in gene therapy. The designed proteins were initially evaluated for their ability to preserve natural structures using recovery and confidence metrics, followed by assessments with the AlphaFold tool. Additionally, natural protein sequences were mutated using a genetic algorithm and compared with those designed by our method. The results demonstrate that the protein sequences generated by our method exhibit much greater similarity to native protein sequences and structures. The code and sequences for the designed proteins are available at https://github.com/aysenursoyturk/HMHO.
Collapse
Affiliation(s)
- Ayşenur Soytürk Patat
- Department of Bioinformatics Systems Biology, Erciyes University, Kayseri, Turkey
- Department of Bioinformatics, Necmettin Erbakan University, Konya, Turkey
| | | |
Collapse
|
5
|
Chen Z, Ji M, Qian J, Zhang Z, Zhang X, Gao H, Wang H, Wang R, Qi Y. ProBID-Net: a deep learning model for protein-protein binding interface design. Chem Sci 2024; 15:19977-19990. [PMID: 39568891 PMCID: PMC11575592 DOI: 10.1039/d4sc02233e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 10/11/2024] [Indexed: 11/22/2024] Open
Abstract
Protein-protein interactions are pivotal in numerous biological processes. The computational design of these interactions facilitates the creation of novel binding proteins, crucial for advancing biopharmaceutical products. With the evolution of artificial intelligence (AI), protein design tools have swiftly transitioned from scoring-function-based to AI-based models. However, many AI models for protein design are constrained by assuming complete unfamiliarity with the amino acid sequence of the input protein, a feature most suited for de novo design but posing challenges in designing protein-protein interactions when the receptor sequence is known. To bridge this gap in computational protein design, we introduce ProBID-Net. Trained using natural protein-protein complex structures and protein domain-domain interface structures, ProBID-Net can discern features from known target protein structures to design specific binding proteins based on their binding sites. In independent tests, ProBID-Net achieved interface sequence recovery rates of 52.7%, 43.9%, and 37.6%, surpassing or being on par with ProteinMPNN in binding protein design. Validated using AlphaFold-Multimer, the sequences designed by ProBID-Net demonstrated a close correspondence between the design target and the predicted structure. Moreover, the model's output can predict changes in binding affinity upon mutations in protein complexes, even in scenarios where no data on such mutations were provided during training (zero-shot prediction). In summary, the ProBID-Net model is poised to significantly advance the design of protein-protein interactions.
Collapse
Affiliation(s)
- Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Menglin Ji
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Jie Qian
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University 826 Zhangheng Road Shanghai 201203 People's Republic of China
| |
Collapse
|
6
|
Zhang J, Basu S, Zhang F, Kurgan L. MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning. J Mol Biol 2024:168872. [PMID: 40133785 DOI: 10.1016/j.jmb.2024.168872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 03/27/2025]
Abstract
Multi-ligand binding residues (MLBRs) are amino acids in protein sequences that interact with multiple different ligands that include proteins, peptides, nucleic acids, and a variety of small molecules. MLBRs are implicated in a number of cellular functions and targeted in a context of multiple human diseases. There are many sequence-based predictors of residues that interact with specific ligand types and they can be collectively used to identify MLBRs. However, there are no methods that directly predict MLBRs. To this end, we conceptualize, design, evaluate and release MERIT (Multi-binding rEsidues pRedIcTor). This tool relies on a custom-crafted deep neural network that implements a number of innovative features, such as a multi-layered/step architecture with transformer modules that we train using a custom-designed loss function, computation of evolutionary couplings, and application of transfer learning. These innovations boost predictive performance, which we demonstrate using an ablation analysis. In particular, they reduce the number of cross-predictions, defined as residues that interact with a single ligand type that are incorrectly predicted as MLBRs. We compare MERIT against a representative selection of current and popular ligand-specific predictors, meta-predictors that combine their results to identify MLBRs, and a baseline regression-based predictor. These tests reveal that MERIT provides accurate predictions and statistically outperforms these alternatives. Moreover, using two test datasets, one with MLBRs and another with only the single ligand binding residues, we show that MERIT consistently produces relatively low false positive rates, including low rates of cross-predictions. The web server and datasets from this study are freely available at http://biomine.cs.vcu.edu/servers/MERIT/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Fuhao Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
7
|
Xu W, Wu Z, Zhang C, Zhu C, Duan H. PepCARES: A Comprehensive Advanced Refinement and Evaluation System for Peptide Design and Affinity Screening. ACS OMEGA 2024; 9:46429-46438. [PMID: 39583700 PMCID: PMC11579952 DOI: 10.1021/acsomega.4c07682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/27/2024] [Accepted: 11/04/2024] [Indexed: 11/26/2024]
Abstract
Peptides are crucial in vaccine research, and their remarkable specificity and efficacy make them a promising potential drug class. However, designing and screening these peptides computationally is challenging. Here, we present the comprehensive advanced refinement and evaluation system (PepCARES), a program utilizing our novel model called PeptideMPNN and score evaluation for peptide design and affinity screening. PeptideMPNN, built on ProteinMPNN with transfer learning, significantly enhances sequence recovery (by 26.26%) and reduces perplexity (by 0.536) in a sequence generation task. We designed peptides targeting two HLA alleles and, using MHCfovea and PDBePISA, identified candidates with high potential. From 20 designed peptides, 14 and 7 peptides were selected, respectively. Our research provides a method for designing and screening peptides, making an important step toward the development of peptide-based vaccines.
Collapse
Affiliation(s)
- Wen Xu
- College of
Pharmaceutical Sciences, Zhejiang University
of Technology, Hangzhou 310014, China
| | - Zhipeng Wu
- College of
Pharmaceutical Sciences, Zhejiang University
of Technology, Hangzhou 310014, China
| | - Chengyun Zhang
- AI
Department, Shanghai Highslab Therapeutics,
Inc., Shanghai 201203, China
| | - Cheng Zhu
- College of
Pharmaceutical Sciences, Zhejiang University
of Technology, Hangzhou 310014, China
| | - Hongliang Duan
- Faculty of
Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
8
|
Zhang F, Naeem M, Yu B, Liu F, Ju J. Improving the enzymatic activity and stability of N-carbamoyl hydrolase using deep learning approach. Microb Cell Fact 2024; 23:164. [PMID: 38834993 PMCID: PMC11151596 DOI: 10.1186/s12934-024-02439-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 05/24/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Optically active D-amino acids are widely used as intermediates in the synthesis of antibiotics, insecticides, and peptide hormones. Currently, the two-enzyme cascade reaction is the most efficient way to produce D-amino acids using enzymes DHdt and DCase, but DCase is susceptible to heat inactivation. Here, to enhance the enzymatic activity and thermal stability of DCase, a rational design software "Feitian" was developed based on kcat prediction using the deep learning approach. RESULTS According to empirical design and prediction of "Feitian" software, six single-point mutants with high kcat value were selected and successfully constructed by site-directed mutagenesis. Out of six, three mutants (Q4C, T212S, and A302C) showed higher enzymatic activity than the wild-type. Furthermore, the combined triple-point mutant DCase-M3 (Q4C/T212S/A302C) exhibited a 4.25-fold increase in activity (29.77 ± 4.52 U) and a 2.25-fold increase in thermal stability as compared to the wild-type, respectively. Through the whole-cell reaction, the high titer of D-HPG (2.57 ± 0.43 mM) was produced by the mutant Q4C/T212S/A302C, which was about 2.04-fold of the wild-type. Molecular dynamics simulation results showed that DCase-M3 significantly enhances the rigidity of the catalytic site and thus increases the activity of DCase-M3. CONCLUSIONS In this study, an efficient rational design software "Feitian" was successfully developed with a prediction accuracy of about 50% in enzymatic activity. A triple-point mutant DCase-M3 (Q4C/T212S/A302C) with enhanced enzymatic activity and thermostability was successfully obtained, which could be applied to the development of a fully enzymatic process for the industrial production of D-HPG.
Collapse
Affiliation(s)
- Fa Zhang
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Muhammad Naeem
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China
| | - Bo Yu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Feixia Liu
- Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jiansong Ju
- College of Life Science, Hebei Normal University, Shijiazhuang, 050024, China.
- Hebei Collaborative Innovation Center for Eco-Environment, Shijiazhuang, 050024, China.
| |
Collapse
|
9
|
Mu J, Li Z, Zhang B, Zhang Q, Iqbal J, Wadood A, Wei T, Feng Y, Chen HF. Graphormer supervised de novo protein design method and function validation. Brief Bioinform 2024; 25:bbae135. [PMID: 38557677 PMCID: PMC10982952 DOI: 10.1093/bib/bbae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 01/31/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open
Abstract
Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.
Collapse
Affiliation(s)
- Junxi Mu
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, No.5 Yiheyuan Road, Beijing, 100871, China
| | - Zhengxin Li
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Bo Zhang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Qi Zhang
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jamshed Iqbal
- Centre for Advanced Drug Research, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, Pakistan
| | - Abdul Wadood
- Department of Biochemistry, Abdul Wali Khan University Mardan, Mardan, 23200, Pakistan
| | - Ting Wei
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Yan Feng
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| |
Collapse
|
10
|
Yu J, Mu J, Wei T, Chen HF. Multi-indicator comparative evaluation for deep learning-based protein sequence design methods. Bioinformatics 2024; 40:btae037. [PMID: 38261649 PMCID: PMC10868333 DOI: 10.1093/bioinformatics/btae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/20/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Proteins found in nature represent only a fraction of the vast space of possible proteins. Protein design presents an opportunity to explore and expand this protein landscape. Within protein design, protein sequence design plays a crucial role, and numerous successful methods have been developed. Notably, deep learning-based protein sequence design methods have experienced significant advancements in recent years. However, a comprehensive and systematic comparison and evaluation of these methods have been lacking, with indicators provided by different methods often inconsistent or lacking effectiveness. RESULTS To address this gap, we have designed a diverse set of indicators that cover several important aspects, including sequence recovery, diversity, root-mean-square deviation of protein structure, secondary structure, and the distribution of polar and nonpolar amino acids. In our evaluation, we have employed an improved weighted inferiority-superiority distance method to comprehensively assess the performance of eight widely used deep learning-based protein sequence design methods. Our evaluation not only provides rankings of these methods but also offers optimization suggestions by analyzing the strengths and weaknesses of each method. Furthermore, we have developed a method to select the best temperature parameter and proposed solutions for the common issue of designing sequences with consecutive repetitive amino acids, which is often encountered in protein design methods. These findings can greatly assist users in selecting suitable protein sequence design methods. Overall, our work contributes to the field of protein sequence design by providing a comprehensive evaluation system and optimization suggestions for different methods.
Collapse
Affiliation(s)
- Jinyu Yu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Junxi Mu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
11
|
Liu Y, Liu H. Protein sequence design on given backbones with deep learning. Protein Eng Des Sel 2024; 37:gzad024. [PMID: 38157313 DOI: 10.1093/protein/gzad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/08/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024] Open
Abstract
Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215004, China
| |
Collapse
|
12
|
Zhang X, Yin H, Ling F, Zhan J, Zhou Y. SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network. PLoS Comput Biol 2023; 19:e1011330. [PMID: 38060617 PMCID: PMC10729952 DOI: 10.1371/journal.pcbi.1011330] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/19/2023] [Accepted: 11/27/2023] [Indexed: 12/20/2023] Open
Abstract
Recent advances in deep learning have significantly improved the ability to infer protein sequences directly from protein structures for the fix-backbone design. The methods have evolved from the early use of multi-layer perceptrons to convolutional neural networks, transformers, and graph neural networks (GNN). However, the conventional approach of constructing K-nearest-neighbors (KNN) graph for GNN has limited the utilization of edge information, which plays a critical role in network performance. Here we introduced SPIN-CGNN based on protein contact maps for nearest neighbors. Together with auxiliary edge updates and selective kernels, we found that SPIN-CGNN provided a comparable performance in refolding ability by AlphaFold2 to the current state-of-the-art techniques but a significant improvement over them in term of sequence recovery, perplexity, deviation from amino-acid compositions of native sequences, conservation of hydrophobic positions, and low complexity regions, according to the test by unseen structures, "hallucinated" structures and diffusion models. Results suggest that low complexity regions in the sequences designed by deep learning, for generated structures in particular, remain to be improved, when compared to the native sequences.
Collapse
Affiliation(s)
- Xing Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, People’s Republic of China
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Hongmei Yin
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Fei Ling
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, People’s Republic of China
| | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People’s Republic of China
| |
Collapse
|