1
|
Nguyen JDM, da Hora GCA, Mifflin MC, Roberts AG, Swanson JMJ. In silico design of foldable lasso peptides. Biophys J 2025; 124:1532-1547. [PMID: 40181537 DOI: 10.1016/j.bpj.2025.03.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Revised: 03/03/2025] [Accepted: 03/31/2025] [Indexed: 04/05/2025] Open
Abstract
Lasso peptides are a unique class of natural products with distinctively threaded structures, conferring exceptional stability against thermal and proteolytic degradation. Despite their promising biotechnological and pharmaceutical applications, reported attempts to prepare them by chemical synthesis result in forming the nonthreaded branched-cyclic isomer, rather than the desired lassoed structure. This is likely due to the entropic challenge of folding a short, threaded motif before chemically mediated cyclization. Accordingly, this study aims to better understand and enhance the relative stability of pre-lasso conformations-the essential precursor to lasso peptide formation-through sequence optimization, chemical modification, and disulfide incorporation. Using Rosetta fixed backbone design, optimal sequences for several class II lasso peptides are identified. Enhanced sampling with well-tempered metadynamics confirmed that designed sequences derived from the lasso structures of rubrivinodin and microcin J25 exhibit a notable improvement in pre-lasso stability relative to the competing nonthreaded conformations. Chemical modifications to the isopeptide bond-forming residues of microcin J25 further increase the probability of pre-lasso formation, highlighting the beneficial role of noncanonical amino acid residues. Counterintuitively, the introduction of a disulfide cross-link decreased pre-lasso stability. Although cross-linking inherently constrains the peptide structure, decreasing the entropic dominance of unfolded phase space, it hinders the requisite wrapping of the N-terminal end around the tail to adopt the pre-lasso conformation. However, combining chemical modifications with the disulfide cross-link results in further pre-lasso stabilization, indicating that the ring modifications counteract the constraints and provide a cooperative benefit with cross-linking. These findings lay the groundwork for further design efforts to enable synthetic access to the lasso peptide scaffold.
Collapse
Affiliation(s)
- John D M Nguyen
- Department of Chemistry, University of Utah, Salt Lake City, Utah
| | | | - Marcus C Mifflin
- Department of Chemistry, University of Utah, Salt Lake City, Utah
| | - Andrew G Roberts
- Department of Chemistry, University of Utah, Salt Lake City, Utah
| | | |
Collapse
|
2
|
Zhao Z, Fernie AR, Zhang Y. Engineering nitrogen and carbon fixation for next-generation plants. CURRENT OPINION IN PLANT BIOLOGY 2025; 85:102699. [PMID: 40056871 DOI: 10.1016/j.pbi.2025.102699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2024] [Revised: 02/12/2025] [Accepted: 02/15/2025] [Indexed: 03/10/2025]
Abstract
Improving plant nitrogen (N) and carbon (C) acquisition and assimilation is a major challenge for global agriculture, food security, and ecological sustainability. Emerging synthetic biology techniques, including directed evolution, artificial intelligence (AI)-guided enzyme design, and metabolic engineering, have opened new avenues for optimizing nitrogenase to fix atmospheric N2 in plants, engineering Rhizobia or other nitrogen-fixing bacteria for symbiotic associations with both legume and nonlegume crops, and enhancing carbon fixation to improve photosynthetic efficiency and source-to-sink assimilate fluxes. Here, we discuss the potential for engineering nitrogen fixation and carbon fixation mechanisms in plants, from rational and AI-driven optimization of nitrogen and carbon fixation cycles. Furthermore, we discuss strategies for modifying source-to-sink relationships to promote robust growth in extreme conditions, such as arid deserts, saline-alkaline soils, or even extraterrestrial environments like Mars. The combined engineering of N and C pathways promises a new generation of crops with enhanced productivity, resource-use efficiency, and resilience. Finally, we explore future perspectives, focusing on the integration of enzyme engineering via directed evolution and computational design to accelerate metabolic innovation in plants.
Collapse
Affiliation(s)
- Zehong Zhao
- State Key Laboratory of Seed Innovation, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Youjun Zhang
- State Key Laboratory of Seed Innovation, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
3
|
Ertelt M, Moretti R, Meiler J, Schoeder CT. Self-supervised machine learning methods for protein design improve sampling but not the identification of high-fitness variants. SCIENCE ADVANCES 2025; 11:eadr7338. [PMID: 39937901 PMCID: PMC11817935 DOI: 10.1126/sciadv.adr7338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 01/10/2025] [Indexed: 02/14/2025]
Abstract
Machine learning (ML) is changing the world of computational protein design, with data-driven methods surpassing biophysical-based methods in experimental success. However, they are most often reported as case studies, lack integration and standardization, and are therefore hard to objectively compare. In this study, we established a streamlined and diverse toolbox for methods that predict amino acid probabilities inside the Rosetta software framework that allows for the side-by-side comparison of these models. Subsequently, existing protein fitness landscapes were used to benchmark novel ML methods in realistic protein design settings. We focused on the traditional problems of protein design: sampling and scoring. A major finding of our study is that ML approaches are better at purging the sampling space from deleterious mutations. Nevertheless, scoring resulting mutations without model fine-tuning showed no clear improvement over scoring with Rosetta. We conclude that ML now complements, rather than replaces, biophysical methods in protein design.
Collapse
Affiliation(s)
- Moritz Ertelt
- Institute for Drug Discovery, Leipzig University Faculty of Medicine, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Dresden, Germany
| | - Rocco Moretti
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Institute for Drug Discovery, Leipzig University Faculty of Medicine, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Dresden, Germany
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Clara T. Schoeder
- Institute for Drug Discovery, Leipzig University Faculty of Medicine, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Dresden, Germany
| |
Collapse
|
4
|
Nguyen JDM, da Hora GCA, Mifflin MC, Roberts AG, Swanson JMJ. Tying the Knot: In Silico Design of Foldable Lasso Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.17.633674. [PMID: 39896618 PMCID: PMC11785075 DOI: 10.1101/2025.01.17.633674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Lasso peptides are a unique class of natural products with distinctively threaded structures, conferring exceptional stability against thermal and proteolytic degradation. Despite their promising biotechnological and pharmaceutical applications, reported attempts to prepare them by chemical synthesis result in forming the nonthreaded branched-cyclic isomer, rather than the desired lassoed structure. This is likely due to the entropic challenge of folding a short, threaded motif prior to chemically mediated cyclization. Accordingly, this study aims to better understand and enhance the relative stability of pre-lasso conformations-the essential precursor to lasso peptide formation-through sequence optimization, chemical modification, and disulfide incorporation. Using Rosetta fixed backbone design, optimal sequences for several class II lasso peptides are identified. Enhanced sampling with well-tempered metadynamics confirmed that designed sequences derived from the lasso structures of rubrivinodin and microcin J25 exhibit a notable improvement in pre-lasso stability relative to the competing nonthreaded conformations. Chemical modifications to the isopeptide bond-forming residues of microcin J25 further increase the probability of pre-lasso formation, highlighting the beneficial role of non-canonical amino acid residues. Counterintuitively, the introduction of a disulfide cross-link decreased pre-lasso stability. Although cross-linking inherently constrains the peptide structure, decreasing the entropic dominance of unfolded phase space, it hinders the requisite wrapping of the N-terminal end around the tail to adopt the pre-lasso conformation. However, combining chemical modifications with the disulfide cross-link results in further pre-lasso stabilization, indicating that the ring modifications counteract the constraints and provide a cooperative benefit with cross-linking. These findings lay the groundwork for further design efforts to enable synthetic access to the lasso peptide scaffold. SIGNIFICANCE Lasso peptides are a unique class of ribosomally synthesized and post-translationally modified natural products with diverse biological activities and potential for therapeutic applications. Although direct synthesis would facilitate therapeutic design, it has not yet been possible to fold these short sequences to their threaded architecture without the help of biosynthetic enzyme stabilization. Our work explores strategies to enhance the stability of the pre-lasso structure, the essential precursor to de novo lasso peptide formation. We find that sequence design, incorporating non-canonical amino acid residues, and design-guided cross-linking can augment stability to increase the likelihood of lasso motif accessibility. This work presents several strategies for the continued design of foldable lasso peptides.
Collapse
|
5
|
Luo Z, Wang Q, Xia Y, Zhu X, Yang S, Xu Z, Gu L. DLBWE-Cys: a deep-learning-based tool for identifying cysteine S-carboxyethylation sites using binary-weight encoding. Front Genet 2025; 15:1464976. [PMID: 39845187 PMCID: PMC11751040 DOI: 10.3389/fgene.2024.1464976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 12/23/2024] [Indexed: 01/24/2025] Open
Abstract
Cysteine S-carboxyethylation, a novel post-translational modification (PTM), plays a critical role in the pathogenesis of autoimmune diseases, particularly ankylosing spondylitis. Accurate identification of S-carboxyethylation modification sites is essential for elucidating their functional mechanisms. Unfortunately, there are currently no computational tools that can accurately predict these sites, posing a significant challenge to this area of research. In this study, we developed a new deep learning model, DLBWE-Cys, which integrates CNN, BiLSTM, Bahdanau attention mechanisms, and a fully connected neural network (FNN), using Binary-Weight encoding specifically designed for the accurate identification of cysteine S-carboxyethylation sites. Our experimental results show that our model architecture outperforms other machine learning and deep learning models in 5-fold cross-validation and independent testing. Feature comparison experiments confirmed the superiority of our proposed Binary-Weight encoding method over other encoding techniques. t-SNE visualization further validated the model's effective classification capabilities. Additionally, we confirmed the similarity between the distribution of positional weights in our Binary-Weight encoding and the allocation of weights in attentional mechanisms. Further experiments proved the effectiveness of our Binary-Weight encoding approach. Thus, this model paves the way for predicting cysteine S-carboxyethylation modification sites in protein sequences. The source code of DLBWE-Cys and experiments data are available at: https://github.com/ztLuo-bioinfo/DLBWE-Cys.
Collapse
Affiliation(s)
- Zhengtao Luo
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Qingyong Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Yingchun Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Shuai Yang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, China
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
6
|
Kim DN, Yin T, Zhang T, Im AK, Cort JR, Rozum JC, Pollock D, Qian WJ, Feng S. Artificial Intelligence Transforming Post-Translational Modification Research. Bioengineering (Basel) 2024; 12:26. [PMID: 39851300 PMCID: PMC11762806 DOI: 10.3390/bioengineering12010026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/16/2024] [Accepted: 12/29/2024] [Indexed: 01/26/2025] Open
Abstract
Post-Translational Modifications (PTMs) are covalent changes to amino acids that occur after protein synthesis, including covalent modifications on side chains and peptide backbones. Many PTMs profoundly impact cellular and molecular functions and structures, and their significance extends to evolutionary studies as well. In light of these implications, we have explored how artificial intelligence (AI) can be utilized in researching PTMs. Initially, rationales for adopting AI and its advantages in understanding the functions of PTMs are discussed. Then, various deep learning architectures and programs, including recent applications of language models, for predicting PTM sites on proteins and the regulatory functions of these PTMs are compared. Finally, our high-throughput PTM-data-generation pipeline, which formats data suitably for AI training and predictions is described. We hope this review illuminates areas where future AI models on PTMs can be improved, thereby contributing to the field of PTM bioengineering.
Collapse
Affiliation(s)
- Doo Nam Kim
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - Tianzhixi Yin
- National Security Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA
| | - Tong Zhang
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - Alexandria K. Im
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - John R. Cort
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - Jordan C. Rozum
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - David Pollock
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| | - Song Feng
- Biological Sciences Division, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA 99352, USA (J.C.R.); (D.P.); (W.-J.Q.)
| |
Collapse
|
7
|
Gluth A, Li X, Gritsenko MA, Gaffrey MJ, Kim DN, Lalli PM, Chu RK, Day NJ, Sagendorf TJ, Monroe ME, Feng S, Liu T, Yang B, Qian WJ, Zhang T. Integrative Multi-PTM Proteomics Reveals Dynamic Global, Redox, Phosphorylation, and Acetylation Regulation in Cytokine-Treated Pancreatic Beta Cells. Mol Cell Proteomics 2024; 23:100881. [PMID: 39550035 PMCID: PMC11700301 DOI: 10.1016/j.mcpro.2024.100881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 10/28/2024] [Accepted: 11/13/2024] [Indexed: 11/18/2024] Open
Abstract
Studying regulation of protein function at a systems level necessitates an understanding of the interplay among diverse posttranslational modifications (PTMs). A variety of proteomics sample processing workflows are currently used to study specific PTMs but rarely characterize multiple types of PTMs from the same sample inputs. Method incompatibilities and laborious sample preparation steps complicate large-scale physiological investigations and can lead to variations in results. The single-pot, solid-phase-enhanced sample preparation (SP3) method for sample cleanup is compatible with different lysis buffers and amenable to automation, making it attractive for high-throughput multi-PTM profiling. Herein, we describe an integrative SP3 workflow for multiplexed quantification of protein abundance, cysteine thiol oxidation, phosphorylation, and acetylation. The broad applicability of this approach is demonstrated using cell and tissue samples, and its utility for studying interacting regulatory networks is highlighted in a time-course experiment of cytokine-treated β-cells. We observed a swift response in the global regulation of protein abundances consistent with rapid activation of JAK-STAT and NF-κB signaling pathways. Regulators of these pathways as well as proteins involved in their target processes displayed multi-PTM dynamics indicative of complex cellular response stages: acute, adaptation, and chronic (prolonged stress). PARP14, a negative regulator of JAK-STAT, had multiple colocalized PTMs that may be involved in intraprotein regulatory crosstalk. Our workflow provides a high-throughput platform that can profile multi-PTMomes from the same sample set, which is valuable in unraveling the functional roles of PTMs and their co-regulation.
Collapse
Affiliation(s)
- Austin Gluth
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA; Department of Biological Systems Engineering, Washington State University, Richland, Washington, USA
| | - Xiaolu Li
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew J Gaffrey
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Doo Nam Kim
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Priscila M Lalli
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Rosalie K Chu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Nicholas J Day
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tyler J Sagendorf
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Song Feng
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Bin Yang
- Department of Biological Systems Engineering, Washington State University, Richland, Washington, USA
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tong Zhang
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA.
| |
Collapse
|
8
|
Wu Z, Li P, Chen Y, Chen X, Feng Y, Guo Z, Zhu D, Yong Y, Chen H. Rational Design for Enhancing Cellobiose Dehydrogenase Activity and Its Synergistic Role in Straw Degradation. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:24620-24631. [PMID: 39468403 DOI: 10.1021/acs.jafc.4c05991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Addressing the demand for efficient biological degradation of straw, this study employed rational design coupled with structural biology and enzyme engineering techniques to enhance the catalytic activity of cellobiose dehydrogenase (PsCDH, CDH form Pycnoporus sanguineus). By predicting and modifying the active site and key amino acids of PsCDH, several CDH immobilized enzyme preparations with higher catalytic activities were successfully obtained. The excellent mutant T1 (C286Y/A461H/F464R) exhibited a 2.7-fold increase in enzyme activity compared to the wild type. Simulated calculations indicated that the enhancement of catalytic activity was primarily due to the formation of additional intermolecular interactions between CDH and the substrate, as well as the enlargement of the substrate pocket to reduce steric hindrance effects. Additionally, molecular dynamics simulation analysis revealed a potential correlation between structural stability and enzyme activity. When PsCDH was added to a multienzyme synergistic straw degradation system, the cellulose degradation rate increased by 1.84-fold. Moreover, mutant T1 further increased the degradation of lignocellulose in the mixed system. This study provides efficient enzyme sources and modification strategies for the high-efficiency biological conversion of straw and unconventional feedstock degradation, thereby possessing significant academic value and application prospects.
Collapse
Affiliation(s)
- Zhengfen Wu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Pengfei Li
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Yong Chen
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Xihua Chen
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Yong Feng
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Zhongjian Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - Daochen Zhu
- Biofuels Institute, Jiangsu University, 301 Xuefu Road, Zhenjiang, Jiangsu Province 212013, China
| | - Yangchun Yong
- Biofuels Institute, Jiangsu University, 301 Xuefu Road, Zhenjiang, Jiangsu Province 212013, China
| | - Huayou Chen
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| |
Collapse
|
9
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
10
|
Ertelt M, Meiler J, Schoeder CT. Combining Rosetta Sequence Design with Protein Language Model Predictions Using Evolutionary Scale Modeling (ESM) as Restraint. ACS Synth Biol 2024; 13:1085-1092. [PMID: 38568188 PMCID: PMC11036486 DOI: 10.1021/acssynbio.3c00753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 02/16/2024] [Accepted: 03/20/2024] [Indexed: 04/20/2024]
Abstract
Computational protein sequence design has the ambitious goal of modifying existing or creating new proteins; however, designing stable and functional proteins is challenging without predictability of protein dynamics and allostery. Informing protein design methods with evolutionary information limits the mutational space to more native-like sequences and results in increased stability while maintaining functions. Recently, language models, trained on millions of protein sequences, have shown impressive performance in predicting the effects of mutations. Assessing Rosetta-designed sequences with a language model showed scores that were worse than those of their original sequence. To inform Rosetta design protocols with language model predictions, we added a new metric to restrain the energy function during design using the Evolutionary Scale Modeling (ESM) model. The resulting sequences have better language model scores and similar sequence recovery, with only a minor decrease in the fitness as assessed by Rosetta energy. In conclusion, our work combines the strength of recent machine learning approaches with the Rosetta protein design toolbox.
Collapse
Affiliation(s)
- Moritz Ertelt
- Institute
for Drug Discovery, University Leipzig Medicine
Faculty, Liebigstr. 19, D-04103 Leipzig, Germany
- Center
for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, D-04105 Leipzig, Germany
| | - Jens Meiler
- Institute
for Drug Discovery, University Leipzig Medicine
Faculty, Liebigstr. 19, D-04103 Leipzig, Germany
- Center
for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, D-04105 Leipzig, Germany
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United
States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Clara T. Schoeder
- Institute
for Drug Discovery, University Leipzig Medicine
Faculty, Liebigstr. 19, D-04103 Leipzig, Germany
- Center
for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, D-04105 Leipzig, Germany
| |
Collapse
|