1
|
Haq I, Anwar F, Tong Y. De Novo Design of Highly Stable Binders Targeting Dihydrofolate Reductase in Klebsiella pneumoniae. Proteins 2025. [PMID: 40371895 DOI: 10.1002/prot.26835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2025] [Revised: 03/17/2025] [Accepted: 04/25/2025] [Indexed: 05/16/2025]
Abstract
The study aims to design novel therapeutic inhibitors targeting the DHFR protein of Klebsiella pneumoniae. However, challenges like bacterial resistance to peptides and the limitations of computational models in predicting in vivo behavior must be addressed to refine the design process and improve therapeutic efficacy. This study employed deep learning-based bioinformatics techniques to tackle these issues. The study involved retrieving DHFR protein sequences from Klebsiella strains, aligning them to identify conserved regions, and using deep learning models (OmegaFold, ProteinMPNN) to design de novo inhibitors. Cell-penetrating peptide (CPP) motifs were added to enhance delivery, followed by allergenicity and thermal stability assessments. Molecular docking and dynamics simulations evaluated the binding affinity and stability of the inhibitors with DHFR. A conserved 60-residue region was identified, and 60 de novo binders were generated, resulting in 7200 sequences. After allergenicity prediction and stability testing, 10 sequences with melting points near 70°C were shortlisted. Strong binding affinities were observed, especially for complexes 4OR7-1787 and 4OR7-1811, which remained stable in molecular dynamics simulations, indicating their potential as therapeutic agents. This study designed stable de novo peptides with cell-penetrating properties and strong binding affinity to DHFR. Future steps include in vitro validation to assess their effectiveness in inhibiting DHFR, followed by in vivo studies to evaluate their therapeutic potential and stability. These peptides offer a promising strategy against Klebsiella pneumoniae infections, providing potential alternatives to current antibiotics. Experimental validation will be key to assessing their clinical relevance.
Collapse
Affiliation(s)
- Ihteshamul Haq
- College of Life Sciences and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Faheem Anwar
- Medical School, Tianjin University, Tianjin, China
| | - Yigang Tong
- College of Life Sciences and Technology, Beijing University of Chemical Technology, Beijing, China
| |
Collapse
|
2
|
Otero-Carrasco B, Nevado PT, Muñoz RA, Ferreiro GD, Pérez AP, Caraça-Valente Hernández JP, Rodríguez-González A. Finding patterns in lung cancer protein sequences for drug repurposing. PLoS One 2025; 20:e0322546. [PMID: 40334012 PMCID: PMC12058034 DOI: 10.1371/journal.pone.0322546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Accepted: 03/22/2025] [Indexed: 05/09/2025] Open
Abstract
Proteins are fundamental biomolecules composed of one or more chains of amino acids. They are essential for all living organisms, contributing to various biological functions and regulatory processes. Alterations in protein structures and functions are closely linked to diseases, emphasizing the need for in-depth study. A thorough understanding of these associations is crucial for developing targeted and more effective therapeutic strategies.Computational analyses of biomedical data facilitate the identification of specific patterns in proteins associated with diseases, providing novel insights into their biological roles. This study introduces a computational approach designed to detect relevant sequence patterns within proteins. These patterns, characterized by specific amino acid arrangements, can be critical for protein functionality. The proposed methodology was applied to proteins targeted by drugs used in lung cancer treatment, a disease that remains the leading cause of cancer-related mortality worldwide. Given that non-small cell lung cancer represents 85-90% of all lung cancer cases, it was selected as the primary focus of this study.Significant sequence patterns were identified, establishing connections between drug-target proteins and proteins associated with lung cancer. Based on these findings, a novel computational framework was developed to extend this pattern-based analysis to proteins linked to other diseases. By employing this approach, relationships between lung cancer drug-target proteins and proteins associated with four additional cancer types were uncovered. These associations, characterized by shared amino acid sequence features, suggest potential opportunities for drug repurposing. Furthermore, validation through an extensive literature review confirmed biological links between lung cancer drug-target proteins and proteins related to other malignancies, reinforcing the potential of this methodology for identifying new therapeutic applications.
Collapse
Affiliation(s)
- Belén Otero-Carrasco
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| | - Paloma Tejera Nevado
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| | - Rafael Artiñano Muñoz
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| | - Gema Díaz Ferreiro
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| | - Aurora Pérez Pérez
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| | | | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
3
|
Sil S, Datta I, Basu S. Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs. Front Mol Biosci 2025; 12:1542267. [PMID: 40264953 PMCID: PMC12011600 DOI: 10.3389/fmolb.2025.1542267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 03/17/2025] [Indexed: 04/24/2025] Open
Abstract
Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.
Collapse
Affiliation(s)
- Souradeep Sil
- Department of Genetics, Osmania University, Hyderabad, India
| | - Ishita Datta
- Department of Genetics and Plant Breeding, Banaras Hindu University, Varanasi, India
| | - Sankar Basu
- Department of Microbiology, Asutosh College (Affiliated with University of Calcutta), Kolkata, India
| |
Collapse
|
4
|
Dauparas J, Lee GR, Pecoraro R, An L, Anishchenko I, Glasscock C, Baker D. Atomic context-conditioned protein sequence design using LigandMPNN. Nat Methods 2025; 22:717-723. [PMID: 40155723 PMCID: PMC11978504 DOI: 10.1038/s41592-025-02626-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Accepted: 02/10/2025] [Indexed: 04/01/2025]
Abstract
Protein sequence design in the context of small molecules, nucleotides and metals is critical to enzyme and small-molecule binder and sensor design, but current state-of-the-art deep-learning-based sequence design methods are unable to model nonprotein atoms and molecules. Here we describe a deep-learning-based protein sequence design method called LigandMPNN that explicitly models all nonprotein components of biomolecular systems. LigandMPNN significantly outperforms Rosetta and ProteinMPNN on native backbone sequence recovery for residues interacting with small molecules (63.3% versus 50.4% and 50.5%), nucleotides (50.5% versus 35.2% and 34.0%) and metals (77.5% versus 36.0% and 40.6%). LigandMPNN generates not only sequences but also sidechain conformations to allow detailed evaluation of binding interactions. LigandMPNN has been used to design over 100 experimentally validated small-molecule and DNA-binding proteins with high affinity and high structural accuracy (as indicated by four X-ray crystal structures), and redesign of Rosetta small-molecule binder designs has increased binding affinity by as much as 100-fold. We anticipate that LigandMPNN will be widely useful for designing new binding proteins, sensors and enzymes.
Collapse
Affiliation(s)
- Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Robert Pecoraro
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Linna An
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Cameron Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
5
|
Li D, Zhu Y, Zhang W, Liu J, Yang X, Liu Z, Wei D. AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. Interdiscip Sci 2025; 17:101-113. [PMID: 39367992 DOI: 10.1007/s12539-024-00662-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/18/2024] [Accepted: 09/22/2024] [Indexed: 10/07/2024]
Abstract
The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.
Collapse
Affiliation(s)
- Daixi Li
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China.
- Pengcheng Laboratory, Shenzhen, 518055, China.
| | - Yuqi Zhu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Wujie Zhang
- Chemical and Biomolecular Engineering Program, Physics and Chemistry Department, Milwaukee School of Engineering, Milwaukee, 53202, USA
| | - Jing Liu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Xiaochen Yang
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Zhihong Liu
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Dongqing Wei
- Pengcheng Laboratory, Shenzhen, 518055, China
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation, Center On Antibacterial Resistances, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
6
|
Zhou Y, Shi L, Li X, Wei S, Ye X, Gao Y, Zhou Y, Cheng L, Cheng L, Duan F, Li M, Zhang H, Qian Q, Zhou W. Genetic engineering of RuBisCO by multiplex CRISPR editing small subunits in rice. PLANT BIOTECHNOLOGY JOURNAL 2025; 23:731-749. [PMID: 39630060 PMCID: PMC11869188 DOI: 10.1111/pbi.14535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 09/26/2024] [Accepted: 11/16/2024] [Indexed: 03/01/2025]
Abstract
Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is required for photosynthetic carbon assimilation, as it catalyses the conversion of inorganic carbon into organic carbon. Despite its importance, RuBisCO is inefficient; it has a low catalytic rate and poor substrate specificity. Improving the catalytic performance of RuBisCO is one of the key routes for enhancing plant photosynthesis. As the basic subunit of RuBisCO, RbcS affects the catalytic properties and plays a key role in stabilizing the structure of holoenzyme. Yet, the understanding of functions of RbcS in crops is still largely unknown. Toward this end, we employed CRISPR-Cas9 technology to randomly edit five rbcS genes in rice (OsrbcS1-5), generating a series of knockout mutants. The mutations of predominant rbcS genes in rice photosynthetic tissues, OsrbcS2-5, conferred inhibited growth, delayed heading and reduced yield in the field conditions, accompanying with lower RuBisCO contents and activities and significantly reduced photosynthetic efficiency. The retarded phenotypes were severer caused by multiple mutations. In addition, we revealed that these mutants had fewer chloroplasts and starch grains and a lower sugar content in the shoot base, resulting in fewer rice tillers. Further structural analysis of the mutated RuBisCO enzyme in one rbcs2,3,5 mutant line uncovered no significant differences from the wild-type protein, indicating that the mutations of rbcS did not compromise the protein assembly or the structure. Our findings generated a mutant pool with genetic diversities, which offers a valuable resource and novel insights into unravelling the mechanisms of RuBisCO in rice. The multiplex genetic engineering approach of this study provides an effective and feasible strategy for RuBisCO modification in crops, further facilitate the photosynthesis improvement and sustainable crop production.
Collapse
Affiliation(s)
- Yujie Zhou
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lifang Shi
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of BiophysicsChinese Academy of SciencesBeijingChina
| | - Xia Li
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Shaobo Wei
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Xiangyuan Ye
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Yuan Gao
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Yupeng Zhou
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Lin Cheng
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Long Cheng
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Fengying Duan
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Mei Li
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of BiophysicsChinese Academy of SciencesBeijingChina
| | - Hui Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Qian Qian
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Wenbin Zhou
- Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
7
|
Johnson SR, Fu X, Viknander S, Goldin C, Monaco S, Zelezniak A, Yang KK. Computational scoring and experimental evaluation of enzymes generated by neural networks. Nat Biotechnol 2025; 43:396-405. [PMID: 38653796 PMCID: PMC11919684 DOI: 10.1038/s41587-024-02214-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 03/20/2024] [Indexed: 04/25/2024]
Abstract
In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50-150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.
Collapse
Affiliation(s)
| | - Xiaozhi Fu
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Sandra Viknander
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Clara Goldin
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | | | - Aleksej Zelezniak
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden.
- Institute of Biotechnology, Life Sciences Centre, Vilnius University, Vilnius, Lithuania.
- Randall Centre for Cell & Molecular Biophysics, King's College London, Guy's Campus, London, UK.
| | | |
Collapse
|
8
|
Patat AS, Nalbantoğlu ÖU. Enhancing Functional Protein Design Using Heuristic Optimization and Deep Learning for Anti-Inflammatory and Gene Therapy Applications. Proteins 2025. [PMID: 39985803 DOI: 10.1002/prot.26810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/21/2025] [Accepted: 02/03/2025] [Indexed: 02/24/2025]
Abstract
Protein sequence design is a highly challenging task, aimed at discovering new proteins that are more functional and producible under laboratory conditions than their natural counterparts. Deep learning-based approaches developed to address this problem have achieved significant success. However, these approaches often do not adequately emphasize the functional properties of proteins. In this study, we developed a heuristic optimization method to enhance key functionalities such as solubility, flexibility, and stability, while preserving the structural integrity of proteins. This method aims to reduce laboratory demands by enabling a design that is both functional and structurally sound. This approach is particularly valuable for the synthetic production of proteins with anti-inflammatory properties and those used in gene therapy. The designed proteins were initially evaluated for their ability to preserve natural structures using recovery and confidence metrics, followed by assessments with the AlphaFold tool. Additionally, natural protein sequences were mutated using a genetic algorithm and compared with those designed by our method. The results demonstrate that the protein sequences generated by our method exhibit much greater similarity to native protein sequences and structures. The code and sequences for the designed proteins are available at https://github.com/aysenursoyturk/HMHO.
Collapse
Affiliation(s)
- Ayşenur Soytürk Patat
- Department of Bioinformatics Systems Biology, Erciyes University, Kayseri, Turkey
- Department of Bioinformatics, Necmettin Erbakan University, Konya, Turkey
| | | |
Collapse
|
9
|
Chaves EJF, Coêlho DF, Cruz CHB, Moreira EG, Simões JCM, Nascimento‐Filho MJ, Lins RD. Structure-based computational design of antibody mimetics: challenges and perspectives. FEBS Open Bio 2025; 15:223-235. [PMID: 38925955 PMCID: PMC11788748 DOI: 10.1002/2211-5463.13855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/17/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024] Open
Abstract
The design of antibody mimetics holds great promise for revolutionizing therapeutic interventions by offering alternatives to conventional antibody therapies. Structure-based computational approaches have emerged as indispensable tools in the rational design of those molecules, enabling the precise manipulation of their structural and functional properties. This review covers the main classes of designed antigen-binding motifs, as well as alternative strategies to develop tailored ones. We discuss the intricacies of different computational protein-protein interaction design strategies, showcased by selected successful cases in the literature. Subsequently, we explore the latest advancements in the computational techniques including the integration of machine and deep learning methodologies into the design framework, which has led to an augmented design pipeline. Finally, we verse onto the current challenges that stand in the way between high-throughput computer design of antibody mimetics and experimental realization, offering a forward-looking perspective into the field and the promises it holds to biotechnology.
Collapse
Affiliation(s)
| | - Danilo F. Coêlho
- Department of Fundamental ChemistryFederal University of PernambucoRecifeBrazil
| | - Carlos H. B. Cruz
- Institute of Structural and Molecular BiologyUniversity College LondonUK
| | | | - Júlio C. M. Simões
- Aggeu Magalhães InstituteOswaldo Cruz FoundationRecifeBrazil
- Department of Fundamental ChemistryFederal University of PernambucoRecifeBrazil
| | - Manassés J. Nascimento‐Filho
- Aggeu Magalhães InstituteOswaldo Cruz FoundationRecifeBrazil
- Department of Fundamental ChemistryFederal University of PernambucoRecifeBrazil
| | - Roberto D. Lins
- Aggeu Magalhães InstituteOswaldo Cruz FoundationRecifeBrazil
- Department of Fundamental ChemistryFederal University of PernambucoRecifeBrazil
- Fiocruz Genomics NetworkBrazil
| |
Collapse
|
10
|
Liu J, Yang M, Yu Y, Xu H, Wang T, Li K, Zhou X. Advancing bioinformatics with large language models: components, applications and perspectives. ARXIV 2025:arXiv:2401.04155v2. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics, spanning genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, the core attention mechanism, and the pre-training processes underlying these models. Additionally, we will introduce currently available foundation models and highlight their downstream applications across various bioinformatics domains. Finally, drawing from our experience, we will offer practical guidance for both LLM users and developers, emphasizing strategies to optimize their use and foster further innovation in the field.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Mengyuan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Yankai Yu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
| | - Haixia Xu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Tiangang Wang
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Mohapatra M, Sahu C, Mohapatra S. Trends of Artificial Intelligence (AI) Use in Drug Targets, Discovery and Development: Current Status and Future Perspectives. Curr Drug Targets 2025; 26:221-242. [PMID: 39473198 DOI: 10.2174/0113894501322734241008163304] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 08/14/2024] [Accepted: 08/26/2024] [Indexed: 05/07/2025]
Abstract
The applications of artificial intelligence (AI) in pharmaceutical sectors have advanced drug discovery and development methods. AI has been applied in virtual drug design, molecule synthesis, advanced research, various screening methods, and decision-making processes. In the fourth industrial revolution, when medical discoveries are happening swiftly, AI technology is essential to reduce the costs, effort, and time in the pharmaceutical industry. Further, it will aid "genome-based medicine" and "drug discovery." AI may prepare proactive databases according to diseases, disorders, and appropriate usage of drugs which will facilitate the required data for the process of drug development. The application of AI has improved clinical trials on patient selection in a population, stratification, and sample assessment such as biomarkers, effectiveness measures, dosage selection, and trial length. Various studies suggest AI could be perform better compared to conventional techniques in drug discovery. The present review focused on the positive impact of AI in drug discovery and development processes in the pharmaceutical industry and beneficial usage in health sectors as well.
Collapse
Affiliation(s)
- Manmayee Mohapatra
- Department of Pharmaceutics, Einstein College of Pharmacy, Bhubaneswar, Biju Patnaik University of Technology, Rourkela, Odisha, India
| | - Chittaranjan Sahu
- Department of Pharmacology, Koustuv Research Institute of Medical Science (KRIMS), Bhubaneswar, Biju Patnaik University of Technology, Rourkela, Odisha, India
| | - Snehamayee Mohapatra
- School of Pharmaceutical Sciences, Sikhya 'O' Anusandhan University, Bhubaneswar, Odisha, India
| |
Collapse
|
12
|
Park W, Cha S, Hahn JS. Advancements in Biological Conversion of C1 Feedstocks: Sustainable Bioproduction and Environmental Solutions. ACS Synth Biol 2024; 13:3788-3798. [PMID: 39610332 DOI: 10.1021/acssynbio.4c00519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
The use of one-carbon (C1) feedstocks, including carbon dioxide (CO2), carbon monoxide (CO), formate (HCO2H), methanol (CH3OH), and methane (CH4), presents a significant opportunity for sustainable bioproduction and environmental conservation. This Perspective explores the development of biological methods for converting C1 feedstocks into valuable products, emphasizing major progress from engineering native C1 assimilation pathways to the creation of synthetic autotrophs and methylotrophs that utilize these carbon sources. Additionally, we discuss hybrid approaches that merge biological and electrochemical systems, particularly for the conversion of CO2. This Perspective underscores the importance of C1 bioconversion in promoting sustainable biotechnological strategies for a low-carbon future.
Collapse
Affiliation(s)
- Wooyoung Park
- School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Seungwoo Cha
- School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Ji-Sook Hahn
- School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
13
|
Otesteanu CF, Caldelari R, Heussler V, Sznitman R. Machine learning for predicting Plasmodium liver stage development in vitro using microscopy imaging. Comput Struct Biotechnol J 2024; 24:334-342. [PMID: 38690550 PMCID: PMC11059334 DOI: 10.1016/j.csbj.2024.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 05/02/2024] Open
Abstract
Malaria, a significant global health challenge, is caused by Plasmodium parasites. The Plasmodium liver stage plays a pivotal role in the establishment of the infection. This study focuses on the liver stage development of the model organism Plasmodium berghei, employing fluorescent microscopy imaging and convolutional neural networks (CNNs) for analysis. Convolutional neural networks have been recently proposed as a viable option for tasks such as malaria detection, prediction of host-pathogen interactions, or drug discovery. Our research aimed to predict the transition of Plasmodium-infected liver cells to the merozoite stage, a key development phase, 15 hours in advance. We collected and analyzed hourly imaging data over a span of at least 38 hours from 400 sequences, encompassing 502 parasites. Our method was compared to human annotations to validate its efficacy. Performance metrics, including the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity, were evaluated on an independent test dataset. The outcomes revealed an AUC of 0.873, a sensitivity of 84.6%, and a specificity of 83.3%, underscoring the potential of our CNN-based framework to predict liver stage development of P. berghei. These findings not only demonstrate the feasibility of our methodology but also could potentially contribute to the broader understanding of parasite biology.
Collapse
Affiliation(s)
- Corin F. Otesteanu
- Artificial Intelligence in Medicine group, University of Bern, Switzerland
| | - Reto Caldelari
- Institute of Cell Biology, University of Bern, Switzerland
| | | | - Raphael Sznitman
- Artificial Intelligence in Medicine group, University of Bern, Switzerland
| |
Collapse
|
14
|
Heinzinger M, Weissenow K, Sanchez J, Henkel A, Mirdita M, Steinegger M, Rost B. Bilingual language model for protein sequence and structure. NAR Genom Bioinform 2024; 6:lqae150. [PMID: 39633723 PMCID: PMC11616678 DOI: 10.1093/nargab/lqae150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/02/2024] [Accepted: 10/21/2024] [Indexed: 12/07/2024] Open
Abstract
Adapting language models to protein sequences spawned the development of powerful protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein structure prediction. Now we can systematically and comprehensively explore the dual nature of proteins that act and exist as three-dimensional (3D) machines and evolve as linear strings of one-dimensional (1D) sequences. Here, we leverage pLMs to simultaneously model both modalities in a single model. We encode protein structures as token sequences using the 3Di-alphabet introduced by the 3D-alignment method Foldseek. For training, we built a non-redundant dataset from AlphaFoldDB and fine-tuned an existing pLM (ProtT5) to translate between 3Di and amino acid sequences. As a proof-of-concept for our novel approach, dubbed Protein 'structure-sequence' T5 (ProstT5), we showed improved performance for subsequent, structure-related prediction tasks, leading to three orders of magnitude speedup for deriving 3Di. This will be crucial for future applications trying to search metagenomic sequence databases at the sensitivity of structure comparisons. Our work showcased the potential of pLMs to tap into the information-rich protein structure revolution fueled by AlphaFold2. ProstT5 paves the way to develop new tools integrating the vast resource of 3D predictions and opens new research avenues in the post-AlphaFold2 era.
Collapse
Affiliation(s)
- Michael Heinzinger
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), 85748 Garching/Munich, Germany
| | - Konstantin Weissenow
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), 85748 Garching/Munich, Germany
| | - Joaquin Gomez Sanchez
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), 85748 Garching/Munich, Germany
| | - Adrian Henkel
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), 85748 Garching/Munich, Germany
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, 08826 Seoul, South Korea
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, 08826 Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, 08826 Seoul, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, 08826 Seoul, South Korea
| | - Burkhard Rost
- School of Computation, Information, and Technology (CIT), Department of Informatics, Bioinformatics & Computational Biology, TUM (Technical University of Munich), 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr, 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
15
|
Hemant Kumar S, Venkatachalapathy M, Sistla R, Poongavanam V. Advances in molecular glues: exploring chemical space and design principles for targeted protein degradation. Drug Discov Today 2024; 29:104205. [PMID: 39393773 DOI: 10.1016/j.drudis.2024.104205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 09/18/2024] [Accepted: 10/04/2024] [Indexed: 10/13/2024]
Abstract
The discovery of the E3 ligase cereblon (CRBN) as the target of thalidomide and its analogs revolutionized the field of targeted protein degradation (TPD). This ubiquitin-mediated degradation pathway was first harnessed by bivalent degraders. Recently, the emergence of low-molecular-weight molecular glue degraders (MGDs) has expanded the TPD landscape, because MGDs operate via the same mechanism while offering attractive physicochemical properties that are consistent with small-molecule therapeutics. This review delves into the discovery and advancement of MGDs, with case studies on cyclin K and the zinc finger protein IKZF2, highlighting the design principles, biological assays and therapeutic applications. Additionally, it examines the chemical space of molecular glues and outlines the collaborative efforts that are fueling innovation in this field.
Collapse
Affiliation(s)
- S Hemant Kumar
- thinkMolecular Technologies Pvt. Ltd, Haralur, Bangalore, KA 560102, India
| | | | - Ramesh Sistla
- thinkMolecular Technologies Pvt. Ltd, Haralur, Bangalore, KA 560102, India.
| | | |
Collapse
|
16
|
Leone L, De Fenza M, Esposito A, Maglio O, Nastri F, Lombardi A. Peptides and metal ions: A successful marriage for developing artificial metalloproteins. J Pept Sci 2024; 30:e3606. [PMID: 38719781 DOI: 10.1002/psc.3606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/27/2024] [Accepted: 03/28/2024] [Indexed: 10/12/2024]
Abstract
The mutual relationship between peptides and metal ions enables metalloproteins to have crucial roles in biological systems, including structural, sensing, electron transport, and catalytic functions. The effort to reproduce or/and enhance these roles, or even to create unprecedented functions, is the focus of protein design, the first step toward the comprehension of the complex machinery of nature. Nowadays, protein design allows the building of sophisticated scaffolds, with novel functions and exceptional stability. Recent progress in metalloprotein design has led to the building of peptides/proteins capable of orchestrating the desired functions of different metal cofactors. The structural diversity of peptides allows proper selection of first- and second-shell ligands, as well as long-range electrostatic and hydrophobic interactions, which represent precious tools for tuning metal properties. The scope of this review is to discuss the construction of metal sites in de novo designed and miniaturized scaffolds. Selected examples of mono-, di-, and multi-nuclear binding sites, from the last 20 years will be described in an effort to highlight key artificial models of catalytic or electron-transfer metalloproteins. The authors' goal is to make readers feel like guests at the marriage between peptides and metal ions while offering sources of inspiration for future architects of innovative, artificial metalloproteins.
Collapse
Affiliation(s)
- Linda Leone
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Maria De Fenza
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Alessandra Esposito
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Ornella Maglio
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
- Institute of Biostructures and Bioimaging, National Research Council, Naples, Italy
| | - Flavia Nastri
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| | - Angela Lombardi
- Department of Chemical Sciences, University of Naples Federico II, Naples, Italy
| |
Collapse
|
17
|
Xie X, Gui L, Qiao B, Wang G, Huang S, Zhao Y, Sun S. Deep learning in template-free de novo biosynthetic pathway design of natural products. Brief Bioinform 2024; 25:bbae495. [PMID: 39373052 PMCID: PMC11456888 DOI: 10.1093/bib/bbae495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/12/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024] Open
Abstract
Natural products (NPs) are indispensable in drug development, particularly in combating infections, cancer, and neurodegenerative diseases. However, their limited availability poses significant challenges. Template-free de novo biosynthetic pathway design provides a strategic solution for NP production, with deep learning standing out as a powerful tool in this domain. This review delves into state-of-the-art deep learning algorithms in NP biosynthesis pathway design. It provides an in-depth discussion of databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and UniProt, which are essential for model training, along with chemical databases such as Reaxys, SciFinder, and PubChem for transfer learning to expand models' understanding of the broader chemical space. It evaluates the potential and challenges of sequence-to-sequence and graph-to-graph translation models for accurate single-step prediction. Additionally, it discusses search algorithms for multistep prediction and deep learning algorithms for predicting enzyme function. The review also highlights the pivotal role of deep learning in improving catalytic efficiency through enzyme engineering, which is essential for enhancing NP production. Moreover, it examines the application of large language models in pathway design, enzyme discovery, and enzyme engineering. Finally, it addresses the challenges and prospects associated with template-free approaches, offering insights into potential advancements in NP biosynthesis pathway design.
Collapse
Affiliation(s)
- Xueying Xie
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Lin Gui
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Baixue Qiao
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, No. 246 Xuefu Road, Nangang District,Harbin 150081, China
| | - Yuming Zhao
- College of Computer and Control Engineering, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Shanwen Sun
- Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), No. 26 Hexing Road, Xiangfang District, Harbin 150001, China
- College of Life Science, Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin 150040, China
| |
Collapse
|
18
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
19
|
Ribeiro-Filho HV, Jara GE, Guerra JVS, Cheung M, Felbinger NR, Pereira JGC, Pierce BG, Lopes-de-Oliveira PS. Exploring the potential of structure-based deep learning approaches for T cell receptor design. PLoS Comput Biol 2024; 20:e1012489. [PMID: 39348412 PMCID: PMC11466415 DOI: 10.1371/journal.pcbi.1012489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 10/10/2024] [Accepted: 09/14/2024] [Indexed: 10/02/2024] Open
Abstract
Deep learning methods, trained on the increasing set of available protein 3D structures and sequences, have substantially impacted the protein modeling and design field. These advancements have facilitated the creation of novel proteins, or the optimization of existing ones designed for specific functions, such as binding a target protein. Despite the demonstrated potential of such approaches in designing general protein binders, their application in designing immunotherapeutics remains relatively underexplored. A relevant application is the design of T cell receptors (TCRs). Given the crucial role of T cells in mediating immune responses, redirecting these cells to tumor or infected target cells through the engineering of TCRs has shown promising results in treating diseases, especially cancer. However, the computational design of TCR interactions presents challenges for current physics-based methods, particularly due to the unique natural characteristics of these interfaces, such as low affinity and cross-reactivity. For this reason, in this study, we explored the potential of two structure-based deep learning protein design methods, ProteinMPNN and ESM-IF1, in designing fixed-backbone TCRs for binding target antigenic peptides presented by the MHC through different design scenarios. To evaluate TCR designs, we employed a comprehensive set of sequence- and structure-based metrics, highlighting the benefits of these methods in comparison to classical physics-based design methods and identifying deficiencies for improvement.
Collapse
Affiliation(s)
- Helder V. Ribeiro-Filho
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Gabriel E. Jara
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - João V. S. Guerra
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, Brazil
| | - Melyssa Cheung
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, United States of America
| | - Nathaniel R. Felbinger
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - José G. C. Pereira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Brian G. Pierce
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Paulo S. Lopes-de-Oliveira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, Brazil
| |
Collapse
|
20
|
Yu Z, Wang J. Strategies and procedures to generate chimeric DNA polymerases for improved applications. Appl Microbiol Biotechnol 2024; 108:445. [PMID: 39167106 PMCID: PMC11339088 DOI: 10.1007/s00253-024-13276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 08/04/2024] [Accepted: 08/06/2024] [Indexed: 08/23/2024]
Abstract
Chimeric DNA polymerase with notable performance has been generated for wide applications including DNA amplification and molecular diagnostics. This rational design method aims to improve specific enzymatic characteristics or introduce novel functions by fusing amino acid sequences from different proteins with a single DNA polymerase to create a chimeric DNA polymerase. Several strategies prove to be efficient, including swapping homologous domains between polymerases to combine benefits from different species, incorporating additional domains for exonuclease activity or enhanced binding ability to DNA, and integrating functional protein along with specific protein structural pattern to improve thermal stability and tolerance to inhibitors, as many cases in the past decade shown. The conventional protocol to develop a chimeric DNA polymerase with desired traits involves a Design-Build-Test-Learn (DBTL) cycle. This procedure initiates with the selection of a parent polymerase, followed by the identification of relevant domains and devising a strategy for fusion. After recombinant expression and purification of chimeric polymerase, its performance is evaluated. The outcomes of these evaluations are analyzed for further enhancing and optimizing the functionality of the polymerase. This review, centered on microorganisms, briefly outlines typical instances of chimeric DNA polymerases categorized, and presents a general methodology for their creation. KEY POINTS: • Chimeric DNA polymerase is generated by rational design method. • Strategies include domain exchange and addition of proteins, domains, and motifs. • Chimeric DNA polymerase exhibits improved enzymatic properties or novel functions.
Collapse
Affiliation(s)
- Zhuoxuan Yu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jufang Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
21
|
Middendorf L, Ravi Iyengar B, Eicholt LA. Sequence, Structure, and Functional Space of Drosophila De Novo Proteins. Genome Biol Evol 2024; 16:evae176. [PMID: 39212966 PMCID: PMC11363682 DOI: 10.1093/gbe/evae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany
| |
Collapse
|
22
|
Jang YJ, Qin QQ, Huang SY, Peter ATJ, Ding XM, Kornmann B. Accurate prediction of protein function using statistics-informed graph networks. Nat Commun 2024; 15:6601. [PMID: 39097570 PMCID: PMC11297950 DOI: 10.1038/s41467-024-50955-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/15/2024] [Indexed: 08/05/2024] Open
Abstract
Understanding protein function is pivotal in comprehending the intricate mechanisms that underlie many crucial biological activities, with far-reaching implications in the fields of medicine, biotechnology, and drug development. However, more than 200 million proteins remain uncharacterized, and computational efforts heavily rely on protein structural information to predict annotations of varying quality. Here, we present a method that utilizes statistics-informed graph networks to predict protein functions solely from its sequence. Our method inherently characterizes evolutionary signatures, allowing for a quantitative assessment of the significance of residues that carry out specific functions. PhiGnet not only demonstrates superior performance compared to alternative approaches but also narrows the sequence-function gap, even in the absence of structural information. Our findings indicate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting both existing properties and new functionalities of proteins in research and biomedicine.
Collapse
Affiliation(s)
- Yaan J Jang
- Department of Biochemistry, University of Oxford, Oxford, UK.
- AmoAi Technologies, Oxford, UK.
| | - Qi-Qi Qin
- AmoAi Technologies, Oxford, UK
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Si-Yu Huang
- AmoAi Technologies, Oxford, UK
- Oxford Martin School, University of Oxford, Oxford, UK
- School of Systems Science, Beijing Normal University, Beijing, China
| | | | - Xue-Ming Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Benoît Kornmann
- Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
23
|
Albanese KI, Petrenas R, Pirro F, Naudin EA, Borucu U, Dawson WM, Scott DA, Leggett GJ, Weiner OD, Oliver TAA, Woolfson DN. Rationally seeded computational protein design of ɑ-helical barrels. Nat Chem Biol 2024; 20:991-999. [PMID: 38902458 PMCID: PMC11288890 DOI: 10.1038/s41589-024-01642-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 05/09/2024] [Indexed: 06/22/2024]
Abstract
Computational protein design is advancing rapidly. Here we describe efficient routes starting from validated parallel and antiparallel peptide assemblies to design two families of α-helical barrel proteins with central channels that bind small molecules. Computational designs are seeded by the sequences and structures of defined de novo oligomeric barrel-forming peptides, and adjacent helices are connected by loop building. For targets with antiparallel helices, short loops are sufficient. However, targets with parallel helices require longer connectors; namely, an outer layer of helix-turn-helix-turn-helix motifs that are packed onto the barrels. Throughout these computational pipelines, residues that define open states of the barrels are maintained. This minimizes sequence sampling, accelerating the design process. For each of six targets, just two to six synthetic genes are made for expression in Escherichia coli. On average, 70% of these genes express to give soluble monomeric proteins that are fully characterized, including high-resolution structures for most targets that match the design models with high accuracy.
Collapse
Affiliation(s)
- Katherine I Albanese
- School of Chemistry, University of Bristol, Bristol, UK
- Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Bristol, UK
| | | | - Fabio Pirro
- School of Chemistry, University of Bristol, Bristol, UK
| | | | - Ufuk Borucu
- School of Biochemistry, University of Bristol, Medical Sciences Building, Bristol, UK
| | | | - D Arne Scott
- Rosa Biotech, Science Creates St Philips, Bristol, UK
| | | | - Orion D Weiner
- Cardiovascular Research Institute, Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA, USA
| | | | - Derek N Woolfson
- School of Chemistry, University of Bristol, Bristol, UK.
- Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Bristol, UK.
- School of Biochemistry, University of Bristol, Medical Sciences Building, Bristol, UK.
- Bristol BioDesign Institute, University of Bristol, Bristol, UK.
| |
Collapse
|
24
|
Wang Q, Liu X, Zhang H, Chu H, Shi C, Zhang L, Bai J, Liu P, Li J, Zhu X, Liu Y, Chen Z, Huang R, Chang H, Liu T, Chang Z, Cheng J, Jiang H. Cytochrome P450 Enzyme Design by Constraining the Catalytic Pocket in a Diffusion Model. RESEARCH (WASHINGTON, D.C.) 2024; 7:0413. [PMID: 38979516 PMCID: PMC11227911 DOI: 10.34133/research.0413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 05/27/2024] [Indexed: 07/10/2024]
Abstract
Although cytochrome P450 enzymes are the most versatile biocatalysts in nature, there is insufficient comprehension of the molecular mechanism underlying their functional innovation process. Here, by combining ancestral sequence reconstruction, reverse mutation assay, and progressive forward accumulation, we identified 5 founder residues in the catalytic pocket of flavone 6-hydroxylase (F6H) and proposed a "3-point fixation" model to elucidate the functional innovation mechanisms of P450s in nature. According to this design principle of catalytic pocket, we further developed a de novo diffusion model (P450Diffusion) to generate artificial P450s. Ultimately, among the 17 non-natural P450s we generated, 10 designs exhibited significant F6H activity and 6 exhibited a 1.3- to 3.5-fold increase in catalytic capacity compared to the natural CYP706X1. This work not only explores the design principle of catalytic pockets of P450s, but also provides an insight into the artificial design of P450 enzymes with desired functions.
Collapse
Affiliation(s)
- Qian Wang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Xiaonan Liu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Hejian Zhang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
- College of Biotechnology,
Tianjin University of Science and Technology, Tianjin 300457, China
| | - Huanyu Chu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Shi
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences,
Peking University, Beijing 100191, China
| | - Lei Zhang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- College of Life Science and Technology,
Wuhan Polytechnic University, Wuhan, Hubei 430023, China
| | - Jie Bai
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Pi Liu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Jing Li
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
- State Key Laboratory of Elemento-Organic Chemistry, College of Chemistry,
Nankai University, Tianjin 300071, China
- College of Life Science,
Nankai University, Tianjin 300071, China
| | - Xiaoxi Zhu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Yuwan Liu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Zhangxin Chen
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences,
Peking University, Beijing 100191, China
| | - Rong Huang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Hong Chang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Tian Liu
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Zhenzhan Chang
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences,
Peking University, Beijing 100191, China
| | - Jian Cheng
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| | - Huifeng Jiang
- Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology,
Chinese Academy of Sciences, Tianjin 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China
| |
Collapse
|
25
|
Wang J, Watson JL, Lisanza SL. Protein Design Using Structure-Prediction Networks: AlphaFold and RoseTTAFold as Protein Structure Foundation Models. Cold Spring Harb Perspect Biol 2024; 16:a041472. [PMID: 38438190 PMCID: PMC11216169 DOI: 10.1101/cshperspect.a041472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
Designing proteins with tailored structures and functions is a long-standing goal in bioengineering. Recently, deep learning advances have enabled protein structure prediction at near-experimental accuracy, which has catalyzed progress in protein design as well. We review recent studies that use structure-prediction neural networks to design proteins, via approaches such as activation maximization, inpainting, or denoising diffusion. These methods have led to major improvements over previous methods in wet-lab success rates for designing protein binders, metalloproteins, enzymes, and oligomeric assemblies. These results show that structure-prediction models are a powerful foundation for developing protein-design tools and suggest that continued improvement of their accuracy and generality will be key to unlocking the full potential of protein design.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
- DeepMind, London EC4A 3BF, United Kingdom
| | - Joseph L Watson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
| | - Sidney L Lisanza
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
26
|
Rahmati F, Sethi D, Shu W, Asgari Lajayer B, Mosaferi M, Thomson A, Price GW. Advances in microbial exoenzymes bioengineering for improvement of bioplastics degradation. CHEMOSPHERE 2024; 355:141749. [PMID: 38521099 DOI: 10.1016/j.chemosphere.2024.141749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 03/06/2024] [Accepted: 03/16/2024] [Indexed: 03/25/2024]
Abstract
Plastic pollution has become a major global concern, posing numerous challenges for the environment and wildlife. Most conventional ways of plastics degradation are inefficient and cause great damage to ecosystems. The development of biodegradable plastics offers a promising solution for waste management. These plastics are designed to break down under various conditions, opening up new possibilities to mitigate the negative impact of traditional plastics. Microbes, including bacteria and fungi, play a crucial role in the degradation of bioplastics by producing and secreting extracellular enzymes, such as cutinase, lipases, and proteases. However, these microbial enzymes are sensitive to extreme environmental conditions, such as temperature and acidity, affecting their functions and stability. To address these challenges, scientists have employed protein engineering and immobilization techniques to enhance enzyme stability and predict protein structures. Strategies such as improving enzyme and substrate interaction, increasing enzyme thermostability, reinforcing the bonding between the active site of the enzyme and substrate, and refining enzyme activity are being utilized to boost enzyme immobilization and functionality. Recently, bioengineering through gene cloning and expression in potential microorganisms, has revolutionized the biodegradation of bioplastics. This review aimed to discuss the most recent protein engineering strategies for modifying bioplastic-degrading enzymes in terms of stability and functionality, including enzyme thermostability enhancement, reinforcing the substrate binding to the enzyme active site, refining with other enzymes, and improvement of enzyme surface and substrate action. Additionally, discovered bioplastic-degrading exoenzymes by metagenomics techniques were emphasized.
Collapse
Affiliation(s)
- Farzad Rahmati
- Department of Microbiology, Faculty of Science, Qom Branch, Islamic Azad University (IAU), Qom 37185364, Iran
| | - Debadatta Sethi
- Sugarcane Research Station, Odisha University of Agriculture and Technology, Nayagarh, India
| | - Weixi Shu
- Faculty of Agriculture, Dalhousie University, Truro, NS, B2N 5E3, Canada
| | | | - Mohammad Mosaferi
- Health and Environment Research Center, Tabriz Health Services Management Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Allan Thomson
- Perennia Food and Agriculture Corporation., 173 Dr. Bernie MacDonald Dr., Bible Hill, Truro, NS, B6L 2H5, Canada
| | - G W Price
- Faculty of Agriculture, Dalhousie University, Truro, NS, B2N 5E3, Canada.
| |
Collapse
|
27
|
Saikia B, Baruah A. In silico design of misfolding resistant proteins: the role of structural similarity of a competing conformational ensemble in the optimization of frustration. SOFT MATTER 2024; 20:3283-3298. [PMID: 38529658 DOI: 10.1039/d4sm00171k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Most state-of-the-art in silico design methods fail due to misfolding of designed sequences to a conformation other than the target. Thus, a method to design misfolding resistant proteins will provide a better understanding of the misfolding phenomenon and will also increase the success rate of in silico design methods. In this work, we optimize the conformational ensemble to be selected for negative design purposes based on the similarity of the conformational ensemble to the target. Five ensembles with different degrees of similarity to the target are created and destabilized and the target is stabilized while designing sequences using mean field theory and Monte Carlo simulation methods. The results suggest that the degree of similarity of the non-native conformations to the target plays a prominent role in designing misfolding resistant protein sequences. The design procedures that destabilize the conformational ensemble with moderate similarity to the target have proven to be more promising. Incorporation of either highly similar or highly dissimilar conformations to the target conformation into the non-native ensemble to be destabilized may lead to sequences with a higher misfolding propensity. This will significantly reduce the conformational space to be considered in any protein design procedure. Interestingly, the results suggest that a sequence with higher frustration in the target structure does not necessarily lead to a misfold prone sequence. A successful design method may purposefully choose a frustrated sequence in the target conformation if that sequence is even more frustrated in the competing non-native conformations.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India.
| |
Collapse
|
28
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
29
|
Kohyama S, Frohn BP, Babl L, Schwille P. Machine learning-aided design and screening of an emergent protein function in synthetic cells. Nat Commun 2024; 15:2010. [PMID: 38443351 PMCID: PMC10914801 DOI: 10.1038/s41467-024-46203-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024] Open
Abstract
Recently, utilization of Machine Learning (ML) has led to astonishing progress in computational protein design, bringing into reach the targeted engineering of proteins for industrial and biomedical applications. However, the design of proteins for emergent functions of core relevance to cells, such as the ability to spatiotemporally self-organize and thereby structure the cellular space, is still extremely challenging. While on the generative side conditional generative models and multi-state design are on the rise, for emergent functions there is a lack of tailored screening methods as typically needed in a protein design project, both computational and experimental. Here we describe a proof-of-principle of how such screening, in silico and in vitro, can be achieved for ML-generated variants of a protein that forms intracellular spatiotemporal patterns. For computational screening we use a structure-based divide-and-conquer approach to find the most promising candidates, while for the subsequent in vitro screening we use synthetic cell-mimics as established by Bottom-Up Synthetic Biology. We then show that the best screened candidate can indeed completely substitute the wildtype gene in Escherichia coli. These results raise great hopes for the next level of synthetic biology, where ML-designed synthetic proteins will be used to engineer cellular functions.
Collapse
Affiliation(s)
- Shunshi Kohyama
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Béla P Frohn
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Leon Babl
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Petra Schwille
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany.
| |
Collapse
|
30
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
31
|
Pan X, Li Y, Huang P, Staecker H, He M. Extracellular vesicles for developing targeted hearing loss therapy. J Control Release 2024; 366:460-478. [PMID: 38182057 DOI: 10.1016/j.jconrel.2023.12.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/19/2023] [Accepted: 12/28/2023] [Indexed: 01/07/2024]
Abstract
Substantial efforts have been made for local administration of small molecules or biologics in treating hearing loss diseases caused by either trauma, genetic mutations, or drug ototoxicity. Recently, extracellular vesicles (EVs) naturally secreted from cells have drawn increasing attention on attenuating hearing impairment from both preclinical studies and clinical studies. Highly emerging field utilizing diverse bioengineering technologies for developing EVs as the bioderived therapeutic materials, along with artificial intelligence (AI)-based targeting toolkits, shed the light on the unique properties of EVs specific to inner ear delivery. This review will illuminate such exciting research field from fundamentals of hearing protective functions of EVs to biotechnology advancement and potential clinical translation of functionalized EVs. Specifically, the advancements in assessing targeting ligands using AI algorithms are systematically discussed. The overall translational potential of EVs is reviewed in the context of auditory sensing system for developing next generation gene therapy.
Collapse
Affiliation(s)
- Xiaoshu Pan
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, Florida 32610, United States
| | - Peixin Huang
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States
| | - Hinrich Staecker
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States.
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States.
| |
Collapse
|
32
|
Amalia L, Chang CY, Wang SSS, Yeh YC, Tsai SL. Recent advances in the biological depolymerization and upcycling of polyethylene terephthalate. Curr Opin Biotechnol 2024; 85:103053. [PMID: 38128200 DOI: 10.1016/j.copbio.2023.103053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 11/16/2023] [Accepted: 11/28/2023] [Indexed: 12/23/2023]
Abstract
Polyethylene terephthalate (PET) is favored for its exceptional properties and widespread daily use. This review highlights recent advancements that enable the development of biological tools for PET decomposition, transforming PET into valuable platform chemicals and materials in upcycling processes. Enhancing PET hydrolases' catalytic activity and efficiency through protein engineering strategies is a priority, facilitating more effective PET waste management. Efforts to create novel PET hydrolases for large-scale PET depolymerization continue, but cost-effectiveness remains challenging. Hydrolyzed monomers must add additional value to make PET recycling economically attractive. Valorization of hydrolysis products through the upcycling process is expected to produce new compounds with different values and qualities from the initial polymer, making the decomposed monomers more appealing. Advances in synthetic biology and enzyme engineering hold promise for PET upcycling. While biological depolymerization offers environmental benefits, further research is needed to make PET upcycling sustainable and economically feasible.
Collapse
Affiliation(s)
- Lita Amalia
- Department of Chemical Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan
| | - Chia-Yu Chang
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Steven S-S Wang
- Department of Chemical Engineering, National Taiwan University, Taipei 10617, Taiwan
| | - Yi-Chun Yeh
- Department of Chemistry, National Taiwan Normal University, Taipei 11677, Taiwan
| | - Shen-Long Tsai
- Department of Chemical Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan.
| |
Collapse
|
33
|
Kortemme T. De novo protein design-From new structures to programmable functions. Cell 2024; 187:526-544. [PMID: 38306980 PMCID: PMC10990048 DOI: 10.1016/j.cell.2023.12.028] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/03/2023] [Accepted: 12/19/2023] [Indexed: 02/04/2024]
Abstract
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Collapse
Affiliation(s)
- Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA; Chan Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
34
|
Yu J, Mu J, Wei T, Chen HF. Multi-indicator comparative evaluation for deep learning-based protein sequence design methods. Bioinformatics 2024; 40:btae037. [PMID: 38261649 PMCID: PMC10868333 DOI: 10.1093/bioinformatics/btae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/20/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Proteins found in nature represent only a fraction of the vast space of possible proteins. Protein design presents an opportunity to explore and expand this protein landscape. Within protein design, protein sequence design plays a crucial role, and numerous successful methods have been developed. Notably, deep learning-based protein sequence design methods have experienced significant advancements in recent years. However, a comprehensive and systematic comparison and evaluation of these methods have been lacking, with indicators provided by different methods often inconsistent or lacking effectiveness. RESULTS To address this gap, we have designed a diverse set of indicators that cover several important aspects, including sequence recovery, diversity, root-mean-square deviation of protein structure, secondary structure, and the distribution of polar and nonpolar amino acids. In our evaluation, we have employed an improved weighted inferiority-superiority distance method to comprehensively assess the performance of eight widely used deep learning-based protein sequence design methods. Our evaluation not only provides rankings of these methods but also offers optimization suggestions by analyzing the strengths and weaknesses of each method. Furthermore, we have developed a method to select the best temperature parameter and proposed solutions for the common issue of designing sequences with consecutive repetitive amino acids, which is often encountered in protein design methods. These findings can greatly assist users in selecting suitable protein sequence design methods. Overall, our work contributes to the field of protein sequence design by providing a comprehensive evaluation system and optimization suggestions for different methods.
Collapse
Affiliation(s)
- Jinyu Yu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Junxi Mu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ting Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
35
|
Chica RA, Ferruz N. What does it take for an 'AlphaFold Moment' in functional protein engineering and design? Nat Biotechnol 2024; 42:173-174. [PMID: 38361055 DOI: 10.1038/s41587-023-02120-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Affiliation(s)
- Roberto A Chica
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada.
- Center for Catalysis Research and Innovation, University of Ottawa, Ottawa, Ontario, Canada.
| | - Noelia Ferruz
- Department of Structural and Molecular Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona Science Park, Barcelona, Spain.
| |
Collapse
|
36
|
Pantolini L, Studer G, Pereira J, Durairaj J, Tauriello G, Schwede T. Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone. Bioinformatics 2024; 40:btad786. [PMID: 38175775 PMCID: PMC10792726 DOI: 10.1093/bioinformatics/btad786] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/27/2023] [Accepted: 12/29/2023] [Indexed: 01/06/2024] Open
Abstract
MOTIVATION Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high-dimensional embeddings on a per-residue level and encode a "semantic meaning" of each individual amino acid in the context of the full protein sequence. These representations have been used as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. RESULTS In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA) and show how these capture structural similarities even in the twilight zone, outperforming both classical methods as well as other approaches based on pLMs. The method shows excellent accuracy despite the absence of training and parameter optimization. We demonstrate that the combination of pLMs with alignment methods is a valuable approach for the detection of relationships between proteins in the twilight-zone. AVAILABILITY AND IMPLEMENTATION The code to run EBA and reproduce the analysis described in this article is available at: https://git.scicore.unibas.ch/schwede/EBA and https://git.scicore.unibas.ch/schwede/eba_benchmark.
Collapse
Affiliation(s)
- Lorenzo Pantolini
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Joana Pereira
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
37
|
Aguilera-Puga MDC, Cancelarich NL, Marani MM, de la Fuente-Nunez C, Plisson F. Accelerating the Discovery and Design of Antimicrobial Peptides with Artificial Intelligence. Methods Mol Biol 2024; 2714:329-352. [PMID: 37676607 DOI: 10.1007/978-1-0716-3441-7_18] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Peptides modulate many processes of human physiology targeting ion channels, protein receptors, or enzymes. They represent valuable starting points for the development of new biologics against communicable and non-communicable disorders. However, turning native peptide ligands into druggable materials requires high selectivity and efficacy, predictable metabolism, and good safety profiles. Machine learning models have gradually emerged as cost-effective and time-saving solutions to predict and generate new proteins with optimal properties. In this chapter, we will discuss the evolution and applications of predictive modeling and generative modeling to discover and design safe and effective antimicrobial peptides. We will also present their current limitations and suggest future research directions, applicable to peptide drug design campaigns.
Collapse
Affiliation(s)
- Mariana D C Aguilera-Puga
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico
| | - Natalia L Cancelarich
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Mariela M Marani
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Puerto Madryn, Argentina
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| | - Fabien Plisson
- Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV-IPN), Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Irapuato, Guanajuato, Mexico.
- CINVESTAV-IPN, Unidad Irapuato, Departamento de Biotecnología y Bioquímica, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
38
|
Wang H, Zeng W, Huang X, Liu Z, Sun Y, Zhang L. MTTLm 6A: A multi-task transfer learning approach for base-resolution mRNA m 6A site prediction based on an improved transformer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:272-299. [PMID: 38303423 DOI: 10.3934/mbe.2024013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
N6-methyladenosine (m6A) is a crucial RNA modification involved in various biological activities. Computational methods have been developed for the detection of m6A sites in Saccharomyces cerevisiae at base-resolution due to their cost-effectiveness and efficiency. However, the generalization of these methods has been hindered by limited base-resolution datasets. Additionally, RMBase contains a vast number of low-resolution m6A sites for Saccharomyces cerevisiae, and base-resolution sites are often inferred from these low-resolution results through post-calibration. We propose MTTLm6A, a multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. First, the RNA sequences are encoded by using one-hot encoding. Then, we construct a multi-task model that combines a convolutional neural network with a multi-head-attention deep framework. This model not only detects low-resolution m6A sites, it also assigns reasonable probabilities to the predicted sites. Finally, we employ transfer learning to predict base-resolution m6A sites based on the low-resolution m6A sites. Experimental results on Saccharomyces cerevisiae m6A and Homo sapiens m1A data demonstrate that MTTLm6A respectively achieved area under the receiver operating characteristic (AUROC) values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. To enhance user convenience, we have made a user-friendly web server for MTTLm6A publicly available at http://47.242.23.141/MTTLm6A/index.php.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xiaoling Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Zhaoyang Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
39
|
Avraham O, Tsaban T, Ben-Aharon Z, Tsaban L, Schueler-Furman O. Protein language models can capture protein quaternary state. BMC Bioinformatics 2023; 24:433. [PMID: 37964216 PMCID: PMC10647083 DOI: 10.1186/s12859-023-05549-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Determining a protein's quaternary state, i.e. the number of monomers in a functional unit, is a critical step in protein characterization. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models, such as ESM-2, that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction. RESULTS We generated ESM-2 embeddings for a large dataset of proteins with quaternary state labels from the curated QSbio dataset. We trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, it successfully learned to distinguish multimers from monomers, and predicts the specific quaternary state with moderate success, better than simple sequence similarity-based annotation transfer. Our results demonstrate that complex, quaternary state related information is included in such embeddings. CONCLUSIONS QUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out strengths as well as limitations of a sequence-based protein language model approach, compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at: https://colab. RESEARCH google.com/github/Furman-Lab/QUEEN/blob/main/QUEEN_prediction_notebook.ipynb .
Collapse
Affiliation(s)
- Orly Avraham
- Department of Microbiology and Molecular Genetics, Faculty of Medicine, Institute for Biomedical Research Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Tomer Tsaban
- Department of Microbiology and Molecular Genetics, Faculty of Medicine, Institute for Biomedical Research Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ziv Ben-Aharon
- Department of Microbiology and Molecular Genetics, Faculty of Medicine, Institute for Biomedical Research Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Linoy Tsaban
- Gaffin Center for Neuro-Oncology, Sharett Institute for Oncology, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
- The Wohl Institute for Translational Medicine, Hadassah Medical Center and Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Faculty of Medicine, Institute for Biomedical Research Israel-Canada, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
40
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
41
|
Markus B, C GC, Andreas K, Arkadij K, Stefan L, Gustav O, Elina S, Radka S. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design. ACS Catal 2023; 13:14454-14469. [PMID: 37942268 PMCID: PMC10629211 DOI: 10.1021/acscatal.3c03417] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/29/2023] [Accepted: 10/03/2023] [Indexed: 11/10/2023]
Abstract
Emerging computational tools promise to revolutionize protein engineering for biocatalytic applications and accelerate the development timelines previously needed to optimize an enzyme to its more efficient variant. For over a decade, the benefits of predictive algorithms have helped scientists and engineers navigate the complexity of functional protein sequence space. More recently, spurred by dramatic advances in underlying computational tools, the promise of faster, cheaper, and more accurate enzyme identification, characterization, and engineering has catapulted terms such as artificial intelligence and machine learning to the must-have vocabulary in the field. This Perspective aims to showcase the current status of applications in pharmaceutical industry and also to discuss and celebrate the innovative approaches in protein science by highlighting their potential in selected recent developments and offering thoughts on future opportunities for biocatalysis. It also critically assesses the technology's limitations, unanswered questions, and unmet challenges.
Collapse
Affiliation(s)
- Braun Markus
- Department
of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010 Graz, Austria
| | - Gruber Christian C
- Enzyme
and Drug Discovery, Innophore. 1700 Montgomery Street, San Francisco, California 94111, United States
| | - Krassnigg Andreas
- Enzyme
and Drug Discovery, Innophore. 1700 Montgomery Street, San Francisco, California 94111, United States
| | - Kummer Arkadij
- Moderna,
Inc., 200 Technology
Square, Cambridge, Massachusetts 02139, United States
| | - Lutz Stefan
- Codexis
Inc., 200 Penobscot Drive, Redwood City, California 94063, United States
| | - Oberdorfer Gustav
- Department
of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010 Graz, Austria
| | - Siirola Elina
- Novartis
Institute for Biomedical Research, Global Discovery Chemistry, Basel CH-4108, Switzerland
| | - Snajdrova Radka
- Novartis
Institute for Biomedical Research, Global Discovery Chemistry, Basel CH-4108, Switzerland
| |
Collapse
|
42
|
Romero-Romero S, Lindner S, Ferruz N. Exploring the Protein Sequence Space with Global Generative Models. Cold Spring Harb Perspect Biol 2023; 15:a041471. [PMID: 37848247 PMCID: PMC10626256 DOI: 10.1101/cshperspect.a041471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
Recent advancements in specialized large-scale architectures for training images and language have profoundly impacted the field of computer vision and natural language processing (NLP). Language models, such as the recent ChatGPT and GPT-4, have demonstrated exceptional capabilities in processing, translating, and generating human language. These breakthroughs have also been reflected in protein research, leading to the rapid development of numerous new methods in a short time, with unprecedented performance. Several of these models have been developed with the goal of generating sequences in novel regions of the protein space. In this work, we provide an overview of the use of protein generative models, reviewing (1) language models for the design of novel artificial proteins, (2) works that use non-transformer architectures, and (3) applications in directed evolution approaches.
Collapse
Affiliation(s)
| | | | - Noelia Ferruz
- Barcelona Institute of Molecular Biology, 08028 Barcelona, Spain
| |
Collapse
|
43
|
Kandathil SM, Lau AM, Jones DT. Machine learning methods for predicting protein structure from single sequences. Curr Opin Struct Biol 2023; 81:102627. [PMID: 37320955 DOI: 10.1016/j.sbi.2023.102627] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/17/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023]
Abstract
Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Andy M Lau
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| |
Collapse
|
44
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|