1
|
Ding N, Jiang Y, Lee S, Cheng Z, Ran X, Ding Y, Ge R, Zhang Y, Yang ZJ. Enzyme miniaturization: Revolutionizing future biocatalysts. Biotechnol Adv 2025:108598. [PMID: 40354901 DOI: 10.1016/j.biotechadv.2025.108598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 04/05/2025] [Accepted: 05/09/2025] [Indexed: 05/14/2025]
Abstract
Enzyme miniaturization offers a transformative approach to overcome limitations posed by the large size of conventional enzymes in industrial, therapeutic, and diagnostic applications. However, the evolutionary optimization of enzymes for activity and stability has not inherently favored compact structures, creating challenges for modern applications requiring smaller and more efficient catalysts. In this review, we surveyed the advantages of miniature enzymes, including enhanced expressivity, folding efficiency, thermostability, and resistance to proteolysis. We described the applications of miniature enzymes as biosensors, therapeutic agents, and industrial catalysts. We highlighted strategies such as genome mining, rational design, random deletion, and de novo design for achieving enzyme miniaturization, integrating both computational and experimental techniques. By investigating these approaches, we aim to provide a framework for advancing enzyme engineering, emphasizing the unique potential of smaller enzymes to revolutionize biocatalysis, gene therapy, and biosensing technologies.
Collapse
Affiliation(s)
- Ning Ding
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, United States.
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Department of Chemistry and California Institute for Quantitative Biosciences, University of California-Berkeley, Berkeley, CA 94720, United States
| | - Sangsin Lee
- Department of Genetics, Stanford University, Stanford, CA 94305, United States
| | - Zihao Cheng
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Xinchun Ran
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Yujing Ding
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Robbie Ge
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States
| | - Yifei Zhang
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China; Beijing Advanced Innovation Center for Soft Matter Science and Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, TN 37235, United States; Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, United States.
| |
Collapse
|
2
|
Gelman S, Johnson B, Freschlin C, Sharma A, D'Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Arnav Sharma
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - John Peters
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison
- Department of Biomedical Engineering, Duke University
| |
Collapse
|
3
|
Milchevskiy YV, Kravatskaya GI, Kravatsky YV. AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank. Int J Mol Sci 2024; 25:12555. [PMID: 39684267 DOI: 10.3390/ijms252312555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 12/18/2024] Open
Abstract
The physicochemical properties of amino acid residues from the AAindex database are widely used as predictors in building models for predicting both protein structures and properties. It should be noted, however, that the AAindex database contains data only for the 20 canonical amino acids. Non-canonical amino acids, while less common, are not rare; the Protein Data Bank includes proteins with more than 1000 distinct non-canonical amino acids. In this study, we propose a method to evaluate the physicochemical properties from the AAindex database for non-canonical amino acids and assess the prediction quality. We implemented our method as a bioinformatics tool and estimated the physicochemical properties of non-canonical amino acids from the PDB with the chemical composition presentation using SMILES encoding obtained from the PDBechem databank. The bioinformatics tool and resulting database of the estimated properties are freely available on the author's website and available for download via GitHub.
Collapse
Affiliation(s)
- Yury V Milchevskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Galina I Kravatskaya
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Yury V Kravatsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| |
Collapse
|