1
|
Chia BS, Seah YFS, Wang B, Shen K, Srivastava D, Chew WL. Engineering a New Generation of Gene Editors: Integrating Synthetic Biology and AI Innovations. ACS Synth Biol 2025; 14:636-647. [PMID: 39999982 PMCID: PMC11934138 DOI: 10.1021/acssynbio.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 01/06/2025] [Accepted: 01/16/2025] [Indexed: 02/27/2025]
Abstract
CRISPR-Cas technology has revolutionized biology by enabling precise DNA and RNA edits with ease. However, significant challenges remain for translating this technology into clinical applications. Traditional protein engineering methods, such as rational design, mutagenesis screens, and directed evolution, have been used to address issues like low efficacy, specificity, and high immunogenicity. These methods are labor-intensive, time-consuming, and resource-intensive and often require detailed structural knowledge. Recently, computational strategies have emerged as powerful solutions to these limitations. Using artificial intelligence (AI) and machine learning (ML), the discovery and design of novel gene-editing enzymes can be streamlined. AI/ML models predict activity, specificity, and immunogenicity while also enhancing mutagenesis screens and directed evolution. These approaches not only accelerate rational design but also create new opportunities for developing safer and more efficient genome-editing tools, which could eventually be translated into the clinic.
Collapse
Affiliation(s)
- Bing Shao Chia
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Yu Fen Samantha Seah
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Bolun Wang
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Kimberle Shen
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Diya Srivastava
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Wei Leong Chew
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
- Synthetic
Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
| |
Collapse
|
2
|
Wang F, Marouli A, Charoenwongwatthana P, Chang CY. Learn from artificial intelligence: the pursuit of objectivity. Lett Appl Microbiol 2025; 78:ovaf021. [PMID: 39933596 DOI: 10.1093/lambio/ovaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 01/28/2025] [Accepted: 02/10/2025] [Indexed: 02/13/2025]
Abstract
Humans continuously face threats from emerging novel pathogens and antimicrobial resistant bacteria or fungi, which requires urgently and efficient solutions. Alternatively, microbes also produce compounds or chemicals highly valuable to humans of which require continuous refinement and improvement of yields. Artificial intelligence (AI) is a promising tool to search for solutions combatting against diseases and facilitating productivity underpinned by robust research providing accurate information. However, the extent of AI credibility is yet to be fully understood. In terms of human bias, AI could arguably act as a means of ensuring scientific objectivity to increase accuracy and precision, however, whether this is possible or not has not been fully discussed. Human bias and error can be introduced at any step of the research process, including conducting experiments and data processing, through to influencing clinical applications. Despite AI's contribution to advancing knowledge, the question remains, is AI able to achieve objectivity in microbiological research? Here, the benefits, drawbacks, and responsibilities of AI utilization in microbiological research and clinical applications were discussed.
Collapse
Affiliation(s)
- Fengyi Wang
- School of Dental Sciences, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4BW, UK
| | - Angeliki Marouli
- School of Dental Sciences, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4BW, UK
| | - Pisit Charoenwongwatthana
- School of Dental Sciences, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4BW, UK
- Department of Oral Medicine and Periodontology, Faculty of Dentistry, Mahidol University, Bangkok, 10400, Thailand
| | - Chien-Yi Chang
- School of Dental Sciences, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, NE2 4BW, UK
| |
Collapse
|
3
|
Hassan YM, Mohamed AS, Hassan YM, El-Sayed WM. Recent developments and future directions in point-of-care next-generation CRISPR-based rapid diagnosis. Clin Exp Med 2025; 25:33. [PMID: 39789283 PMCID: PMC11717804 DOI: 10.1007/s10238-024-01540-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 12/15/2024] [Indexed: 01/12/2025]
Abstract
The demand for sensitive, rapid, and affordable diagnostic techniques has surged, particularly following the COVID-19 pandemic, driving the development of CRISPR-based diagnostic tools that utilize Cas effector proteins (such as Cas9, Cas12, and Cas13) as viable alternatives to traditional nucleic acid-based detection methods. These CRISPR systems, often integrated with biosensing and amplification technologies, provide precise, rapid, and portable diagnostics, making on-site testing without the need for extensive infrastructure feasible, especially in underserved or rural areas. In contrast, traditional diagnostic methods, while still essential, are often limited by the need for costly equipment and skilled operators, restricting their accessibility. As a result, developing accessible, user-friendly solutions for at-home, field, and laboratory diagnostics has become a key focus in CRISPR diagnostic innovations. This review examines the current state of CRISPR-based diagnostics and their potential applications across a wide range of diseases, including cancers (e.g., colorectal and breast cancer), genetic disorders (e.g., sickle cell disease), and infectious diseases (e.g., tuberculosis, malaria, Zika virus, and human papillomavirus). Additionally, the integration of machine learning (ML) and artificial intelligence (AI) to enhance the accuracy, scalability, and efficiency of CRISPR diagnostics is discussed, alongside the challenges of incorporating CRISPR technologies into point-of-care settings. The review also explores the potential for these cutting-edge tools to revolutionize disease diagnosis and personalized treatment in the future, while identifying the challenges and future directions necessary to address existing gaps in CRISPR-based diagnostic research.
Collapse
Affiliation(s)
- Youssef M Hassan
- Department of Zoology, Faculty of Science, Ain Shams University, Abbassia, Cairo, 11566, Egypt
| | - Ahmed S Mohamed
- Biotechnology Program, Faculty of Science, Ain Shams University, Abbassia, Cairo, 11566, Egypt
| | - Yaser M Hassan
- Biotechnology Program, Faculty of Science, Ain Shams University, Abbassia, Cairo, 11566, Egypt
| | - Wael M El-Sayed
- Department of Zoology, Faculty of Science, Ain Shams University, Abbassia, Cairo, 11566, Egypt.
| |
Collapse
|
4
|
Chu HY, Peng J, Mou Y, Wong ASL. Quantifying Protein-Nucleic Acid Interactions for Engineering Useful CRISPR-Cas9 Genome-Editing Variants. Methods Mol Biol 2025; 2870:227-243. [PMID: 39543038 DOI: 10.1007/978-1-0716-4213-9_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Numerous high-specificity Cas9 variants have been engineered for precision genome editing. These variants typically harbor multiple mutations designed to alter the Cas9-single guide RNA (sgRNA)-DNA complex interactions for reduced off-target cleavage. By dissecting the contributions of individual mutations, we attempt to derive principles for designing high-specificity Cas9 variants. Here, we computationally modeled the specificity harnessing mutations of the widely used Cas9 isolated from Streptococcus pyogenes (SpCas9) and investigated their individual mutational effects. We quantified the mutational effects in terms of energy and contact changes by comparing the wild-type and mutant structures. We found that these mutations disrupt the protein-protein or protein-DNA contacts within the Cas9-sgRNA-DNA complex. We also identified additional impacted amino acid sites via energy changes that constitute the structural microenvironment encompassing the focal mutation, giving insights into how the mutations contribute to the high-specificity phenotype of SpCas9. Our method outlines a strategy to evaluate mutational effects that can facilitate rational design for Cas9 optimization.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Jiaxing Peng
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Yuanbiao Mou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
5
|
Zhou B, Zheng L, Wu B, Yi K, Zhong B, Tan Y, Liu Q, Liò P, Hong L. A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity. Cell Discov 2024; 10:95. [PMID: 39251570 PMCID: PMC11385924 DOI: 10.1038/s41421-024-00728-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 08/13/2024] [Indexed: 09/11/2024] Open
Abstract
Deep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions. CPDiffusion accommodates protein-specific conditions, such as secondary structures and highly conserved amino acids. Without relying on extensive training data, CPDiffusion effectively captures highly conserved residues and sequence features for specific protein families. We applied CPDiffusion to generate artificial sequences of Argonaute (Ago) proteins based on the backbone structures of wild-type (WT) Kurthia massiliensis Ago (KmAgo) and Pyrococcus furiosus Ago (PfAgo), which are complex multi-domain programmable endonucleases. The generated sequences deviate by up to nearly 400 amino acids from their WT templates. Experimental tests demonstrated that the majority of the generated proteins for both KmAgo and PfAgo show unambiguous activity in DNA cleavage, with many of them exhibiting superior activity as compared to the WT. These findings underscore CPDiffusion's remarkable success rate in generating novel sequences for proteins with complex structures and functions in a single step, leading to enhanced activity. This approach facilitates the design of enzymes with multi-domain molecular structures and intricate functions through in silico generation and screening, all accomplished without the need for supervision from labeled data.
Collapse
Affiliation(s)
- Bingxin Zhou
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
- Shanghai National Center for Applied Mathematics (SJTU center), Shanghai Jiao Tong University, Shanghai, China
| | - Lirong Zheng
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Department of Cell and Developmental Biology & Michigan Neuroscience Institute, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Banghao Wu
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Kai Yi
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
| | - Bozitao Zhong
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Tan
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Qian Liu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai National Center for Applied Mathematics (SJTU center), Shanghai Jiao Tong University, Shanghai, China.
- Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| |
Collapse
|
6
|
Capponi S, Wang S. AI in cellular engineering and reprogramming. Biophys J 2024; 123:2658-2670. [PMID: 38576162 PMCID: PMC11393708 DOI: 10.1016/j.bpj.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/19/2024] [Accepted: 04/01/2024] [Indexed: 04/06/2024] Open
Abstract
During the last decade, artificial intelligence (AI) has increasingly been applied in biophysics and related fields, including cellular engineering and reprogramming, offering novel approaches to understand, manipulate, and control cellular function. The potential of AI lies in its ability to analyze complex datasets and generate predictive models. AI algorithms can process large amounts of data from single-cell genomics and multiomic technologies, allowing researchers to gain mechanistic insights into the control of cell identity and function. By integrating and interpreting these complex datasets, AI can help identify key molecular events and regulatory pathways involved in cellular reprogramming. This knowledge can inform the design of precision engineering strategies, such as the development of new transcription factor and signaling molecule cocktails, to manipulate cell identity and drive authentic cell fate across lineage boundaries. Furthermore, when used in combination with computational methods, AI can accelerate and improve the analysis and understanding of the intricate relationships between genes, proteins, and cellular processes. In this review article, we explore the current state of AI applications in biophysics with a specific focus on cellular engineering and reprogramming. Then, we showcase a couple of recent applications where we combined machine learning with experimental and computational techniques. Finally, we briefly discuss the challenges and prospects of AI in cellular engineering and reprogramming, emphasizing the potential of these technologies to revolutionize our ability to engineer cells for a variety of applications, from disease modeling and drug discovery to regenerative medicine and biomanufacturing.
Collapse
Affiliation(s)
- Sara Capponi
- IBM Almaden Research Center, San Jose, California; Center for Cellular Construction, San Francisco, California.
| | - Shangying Wang
- Bay Area Institute of Science, Altos Labs, Redwood City, California.
| |
Collapse
|
7
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
8
|
Guan A, He Z, Wang X, Jia ZJ, Qin J. Engineering the next-generation synthetic cell factory driven by protein engineering. Biotechnol Adv 2024; 73:108366. [PMID: 38663492 DOI: 10.1016/j.biotechadv.2024.108366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/21/2024] [Accepted: 04/22/2024] [Indexed: 05/09/2024]
Abstract
Synthetic cell factory offers substantial advantages in economically efficient production of biofuels, chemicals, and pharmaceutical compounds. However, to create a high-performance synthetic cell factory, precise regulation of cellular material and energy flux is essential. In this context, protein components including enzymes, transcription factor-based biosensors and transporters play pivotal roles. Protein engineering aims to create novel protein variants with desired properties by modifying or designing protein sequences. This review focuses on summarizing the latest advancements of protein engineering in optimizing various aspects of synthetic cell factory, including: enhancing enzyme activity to eliminate production bottlenecks, altering enzyme selectivity to steer metabolic pathways towards desired products, modifying enzyme promiscuity to explore innovative routes, and improving the efficiency of transporters. Furthermore, the utilization of protein engineering to modify protein-based biosensors accelerates evolutionary process and optimizes the regulation of metabolic pathways. The remaining challenges and future opportunities in this field are also discussed.
Collapse
Affiliation(s)
- Ailin Guan
- College of Biomass Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Zixi He
- College of Biomass Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Xin Wang
- West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Zhi-Jun Jia
- West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Jiufu Qin
- College of Biomass Science and Engineering, Sichuan University, Chengdu 610065, China.
| |
Collapse
|
9
|
Shrestha S, Barvenik KJ, Chen T, Yang H, Li Y, Kesavan MM, Little JM, Whitley HC, Teng Z, Luo Y, Tubaldi E, Chen PY. Machine intelligence accelerated design of conductive MXene aerogels with programmable properties. Nat Commun 2024; 15:4685. [PMID: 38824129 PMCID: PMC11144242 DOI: 10.1038/s41467-024-49011-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 05/14/2024] [Indexed: 06/03/2024] Open
Abstract
Designing ultralight conductive aerogels with tailored electrical and mechanical properties is critical for various applications. Conventional approaches rely on iterative, time-consuming experiments across a vast parameter space. Herein, an integrated workflow is developed to combine collaborative robotics with machine learning to accelerate the design of conductive aerogels with programmable properties. An automated pipetting robot is operated to prepare 264 mixtures of Ti3C2Tx MXene, cellulose, gelatin, and glutaraldehyde at different ratios/loadings. After freeze-drying, the aerogels' structural integrity is evaluated to train a support vector machine classifier. Through 8 active learning cycles with data augmentation, 162 unique conductive aerogels are fabricated/characterized via robotics-automated platforms, enabling the construction of an artificial neural network prediction model. The prediction model conducts two-way design tasks: (1) predicting the aerogels' physicochemical properties from fabrication parameters and (2) automating the inverse design of aerogels for specific property requirements. The combined use of model interpretation and finite element simulations validates a pronounced correlation between aerogel density and compressive strength. The model-suggested aerogels with high conductivity, customized strength, and pressure insensitivity allow for compression-stable Joule heating for wearable thermal management.
Collapse
Affiliation(s)
- Snehi Shrestha
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Kieran James Barvenik
- Department of Mechanical Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Tianle Chen
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Haochen Yang
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Yang Li
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Meera Muthachi Kesavan
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Joshua M Little
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Hayden C Whitley
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA
| | - Zi Teng
- US Department of Agriculture, Agricultural Research Service, Food Quality Laboratory and Environment Microbial Food Safety Laboratory, Beltsville Agricultural Research Center, Beltsville, MD, 20725, USA
| | - Yaguang Luo
- US Department of Agriculture, Agricultural Research Service, Food Quality Laboratory and Environment Microbial Food Safety Laboratory, Beltsville Agricultural Research Center, Beltsville, MD, 20725, USA
| | - Eleonora Tubaldi
- Department of Mechanical Engineering, University of Maryland, College Park, MD, 20742, USA.
- Maryland Robotics Center, College Park, MD, 20742, USA.
| | - Po-Yen Chen
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD, 20742, USA.
- Maryland Robotics Center, College Park, MD, 20742, USA.
| |
Collapse
|
10
|
Lim SR, Lee SJ. Multiplex CRISPR-Cas Genome Editing: Next-Generation Microbial Strain Engineering. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:11871-11884. [PMID: 38744727 PMCID: PMC11141556 DOI: 10.1021/acs.jafc.4c01650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024]
Abstract
Genome editing is a crucial technology for obtaining desired phenotypes in a variety of species, ranging from microbes to plants, animals, and humans. With the advent of CRISPR-Cas technology, it has become possible to edit the intended sequence by modifying the target recognition sequence in guide RNA (gRNA). By expressing multiple gRNAs simultaneously, it is possible to edit multiple targets at the same time, allowing for the simultaneous introduction of various functions into the cell. This can significantly reduce the time and cost of obtaining engineered microbial strains for specific traits. In this review, we investigate the resolution of multiplex genome editing and its application in engineering microorganisms, including bacteria and yeast. Furthermore, we examine how recent advancements in artificial intelligence technology could assist in microbial genome editing and engineering. Based on these insights, we present our perspectives on the future evolution and potential impact of multiplex genome editing technologies in the agriculture and food industry.
Collapse
Affiliation(s)
- Se Ra Lim
- Department of Systems Biotechnology
and Institute of Microbiomics, Chung-Ang
University, Anseong 17546, Republic
of Korea
| | - Sang Jun Lee
- Department of Systems Biotechnology
and Institute of Microbiomics, Chung-Ang
University, Anseong 17546, Republic
of Korea
| |
Collapse
|
11
|
Callaway E. 'ChatGPT for CRISPR' creates new gene-editing tools. Nature 2024; 629:272. [PMID: 38684833 DOI: 10.1038/d41586-024-01243-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
|
12
|
Chen H, Lu Z, Ma L. A top variant identification pipeline for protein engineering. Cell Syst 2024; 15:105-106. [PMID: 38387439 DOI: 10.1016/j.cels.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/24/2024]
Abstract
Understanding the fitness of protein variants with combinatorial mutations is critical for effective protein engineering. In this issue of Cell Systems, Chu et al. present TopVIP, a top variant identification pipeline that enables accurate picking of the greatest number of best-performing protein variants with high-fitness leveraging zero-shot predictor and low-N iterative sampling.
Collapse
Affiliation(s)
- Hui Chen
- Westlake Genetech, Hangzhou, China
| | - Zhike Lu
- Westlake Genetech, Hangzhou, China; School of Life Sciences, Westlake University, Hangzhou, China
| | - Lijia Ma
- School of Life Sciences, Westlake University, Hangzhou, China.
| |
Collapse
|
13
|
Chu HY, Fong JHC, Thean DGL, Zhou P, Fung FKC, Huang Y, Wong ASL. Accurate top protein variant discovery via low-N pick-and-validate machine learning. Cell Syst 2024; 15:193-203.e6. [PMID: 38340729 DOI: 10.1016/j.cels.2024.01.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/11/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024]
Abstract
A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - John H C Fong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Dawn G L Thean
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Peng Zhou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Frederic K C Fung
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Yuanhua Huang
- School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China; Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
14
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform 2023; 24:bbad289. [PMID: 37580175 PMCID: PMC10516362 DOI: 10.1093/bib/bbad289] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824 MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824 MI, USA
| |
Collapse
|
15
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. ARXIV 2023:arXiv:2307.14587v1. [PMID: 37547662 PMCID: PMC10402185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI, USA
| |
Collapse
|
16
|
Abbate E, Andrion J, Apel A, Biggs M, Chaves J, Cheung K, Ciesla A, Clark-ElSayed A, Clay M, Contridas R, Fox R, Hein G, Held D, Horwitz A, Jenkins S, Kalbarczyk K, Krishnamurthy N, Mirsiaghi M, Noon K, Rowe M, Shepherd T, Tarasava K, Tarasow TM, Thacker D, Villa G, Yerramsetty K. Optimizing the strain engineering process for industrial-scale production of bio-based molecules. J Ind Microbiol Biotechnol 2023; 50:kuad025. [PMID: 37656881 PMCID: PMC10548853 DOI: 10.1093/jimb/kuad025] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
Biomanufacturing could contribute as much as ${\$}$30 trillion to the global economy by 2030. However, the success of the growing bioeconomy depends on our ability to manufacture high-performing strains in a time- and cost-effective manner. The Design-Build-Test-Learn (DBTL) framework has proven to be an effective strain engineering approach. Significant improvements have been made in genome engineering, genotyping, and phenotyping throughput over the last couple of decades that have greatly accelerated the DBTL cycles. However, to achieve a radical reduction in strain development time and cost, we need to look at the strain engineering process through a lens of optimizing the whole cycle, as opposed to simply increasing throughput at each stage. We propose an approach that integrates all 4 stages of the DBTL cycle and takes advantage of the advances in computational design, high-throughput genome engineering, and phenotyping methods, as well as machine learning tools for making predictions about strain scale-up performance. In this perspective, we discuss the challenges of industrial strain engineering, outline the best approaches to overcoming these challenges, and showcase examples of successful strain engineering projects for production of heterologous proteins, amino acids, and small molecules, as well as improving tolerance, fitness, and de-risking the scale-up of industrial strains.
Collapse
Affiliation(s)
- Eric Abbate
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Jennifer Andrion
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Amanda Apel
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Matthew Biggs
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Julie Chaves
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Kristi Cheung
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Anthony Ciesla
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Alia Clark-ElSayed
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Michael Clay
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Riarose Contridas
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Richard Fox
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Glenn Hein
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Dan Held
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Andrew Horwitz
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Stefan Jenkins
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | | | | | - Mona Mirsiaghi
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Katherine Noon
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Mike Rowe
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Tyson Shepherd
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Katia Tarasava
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Theodore M Tarasow
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Drew Thacker
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | - Gladys Villa
- Inscripta, Inc., 5720 Stoneridge Dr, Suite 300, Pleasanton, CA 94588, USA
| | | |
Collapse
|
17
|
Song Z, Zhang Q, Wu W, Pu Z, Yu H. Rational design of enzyme activity and enantioselectivity. Front Bioeng Biotechnol 2023; 11:1129149. [PMID: 36761300 PMCID: PMC9902596 DOI: 10.3389/fbioe.2023.1129149] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 01/16/2023] [Indexed: 01/25/2023] Open
Abstract
The strategy of rational design to engineer enzymes is to predict the potential mutants based on the understanding of the relationships between protein structure and function, and subsequently introduce the mutations using the site-directed mutagenesis. Rational design methods are universal, relatively fast and have the potential to be developed into algorithms that can quantitatively predict the performance of the designed sequences. Compared to the protein stability, it was more challenging to design an enzyme with improved activity or selectivity, due to the complexity of enzyme molecular structure and inadequate understanding of the relationships between enzyme structures and functions. However, with the development of computational force, advanced algorithm and a deeper understanding of enzyme catalytic mechanisms, rational design could significantly simplify the process of engineering enzyme functions and the number of studies applying rational design strategy has been increasing. Here, we reviewed the recent advances of applying the rational design strategy to engineer enzyme functions including activity and enantioselectivity. Five strategies including multiple sequence alignment, strategy based on steric hindrance, strategy based on remodeling interaction network, strategy based on dynamics modification and computational protein design are discussed and the successful cases using these strategies are introduced.
Collapse
Affiliation(s)
- Zhongdi Song
- Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province, Interdisciplinary Research Academy, Zhejiang Shuren University, Hangzhou, China
| | - Qunfeng Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wenhui Wu
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, China
| | - Zhongji Pu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, Zhejiang, China
| |
Collapse
|
18
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|