1
|
Ding S, Zheng J, Jia C. DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features. Brief Funct Genomics 2025; 24:elae043. [PMID: 39528429 PMCID: PMC11735754 DOI: 10.1093/bfgp/elae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/12/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.
Collapse
Affiliation(s)
- Shumei Ding
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Cardiff RAL, Carothers JM, Zalatan JG, Sauro HM. Systems-Level Modeling for CRISPR-Based Metabolic Engineering. ACS Synth Biol 2024; 13:2643-2652. [PMID: 39119666 DOI: 10.1021/acssynbio.4c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
The CRISPR-Cas system has enabled the development of sophisticated, multigene metabolic engineering programs through the use of guide RNA-directed activation or repression of target genes. To optimize biosynthetic pathways in microbial systems, we need improved models to inform design and implementation of transcriptional programs. Recent progress has resulted in new modeling approaches for identifying gene targets and predicting the efficacy of guide RNA targeting. Genome-scale and flux balance models have successfully been applied to identify targets for improving biosynthetic production yields using combinatorial CRISPR-interference (CRISPRi) programs. The advent of new approaches for tunable and dynamic CRISPR activation (CRISPRa) promises to further advance these engineering capabilities. Once appropriate targets are identified, guide RNA prediction models can lead to increased efficacy in gene targeting. Developing improved models and incorporating approaches from machine learning may be able to overcome current limitations and greatly expand the capabilities of CRISPR-Cas9 tools for metabolic engineering.
Collapse
Affiliation(s)
- Ryan A L Cardiff
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington 98195, United States
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
| | - James M Carothers
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington 98195, United States
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Jesse G Zalatan
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington 98195, United States
- Department of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Herbert M Sauro
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington 98195, United States
- Department of Bioengineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
3
|
Burbano DA, Kiattisewee C, Karanjia AV, Cardiff RAL, Faulkner ID, Sugianto W, Carothers JM. CRISPR Tools for Engineering Prokaryotic Systems: Recent Advances and New Applications. Annu Rev Chem Biomol Eng 2024; 15:389-430. [PMID: 38598861 DOI: 10.1146/annurev-chembioeng-100522-114706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
In the past decades, the broad selection of CRISPR-Cas systems has revolutionized biotechnology by enabling multimodal genetic manipulation in diverse organisms. Rooted in a molecular engineering perspective, we recapitulate the different CRISPR components and how they can be designed for specific genetic engineering applications. We first introduce the repertoire of Cas proteins and tethered effectors used to program new biological functions through gene editing and gene regulation. We review current guide RNA (gRNA) design strategies and computational tools and how CRISPR-based genetic circuits can be constructed through regulated gRNA expression. Then, we present recent advances in CRISPR-based biosensing, bioproduction, and biotherapeutics across in vitro and in vivo prokaryotic systems. Finally, we discuss forthcoming applications in prokaryotic CRISPR technology that will transform synthetic biology principles in the near future.
Collapse
Affiliation(s)
- Diego Alba Burbano
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - Cholpisit Kiattisewee
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - Ava V Karanjia
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - Ryan A L Cardiff
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - Ian D Faulkner
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - Widianti Sugianto
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| | - James M Carothers
- Department of Chemical Engineering, University of Washington, Seattle, Washington, USA
- Molecular Engineering & Sciences Institute and Center for Synthetic Biology, University of Washington, Seattle, Washington, USA;
| |
Collapse
|
4
|
Lim SR, Lee SJ. Multiplex CRISPR-Cas Genome Editing: Next-Generation Microbial Strain Engineering. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:11871-11884. [PMID: 38744727 PMCID: PMC11141556 DOI: 10.1021/acs.jafc.4c01650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024]
Abstract
Genome editing is a crucial technology for obtaining desired phenotypes in a variety of species, ranging from microbes to plants, animals, and humans. With the advent of CRISPR-Cas technology, it has become possible to edit the intended sequence by modifying the target recognition sequence in guide RNA (gRNA). By expressing multiple gRNAs simultaneously, it is possible to edit multiple targets at the same time, allowing for the simultaneous introduction of various functions into the cell. This can significantly reduce the time and cost of obtaining engineered microbial strains for specific traits. In this review, we investigate the resolution of multiplex genome editing and its application in engineering microorganisms, including bacteria and yeast. Furthermore, we examine how recent advancements in artificial intelligence technology could assist in microbial genome editing and engineering. Based on these insights, we present our perspectives on the future evolution and potential impact of multiplex genome editing technologies in the agriculture and food industry.
Collapse
Affiliation(s)
- Se Ra Lim
- Department of Systems Biotechnology
and Institute of Microbiomics, Chung-Ang
University, Anseong 17546, Republic
of Korea
| | - Sang Jun Lee
- Department of Systems Biotechnology
and Institute of Microbiomics, Chung-Ang
University, Anseong 17546, Republic
of Korea
| |
Collapse
|
5
|
Dixit S, Kumar A, Srinivasan K, Vincent PMDR, Ramu Krishnan N. Advancing genome editing with artificial intelligence: opportunities, challenges, and future directions. Front Bioeng Biotechnol 2024; 11:1335901. [PMID: 38260726 PMCID: PMC10800897 DOI: 10.3389/fbioe.2023.1335901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
Clustered regularly interspaced short palindromic repeat (CRISPR)-based genome editing (GED) technologies have unlocked exciting possibilities for understanding genes and improving medical treatments. On the other hand, Artificial intelligence (AI) helps genome editing achieve more precision, efficiency, and affordability in tackling various diseases, like Sickle cell anemia or Thalassemia. AI models have been in use for designing guide RNAs (gRNAs) for CRISPR-Cas systems. Tools like DeepCRISPR, CRISTA, and DeepHF have the capability to predict optimal guide RNAs (gRNAs) for a specified target sequence. These predictions take into account multiple factors, including genomic context, Cas protein type, desired mutation type, on-target/off-target scores, potential off-target sites, and the potential impacts of genome editing on gene function and cell phenotype. These models aid in optimizing different genome editing technologies, such as base, prime, and epigenome editing, which are advanced techniques to introduce precise and programmable changes to DNA sequences without relying on the homology-directed repair pathway or donor DNA templates. Furthermore, AI, in collaboration with genome editing and precision medicine, enables personalized treatments based on genetic profiles. AI analyzes patients' genomic data to identify mutations, variations, and biomarkers associated with different diseases like Cancer, Diabetes, Alzheimer's, etc. However, several challenges persist, including high costs, off-target editing, suitable delivery methods for CRISPR cargoes, improving editing efficiency, and ensuring safety in clinical applications. This review explores AI's contribution to improving CRISPR-based genome editing technologies and addresses existing challenges. It also discusses potential areas for future research in AI-driven CRISPR-based genome editing technologies. The integration of AI and genome editing opens up new possibilities for genetics, biomedicine, and healthcare, with significant implications for human health.
Collapse
Affiliation(s)
- Shriniket Dixit
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Anant Kumar
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| | - Nadesh Ramu Krishnan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
6
|
Miao R, Jahn M, Shabestary K, Peltier G, Hudson EP. CRISPR interference screens reveal growth-robustness tradeoffs in Synechocystis sp. PCC 6803 across growth conditions. THE PLANT CELL 2023; 35:3937-3956. [PMID: 37494719 PMCID: PMC10615215 DOI: 10.1093/plcell/koad208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 07/20/2023] [Indexed: 07/28/2023]
Abstract
Barcoded mutant libraries are a powerful tool for elucidating gene function in microbes, particularly when screened in multiple growth conditions. Here, we screened a pooled CRISPR interference library of the model cyanobacterium Synechocystis sp. PCC 6803 in 11 bioreactor-controlled conditions, spanning multiple light regimes and carbon sources. This gene repression library contained 21,705 individual mutants with high redundancy over all open reading frames and noncoding RNAs. Comparison of the derived gene fitness scores revealed multiple instances of gene repression being beneficial in 1 condition while generally detrimental in others, particularly for genes within light harvesting and conversion, such as antennae components at high light and PSII subunits during photoheterotrophy. Suboptimal regulation of such genes likely represents a tradeoff of reduced growth speed for enhanced robustness to perturbation. The extensive data set assigns condition-specific importance to many previously unannotated genes and suggests additional functions for central metabolic enzymes. Phosphoribulokinase, glyceraldehyde-3-phosphate dehydrogenase, and the small protein CP12 were critical for mixotrophy and photoheterotrophy, which implicates the ternary complex as important for redirecting metabolic flux in these conditions in addition to inactivation of the Calvin cycle in the dark. To predict the potency of sgRNA sequences, we applied machine learning on sgRNA sequences and gene repression data, which showed the importance of C enrichment and T depletion proximal to the PAM site. Fitness data for all genes in all conditions are compiled in an interactive web application.
Collapse
Affiliation(s)
- Rui Miao
- School of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH—Royal Institute of Technology, Stockholm, SE-17165,Sweden
| | - Michael Jahn
- School of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH—Royal Institute of Technology, Stockholm, SE-17165,Sweden
- Max Planck Unit for the Science of Pathogens, 10117 Berlin,Germany
| | - Kiyan Shabestary
- School of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH—Royal Institute of Technology, Stockholm, SE-17165,Sweden
- Department of Bioengineering and Imperial College Centre for Synthetic Biology, Imperial College London, London SW7 2AZ,UK
| | - Gilles Peltier
- Aix Marseille Univ, CEA, CNRS, Institut de Biosciences et Biotechnologies Aix-Marseille, CEA Cadarache, 13108 Saint Paul-Lez-Durance,France
| | - Elton P Hudson
- School of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH—Royal Institute of Technology, Stockholm, SE-17165,Sweden
| |
Collapse
|
7
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
8
|
Lee M. Deep learning in CRISPR-Cas systems: a review of recent studies. Front Bioeng Biotechnol 2023; 11:1226182. [PMID: 37469443 PMCID: PMC10352112 DOI: 10.3389/fbioe.2023.1226182] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 06/22/2023] [Indexed: 07/21/2023] Open
Abstract
In genetic engineering, the revolutionary CRISPR-Cas system has proven to be a vital tool for precise genome editing. Simultaneously, the emergence and rapid evolution of deep learning methodologies has provided an impetus to the scientific exploration of genomic data. These concurrent advancements mandate regular investigation of the state-of-the-art, particularly given the pace of recent developments. This review focuses on the significant progress achieved during 2019-2023 in the utilization of deep learning for predicting guide RNA (gRNA) activity in the CRISPR-Cas system, a key element determining the effectiveness and specificity of genome editing procedures. In this paper, an analytical overview of contemporary research is provided, with emphasis placed on the amalgamation of artificial intelligence and genetic engineering. The importance of our review is underscored by the necessity to comprehend the rapidly evolving deep learning methodologies and their potential impact on the effectiveness of the CRISPR-Cas system. By analyzing recent literature, this review highlights the achievements and emerging trends in the integration of deep learning with the CRISPR-Cas systems, thus contributing to the future direction of this essential interdisciplinary research area.
Collapse
|
9
|
Sherkatghanad Z, Abdar M, Charlier J, Makarenkov V. Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review. Brief Bioinform 2023; 24:bbad131. [PMID: 37080758 PMCID: PMC10199778 DOI: 10.1093/bib/bbad131] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/22/2023] Open
Abstract
CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.
Collapse
Affiliation(s)
- Zeinab Sherkatghanad
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 3216, Geelong, VIC, Australia
| | - Jeremy Charlier
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Vladimir Makarenkov
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| |
Collapse
|
10
|
Dallo T, Krishnakumar R, Kolker SD, Ruffing AM. High-Density Guide RNA Tiling and Machine Learning for Designing CRISPR Interference in Synechococcus sp. PCC 7002. ACS Synth Biol 2023; 12:1175-1186. [PMID: 36893454 DOI: 10.1021/acssynbio.2c00653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
While CRISPRi was previously established in Synechococcus sp. PCC 7002 (hereafter 7002), the design principles for guide RNA (gRNA) effectiveness remain largely unknown. Here, 76 strains of 7002 were constructed with gRNAs targeting three reporter systems to evaluate features that impact gRNA efficiency. Correlation analysis of the data revealed that important features of gRNA design include the position relative to the start codon, GC content, protospacer adjacent motif (PAM) site, minimum free energy, and targeted DNA strand. Unexpectedly, some gRNAs targeting upstream of the promoter region showed small but significant increases in reporter expression, and gRNAs targeting the terminator region showed greater repression than gRNAs targeting the 3' end of the coding sequence. Machine learning algorithms enabled prediction of gRNA effectiveness, with Random Forest having the best performance across all training sets. This study demonstrates that high-density gRNA data and machine learning can improve gRNA design for tuning gene expression in 7002.
Collapse
Affiliation(s)
- Tessa Dallo
- Molecular and Microbiology, Sandia National Laboratories, P.O. Box 5800, MS 1413, Albuquerque, New Mexico 87185, United States
| | - Raga Krishnakumar
- Systems Biology, Sandia National Laboratories, P.O. Box 969, MS 9292, Livermore, California 94551, United States
| | - Stephanie D Kolker
- Molecular and Microbiology, Sandia National Laboratories, P.O. Box 5800, MS 1413, Albuquerque, New Mexico 87185, United States
| | - Anne M Ruffing
- Molecular and Microbiology, Sandia National Laboratories, P.O. Box 5800, MS 1413, Albuquerque, New Mexico 87185, United States
| |
Collapse
|
11
|
Rottinghaus AG, Vo S, Moon TS. Computational design of CRISPR guide RNAs to enable strain-specific control of microbial consortia. Proc Natl Acad Sci U S A 2023; 120:e2213154120. [PMID: 36574681 PMCID: PMC9910470 DOI: 10.1073/pnas.2213154120] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 11/29/2022] [Indexed: 12/28/2022] Open
Abstract
Microbes naturally coexist in complex, multistrain communities. However, extracting individual microbes from and specifically manipulating the composition of these consortia remain challenging. The sequence-specific nature of CRISPR guide RNAs can be leveraged to accurately differentiate microorganisms and facilitate the creation of tools that can achieve these tasks. We developed a computational program, ssCRISPR, which designs strain-specific CRISPR guide RNA sequences with user-specified target strains, protected strains, and guide RNA properties. We experimentally verify the accuracy of the strain specificity predictions in both Escherichia coli and Pseudomonas spp. and show that up to three nucleotide mismatches are often required to ensure perfect specificity. To demonstrate the functionality of ssCRISPR, we apply computationally designed CRISPR-Cas9 guide RNAs to two applications: the purification of specific microbes through one- and two-plasmid transformation workflows and the targeted removal of specific microbes using DNA-loaded liposomes. For strain purification, we utilize gRNAs designed to target and kill all microbes in a consortium except the specific microbe to be isolated. For strain elimination, we utilize gRNAs designed to target only the unwanted microbe while protecting all other strains in the community. ssCRISPR will be of use in diverse microbiota engineering applications.
Collapse
Affiliation(s)
- Austin G. Rottinghaus
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO 63130
| | - Steven Vo
- Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO 63110
| | - Tae Seok Moon
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO 63130
- Division of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO 63110
| |
Collapse
|
12
|
Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.
Collapse
|
13
|
Li B, Ai D, Liu X. CNN-XG: A Hybrid Framework for sgRNA On-Target Prediction. Biomolecules 2022; 12:409. [PMID: 35327601 PMCID: PMC8945678 DOI: 10.3390/biom12030409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/23/2022] [Accepted: 03/03/2022] [Indexed: 02/04/2023] Open
Abstract
As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode.
Collapse
Affiliation(s)
- Bohao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
- Basic Experimental Center of Natural Science, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| |
Collapse
|
14
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
15
|
Racharaks R, Arnold W, Peccia J. Development of CRISPR-Cas9 knock-in tools for free fatty acid production using the fast-growing cyanobacterial strain Synechococcus elongatus UTEX 2973. J Microbiol Methods 2021; 189:106315. [PMID: 34454980 DOI: 10.1016/j.mimet.2021.106315] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/23/2021] [Accepted: 08/23/2021] [Indexed: 12/26/2022]
Abstract
Synechococcus elongatus UTEX 2973 has one of the fastest measured doubling time of cyanobacteria making it an important candidate for metabolic engineering. Traditional genetic engineering methods, which rely on homologous recombination, however, are inefficient, labor-intensive, and time-consuming due to the oligoploidy or polyploidy nature of cyanobacteria and the reliance on unique antibiotic resistance markers. CRISPR-Cas9 has emerged as an effective and versatile editing platform in a wide variety of organisms, but its application for cyanobacterial engineering is limited by the inherent toxicity of Cas9 resulting in poor transformation efficiencies. Here, we demonstrated that a single-plasmid CRISPR-Cas9 system, pCRISPOmyces-2, can effectively knock-in a truncated thioesterase gene from Escherichia coli to generate free fatty acid (FFA) producing mutants of Syn2973. To do so, three parameters were evaluated on the effect of generating recipient colonies after conjugation with pCRISPOmyces-2-based plasmids: 1) a modified conjugation protocol termed streaked conjugation, 2) the deletion of the gene encoding RecJ exonuclease, and 3) single guide RNA (sgRNA) sequence. With the use of the streaked conjugation protocol and a ΔrecJ mutant strain of Syn2973, the conjugation efficiency for the pCRISPomyces-2 plasmid could be improved by 750-fold over the wildtype (WT) for a conjugation efficiency of 2.0 × 10-6 transconjugants/recipient cell. While deletion of the RecJ exonuclease alone increased the conjugation efficiency by 150-fold over the WT, FFA generation was impaired in FFA-producing mutants with the ΔrecJ background, and the large number of poor FFA-producing isolates indicated the potential increase in spontaneous mutation rates. The sgRNA sequence was found to be critical in achieving the desired CRISPR-Cas9-mediated knock-in mutation as the sgRNA impacts conjugation efficiency, likelihood of homogenous recombinants, and free fatty acid production in engineered strains.
Collapse
Affiliation(s)
- Ratanachat Racharaks
- Department of Chemical and Environmental Engineering, Yale University, New Haven, CT, USA
| | - Wyatt Arnold
- Department of Chemical and Environmental Engineering, Yale University, New Haven, CT, USA
| | - Jordan Peccia
- Department of Chemical and Environmental Engineering, Yale University, New Haven, CT, USA.
| |
Collapse
|
16
|
Vinodkumar PK, Ozcinar C, Anbarjafari G. Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network. ENTROPY (BASEL, SWITZERLAND) 2021; 23:608. [PMID: 34069050 PMCID: PMC8156774 DOI: 10.3390/e23050608] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 05/03/2021] [Accepted: 05/12/2021] [Indexed: 12/26/2022]
Abstract
CRISPR/Cas9 is a powerful genome-editing technology that has been widely applied in targeted gene repair and gene expression regulation. One of the main challenges for the CRISPR/Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far to predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques; however, this is a convoluted process that is difficult to understand and implement for researchers. In this research work, we introduce a novel graph-based approach to predict off-target efficacy of sgRNA in the CRISPR/Cas9 system that is easy to understand and replicate for researchers. This is achieved by creating a graph with sequences as nodes and by using a link prediction method to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences. We used HEK293 and K562 t datasets in our experiments. GCN predicted the off-target gene knockouts (using link prediction) by predicting the links between sgRNA and off-target sequences with an auROC value of 0.987.
Collapse
Affiliation(s)
| | - Cagri Ozcinar
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
| | - Gholamreza Anbarjafari
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
- PwC Advisory Finland, 00180 Helsinki, Finland
| |
Collapse
|
17
|
Zhang G, Zeng T, Dai Z, Dai X. Prediction of CRISPR/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks. Comput Struct Biotechnol J 2021; 19:1445-1457. [PMID: 33841753 PMCID: PMC8010402 DOI: 10.1016/j.csbj.2021.03.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 02/26/2021] [Accepted: 03/01/2021] [Indexed: 12/26/2022] Open
Abstract
CRISPR/Cas9 is a preferred genome editing tool and has been widely adapted to ranges of disciplines, from molecular biology to gene therapy. A key prerequisite for the success of CRISPR/Cas9 is its capacity to distinguish between single guide RNAs (sgRNAs) on target and homologous off-target sites. Thus, optimized design of sgRNAs by maximizing their on-target activity and minimizing their potential off-target mutations are crucial concerns for this system. Several deep learning models have been developed for comprehensive understanding of sgRNA cleavage efficacy and specificity. Although the proposed methods yield the performance results by automatically learning a suitable representation from the input data, there is still room for the improvement of accuracy and interpretability. Here, we propose novel interpretable attention-based convolutional neural networks, namely CRISPR-ONT and CRISPR-OFFT, for the prediction of CRISPR/Cas9 sgRNA on- and off-target activities, respectively. Experimental tests on public datasets demonstrate that our models significantly yield satisfactory results in terms of accuracy and interpretability. Our findings contribute to the understanding of how RNA-guide Cas9 nucleases scan the mammalian genome. Data and source codes are available at https://github.com/Peppags/CRISPRont-CRISPRofft.
Collapse
Affiliation(s)
- Guishan Zhang
- Key Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou 515063, China.,School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Tian Zeng
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.,Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| | - Xianhua Dai
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China.,Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| |
Collapse
|
18
|
Sun W, Wang H. Recent advances of genome editing and related technologies in China. Gene Ther 2020; 27:312-320. [DOI: 10.1038/s41434-020-0181-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/24/2020] [Accepted: 07/22/2020] [Indexed: 12/26/2022]
|