1
|
Danaeifar M, Najafi A. Artificial Intelligence and Computational Biology in Gene Therapy: A Review. Biochem Genet 2024:10.1007/s10528-024-10799-1. [PMID: 38635012 DOI: 10.1007/s10528-024-10799-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024]
Abstract
One of the trending fields in almost all areas of science and technology is artificial intelligence. Computational biology and artificial intelligence can help gene therapy in many steps including: gene identification, gene editing, vector design, development of new macromolecules and modeling of gene delivery. There are various tools used by computational biology and artificial intelligence in this field, such as genomics, transcriptomic and proteomics data analysis, machine learning algorithms and molecular interaction studies. These tools can introduce new gene targets, novel vectors, optimized experiment conditions, predict the outcomes and suggest the best solutions to avoid undesired immune responses following gene therapy treatment.
Collapse
Affiliation(s)
- Mohsen Danaeifar
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Science, P.O. Box 19395-5487, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Science, P.O. Box 19395-5487, Tehran, Iran.
| |
Collapse
|
2
|
Sandip Vora D, Manoj Bhandari S, Sundar D. DNA shape features improve prediction of CRISPR/Cas9 activity. Methods 2024:S1046-2023(24)00102-6. [PMID: 38641083 DOI: 10.1016/j.ymeth.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/27/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024] Open
Abstract
The CRISPR/Cas9 genome editing technology has transformed basic and translational research in biology and medicine. However, the advances are hindered by off-target effects and a paucity in the knowledge of the mechanism of the Cas9 protein. Machine learning models have been proposed for the prediction of Cas9 activity at unintended sites, yet feature engineering plays a major role in the outcome of the predictors. This study evaluates the improvement in the performance of similar predictors upon inclusion of epigenetic and DNA shape feature groups in the conventionally used sequence-based Cas9 target and off-target datasets. The approach involved the utilization of neural networks trained on a diverse range of parameters, allowing us to systematically assess the performance increase for the meticulously designed datasets- (i) sequence only, (ii) sequence and epigenetic features, and (iii) sequence, epigenetic and DNA shape feature datasets. The addition of DNA shape information significantly improved predictive performance, evaluated by Akaike and Bayesian information criteria. The evaluation of individual feature importance by permutation and LIME-based methods also indicates that not only sequence features like mismatches and nucleotide composition, but also base pairing parameters like opening and stretch, that are indicative of distortion in the DNA-RNA hybrid in the presence of mismatches, influence model outcomes.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India.
| | - Sakshi Manoj Bhandari
- Department of Mathematics, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, New Delhi 110016, India; School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India.
| |
Collapse
|
3
|
Wessels HH, Stirn A, Méndez-Mancilla A, Kim EJ, Hart SK, Knowles DA, Sanjana NE. Prediction of on-target and off-target activity of CRISPR-Cas13d guide RNAs using deep learning. Nat Biotechnol 2024; 42:628-637. [PMID: 37400521 DOI: 10.1038/s41587-023-01830-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 05/16/2023] [Indexed: 07/05/2023]
Abstract
Transcriptome engineering applications in living cells with RNA-targeting CRISPR effectors depend on accurate prediction of on-target activity and off-target avoidance. Here we design and test ~200,000 RfxCas13d guide RNAs targeting essential genes in human cells with systematically designed mismatches and insertions and deletions (indels). We find that mismatches and indels have a position- and context-dependent impact on Cas13d activity, and mismatches that result in G-U wobble pairings are better tolerated than other single-base mismatches. Using this large-scale dataset, we train a convolutional neural network that we term targeted inhibition of gene expression via gRNA design (TIGER) to predict efficacy from guide sequence and context. TIGER outperforms the existing models at predicting on-target and off-target activity on our dataset and published datasets. We show that TIGER scoring combined with specific mismatches yields the first general framework to modulate transcript expression, enabling the use of RNA-targeting CRISPRs to precisely control gene dosage.
Collapse
Affiliation(s)
- Hans-Hermann Wessels
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - Andrew Stirn
- New York Genome Center, New York City, NY, USA
- Department of Computer Science, Columbia University, New York City, NY, USA
| | - Alejandro Méndez-Mancilla
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - Eric J Kim
- Department of Computer Science, Columbia University, New York City, NY, USA
| | - Sydney K Hart
- New York Genome Center, New York City, NY, USA
- Department of Biology, New York University, New York City, NY, USA
| | - David A Knowles
- New York Genome Center, New York City, NY, USA.
- Department of Computer Science, Columbia University, New York City, NY, USA.
- Data Science Institute, Columbia University, New York City, NY, USA.
- Department of Systems Biology, Columbia University, New York City, NY, USA.
| | - Neville E Sanjana
- New York Genome Center, New York City, NY, USA.
- Department of Biology, New York University, New York City, NY, USA.
| |
Collapse
|
4
|
Zhuang J, Gao W, Su R. EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features. J Bioinform Comput Biol 2024; 22:2450001. [PMID: 38406833 DOI: 10.1142/s021972002450001x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Antimicrobial peptides (AMPs), as the preferred alternatives to antibiotics, have wide application with good prospects. Identifying AMPs through wet lab experiments remains expensive, time-consuming and challenging. Many machine learning methods have been proposed to predict AMPs and achieved good results. In this work, we combine two kinds of word embedding features with the statistical features of peptide sequences to develop an ensemble classifier, named EnAMP, in which, two deep neural networks are trained based on Word2vec and Glove word embedding features of peptide sequences, respectively, meanwhile, we utilize statistical features of peptide sequences to train random forest and support vector machine classifiers. The average of four classifiers is the final prediction result. Compared with other state-of-the-art algorithms on six datasets, EnAMP outperforms most existing models with similar computational costs, even when compared with high computational cost algorithms based on Bidirectional Encoder Representation from Transformers (BERT), the performance of our model is comparable. EnAMP source code and the data are available at https://github.com/ruisue/EnAMP.
Collapse
Affiliation(s)
- Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian, Liaoning, P. R. China
| | - Wanquan Gao
- School of Science, Dalian Maritime University, Dalian, Liaoning, P. R. China
| | - Rui Su
- School of Science, Dalian Maritime University, Dalian, Liaoning, P. R. China
| |
Collapse
|
5
|
Luo Y, Chen Y, Xie H, Zhu W, Zhang G. Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT. Comput Biol Med 2024; 169:107932. [PMID: 38199209 DOI: 10.1016/j.compbiomed.2024.107932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024]
Abstract
Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https://github.com/BrokenStringx/CRISPR-BERT.
Collapse
Affiliation(s)
- Ye Luo
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Yaowen Chen
- College of Engineering, Shantou University, Shantou, 515063, China
| | - HuanZeng Xie
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Wentao Zhu
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, 515063, China.
| |
Collapse
|
6
|
Toufikuzzaman M, Hassan Samee MA, Sohel Rahman M. CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction. Brief Bioinform 2024; 25:bbad530. [PMID: 38388680 PMCID: PMC10883906 DOI: 10.1093/bib/bbad530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/14/2023] [Accepted: 12/19/2023] [Indexed: 02/24/2024] Open
Abstract
CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.
Collapse
Affiliation(s)
- Md Toufikuzzaman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| |
Collapse
|
7
|
Dixit S, Kumar A, Srinivasan K, Vincent PMDR, Ramu Krishnan N. Advancing genome editing with artificial intelligence: opportunities, challenges, and future directions. Front Bioeng Biotechnol 2024; 11:1335901. [PMID: 38260726 PMCID: PMC10800897 DOI: 10.3389/fbioe.2023.1335901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
Clustered regularly interspaced short palindromic repeat (CRISPR)-based genome editing (GED) technologies have unlocked exciting possibilities for understanding genes and improving medical treatments. On the other hand, Artificial intelligence (AI) helps genome editing achieve more precision, efficiency, and affordability in tackling various diseases, like Sickle cell anemia or Thalassemia. AI models have been in use for designing guide RNAs (gRNAs) for CRISPR-Cas systems. Tools like DeepCRISPR, CRISTA, and DeepHF have the capability to predict optimal guide RNAs (gRNAs) for a specified target sequence. These predictions take into account multiple factors, including genomic context, Cas protein type, desired mutation type, on-target/off-target scores, potential off-target sites, and the potential impacts of genome editing on gene function and cell phenotype. These models aid in optimizing different genome editing technologies, such as base, prime, and epigenome editing, which are advanced techniques to introduce precise and programmable changes to DNA sequences without relying on the homology-directed repair pathway or donor DNA templates. Furthermore, AI, in collaboration with genome editing and precision medicine, enables personalized treatments based on genetic profiles. AI analyzes patients' genomic data to identify mutations, variations, and biomarkers associated with different diseases like Cancer, Diabetes, Alzheimer's, etc. However, several challenges persist, including high costs, off-target editing, suitable delivery methods for CRISPR cargoes, improving editing efficiency, and ensuring safety in clinical applications. This review explores AI's contribution to improving CRISPR-based genome editing technologies and addresses existing challenges. It also discusses potential areas for future research in AI-driven CRISPR-based genome editing technologies. The integration of AI and genome editing opens up new possibilities for genetics, biomedicine, and healthcare, with significant implications for human health.
Collapse
Affiliation(s)
- Shriniket Dixit
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Anant Kumar
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| | - Nadesh Ramu Krishnan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
8
|
Yang Y, Li J, Zou Q, Ruan Y, Feng H. Prediction of CRISPR-Cas9 off-target activities with mismatches and indels based on hybrid neural network. Comput Struct Biotechnol J 2023; 21:5039-5048. [PMID: 37867973 PMCID: PMC10589368 DOI: 10.1016/j.csbj.2023.10.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/24/2023] Open
Abstract
The CRISPR/Cas9 system has significantly advanced the field of gene editing, yet its clinical application is constrained by the considerable challenge of off-target effects. Although numerous deep learning models for off-target prediction have been proposed, most struggle to effectively extract the nuanced features of guide RNA (gRNA) and DNA sequence pairs and to mitigate information loss during data transmission within the model. To address these limitations, we introduce a novel Hybrid Neural Network (HNN) model that employs a parallelized network structure to fully extract pertinent features from different positions and types of bases in the sequence to minimize information loss. Notably, this study marks the first application of word embedding techniques to extract information from sequence pairs that contain insertions and deletions (Indels). Comprehensive evaluation across diverse datasets indicates that our proposed model outperforms existing state-of-the-art prediction methods in off-target prediction. The datasets and source codes supporting this study can be found at https://github.com/Yang-k955/CRISPR-HW.
Collapse
Affiliation(s)
- Yanpeng Yang
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou 311300, China
| | - Jian Li
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou 311300, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yaoping Ruan
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou 311300, China
| | - Hailin Feng
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou 311300, China
| |
Collapse
|
9
|
Liu Y, Fan R, Yi J, Cui Q, Cui C. A fusion framework of deep learning and machine learning for predicting sgRNA cleavage efficiency. Comput Biol Med 2023; 165:107476. [PMID: 37696181 DOI: 10.1016/j.compbiomed.2023.107476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/22/2023] [Accepted: 09/04/2023] [Indexed: 09/13/2023]
Abstract
CRISPR/Cas9 system is a powerful tool for genome editing. Numerous studies have shown that sgRNAs can strongly affect the efficiency of editing. However, it is still not clear what rules should be followed for designing sgRNA with high cleavage efficiency. At present, several machine learning or deep learning methods have been developed to predict the cleavage efficiency of sgRNAs, however, the prediction accuracy of these tools is still not satisfactory. Here we propose a fusion framework of deep learning and machine learning, which first deals with the primary sequence and secondary structure features of the sgRNAs using both convolutional neural network (CNN) and recurrent neural network (RNN), and then uses the features extracted by the deep neural network to train a conventional machine learning model with LGBM. As a result, the new approach overwhelmed previous methods. The Spearman's correlation coefficient between predicted and measured sgRNA cleavage efficiency of our model (0.917) is improved by over 5% compared with the most advanced method (0.865), and the mean square error reduces from 7.89 × 10-3 to 4.75 × 10-3. Finally, we developed an online tool, CRISep (http://www.cuilab.cn/CRISep), to evaluate the availability of sgRNAs based on our models.
Collapse
Affiliation(s)
- Yu Liu
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Rui Fan
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Jingkun Yi
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Qinghua Cui
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China.
| | - Chunmei Cui
- Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China.
| |
Collapse
|
10
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
11
|
Abstract
In genetic engineering, the revolutionary CRISPR-Cas system has proven to be a vital tool for precise genome editing. Simultaneously, the emergence and rapid evolution of deep learning methodologies has provided an impetus to the scientific exploration of genomic data. These concurrent advancements mandate regular investigation of the state-of-the-art, particularly given the pace of recent developments. This review focuses on the significant progress achieved during 2019-2023 in the utilization of deep learning for predicting guide RNA (gRNA) activity in the CRISPR-Cas system, a key element determining the effectiveness and specificity of genome editing procedures. In this paper, an analytical overview of contemporary research is provided, with emphasis placed on the amalgamation of artificial intelligence and genetic engineering. The importance of our review is underscored by the necessity to comprehend the rapidly evolving deep learning methodologies and their potential impact on the effectiveness of the CRISPR-Cas system. By analyzing recent literature, this review highlights the achievements and emerging trends in the integration of deep learning with the CRISPR-Cas systems, thus contributing to the future direction of this essential interdisciplinary research area.
Collapse
|
12
|
Sherkatghanad Z, Abdar M, Charlier J, Makarenkov V. Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review. Brief Bioinform 2023; 24:7130974. [PMID: 37080758 DOI: 10.1093/bib/bbad131] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/22/2023] Open
Abstract
CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.
Collapse
Affiliation(s)
- Zeinab Sherkatghanad
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 3216, Geelong, VIC, Australia
| | - Jeremy Charlier
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Vladimir Makarenkov
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| |
Collapse
|
13
|
Vora DS, Yadav S, Sundar D. Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System. Biomolecules 2023; 13:biom13040641. [PMID: 37189388 DOI: 10.3390/biom13040641] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/27/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target effects. Although experimental screens to detect off-targets have allowed understanding the activity of Cas9, that knowledge remains incomplete as the rules do not extrapolate well to new target sequences. Off-target prediction tools developed recently have increasingly relied on machine learning and deep learning techniques to reliably understand the complete threat of likely off-targets because the rules that drive Cas9 activity are not fully understood. In this study, we present a count-based as well as deep-learning-based approach to derive sequence features that are important in deciding on Cas9 activity at a sequence. There are two major challenges in off-target determination—the identification of a likely site of Cas9 activity and the prediction of the extent of Cas9 activity at that site. The hybrid multitask CNN–biLSTM model developed, named CRISP–RCNN, simultaneously predicts off-targets and the extent of activity on off-targets. Employing methods of integrated gradients and weighting kernels for feature importance approximation, analysis of nucleotide and position preference, and mismatch tolerance have been performed.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Shashank Yadav
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
- Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
14
|
Hu X, Zhang B, Li X, Li M, Wang Y, Dan H, Zhou J, Wei Y, Ge K, Li P, Song Z. The application and progression of CRISPR/Cas9 technology in ophthalmological diseases. Eye (Lond) 2023; 37:607-617. [PMID: 35915232 PMCID: PMC9998618 DOI: 10.1038/s41433-022-02169-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 06/07/2022] [Accepted: 06/30/2022] [Indexed: 11/08/2022] Open
Abstract
The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease (Cas) system is an adaptive immune defence system that has gradually evolved in bacteria and archaea to combat invading viruses and exogenous DNA. Advances in technology have enabled researchers to enhance their understanding of the immune process in vivo and its potential for use in genome editing. Thus far, applications of CRISPR/Cas9 genome editing technology in ophthalmology have included gene therapy for corneal dystrophy, glaucoma, congenital cataract, Leber's congenital amaurosis, retinitis pigmentosa, Usher syndrome, fundus neovascular disease, proliferative vitreoretinopathy, retinoblastoma and other eye diseases. Additionally, the combination of CRISPR/Cas9 genome editing technology with adeno-associated virus vector and inducible pluripotent stem cells provides further therapeutic avenues for the treatment of eye diseases. Nonetheless, many challenges remain in the development of clinically feasible retinal genome editing therapy. This review discusses the development, as well as mechanism of CRISPR/Cas9 and its applications and challenges in gene therapy for eye diseases.
Collapse
Affiliation(s)
- Xumeng Hu
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Beibei Zhang
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Xiaoli Li
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Miao Li
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Yange Wang
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Handong Dan
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Jiamu Zhou
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Yuanmeng Wei
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Keke Ge
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Pan Li
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China
| | - Zongming Song
- Henan Eye Hospital, Henan Eye Institution, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Zhengzhou, 450003, Henan, China.
| |
Collapse
|
15
|
Abstract
Advancements in high-throughput sequencing have yielded vast amounts of genomic data, which are studied using genome-wide association study (GWAS)/phenome-wide association study (PheWAS) methods to identify associations between the genotype and phenotype. The associated findings have contributed to pharmacogenomics and improved clinical decision support at the point of care in many healthcare systems. However, the accumulation of genomic data from sequencing and clinical data from electronic health records (EHRs) poses significant challenges for data scientists. Following the rise of artificial intelligence (AI) technology such as machine learning and deep learning, an increasing number of GWAS/PheWAS studies have successfully leveraged this technology to overcome the aforementioned challenges. In this review, we focus on the application of data science and AI technology in three areas, including risk prediction and identification of causal single-nucleotide polymorphisms, EHR-based phenotyping and CRISPR guide RNA design. Additionally, we highlight a few emerging AI technologies, such as transfer learning and multi-view learning, which will or have started to benefit genomic studies.
Collapse
Affiliation(s)
- Jing Lin
- NUHS Corporate Office, National University Health System, Singapore
| | - Kee Yuan Ngiam
- NUHS Corporate Office, National University Health System, Singapore,Department of Surgery, National University of Singapore, Singapore,Correspondence: A/Prof Kee Yuan Ngiam, Group Chief Technology Officer, NUHS Corporate Office, National University Health System, 1E Kent Ridge Road, 119228, Singapore. E-mail:
| |
Collapse
|
16
|
Kwon J, Kim M, Hwang W, Jo A, Hwang GH, Jung M, Kim UG, Cui G, Kim H, Eom JH, Hur JK, Lee J, Kim Y, Kim JS, Bae S, Lee JK. Extru-seq: a method for predicting genome-wide Cas9 off-target sites with advantages of both cell-based and in vitro approaches. Genome Biol 2023; 24:4. [PMID: 36627653 PMCID: PMC9832775 DOI: 10.1186/s13059-022-02842-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 12/21/2022] [Indexed: 01/11/2023] Open
Abstract
We present a novel genome-wide off-target prediction method named Extru-seq and compare it with cell-based (GUIDE-seq), in vitro (Digenome-seq), and in silico methods using promiscuous guide RNAs with large numbers of valid off-target sites. Extru-seq demonstrates a high validation rate and retention of information about the intracellular environment, both beneficial characteristics of cell-based methods. Extru-seq also shows a low miss rate and could easily be performed in clinically relevant cell types with little optimization, which are major positive features of the in vitro methods. In summary, Extru-seq shows beneficial features of cell-based and in vitro methods.
Collapse
Affiliation(s)
| | | | - Woochang Hwang
- Department of Pre-Medicine, College of Medicine, Hanyang University, Seoul, Republic of Korea
- Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea
| | - Anna Jo
- Toolgen, Seoul, Republic of Korea
| | - Gue-Ho Hwang
- Department of Chemistry, Hanyang University, Seoul, Republic of Korea
| | | | | | - Gang Cui
- Department of Ophthalmology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Heonseok Kim
- Department of Medicine, Division of Oncology, Stanford University, Stanford, USA
| | - Joon-Ho Eom
- National Institute of Food and Drug Safety Evaluation, Cheongju, Republic of Korea
| | - Junho K Hur
- Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Seoul, Republic of Korea
- Department of Genetics, College of Medicine, Hanyang University, Seoul, Republic of Korea
| | - Junwon Lee
- Department of Ophthalmology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | | | - Jin-Soo Kim
- Center for Genome Engineering, Institute for Basic Science (IBS), Seoul, Republic of Korea
| | - Sangsu Bae
- Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | | |
Collapse
|
17
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
18
|
Anton N, Doroftei B, Curteanu S, Catãlin L, Ilie OD, Târcoveanu F, Bogdănici CM. Comprehensive Review on the Use of Artificial Intelligence in Ophthalmology and Future Research Directions. Diagnostics (Basel) 2022; 13. [PMID: 36611392 DOI: 10.3390/diagnostics13010100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/12/2022] [Accepted: 12/26/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Having several applications in medicine, and in ophthalmology in particular, artificial intelligence (AI) tools have been used to detect visual function deficits, thus playing a key role in diagnosing eye diseases and in predicting the evolution of these common and disabling diseases. AI tools, i.e., artificial neural networks (ANNs), are progressively involved in detecting and customized control of ophthalmic diseases. The studies that refer to the efficiency of AI in medicine and especially in ophthalmology were analyzed in this review. MATERIALS AND METHODS We conducted a comprehensive review in order to collect all accounts published between 2015 and 2022 that refer to these applications of AI in medicine and especially in ophthalmology. Neural networks have a major role in establishing the demand to initiate preliminary anti-glaucoma therapy to stop the advance of the disease. RESULTS Different surveys in the literature review show the remarkable benefit of these AI tools in ophthalmology in evaluating the visual field, optic nerve, and retinal nerve fiber layer, thus ensuring a higher precision in detecting advances in glaucoma and retinal shifts in diabetes. We thus identified 1762 applications of artificial intelligence in ophthalmology: review articles and research articles (301 pub med, 144 scopus, 445 web of science, 872 science direct). Of these, we analyzed 70 articles and review papers (diabetic retinopathy (N = 24), glaucoma (N = 24), DMLV (N = 15), other pathologies (N = 7)) after applying the inclusion and exclusion criteria. CONCLUSION In medicine, AI tools are used in surgery, radiology, gynecology, oncology, etc., in making a diagnosis, predicting the evolution of a disease, and assessing the prognosis in patients with oncological pathologies. In ophthalmology, AI potentially increases the patient's access to screening/clinical diagnosis and decreases healthcare costs, mainly when there is a high risk of disease or communities face financial shortages. AI/DL (deep learning) algorithms using both OCT and FO images will change image analysis techniques and methodologies. Optimizing these (combined) technologies will accelerate progress in this area.
Collapse
|
19
|
Mak JK, Störtz F, Minary P. Comprehensive computational analysis of epigenetic descriptors affecting CRISPR-Cas9 off-target activity. BMC Genomics 2022; 23:805. [PMID: 36474180 DOI: 10.1186/s12864-022-09012-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND A common issue in CRISPR-Cas9 genome editing is off-target activity, which prevents the widespread use of CRISPR-Cas9 in medical applications. Among other factors, primary chromatin structure and epigenetics may influence off-target activity. METHODS In this work, we utilize crisprSQL, an off-target database, to analyze the effect of 19 epigenetic descriptors on CRISPR-Cas9 off-target activity. Termed as 19 epigenetic features/scores, they consist of 6 experimental epigenetic and 13 computed nucleosome organization-related features. In terms of novel features, 15 of the epigenetic scores are newly considered. The 15 newly considered scores consist of 13 freshly computed nucleosome occupancy/positioning scores and 2 experimental features (MNase and DRIP). The other 4 existing scores are experimental features (CTCF, DNase I, H3K4me3, RRBS) commonly used in deep learning models for off-target activity prediction. For data curation, MNase was aggregated from existing experimental nucleosome occupancy data. Based on the sequence context information available in crisprSQL, we also computed nucleosome occupancy/positioning scores for off-target sites. RESULTS To investigate the relationship between the 19 epigenetic features and off-target activity, we first conducted Spearman and Pearson correlation analysis. Such analysis shows that some computed scores derived from training-based models and training-free algorithms outperform all experimental epigenetic features. Next, we evaluated the contribution of all epigenetic features in two successful machine/deep learning models which predict off-target activity. We found that some computed scores, unlike all 6 experimental features, significantly contribute to the predictions of both models. As a practical research contribution, we make the off-target dataset containing all 19 epigenetic features available to the research community. CONCLUSIONS Our comprehensive computational analysis helps the CRISPR-Cas9 community better understand the relationship between epigenetic features and CRISPR-Cas9 off-target activity.
Collapse
|
20
|
Hasanzadeh A, Hamblin MR, Kiani J, Noori H, Hardie JM, Karimi M, Shafiee H. Could artificial intelligence revolutionize the development of nanovectors for gene therapy and mRNA vaccines? Nano Today 2022; 47:101665. [PMID: 37034382 PMCID: PMC10081506 DOI: 10.1016/j.nantod.2022.101665] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Gene therapy enables the introduction of nucleic acids like DNA and RNA into host cells, and is expected to revolutionize the treatment of a wide range of diseases. This growth has been further accelerated by the discovery of CRISPR/Cas technology, which allows accurate genomic editing in a broad range of cells and organisms in vitro and in vivo. Despite many advances in gene delivery and the development of various viral and non-viral gene delivery vectors, the lack of highly efficient non-viral systems with low cellular toxicity remains a challenge. The application of cutting-edge technologies such as artificial intelligence (AI) has great potential to find new paradigms to solve this issue. Herein, we review AI and its major subfields including machine learning (ML), neural networks (NNs), expert systems, deep learning (DL), computer vision and robotics. We discuss the potential of AI-based models and algorithms in the design of targeted gene delivery vehicles capable of crossing extracellular and intracellular barriers by viral mimicry strategies. We finally discuss the role of AI in improving the function of CRISPR/Cas systems, developing novel nanobots, and mRNA vaccine carriers.
Collapse
Affiliation(s)
- Akbar Hasanzadeh
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Michael R Hamblin
- Laser Research Centre, Faculty of Health Science, University of Johannesburg, Doornfontein 2028, South Africa
- Radiation Biology Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Jafar Kiani
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Molecular Medicine, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Hamid Noori
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
| | - Joseph M. Hardie
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| | - Mahdi Karimi
- Cellular and Molecular Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Department of Medical Nanotechnology, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Oncopathology Research Center, Iran University of Medical Sciences, Tehran 1449614535, Iran
- Research Center for Science and Technology in Medicine, Tehran University of Medical Sciences, Tehran 141556559, Iran
- Applied Biotechnology Research Centre, Tehran Medical Science, Islamic Azad University, Tehran 1584743311, Iran
| | - Hadi Shafiee
- Division of Engineering in Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02139 USA
| |
Collapse
|
21
|
Yang Q, Wu L, Meng J, Ma L, Zuo E, Sun Y. EpiCas-DL: Predicting sgRNA activity for CRISPR-mediated epigenome editing by deep learning. Comput Struct Biotechnol J 2023; 21:202-11. [PMID: 36582444 DOI: 10.1016/j.csbj.2022.11.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 11/15/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
CRISPR-mediated epigenome editing enables gene expression regulation without changing the underlying DNA sequence, and thus has vast potential for basic research and gene therapy. Effective selection of a single guide RNA (sgRNA) with high on-target efficiency and specificity would facilitate the application of epigenome editing tools. Here we performed an extensive analysis of CRISPR-mediated epigenome editing tools on thousands of experimentally examined on-target sites and established EpiCas-DL, a deep learning framework to optimize sgRNA design for gene silencing or activation. EpiCas-DL achieves high accuracy in sgRNA activity prediction for targeted gene silencing or activation and outperforms other available in silico methods. In addition, EpiCas-DL also identifies both epigenetic and sequence features that affect sgRNA efficacy in gene silencing and activation, facilitating the application of epigenome editing for research and therapy. EpiCas-DL is available at http://www.sunlab.fun:3838/EpiCas-DL.
Collapse
|
22
|
Yaish O, Asif M, Orenstein Y. A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction. Brief Bioinform 2022; 23:bbac157. [PMID: 35595297 DOI: 10.1093/bib/bbac157] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 03/24/2022] [Accepted: 04/07/2022] [Indexed: 11/14/2022] Open
Abstract
CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this editing technique is quite accurate in the target region, there may be many unplanned off-target sites (OTSs). Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of OTSs) produced by experimental techniques to detect OTSs with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect OTSs, was used to produce a dataset of unprecedented scale and quality (>200 000 OTS over 110 guide RNAs). In addition, the same study included in cellula GUIDE-seq experiments for 58 of the guide RNAs. Here, we fill the gap in previous computational methods by utilizing these data to systematically evaluate data processing and formulation of the CRISPR OTSs prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive OTSs to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between guide RNAs and their OTSs as a feature. Finally, we present predictive off-target in cellula models based on both in vitro and in cellula data and compare them to state-of-the-art methods in predicting true OTSs. Our conclusions will be instrumental in any future development of an off-target predictor based on high-throughput datasets.
Collapse
Affiliation(s)
- Ofir Yaish
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Maor Asif
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| |
Collapse
|
23
|
Mattiello L, Rütgers M, Sua-Rojas MF, Tavares R, Soares JS, Begcy K, Menossi M. Molecular and Computational Strategies to Increase the Efficiency of CRISPR-Based Techniques. Front Plant Sci 2022; 13:868027. [PMID: 35712599 PMCID: PMC9194676 DOI: 10.3389/fpls.2022.868027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
The prokaryote-derived Clustered Regularly Interspaced Palindromic Repeats (CRISPR)/Cas mediated gene editing tools have revolutionized our ability to precisely manipulate specific genome sequences in plants and animals. The simplicity, precision, affordability, and robustness of this technology have allowed a myriad of genomes from a diverse group of plant species to be successfully edited. Even though CRISPR/Cas, base editing, and prime editing technologies have been rapidly adopted and implemented in plants, their editing efficiency rate and specificity varies greatly. In this review, we provide a critical overview of the recent advances in CRISPR/Cas9-derived technologies and their implications on enhancing editing efficiency. We highlight the major efforts of engineering Cas9, Cas12a, Cas12b, and Cas12f proteins aiming to improve their efficiencies. We also provide a perspective on the global future of agriculturally based products using DNA-free CRISPR/Cas techniques. The improvement of CRISPR-based technologies efficiency will enable the implementation of genome editing tools in a variety of crop plants, as well as accelerate progress in basic research and molecular breeding.
Collapse
Affiliation(s)
- Lucia Mattiello
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Mark Rütgers
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Maria Fernanda Sua-Rojas
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Rafael Tavares
- Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| | - José Sérgio Soares
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Kevin Begcy
- Environmental Horticulture Department, University of Florida, Gainesville, FL, United States
| | - Marcelo Menossi
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| |
Collapse
|
24
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
25
|
Abstract
As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode.
Collapse
Affiliation(s)
- Bohao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
- Basic Experimental Center of Natural Science, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| |
Collapse
|
26
|
Krohannon A, Srivastava M, Rauch S, Srivastava R, Dickinson BC, Janga SC. CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion. BMC Genomics 2022; 23:172. [PMID: 35236300 PMCID: PMC8889671 DOI: 10.1186/s12864-022-08366-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 02/03/2022] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited. RESULTS Here we present a novel machine learning and computational tool, CASowary, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from Cas13 characterization experiments for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 70%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting Cas system - CIRTS, confirming the robustness and reproducibility of our model's sgRNA predictions. Utilizing transcriptome wide protein occupancy map generated using POP-seq in HeLa cells against publicly available protein-RNA interaction map in Hek293 cells, we show that CASowary can predict high quality guides for numerous transcripts in a cell line specific manner. CONCLUSIONS Application of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes.
Collapse
Affiliation(s)
- Alexander Krohannon
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), 535 West Michigan St, Indianapolis, IN, 46202, USA
| | - Mansi Srivastava
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), 535 West Michigan St, Indianapolis, IN, 46202, USA
| | - Simone Rauch
- Department of Chemistry, The University of Chicago, Chicago, IL, USA
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, 60637, USA
| | - Rajneesh Srivastava
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), 535 West Michigan St, Indianapolis, IN, 46202, USA
| | - Bryan C Dickinson
- Department of Chemistry, The University of Chicago, Chicago, IL, USA
| | - Sarath Chandra Janga
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University Purdue University Indianapolis (IUPUI), 535 West Michigan St, Indianapolis, IN, 46202, USA.
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translation Sciences (HITS), 410 West 10th Street, Indianapolis, IN, 46202, USA.
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, IN, 46202, USA.
| |
Collapse
|
27
|
Fu R, He W, Dou J, Villarreal OD, Bedford E, Wang H, Hou C, Zhang L, Wang Y, Ma D, Chen Y, Gao X, Depken M, Xu H. Systematic decomposition of sequence determinants governing CRISPR/Cas9 specificity. Nat Commun 2022; 13:474. [PMID: 35078987 PMCID: PMC8789861 DOI: 10.1038/s41467-022-28028-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 01/04/2022] [Indexed: 12/20/2022] Open
Abstract
The specificity of CRISPR/Cas9 genome editing is largely determined by the sequences of guide RNA (gRNA) and the targeted DNA, yet the sequence-dependent rules underlying off-target effects are not fully understood. To systematically explore the sequence determinants governing CRISPR/Cas9 specificity, here we describe a dual-target system to measure the relative cleavage rate between off- and on-target sequences (off-on ratios) of 1902 gRNAs on 13,314 synthetic target sequences, and reveal a set of sequence rules involving 2 factors in off-targeting: 1) a guide-intrinsic mismatch tolerance (GMT) independent of the mismatch context; 2) an "epistasis-like" combinatorial effect of multiple mismatches, which are associated with the free-energy landscape in R-loop formation and are explainable by a multi-state kinetic model. These sequence rules lead to the development of MOFF, a model-based predictor of Cas9-mediated off-target effects. Moreover, the "epistasis-like" combinatorial effect suggests a strategy of allele-specific genome editing using mismatched guides. With the aid of MOFF prediction, this strategy significantly improves the selectivity and expands the application domain of Cas9-based allele-specific editing, as tested in a high-throughput allele-editing screen on 18 cancer hotspot mutations.
Collapse
Affiliation(s)
- Rongjie Fu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Wei He
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Jinzhuang Dou
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Oscar D Villarreal
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Ella Bedford
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Helen Wang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Connie Hou
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Liang Zhang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Yalong Wang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA
| | - Dacheng Ma
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, 77005, USA
| | - Yiwen Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Xue Gao
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, 77005, USA
- Department of Chemistry, Rice University, Houston, TX, 77005, USA
- Department of Bioengineering, Rice University, Houston, TX, 77005, USA
| | - Martin Depken
- Kavli Institute of NanoScience and Department of BionanoScience, Delft University of Technology, Delft, 2629HZ, the Netherlands
| | - Han Xu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Smithville, TX, 78957, USA.
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
- The Center for Cancer Epigenetics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
28
|
Dimauro G, Barletta VS, Catacchio CR, Colizzi L, Maglietta R, Ventura M. A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
29
|
Zhang ZR, Jiang ZR. Effective use of sequence information to predict CRISPR-Cas9 off-target. Comput Struct Biotechnol J 2022; 20:650-661. [PMID: 35140885 PMCID: PMC8804193 DOI: 10.1016/j.csbj.2022.01.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 01/05/2022] [Accepted: 01/08/2022] [Indexed: 12/05/2022] Open
Abstract
The CRISPR/Cas9 gene-editing system is the third-generation gene-editing technology that has been widely used in biomedical applications. However, off-target effects occurring CRISPR/Cas9 system has been a challenging problem it faces in practical applications. Although many predictive models have been developed to predict off-target activities, current models do not effectively use sequence pair information. There is still room for improved accuracy. This study aims to effectively use sequence pair information to improve the model's performance for predicting off-target activities. We propose a new coding scheme for coding sequence pairs and design a new model called CRISPR-IP for predicting off-target activity. Our coding scheme distinguishes regions with different functions in the sequence pairs through the function channel. Moreover, it distinguishes between bases and base pairs using type channels, effectively representing the sequence pair information. The CRISPR-IP model is based on CNN, BiLSTM, and the attention layer to learn features of sequence pairs. We performed performance verification on two data sets and found that our coding scheme can represent sequence pair information effectively, and the CRISPR-IP model performance is better than others. Data and source codes are available at https://github.com/BioinfoVirgo/CRISPR-IP.
Collapse
|
30
|
Abstract
BACKGROUND More and more Cas9 variants with higher specificity are developed to avoid the off-target effect, which brings a significant volume of experimental data. Conventional machine learning performs poorly on these datasets, while the methods based on deep learning often lack interpretability, which makes researchers have to trade-off accuracy and interpretability. It is necessary to develop a method that can not only match deep learning-based methods in performance but also with good interpretability that can be comparable to conventional machine learning methods. RESULTS To overcome these problems, we propose an intrinsically interpretable method called AttCRISPR based on deep learning to predict the on-target activity. The advantage of AttCRISPR lies in using the ensemble learning strategy to stack available encoding-based methods and embedding-based methods with strong interpretability. Comparison with the state-of-the-art methods using WT-SpCas9, eSpCas9(1.1), SpCas9-HF1 datasets, AttCRISPR can achieve an average Spearman value of 0.872, 0.867, 0.867, respectively on several public datasets, which is superior to these methods. Furthermore, benefits from two attention modules-one spatial and one temporal, AttCRISPR has good interpretability. Through these modules, we can understand the decisions made by AttCRISPR at both global and local levels without other post hoc explanations techniques. CONCLUSION With the trained models, we reveal the preference for each position-dependent nucleotide on the sgRNA (short guide RNA) sequence in each dataset at a global level. And at a local level, we prove that the interpretability of AttCRISPR can be used to guide the researchers to design sgRNA with higher activity.
Collapse
Affiliation(s)
- Li-Ming Xiao
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Yun-Qi Wan
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Zhen-Ran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
31
|
Vinodkumar PK, Ozcinar C, Anbarjafari G. Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network. Entropy (Basel) 2021; 23:608. [PMID: 34069050 PMCID: PMC8156774 DOI: 10.3390/e23050608] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 05/03/2021] [Accepted: 05/12/2021] [Indexed: 12/26/2022]
Abstract
CRISPR/Cas9 is a powerful genome-editing technology that has been widely applied in targeted gene repair and gene expression regulation. One of the main challenges for the CRISPR/Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far to predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques; however, this is a convoluted process that is difficult to understand and implement for researchers. In this research work, we introduce a novel graph-based approach to predict off-target efficacy of sgRNA in the CRISPR/Cas9 system that is easy to understand and replicate for researchers. This is achieved by creating a graph with sequences as nodes and by using a link prediction method to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences. We used HEK293 and K562 t datasets in our experiments. GCN predicted the off-target gene knockouts (using link prediction) by predicting the links between sgRNA and off-target sequences with an auROC value of 0.987.
Collapse
Affiliation(s)
| | - Cagri Ozcinar
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
| | - Gholamreza Anbarjafari
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
- PwC Advisory Finland, 00180 Helsinki, Finland
| |
Collapse
|
32
|
Zhang J, Khazalwa EM, Abkallo HM, Zhou Y, Nie X, Ruan J, Zhao C, Wang J, Xu J, Li X, Zhao S, Zuo E, Steinaa L, Xie S. The advancements, challenges, and future implications of the CRISPR/Cas9 system in swine research. J Genet Genomics 2021; 48:347-360. [PMID: 34144928 DOI: 10.1016/j.jgg.2021.03.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/10/2021] [Accepted: 03/13/2021] [Indexed: 12/11/2022]
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (CRISPR/Cas9) genome editing technology has dramatically influenced swine research by enabling the production of high-quality disease-resistant pig breeds, thus improving yields. In addition, CRISPR/Cas9 has been used extensively in pigs as one of the tools in biomedical research. In this review, we present the advancements of the CRISPR/Cas9 system in swine research, such as animal breeding, vaccine development, xenotransplantation, and disease modeling. We also highlight the current challenges and some potential applications of the CRISPR/Cas9 technologies.
Collapse
Affiliation(s)
- Jinfu Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Emmanuel M Khazalwa
- Animal and Human Health Program, Biosciences, International Livestock Research Institute (ILRI), P.O. Box 30709, Nairobi 00100, Kenya
| | - Hussein M Abkallo
- Animal and Human Health Program, Biosciences, International Livestock Research Institute (ILRI), P.O. Box 30709, Nairobi 00100, Kenya
| | - Yuan Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xiongwei Nie
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Jinxue Ruan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Changzhi Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Jieru Wang
- Key Laboratory of Pig Molecular Quantitative Genetics of Anhui Academy of Agricultural Sciences, Livestock and Poultry Epidemic Diseases Research Center of Anhui Province, Anhui Provincial Key Laboratory of Livestock and Poultry Product Safety Engineering, Institute of Animal Husbandry and Veterinary Medicine, Anhui Academy of Agricultural Sciences, Hefei 230031, PR China
| | - Jing Xu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China; The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China; The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Erwei Zuo
- Lingnan Guangdong Laboratory of Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, PR China.
| | - Lucilla Steinaa
- Animal and Human Health Program, Biosciences, International Livestock Research Institute (ILRI), P.O. Box 30709, Nairobi 00100, Kenya.
| | - Shengsong Xie
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan 430070, PR China; Animal and Human Health Program, Biosciences, International Livestock Research Institute (ILRI), P.O. Box 30709, Nairobi 00100, Kenya; The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, PR China.
| |
Collapse
|
33
|
Abstract
Inherited retinal degenerations (IRDs) are a leading cause of blindness. Although gene-supplementation therapies have been developed, they are only available for a small proportion of recessive IRD mutations. In contrast, genome editing using clustered-regularly interspaced short palindromic repeats (CRISPR) CRISPR-associated (Cas) systems could provide alternative therapeutic avenues for treating a wide range of genetic retinal diseases through targeted knockdown or correction of mutant alleles. Progress in this rapidly evolving field has been highlighted by recent Food and Drug Administration clinical trial approval for EDIT-101 (Editas Medicine, Inc., Cambridge, MA), which has demonstrated efficacious genome editing in a mouse model of CEP290-associated Leber congenital amaurosis and safety in nonhuman primates. Nonetheless, there remains a significant number of challenges to developing clinically viable retinal genome-editing therapies. In particular, IRD-causing mutations occur in more than 200 known genes, with considerable heterogeneity in mutation type and position within each gene. Additionally, there are remaining safety concerns over long-term expression of Cas9 in vivo. This review highlights (i) the technological advances in gene-editing technology, (ii) major safety concerns associated with retinal genome editing, and (iii) potential strategies for overcoming these challenges to develop clinical therapies.
Collapse
Affiliation(s)
- Joel Quinn
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Ayesha Musa
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Ariel Kantor
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Michelle E. McClements
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
| | - Jasmina Cehajic-Kapetanovic
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- Oxford Eye Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Robert E. MacLaren
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- Oxford Eye Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Kanmin Xue
- Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- Oxford Eye Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
- Correspondence: Dr. Kanmin Xue, Nuffield Laboratory of Ophthalmology, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom.
| |
Collapse
|
34
|
Abstract
Humans' creativity led to machines that outperform human capabilities in terms of workload, effectiveness, precision, endurance, strength, and repetitiveness. It has always been a vision and a way to transcend the existence and to give more sense to life, which is precious. The common denominator of all these creations was that they were meant to replace, enhance or go beyond the mechanical capabilities of the human body. The story takes another bifurcation when Alan Turing introduced the concept of a machine that could think, in 1950. Artificial intelligence, presented as a term in 1956, describes the use of computers to imitate intelligence and critical thinking comparable to humans. However, the revolution began in 1943, when artificial neural networks was an attempt to exploit the architecture of the human brain to perform tasks that conventional algorithms had little success with. Artificial intelligence is becoming a research focus and a tool of strategic value. The same observations apply in the field of healthcare, too. In this manuscript, we try to address key questions regarding artificial intelligence in medicine, such as what artificial intelligence is and how it works, what is its value in terms of application in medicine, and what are the prospects?
Collapse
Affiliation(s)
- Andreas Larentzakis
- First Department of Propaedeutic Surgery, Athens Medical School, National and Kapodistrian University of Athens, Hippocration General Athens Hospital, Athens, Greece
| | - Nik Lygeros
- Laboratoire de Génie des Procédés Catalytiques, Centre National de la Recherche Scientifique/École Supérieure de Chimie Physique Électronique, Lyon, France
| |
Collapse
|
35
|
Störtz F, Minary P. crisprSQL: a novel database platform for CRISPR/Cas off-target cleavage assays. Nucleic Acids Res 2021; 49:D855-D861. [PMID: 33084893 PMCID: PMC7778913 DOI: 10.1093/nar/gkaa885] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/23/2020] [Accepted: 10/17/2020] [Indexed: 12/20/2022] Open
Abstract
With ongoing development of the CRISPR/Cas programmable nuclease system, applications in the area of in vivo therapeutic gene editing are increasingly within reach. However, non-negligible off-target effects remain a major concern for clinical applications. Even though a multitude of off-target cleavage datasets have been published, a comprehensive, transparent overview tool has not yet been established. Here, we present crisprSQL (http://www.crisprsql.com), an interactive and bioinformatically enhanced collection of CRISPR/Cas9 off-target cleavage studies aimed at enriching the fields of cleavage profiling, gene editing safety analysis and transcriptomics. The current version of crisprSQL contains cleavage data from 144 guide RNAs on 25,632 guide-target pairs from human and rodent cell lines, with interaction-specific references to epigenetic markers and gene names. The first curated database of this standard, it promises to enhance safety quantification research, inform experiment design and fuel development of computational off-target prediction algorithms.
Collapse
Affiliation(s)
- Florian Störtz
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| | - Peter Minary
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| |
Collapse
|