1
|
Zhu W, Xie H, Chen Y, Zhang G. CrnnCrispr: An Interpretable Deep Learning Method for CRISPR/Cas9 sgRNA On-Target Activity Prediction. Int J Mol Sci 2024; 25:4429. [PMID: 38674012 PMCID: PMC11050447 DOI: 10.3390/ijms25084429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/11/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
CRISPR/Cas9 is a powerful genome-editing tool in biology, but its wide applications are challenged by a lack of knowledge governing single-guide RNA (sgRNA) activity. Several deep-learning-based methods have been developed for the prediction of on-target activity. However, there is still room for improvement. Here, we proposed a hybrid neural network named CrnnCrispr, which integrates a convolutional neural network and a recurrent neural network for on-target activity prediction. We performed unbiased experiments with four mainstream methods on nine public datasets with varying sample sizes. Additionally, we incorporated a transfer learning strategy to boost the prediction power on small-scale datasets. Our results showed that CrnnCrispr outperformed existing methods in terms of accuracy and generalizability. Finally, we applied a visualization approach to investigate the generalizable nucleotide-position-dependent patterns of sgRNAs for on-target activity, which shows potential in terms of model interpretability and further helps in understanding the principles of sgRNA design.
Collapse
Affiliation(s)
| | | | | | - Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China; (W.Z.); (H.X.); (Y.C.)
| |
Collapse
|
2
|
Liu Y, Gong Q. Deep Learning Models for Predicting Hearing Thresholds Based on Swept-Tone Stimulus-Frequency Otoacoustic Emissions. Ear Hear 2024; 45:465-475. [PMID: 37990395 DOI: 10.1097/aud.0000000000001443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
OBJECTIVES This study aims to develop deep learning (DL) models for the quantitative prediction of hearing thresholds based on stimulus-frequency otoacoustic emissions (SFOAEs) evoked by swept tones. DESIGN A total of 174 ears with normal hearing and 388 ears with sensorineural hearing loss were studied. SFOAEs in the 0.3 to 4.3 kHz frequency range were recorded using linearly swept tones at a rate of 2 Hz/msec, with stimulus level changing from 40 to 60 dB SPL in 10 dB steps. Four DL models were used to predict hearing thresholds at octave frequencies from 0.5 to 4 kHz. The models-a conventional convolutional neural network (CNN), a hybrid CNN-k-nearest neighbor (KNN), a hybrid CNN-support vector machine (SVM), and a hybrid CNN-random forest (RF)-were individually built for each frequency. The input to the DL models was the measured raw SFOAE amplitude spectra and their corresponding signal to noise ratio spectra. All DL models shared a CNN-based feature self-extractor. They differed in that the conventional CNN utilized a fully connected layer to make the final regression decision, whereas the hybrid CNN-KNN, CNN-SVM, and CNN-RF models were designed by replacing the last fully connected layer of CNN model with a traditional machine learning (ML) regressor, that is, KNN, SVM, and RF, respectively. The model performance was evaluated using mean absolute error and SE averaged over 20 repetitions of 5 × 5 fold nested cross-validation. The performance of the proposed DL models was compared with two types of traditional ML models. RESULTS The proposed SFOAE-based DL models resulted in an optimal mean absolute error of 5.98, 5.22, 5.51, and 6.06 dB at 0.5, 1, 2, and 4 kHz, respectively, superior to that obtained by the traditional ML models. The produced SEs were 8.55, 7.27, 7.58, and 7.95 dB at 0.5, 1, 2, and 4 kHz, respectively. All the DL models outperformed any of the traditional ML models. CONCLUSIONS The proposed swept-tone SFOAE-based DL models were capable of quantitatively predicting hearing thresholds with satisfactory performance. With DL techniques, the underlying relationship between SFOAEs and hearing thresholds at disparate frequencies was explored and captured, potentially improving the diagnostic value of SFOAEs.
Collapse
Affiliation(s)
- Yin Liu
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
| | - Qin Gong
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
- School of Medicine, Shanghai University, Shanghai, China
| |
Collapse
|
3
|
Zhong Z, Li Z, Yang J, Wang Q. Unified Model to Predict gRNA Efficiency across Diverse Cell Lines and CRISPR-Cas9 Systems. J Chem Inf Model 2023; 63:7320-7329. [PMID: 37983481 DOI: 10.1021/acs.jcim.3c01339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Computationally predicting the efficiency of a guide RNA (gRNA) from its sequence is crucial to designing the CRISPR-Cas9 system. Currently, machine learning (ML)-based models are widely used for such predictions. However, these ML models often show performance imbalance when applied to multiple data sets from diverse sources, hindering the practical utilization of these tools. To address this issue, we propose a Michaelis-Menten theoretical framework that integrates information from multiple data sets. We demonstrate that the binding free energy can serve as a useful invariant that bridges the data from different experimental setups. Building upon this framework, we develop a new ML model called Uni-deepSG. This model exhibits broad applicability on 27 data sets with different cell types, Cas9 variants, and gRNA designs. Our work confirms the existence of a generalized model for predicting gRNA efficiency and lays the theoretical groundwork necessary to finalize such a model.
Collapse
Affiliation(s)
- Zhicheng Zhong
- Department of Physics, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Zeying Li
- Department of Physics, University of Science and Technology of China, Hefei 230026, Anhui, China
| | - Jie Yang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Qian Wang
- Department of Physics, University of Science and Technology of China, Hefei 230026, Anhui, China
| |
Collapse
|
4
|
Noshay J, Walker T, Alexander W, Klingeman D, Romero J, Walker A, Prates E, Eckert C, Irle S, Kainer D, Jacobson D. Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering. Nucleic Acids Res 2023; 51:10147-10161. [PMID: 37738140 PMCID: PMC10602897 DOI: 10.1093/nar/gkad736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 08/07/2023] [Accepted: 08/29/2023] [Indexed: 09/24/2023] Open
Abstract
CRISPR-Cas9 tools have transformed genetic manipulation capabilities in the laboratory. Empirical rules-of-thumb have been developed for only a narrow range of model organisms, and mechanistic underpinnings for sgRNA efficiency remain poorly understood. This work establishes a novel feature set and new public resource, produced with quantum chemical tensors, for interpreting and predicting sgRNA efficiency. Feature engineering for sgRNA efficiency is performed using an explainable-artificial intelligence model: iterative Random Forest (iRF). By encoding quantitative attributes of position-specific sequences for Escherichia coli sgRNAs, we identify important traits for sgRNA design in bacterial species. Additionally, we show that expanding positional encoding to quantum descriptors of base-pair, dimer, trimer, and tetramer sequences captures intricate interactions in local and neighboring nucleotides of the target DNA. These features highlight variation in CRISPR-Cas9 sgRNA dynamics between E. coli and H. sapiens genomes. These novel encodings of sgRNAs enhance our understanding of the elaborate quantum biological processes involved in CRISPR-Cas9 machinery.
Collapse
Affiliation(s)
- Jaclyn M Noshay
- Computational and Predictive Biology, Biosciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Tyler Walker
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee-Knoxville, Knoxville, TN, USA
| | - William G Alexander
- Synthetic Biology, Biosciences,Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Dawn M Klingeman
- Synthetic Biology, Biosciences,Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jonathon Romero
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee-Knoxville, Knoxville, TN, USA
| | - Angelica M Walker
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee-Knoxville, Knoxville, TN, USA
| | - Erica Prates
- Computational and Predictive Biology, Biosciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Carrie Eckert
- Synthetic Biology, Biosciences,Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Stephan Irle
- Computational Sciences and Engineering, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - David Kainer
- Computational and Predictive Biology, Biosciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Daniel A Jacobson
- Computational and Predictive Biology, Biosciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| |
Collapse
|
5
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
6
|
Ham DT, Browne TS, Banglorewala PN, Wilson TL, Michael RK, Gloor GB, Edgell DR. A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets. Nat Commun 2023; 14:5514. [PMID: 37679324 PMCID: PMC10485023 DOI: 10.1038/s41467-023-41143-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023] Open
Abstract
The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications.
Collapse
Affiliation(s)
- Dalton T Ham
- Department of Biochemistry, Schulich School of Medicine and Dentistry, London, ON, N6A5C1, Canada
| | - Tyler S Browne
- Department of Biochemistry, Schulich School of Medicine and Dentistry, London, ON, N6A5C1, Canada
| | - Pooja N Banglorewala
- Department of Biochemistry, Schulich School of Medicine and Dentistry, London, ON, N6A5C1, Canada
| | | | | | - Gregory B Gloor
- Department of Biochemistry, Schulich School of Medicine and Dentistry, London, ON, N6A5C1, Canada.
| | - David R Edgell
- Department of Biochemistry, Schulich School of Medicine and Dentistry, London, ON, N6A5C1, Canada.
| |
Collapse
|
7
|
Lee M. Deep learning in CRISPR-Cas systems: a review of recent studies. Front Bioeng Biotechnol 2023; 11:1226182. [PMID: 37469443 PMCID: PMC10352112 DOI: 10.3389/fbioe.2023.1226182] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 06/22/2023] [Indexed: 07/21/2023] Open
Abstract
In genetic engineering, the revolutionary CRISPR-Cas system has proven to be a vital tool for precise genome editing. Simultaneously, the emergence and rapid evolution of deep learning methodologies has provided an impetus to the scientific exploration of genomic data. These concurrent advancements mandate regular investigation of the state-of-the-art, particularly given the pace of recent developments. This review focuses on the significant progress achieved during 2019-2023 in the utilization of deep learning for predicting guide RNA (gRNA) activity in the CRISPR-Cas system, a key element determining the effectiveness and specificity of genome editing procedures. In this paper, an analytical overview of contemporary research is provided, with emphasis placed on the amalgamation of artificial intelligence and genetic engineering. The importance of our review is underscored by the necessity to comprehend the rapidly evolving deep learning methodologies and their potential impact on the effectiveness of the CRISPR-Cas system. By analyzing recent literature, this review highlights the achievements and emerging trends in the integration of deep learning with the CRISPR-Cas systems, thus contributing to the future direction of this essential interdisciplinary research area.
Collapse
|
8
|
Sherkatghanad Z, Abdar M, Charlier J, Makarenkov V. Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review. Brief Bioinform 2023; 24:7130974. [PMID: 37080758 DOI: 10.1093/bib/bbad131] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/22/2023] Open
Abstract
CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.
Collapse
Affiliation(s)
- Zeinab Sherkatghanad
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 3216, Geelong, VIC, Australia
| | - Jeremy Charlier
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Vladimir Makarenkov
- Departement d'Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| |
Collapse
|
9
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
10
|
Integration of CRISPR/Cas9 with artificial intelligence for improved cancer therapeutics. J Transl Med 2022; 20:534. [PMID: 36401282 PMCID: PMC9673220 DOI: 10.1186/s12967-022-03765-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/08/2022] [Indexed: 11/19/2022] Open
Abstract
Gene editing has great potential in treating diseases caused by well-characterized molecular alterations. The introduction of clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9)–based gene-editing tools has substantially improved the precision and efficiency of gene editing. The CRISPR/Cas9 system offers several advantages over the existing gene-editing approaches, such as its ability to target practically any genomic sequence, enabling the rapid development and deployment of novel CRISPR-mediated knock-out/knock-in methods. CRISPR/Cas9 has been widely used to develop cancer models, validate essential genes as druggable targets, study drug-resistance mechanisms, explore gene non-coding areas, and develop biomarkers. CRISPR gene editing can create more-effective chimeric antigen receptor (CAR)-T cells that are durable, cost-effective, and more readily available. However, further research is needed to define the CRISPR/Cas9 system’s pros and cons, establish best practices, and determine social and ethical implications. This review summarizes recent CRISPR/Cas9 developments, particularly in cancer research and immunotherapy, and the potential of CRISPR/Cas9-based screening in developing cancer precision medicine and engineering models for targeted cancer therapy, highlighting the existing challenges and future directions. Lastly, we highlight the role of artificial intelligence in refining the CRISPR system's on-target and off-target effects, a critical factor for the broader application in cancer therapeutics.
Collapse
|
11
|
Zhang Z, Li Y, Li Y. Prediction approach of larch wood density from visible-near-infrared spectroscopy based on parameter calibrating and transfer learning. FRONTIERS IN PLANT SCIENCE 2022; 13:1006292. [PMID: 36267936 PMCID: PMC9577256 DOI: 10.3389/fpls.2022.1006292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
Wood density, as a key indicator to measure wood properties, is of weighty significance in enhancing wood utilization and modifying wood properties in sustainable forest management. Visible-near-infrared (Vis-NIR) spectroscopy provides a feasible and efficient solution for obtaining wood density by the advantages of its efficiency and non-destructiveness. However, the spectral responses are different in wood products with different moisture content conditions, and changes in external factors may cause the regression model to fail. Although some calibration transfer methods and convolutional neural network (CNN)-based deep transfer learning methods have been proposed, the generalization ability and prediction accuracy of the models still need to be improved. For the prediction problem of Vis-NIR wood density in different moisture contents, a deep transfer learning hybrid method with automatic calibration capability (Resnet1D-SVR-TrAdaBoost.R2) was proposed in this study. The disadvantage of overfitting was avoided when CNN processes small sample data, which considered the complex exterior factors in actual production to enhance feature extraction and migration between samples. Density prediction of the method was performed on a larch dataset with different moisture content conditions, and the hybrid method was found to achieve the best prediction results under the calibration samples with different target domain calibration samples and moisture contents, and the performance of models was better than that of the traditional calibration transfer and migration learning methods. In particular, the hybrid model has achieved an improvement of about 0.1 in both R 2 and root mean square error (RMSE) values compared to the support vector regression model transferred by piecewise direct standardization method (SVR+PDS), which has the best performance among traditional calibration methods. To further ascertain the generalizability of the hybrid model, the model was validated with samples collected from mixed moisture contents as the target domain. Various experiments demonstrated that the Resnet1D-SVR-TrAdaBoost.R2 model could predict larch wood density with a high generalization ability and accuracy effectively but was computation consuming. It showed the potential to be extended to predict other metrics of wood.
Collapse
Affiliation(s)
- Zheyu Zhang
- College of Engineering and Technology, Northeast Forestry University, Harbin, China
| | - Yaoxiang Li
- College of Engineering and Technology, Northeast Forestry University, Harbin, China
| | - Ying Li
- College of Energy and Transportation Engineering, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
12
|
Yu M, Zheng H, Xu D, Shuai Y, Tian S, Cao T, Zhou M, Zhu Y, Zhao S, Li X. Non-contact detection method of pregnant sows backfat thickness based on two-dimensional images. Anim Genet 2022; 53:769-781. [PMID: 35989407 DOI: 10.1111/age.13248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 07/16/2022] [Accepted: 07/27/2022] [Indexed: 11/27/2022]
Abstract
Since sow backfat thickness (BFT) is highly correlated with its service life and reproductive effectiveness, dynamic monitoring of BFT is a critical component of large-scale sow farm productivity. Existing contact measures of sow BFT have their problems including, high measurement intensity and sows' stress reaction, low biological safety, and difficulty in meeting the requirements for multiple measurements. This article presents a two-dimensional (2D) image-based approach for determining the BFT of pregnant sows when combined with the backfat growth rate (BGR). The 2D image features of sows extracted by convolutional neural networks (CNN) and the artificially defined phenotypic features of sows such as hip width, hip height, body length, hip height-width ratio, length-width ratio, and waist-hip ratio, were used respectively, combined with BGR, to construct a prediction model for sow BFT using support vector regression (SVR). Following testing and comparison, it was shown that using CNN to extract features from images could effectively replace artificially defined features, BGR contributed to the model's accuracy improvement. The CNN-BGR-SVR model performed the best, with R2 of 0.72 and mean absolute error of 1.21 mm, and root mean square error of 1.50 mm, and mean absolute percentage error of 7.57%. The results demonstrated that the CNN-BGR-SVR model based on 2D images was capable of detecting sow BFT, establishing a new reference for non-contact sow BFT detection technology.
Collapse
Affiliation(s)
- Mengyuan Yu
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Hongya Zheng
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Dihong Xu
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Yonghui Shuai
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Shanfeng Tian
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Tingjin Cao
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Mingyan Zhou
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Yuhua Zhu
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Shuhong Zhao
- College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Xuan Li
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, Hubei, China.,Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
13
|
Konstantakos V, Nentidis A, Krithara A, Paliouras G. CRISPR-Cas9 gRNA efficiency prediction: an overview of predictive tools and the role of deep learning. Nucleic Acids Res 2022; 50:3616-3637. [PMID: 35349718 PMCID: PMC9023298 DOI: 10.1093/nar/gkac192] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/09/2022] [Accepted: 03/28/2022] [Indexed: 12/26/2022] Open
Abstract
The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) system has become a successful and promising technology for gene-editing. To facilitate its effective application, various computational tools have been developed. These tools can assist researchers in the guide RNA (gRNA) design process by predicting cleavage efficiency and specificity and excluding undesirable targets. However, while many tools are available, assessment of their application scenarios and performance benchmarks are limited. Moreover, new deep learning tools have been explored lately for gRNA efficiency prediction, but have not been systematically evaluated. Here, we discuss the approaches that pertain to the on-target activity problem, focusing mainly on the features and computational methods they utilize. Furthermore, we evaluate these tools on independent datasets and give some suggestions for their usage. We conclude with some challenges and perspectives about future directions for CRISPR-Cas9 guide design.
Collapse
Affiliation(s)
- Vasileios Konstantakos
- Institute of Informatics and Telecommunications, NCSR Demokritos, Patr. Gregoriou E & 27 Neapoleos Str, 15341 Athens, Greece
| | - Anastasios Nentidis
- Institute of Informatics and Telecommunications, NCSR Demokritos, Patr. Gregoriou E & 27 Neapoleos Str, 15341 Athens, Greece
- School of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Anastasia Krithara
- Institute of Informatics and Telecommunications, NCSR Demokritos, Patr. Gregoriou E & 27 Neapoleos Str, 15341 Athens, Greece
| | - Georgios Paliouras
- Institute of Informatics and Telecommunications, NCSR Demokritos, Patr. Gregoriou E & 27 Neapoleos Str, 15341 Athens, Greece
| |
Collapse
|
14
|
Li B, Ai D, Liu X. CNN-XG: A Hybrid Framework for sgRNA On-Target Prediction. Biomolecules 2022; 12:409. [PMID: 35327601 PMCID: PMC8945678 DOI: 10.3390/biom12030409] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/23/2022] [Accepted: 03/03/2022] [Indexed: 02/04/2023] Open
Abstract
As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode.
Collapse
Affiliation(s)
- Bohao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
- Basic Experimental Center of Natural Science, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| |
Collapse
|
15
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
16
|
Ahmadi F, Quach ABV, Shih SCC. Is microfluidics the "assembly line" for CRISPR-Cas9 gene-editing? BIOMICROFLUIDICS 2020; 14:061301. [PMID: 33262863 PMCID: PMC7688342 DOI: 10.1063/5.0029846] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 11/09/2020] [Indexed: 06/12/2023]
Abstract
Acclaimed as one of the biggest scientific breakthroughs, the technology of CRISPR has brought significant improvement in the biotechnological spectrum-from editing genetic defects in diseases for gene therapy to modifying organisms for the production of biofuels. Since its inception, the CRISPR-Cas9 system has become easier and more versatile to use. Many variants have been found, giving the CRISPR toolkit a great range that includes the activation and repression of genes aside from the previously known knockout and knockin of genes. Here, in this Perspective, we describe efforts on automating the gene-editing workflow, with particular emphasis given on the use of microfluidic technology. We discuss how automation can address the limitations of gene-editing and how the marriage between microfluidics and gene-editing will expand the application space of CRISPR.
Collapse
Affiliation(s)
| | | | - Steve C. C. Shih
- Author to whom correspondence should be addressed:. Tel.: +1-(514) 848-2424 x7579
| |
Collapse
|