1
|
Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.
Collapse
|
2
|
Pulido-Quetglas C, Johnson R. Designing libraries for pooled CRISPR functional screens of long noncoding RNAs. Mamm Genome 2022; 33:312-327. [PMID: 34533605 PMCID: PMC9114037 DOI: 10.1007/s00335-021-09918-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/09/2021] [Indexed: 02/01/2023]
Abstract
Human and other genomes encode tens of thousands of long noncoding RNAs (lncRNAs), the vast majority of which remain uncharacterised. High-throughput functional screening methods, notably those based on pooled CRISPR-Cas perturbations, promise to unlock the biological significance and biomedical potential of lncRNAs. Such screens are based on libraries of single guide RNAs (sgRNAs) whose design is critical for success. Few off-the-shelf libraries are presently available, and lncRNAs tend to have cell-type-specific expression profiles, meaning that library design remains in the hands of researchers. Here we introduce the topic of pooled CRISPR screens for lncRNAs and guide readers through the three key steps of library design: accurate annotation of transcript structures, curation of optimal candidate sets, and design of sgRNAs. This review is a starting point and reference for researchers seeking to design custom CRISPR screening libraries for lncRNAs.
Collapse
Affiliation(s)
- Carlos Pulido-Quetglas
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland
- Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland
- Graduate School of Cellular and Biomedical Sciences, University of Bern, 3012, Bern, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010, Bern, Switzerland.
- Department for BioMedical Research, University of Bern, 3008, Bern, Switzerland.
- School of Biology and Environmental Science, University College Dublin, Dublin, D04 V1W8, Ireland.
- Conway Institute for Biomolecular and Biomedical Research, University College Dublin, Dublin, D04 V1W8, Ireland.
| |
Collapse
|
3
|
Hu WX, Rong Y, Guo Y, Jiang F, Tian W, Chen H, Dong SS, Yang TL. ExsgRNA: reduce off-target efficiency by on-target mismatched sgRNA. Brief Bioinform 2022; 23:6587171. [PMID: 35580855 DOI: 10.1093/bib/bbac183] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/08/2022] [Accepted: 04/23/2022] [Indexed: 11/13/2022] Open
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 gene editing technology has been widely used to facilitate efficient genome editing. Current popular sgRNA design tools only consider the sgRNA perfectly matched to the target site and provide the results without any on-target mismatch. We suppose taking on-target gRNA-DNA mismatches into consideration might provide better sgRNA with similar binding activity and reduced off-target sites. Here, we trained a seq2seq-attention model with feedback-loop architecture, to automatically generate sgRNAs with on-target mismatches. Dual-luciferase reporter experiment showed that multiple sgRNAs with three mismatches could achieve the 80% of the relative activity of the perfect matched sgRNA. Meanwhile, it could reduce the number of off-target sites using sgRNAs with on-target mismatches. Finally, we provided a freely accessible web server sgRNA design tool named ExsgRNA. Users could submit their target sequence to this server and get optimal sgRNAs with less off-targets and similar on-target activity compared with the perfect-matched sgRNA.
Collapse
Affiliation(s)
- Wei-Xin Hu
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Yu Rong
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Yan Guo
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Feng Jiang
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Wen Tian
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Hao Chen
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| | - Shan-Shan Dong
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China.,Research Institute of Xi'an Jiaotong University, Zhejiang, China
| | - Tie-Lin Yang
- Biomedical Informatics & Genomics Center, Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, P.R. China
| |
Collapse
|
4
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
5
|
Dou L, Zhou W, Zhang L, Xu L, Han K. Accurate identification of RNA D modification using multiple features. RNA Biol 2021; 18:2236-2246. [PMID: 33729104 PMCID: PMC8632091 DOI: 10.1080/15476286.2021.1898160] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/13/2021] [Accepted: 02/23/2021] [Indexed: 10/21/2022] Open
Abstract
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, SichuanChina
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, HeilongjiangChina
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, GuangdongChina
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, HeilongjiangChina
| |
Collapse
|
6
|
Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021; 105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]
Abstract
KEY MESSAGE We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
7
|
Liang G, Wu J, Xu L. A prognosis-related based method for miRNA selection on liver hepatocellular carcinoma prediction. Comput Biol Chem 2021; 91:107433. [PMID: 33540232 DOI: 10.1016/j.compbiolchem.2020.107433] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/16/2020] [Accepted: 12/20/2020] [Indexed: 12/18/2022]
Abstract
Hepatocellular carcinoma (HCC) is considered as the sixth most common cancer in the world, and it is also considered as one of the causes of death. Moreover, the poor prognosis of recurrence of HCC after surgery and metastasis is also a big problem for human health. If the disease can be diagnosed earlier, the survival rate of the patients will be improved significantly. In the early stage of hepatocellular carcinoma, the expression of miRNAs is likely to become abnormal. In our work, the expression profile of miRNAs of human HCC in cancer tissue is compared with their adjacent tissue samples collected from tumor cancer genomic Atlas (TCGA) platform, then the genes with significant difference are selected by Limma test. Selected genes are referred to predict miRNAs related to the prognosis of HCC patients. Finally, miRNAs regulated by target genes are selected by our method, and the experimental results demonstrated that our method is more efficient than biology wet experimental method with lower cost.
Collapse
Affiliation(s)
- Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, 518000, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, 518000, China.
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, 518000, China.
| |
Collapse
|
8
|
Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An Interpretable Prediction Model for Identifying N 7-Methylguanosine Sites Based on XGBoost and SHAP. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:362-372. [PMID: 33230441 PMCID: PMC7533297 DOI: 10.1016/j.omtn.2020.08.022] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/20/2020] [Indexed: 12/19/2022]
Abstract
Recent studies have increasingly shown that the chemical modification of mRNA plays an important role in the regulation of gene expression. N7-methylguanosine (m7G) is a type of positively-charged mRNA modification that plays an essential role for efficient gene expression and cell viability. However, the research on m7G has received little attention to date. Bioinformatics tools can be applied as auxiliary methods to identify m7G sites in transcriptomes. In this study, we develop a novel interpretable machine learning-based approach termed XG-m7G for the differentiation of m7G sites using the XGBoost algorithm and six different types of sequence-encoding schemes. Both 10-fold and jackknife cross-validation tests indicate that XG-m7G outperforms iRNA-m7G. Moreover, using the powerful SHAP algorithm, this new framework also provides desirable interpretations of the model performance and highlights the most important features for identifying m7G sites. XG-m7G is anticipated to serve as a useful tool and guide for researchers in their future studies of mRNA modification sites.
Collapse
Affiliation(s)
- Yue Bi
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Dongxu Xiang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
9
|
Gu X, Chen Z, Wang D. Prediction of G Protein-Coupled Receptors With CTDC Extraction and MRMD2.0 Dimension-Reduction Methods. Front Bioeng Biotechnol 2020; 8:635. [PMID: 32671038 PMCID: PMC7329982 DOI: 10.3389/fbioe.2020.00635] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 05/26/2020] [Indexed: 11/13/2022] Open
Abstract
The G Protein-Coupled Receptor (GPCR) family consists of more than 800 different members. In this article, we attempt to use the physicochemical properties of Composition, Transition, Distribution (CTD) to represent GPCRs. The dimensionality reduction method of MRMD2.0 filters the physicochemical properties of GPCR redundancy. Matplotlib plots the coordinates to distinguish GPCRs from other protein sequences. The chart data show a clear distinction effect, and there is a well-defined boundary between the two. The experimental results show that our method can predict GPCRs.
Collapse
Affiliation(s)
- Xingyue Gu
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Zhihua Chen
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| |
Collapse
|