1
|
Yang Y, Zhong Y, Chen L. EIciRNAs in focus: current understanding and future perspectives. RNA Biol 2025; 22:1-12. [PMID: 39711231 DOI: 10.1080/15476286.2024.2443876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 11/14/2024] [Accepted: 12/09/2024] [Indexed: 12/24/2024] Open
Abstract
Circular RNAs (circRNAs) are a unique class of covalently closed single-stranded RNA molecules that play diverse roles in normal physiology and pathology. Among the major types of circRNA, exon-intron circRNA (EIciRNA) distinguishes itself by its sequence composition and nuclear localization. Recent RNA-seq technologies and computational methods have facilitated the detection and characterization of EIciRNAs, with features like circRNA intron retention (CIR) and tissue-specificity being characterized. EIciRNAs have been identified to exert their functions via mechanisms such as regulating gene transcription, and the physiological relevance of EIciRNAs has been reported. Within this review, we present a summary of the current understanding of EIciRNAs, delving into their identification and molecular functions. Additionally, we emphasize factors regulating EIciRNA biogenesis and the physiological roles of EIciRNAs based on recent research. We also discuss the future challenges in EIciRNA exploration, underscoring the potential for novel functions and functional mechanisms of EIciRNAs for further investigation.
Collapse
Affiliation(s)
- Yan Yang
- Department of Cardiology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
- Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, China
| | - Yinchun Zhong
- Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, China
- Department of Clinical Laboratory, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| | - Liang Chen
- Department of Cardiology, The First Affiliated Hospital of USTC, School of Basic Medical Sciences, Division of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| |
Collapse
|
2
|
Han B, Bai S, Liu Y, Wu J, Feng X, Xin R. Definer: A computational method for accurate identification of RNA pseudouridine sites based on deep learning. PLoS One 2025; 20:e0320077. [PMID: 40273178 PMCID: PMC12021131 DOI: 10.1371/journal.pone.0320077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Accepted: 02/12/2025] [Indexed: 04/26/2025] Open
Abstract
Pseudouridine is an important modification site, which is widely present in a variety of non-coding RNAs and is involved in a variety of important biological processes. Studies have shown that pseudouridine is important in many biological functions such as gene expression, RNA structural stability, and various diseases. Therefore, accurate identification of pseudouridine sites can effectively explain the functional mechanism of this modification site. Due to the rapid increase of genomics data, traditional biological experimental methods to identify RNA modification sites can no longer meet the practical needs, and it is necessary to accurately identify pseudouridine sites from high-throughput RNA sequence data by computational methods. In this study, we propose a deep learning-based computational method, Definer, to accurately identify RNA pseudouridine loci in three species, Homo sapiens, Saccharomyces cerevisiae and Mus musculus. The method incorporates two sequence coding schemes, including NCP and One-hot, and then feeds the extracted RNA sequence features into a deep learning model constructed from CNN, GRU and Attention. The benchmark dataset contains data from three species, H. sapiens, S. cerevisiae and M. musculus, and the results using 10-fold cross-validation show that Definer significantly outperforms other existing methods. Meanwhile, the data sets of two species, H. sapiens and S. cerevisiae, were tested independently to further demonstrate the predictive ability of the model. In summary, our method, Definer, can accurately identify pseudouridine modification sites in RNA.
Collapse
Affiliation(s)
- Bo Han
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Sudan Bai
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Yang Liu
- Jilin Chemical Hospital, Jilin, P.R. China
| | - Jiezhang Wu
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, P.R. China
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, P.R. China
| |
Collapse
|
3
|
Yu H, Yu Y, Xia Y. circ2LO: Identification of CircRNA Based on the LucaOne Large Model. Genes (Basel) 2025; 16:413. [PMID: 40282373 PMCID: PMC12026638 DOI: 10.3390/genes16040413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2025] [Revised: 03/25/2025] [Accepted: 03/28/2025] [Indexed: 04/29/2025] Open
Abstract
Circular RNA is a type of noncoding RNA with a special covalent bond structure. As an endogenous RNA in animals and plants, it is formed through RNA splicing. The 5' and 3' ends of the exons form circular RNA at the back-splicing sites. Circular RNA plays an important regulatory role in diseases by interacting with the associated miRNAs. Accurate identification of circular RNA can enrich the data on circular RNA and provide new ideas for drug development. At present, mainstream circular RNA recognition algorithms are divided into two categories: those based on RNA sequence position information and those based on RNA sequence biometric information. Herein, we propose a method for the recognition of circular RNA, called circ2LO, which utilizes the LucaOne large model for feature embedding of the splicing sites of RNA sequences as well as their upstream and downstream sequences to prevent semantic information loss caused by the traditional one-hot encoding method. Subsequently, it employs a convolutional layer to extract features and a self-attention mechanism to extract interactive features to accurately capture the core features of the circular RNA at the splicing sites. Finally, it uses a fully connected layer to identify circular RNA. The accuracy of circ2LO on the human dataset reached 95.47%, which is higher than the values shown by existing methods. It also achieved accuracies of 97.04% and 72.04% on the Arabidopsis and mouse datasets, respectively, demonstrating good robustness. Through rigorous validation, the circ2LO model has proven its high-precision identification capability for circular RNAs, marking it as a potentially transformative analytical platform in the circRNA research field.
Collapse
Affiliation(s)
- Haihao Yu
- Computer Science and Technology College, Heilongjiang Institute of Technology, No. 999 Hongqi Street, Harbin 150009, China;
| | - Yue Yu
- College of Animal Science, Jilin University, No. 1977 Xinzhu Road, Changchun 130012, China;
| | - Yanling Xia
- College of Wildlife and Protected Area, Northeast Forestry University, No. 26 Hexing Road, Harbin 150040, China
| |
Collapse
|
4
|
Peng L, Li H, Yuan S, Meng T, Chen Y, Fu X, Cao D. metaCDA: A Novel Framework for CircRNA-Driven Drug Discovery Utilizing Adaptive Aggregation and Meta-Knowledge Learning. J Chem Inf Model 2025; 65:2129-2144. [PMID: 39937612 DOI: 10.1021/acs.jcim.4c02193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2025]
Abstract
In the emerging field of RNA drugs, circular RNA (circRNA) has attracted much attention as a novel multifunctional therapeutic target. Delving deeper into the intricate interactions between circRNA and disease is critical for driving drug discovery efforts centered around circRNAs. Current computational methods face two significant limitations: a lack of aggregate information in heterogeneous graph networks and a lack of higher-order fusion information. To this end, we present a novel approach, metaCDA, which utilizes meta-knowledge and adaptive aggregate learning to improve the accuracy of circRNA and disease association predictions and addresses the limitations of both. We calculate multiple similarity measures between disease and circRNA, construct a heterogeneous graph based on these, and apply meta-networks to extract meta-knowledge from the heterogeneous graph, so that the constructed heterogeneous maps have adaptive contrast enhancement information. Then, we construct a nodal adaptive attention aggregation system, which integrates a multihead attention mechanism and a nodal adaptive attention aggregation mechanism, so as to achieve accurate capture of higher-order fusion information. We conducted extensive experiments, and the results show that metaCDA outperforms existing state-of-the-art models and can effectively predict disease-associated circRNA, opening up new prospects for circRNA-driven drug discovery.
Collapse
Affiliation(s)
- Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411100, China
| | - Huaping Li
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411100, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, North Carolina 28223-0001, United States
| | - Tao Meng
- College of Computer and Mathematics, Central South University of Forestry and Technology, Changsha, Hunan 410004, China
| | - Yifan Chen
- Institute of Artificial Intelligence Application, College of Computer and Mathematics, Central South University of Forestry and Technology, Changsha, Hunan 410004, P. R. China
| | - Xiangzheng Fu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR 999077, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410003, China
| |
Collapse
|
5
|
Shukla R, Singh TR. AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data. Sci Rep 2024; 14:30294. [PMID: 39639110 PMCID: PMC11621786 DOI: 10.1038/s41598-024-82208-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 12/03/2024] [Indexed: 12/07/2024] Open
Abstract
AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .
Collapse
Affiliation(s)
- Rohit Shukla
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India
- Center of Excellence for Aging and Brain Repair, Morsani College of Medicine, University of South Florida, Tampa, 33613, FL, USA
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
- Centre of Healthcare Technologies and Informatics (CEHTI), Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
| |
Collapse
|
6
|
Pradhan UK, Behera P, Das R, Naha S, Gupta A, Parsad R, Pradhan SK, Meher PK. AScirRNA: A novel computational approach to discover abiotic stress-responsive circular RNAs in plant genome. Comput Biol Chem 2024; 113:108205. [PMID: 39265460 DOI: 10.1016/j.compbiolchem.2024.108205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 07/12/2024] [Accepted: 09/04/2024] [Indexed: 09/14/2024]
Abstract
In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Prasanjit Behera
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sukanta Kumar Pradhan
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| |
Collapse
|
7
|
Digby B, Finn S, Ó Broin P. Computational approaches and challenges in the analysis of circRNA data. BMC Genomics 2024; 25:527. [PMID: 38807085 PMCID: PMC11134749 DOI: 10.1186/s12864-024-10420-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/15/2024] [Indexed: 05/30/2024] Open
Abstract
Circular RNAs (circRNA) are a class of non-coding RNA, forming a single-stranded covalently closed loop structure generated via back-splicing. Advancements in sequencing methods and technologies in conjunction with algorithmic developments of bioinformatics tools have enabled researchers to characterise the origin and function of circRNAs, with practical applications as a biomarker of diseases becoming increasingly relevant. Computational methods developed for circRNA analysis are predicated on detecting the chimeric back-splice junction of circRNAs whilst mitigating false-positive sequencing artefacts. In this review, we discuss in detail the computational strategies developed for circRNA identification, highlighting a selection of tool strengths, weaknesses and assumptions. In addition to circRNA identification tools, we describe methods for characterising the role of circRNAs within the competing endogenous RNA (ceRNA) network, their interactions with RNA-binding proteins, and publicly available databases for rich circRNA annotation.
Collapse
Affiliation(s)
- Barry Digby
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland.
| | - Stephen Finn
- Discipline of Histopathology, School of Medicine, Trinity College Dublin and Cancer Molecular Diagnostic Laboratory, Dublin, Ireland
| | - Pilib Ó Broin
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| |
Collapse
|
8
|
Niu M, Wang C, Chen Y, Zou Q, Qi R, Xu L. CircRNA identification and feature interpretability analysis. BMC Biol 2024; 22:44. [PMID: 38408987 PMCID: PMC10898045 DOI: 10.1186/s12915-023-01804-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 12/18/2023] [Indexed: 02/28/2024] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. RESULTS We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. CONCLUSIONS CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git .
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic University, Shenzhen, 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.4 Block 2 North Jianshe Road, Chengdu, 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.4 Block 2 North Jianshe Road, Chengdu, 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Ren Qi
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China.
| |
Collapse
|
9
|
Niu M, Wang C, Chen Y, Zou Q, Xu L. Identification, characterization and expression analysis of circRNA encoded by SARS-CoV-1 and SARS-CoV-2. Brief Bioinform 2024; 25:bbad537. [PMID: 38279648 PMCID: PMC10818166 DOI: 10.1093/bib/bbad537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/12/2023] [Accepted: 12/22/2023] [Indexed: 01/28/2024] Open
Abstract
Virus-encoded circular RNA (circRNA) participates in the immune response to viral infection, affects the human immune system, and can be used as a target for precision therapy and tumor biomarker. The coronaviruses SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) that have emerged in recent years are highly contagious and have high mortality rates. In coronaviruses, little is known about the circRNA encoded by the SARS-CoV-1/2. Therefore, this study explores whether SARS-CoV-1/2 encodes circRNA and characteristics and functions of circRNA. Based on RNA-seq data of SARS-CoV-1 and SARS-CoV-2 infections, we used circRNA identification tools (circRNA_finder, find_circ and CIRI2) to identify circRNAs. The number of circRNAs encoded by SARS-CoV-1 and SARS-CoV-2 was identified as 151 and 470, respectively. It can be found that SARS-CoV-2 shows more prominent circRNA encoding ability than SARS-CoV-1. Expression analysis showed that only a few circRNAs encoded by SARS-CoV-1/2 showed high expression levels, and the positive strand produced more abundant circRNAs. Then, based on the identified SARS-CoV-1/2-encoded circRNAs, we performed circRNA identification and characterization using the previously developed CirRNAPL. Finally, target gene prediction and functional enrichment analysis were performed. It was found that viral circRNA is closely related to cancer and has a potential role in regulating host cell functions. This study studied the characteristics and functions of viral circRNA encoded by coronavirus SARS-CoV-1/2, providing a valuable resource for further research on the function and molecular mechanism of coronavirus circRNA.
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150000, China
| | - Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.4 Block 2 North Jianshe Road, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.4 Block 2 North Jianshe Road, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518055, China
| |
Collapse
|
10
|
Feng XY, Zhu SX, Pu KJ, Huang HJ, Chen YQ, Wang WT. New insight into circRNAs: characterization, strategies, and biomedical applications. Exp Hematol Oncol 2023; 12:91. [PMID: 37828589 PMCID: PMC10568798 DOI: 10.1186/s40164-023-00451-w] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 09/23/2023] [Indexed: 10/14/2023] Open
Abstract
Circular RNAs (circRNAs) are a class of covalently closed, endogenous ncRNAs. Most circRNAs are derived from exonic or intronic sequences by precursor RNA back-splicing. Advanced high-throughput RNA sequencing and experimental technologies have enabled the extensive identification and characterization of circRNAs, such as novel types of biogenesis, tissue-specific and cell-specific expression patterns, epigenetic regulation, translation potential, localization and metabolism. Increasing evidence has revealed that circRNAs participate in diverse cellular processes, and their dysregulation is involved in the pathogenesis of various diseases, particularly cancer. In this review, we systematically discuss the characterization of circRNAs, databases, challenges for circRNA discovery, new insight into strategies used in circRNA studies and biomedical applications. Although recent studies have advanced the understanding of circRNAs, advanced knowledge and approaches for circRNA annotation, functional characterization and biomedical applications are continuously needed to provide new insights into circRNAs. The emergence of circRNA-based protein translation strategy will be a promising direction in the field of biomedicine.
Collapse
Affiliation(s)
- Xin-Yi Feng
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Shun-Xin Zhu
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Ke-Jia Pu
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Heng-Jing Huang
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Yue-Qin Chen
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| | - Wen-Tao Wang
- MOE Key Laboratory of Gene Function and Regulation, Guangdong Province Key Laboratory of Pharmaceutical Functional Genes, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
| |
Collapse
|
11
|
Yuan L, Zhao J, Shen Z, Zhang Q, Geng Y, Zheng CH, Huang DS. iCircDA-NEAE: Accelerated attribute network embedding and dynamic convolutional autoencoder for circRNA-disease associations prediction. PLoS Comput Biol 2023; 19:e1011344. [PMID: 37651321 PMCID: PMC10470932 DOI: 10.1371/journal.pcbi.1011344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/10/2023] [Indexed: 09/02/2023] Open
Abstract
Accumulating evidence suggests that circRNAs play crucial roles in human diseases. CircRNA-disease association prediction is extremely helpful in understanding pathogenesis, diagnosis, and prevention, as well as identifying relevant biomarkers. During the past few years, a large number of deep learning (DL) based methods have been proposed for predicting circRNA-disease association and achieved impressive prediction performance. However, there are two main drawbacks to these methods. The first is these methods underutilize biometric information in the data. Second, the features extracted by these methods are not outstanding to represent association characteristics between circRNAs and diseases. In this study, we developed a novel deep learning model, named iCircDA-NEAE, to predict circRNA-disease associations. In particular, we use disease semantic similarity, Gaussian interaction profile kernel, circRNA expression profile similarity, and Jaccard similarity simultaneously for the first time, and extract hidden features based on accelerated attribute network embedding (AANE) and dynamic convolutional autoencoder (DCAE). Experimental results on the circR2Disease dataset show that iCircDA-NEAE outperforms other competing methods significantly. Besides, 16 of the top 20 circRNA-disease pairs with the highest prediction scores were validated by relevant literature. Furthermore, we observe that iCircDA-NEAE can effectively predict new potential circRNA-disease associations.
Collapse
Affiliation(s)
- Lin Yuan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Jiawang Zhao
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| | - Qinhu Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| | - Yushui Geng
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Engineering Research Center of Big Data Applied Technology, Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Ningbo, China
| |
Collapse
|
12
|
Rebolledo C, Silva JP, Saavedra N, Maracaja-Coutinho V. Computational approaches for circRNAs prediction and in silico characterization. Brief Bioinform 2023; 24:7150741. [PMID: 37139555 DOI: 10.1093/bib/bbad154] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 03/20/2023] [Accepted: 03/30/2023] [Indexed: 05/05/2023] Open
Abstract
Circular RNAs (circRNAs) are single-stranded and covalently closed non-coding RNA molecules originated from RNA splicing. Their functions include regulatory potential over other RNA species, such as microRNAs, messenger RNAs and RNA binding proteins. For circRNA identification, several algorithms are available and can be classified in two major types: pseudo-reference-based and split-alignment-based approaches. In general, the data generated from circRNA transcriptome initiatives is deposited on public specific databases, which provide a large amount of information on different species and functional annotations. In this review, we describe the main computational resources for the identification and characterization of circRNAs, covering the algorithms and predictive tools to evaluate its potential role in a particular transcriptomics project, including the public repositories containing relevant data and information for circRNAs, recapitulating their characteristics, reliability and amount of data reported.
Collapse
Affiliation(s)
- Camilo Rebolledo
- Center of Molecular Biology & Pharmacogenetics, Department of Basic Sciences, Scientific and Technological Resources, Universidad de La Frontera, Temuco, Chile
- Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
| | - Juan Pablo Silva
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
| | - Nicolás Saavedra
- Center of Molecular Biology & Pharmacogenetics, Department of Basic Sciences, Scientific and Technological Resources, Universidad de La Frontera, Temuco, Chile
| | - Vinicius Maracaja-Coutinho
- Advanced Center for Chronic Diseases - ACCDiS, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- Centro de Modelamiento Molecular, Biofísica y Bioinformática - CM2B2, Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
- ANID Anillo ACT210004 SYSTEMIX, Rancagua, Chile
- Anillo Inflammation in HIV/AIDS - InflammAIDS, Santiago, Chile
| |
Collapse
|
13
|
Wu P, Nie Z, Huang Z, Zhang X. CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model. PLANTS (BASEL, SWITZERLAND) 2023; 12:1652. [PMID: 37111874 PMCID: PMC10143888 DOI: 10.3390/plants12081652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/10/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 - 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge.
Collapse
Affiliation(s)
- Pengpeng Wu
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Life Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhenjun Nie
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhiqiang Huang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Xiaodan Zhang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
14
|
Wang W, Sun L, Huang MT, Quan Y, Jiang T, Miao Z, Zhang Q. Regulatory circular RNAs in viral diseases: applications in diagnosis and therapy. RNA Biol 2023; 20:847-858. [PMID: 37882652 PMCID: PMC10730172 DOI: 10.1080/15476286.2023.2272118] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2023] [Indexed: 10/27/2023] Open
Abstract
Circular RNA (circRNA) forms closed loops via back-splicing in precursor mRNA, resisting exonuclease degradation. In higher eukaryotes, protein-coding genes create circRNAs through exon back-splicing. Unlike mRNAs, circRNAs possess unique production and structural traits, bestowing distinct cellular functions and biomedical potential. In this review, we explore the pivotal roles of viral circRNAs and associated RNA in various biological processes. Analysing the interactions between viral circRNA and host cellular machinery yields fresh insights into antiviral immunity, catalysing the development of potential therapeutics. Furthermore, circRNAs serve as enduring biomarkers in viral diseases due to their stable translation within specific tissues. Additionally, a deeper understanding of translational circRNA could expedite the establishment of circRNA-based expression platforms, meeting the rising demand for broad-spectrum viral vaccines. We also highlight the applications of circular RNA in biomarker studies as well as circRNA-based therapeutics. Prospectively, we expect a technological revolution in combating viral infections using circRNA.
Collapse
Affiliation(s)
- Wei Wang
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
| | - Lei Sun
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
| | - Meng-Ting Huang
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Yun Quan
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
| | - Tao Jiang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Zhichao Miao
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
- Shanghai Key Laboratory of Anesthesiology and Brain Functional Modulation, Clinical Research Center for Anesthesiology and Perioperative Medicine, Translational Research Institute of Brain and Brain-Like Intelligence, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Qiong Zhang
- Guangzhou National Laboratory, Guangzhou, Guangdong, China
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| |
Collapse
|
15
|
Wang X, Liu Y, Li J, Wang G. StackCirRNAPred: computational classification of long circRNA from other lncRNA based on stacking strategy. BMC Bioinformatics 2022; 23:563. [PMID: 36575368 PMCID: PMC9793644 DOI: 10.1186/s12859-022-05118-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND CircRNAs are essential for the regulation of post-transcriptional gene expression, including as miRNA sponges, and play an important role in disease development. Some computational tools have been proposed recently to predict circRNA, since only one classifier is used, there is still much that can be done to improve the performance. RESULTS StackCirRNAPred was proposed, the computational classification of long circRNA from other lncRNA based on stacking strategy. In order to cope with the potential problem that a single feature might not be able to distinguish circRNA well from other lncRNA, we first extracted features from different sources, including nucleic acid composition, sequence spatial features and physicochemical properties, Alu and tandem repeats. We innovatively apply the stacking strategy to integrate the more advantageous classifiers of RF, LightGBM, XGBoost. This allows the model to incorporate these features more flexibly. StackCirRNAPred was found to be significantly better than other tools, with precision, accuracy, F1, recall and MCC of 0.843, 0.833, 0.831, 0.819 and 0.666 respectively. We tested it directly on the mouse dataset. StackCirRNAPred was still significantly better than other methods, with precision, accuracy, F1, recall and MCC of 0.837, 0.839, 0.839, 0.841, 0.677. CONCLUSIONS We proposed StackCirRNAPred based on stacking strategy to distinguish long circRNAs from other lncRNAs. With the test results demonstrating the validity and robustness of StackCirRNAPred, we hope StackCirRNAPred will complement existing circRNA prediction methods and is helpful in down-stream research.
Collapse
Affiliation(s)
- Xin Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Liu
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Li
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
16
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
Wang J, Lu S, Wang SH, Zhang YD. A review on extreme learning machine. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:41611-41660. [DOI: 10.1007/s11042-021-11007-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 02/26/2021] [Accepted: 05/05/2021] [Indexed: 08/30/2023]
Abstract
AbstractExtreme learning machine (ELM) is a training algorithm for single hidden layer feedforward neural network (SLFN), which converges much faster than traditional methods and yields promising performance. In this paper, we hope to present a comprehensive review on ELM. Firstly, we will focus on the theoretical analysis including universal approximation theory and generalization. Then, the various improvements are listed, which help ELM works better in terms of stability, efficiency, and accuracy. Because of its outstanding performance, ELM has been successfully applied in many real-time learning tasks for classification, clustering, and regression. Besides, we report the applications of ELM in medical imaging: MRI, CT, and mammogram. The controversies of ELM were also discussed in this paper. We aim to report these advances and find some future perspectives.
Collapse
|
18
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Circ-LocNet: A Computational Framework for Circular RNA Sub-Cellular Localization Prediction. Int J Mol Sci 2022; 23:ijms23158221. [PMID: 35897818 PMCID: PMC9329987 DOI: 10.3390/ijms23158221] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/15/2022] [Accepted: 07/20/2022] [Indexed: 02/04/2023] Open
Abstract
Circular ribonucleic acids (circRNAs) are novel non-coding RNAs that emanate from alternative splicing of precursor mRNA in reversed order across exons. Despite the abundant presence of circRNAs in human genes and their involvement in diverse physiological processes, the functionality of most circRNAs remains a mystery. Like other non-coding RNAs, sub-cellular localization knowledge of circRNAs has the aptitude to demystify the influence of circRNAs on protein synthesis, degradation, destination, their association with different diseases, and potential for drug development. To date, wet experimental approaches are being used to detect sub-cellular locations of circular RNAs. These approaches help to elucidate the role of circRNAs as protein scaffolds, RNA-binding protein (RBP) sponges, micro-RNA (miRNA) sponges, parental gene expression modifiers, alternative splicing regulators, and transcription regulators. To complement wet-lab experiments, considering the progress made by machine learning approaches for the determination of sub-cellular localization of other non-coding RNAs, the paper in hand develops a computational framework, Circ-LocNet, to precisely detect circRNA sub-cellular localization. Circ-LocNet performs comprehensive extrinsic evaluation of 7 residue frequency-based, residue order and frequency-based, and physio-chemical property-based sequence descriptors using the five most widely used machine learning classifiers. Further, it explores the performance impact of K-order sequence descriptor fusion where it ensembles similar as well dissimilar genres of statistical representation learning approaches to reap the combined benefits. Considering the diversity of statistical representation learning schemes, it assesses the performance of second-order, third-order, and going all the way up to seventh-order sequence descriptor fusion. A comprehensive empirical evaluation of Circ-LocNet over a newly developed benchmark dataset using different settings reveals that standalone residue frequency-based sequence descriptors and tree-based classifiers are more suitable to predict sub-cellular localization of circular RNAs. Further, K-order heterogeneous sequence descriptors fusion in combination with tree-based classifiers most accurately predict sub-cellular localization of circular RNAs. We anticipate this study will act as a rich baseline and push the development of robust computational methodologies for the accurate sub-cellular localization determination of novel circRNAs.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
- Correspondence:
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- School of Computer Science & Electrical Engineering, National University of Sciences and Technology, Islamabad 44000, Pakistan;
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
19
|
Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.
Collapse
|
20
|
An Improved Multi-Label Learning Method with ELM-RBF and a Synergistic Adaptive Genetic Algorithm. ALGORITHMS 2022. [DOI: 10.3390/a15060185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Profiting from the great progress of information technology, a huge number of multi-label samples are available in our daily life. As a result, multi-label classification has aroused widespread concern. Different from traditional machine learning methods which are time-consuming during the training phase, ELM-RBF (extreme learning machine-radial basis function) is more efficient and has become a research hotspot in multi-label classification. However, because of the lack of effective optimization methods, conventional extreme learning machines are always unstable and tend to fall into local optimum, which leads to low prediction accuracy in practical applications. To this end, a modified ELM-RBF with a synergistic adaptive genetic algorithm (ELM-RBF-SAGA) is proposed in this paper. In ELM-RBF-SAGA, we present a synergistic adaptive genetic algorithm (SAGA) to optimize the performance of ELM-RBF. In addition, two optimization methods are employed collaboratively in SAGA. One is used for adjusting the range of fitness value, the other is applied to update crossover and mutation probability. Sufficient experiments show that ELM-RBF-SAGA has excellent performance in multi-label classification.
Collapse
|
21
|
Mi Z, Zhongqiang C, Caiyun J, Yanan L, Jianhua W, Liang L. Circular RNA detection methods: A minireview. Talanta 2022; 238:123066. [PMID: 34808570 DOI: 10.1016/j.talanta.2021.123066] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/11/2021] [Accepted: 11/12/2021] [Indexed: 12/21/2022]
Abstract
Circular RNA (circRNA), a novel type of covalently closed RNA, is implicated in several developmental and metabolic disease processes. CircRNAs exhibit tissue-specific expression, and are stable, abundant, and highly conserved, making them ideal biomarkers for diagnosis and prognosis. Accurate profiling of circRNA, however, is a prerequisite for their clinical application. Traditional methods such as northern blotting, RT-qPCR, and microarray analysis provide useful but limited information. To address these issues, a number of novel assays have recently emerged, such as droplet digital PCR (ddPCR), isothermal exponential amplification, and rolling cycle amplification, which increase the sensitivity and specificity of circRNA detection. Herein, we summarize the advantages and limitations of the new detection methods and discuss the challenges as well as future directions.
Collapse
Affiliation(s)
- Zhang Mi
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Chen Zhongqiang
- School of Medicine, Jianghan University, Wuhan, 430056, China
| | - Jiang Caiyun
- Department of Pharmacy, The Third Affiliate Hospital of Sun Yat-Sen University, Guangzhou, 510630, China
| | - Liu Yanan
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Wu Jianhua
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China
| | - Liu Liang
- Department of Pharmacy, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China.
| |
Collapse
|
22
|
Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022; 12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Collapse
Affiliation(s)
- Ziye Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| |
Collapse
|
23
|
Niu M, Zou Q, Lin C. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS Comput Biol 2022; 18:e1009798. [PMID: 35051187 PMCID: PMC8806072 DOI: 10.1371/journal.pcbi.1009798] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 02/01/2022] [Accepted: 01/02/2022] [Indexed: 02/06/2023] Open
Abstract
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git. More and more evidences show that circular RNA can directly bind to proteins and participate in countless different biological processes. The calculation method can quickly and accurately predict the binding site of circular RNA and RBP. In order to identify the interaction of circRNA with 37 different types of circRNA binding proteins, we developed an integrated deep learning network based on hierarchical network, called CRBPDL. It can effectively learn high-level feature representations. The performance of the model was verified through comparative experiments of different feature extraction algorithms, different deep learning models and classifier models. Moreover, the CRBPDL model was applied to 31 linear RNAs, and the effectiveness of our method was proved by comparison with the results of current excellent algorithms. It is expected that the CRBPDL model can effectively predict the binding site of circular RNA-RBP and provide reliable candidates for further biological experiments.
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
- * E-mail:
| |
Collapse
|
24
|
Attention-Based Deep Multiple-Instance Learning for Classifying Circular RNA and Other Long Non-Coding RNA. Genes (Basel) 2021; 12:genes12122018. [PMID: 34946967 PMCID: PMC8701965 DOI: 10.3390/genes12122018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 12/23/2022] Open
Abstract
Circular RNA (circRNA) is a distinguishable circular formed long non-coding RNA (lncRNA), which has specific roles in transcriptional regulation, multiple biological processes. The identification of circRNA from other lncRNA is necessary for relevant research. In this study, we designed attention-based multi-instance learning (MIL) network architecture fed with a raw sequence, to learn the sparse features of RNA sequences and to accomplish the circRNAs identification task. The model outperformed the state-of-art models. Moreover, following the validation of the attention mechanism effectiveness by the handwritten digit dataset, the key sequence loci underlying circRNA’s recognition were obtained based on the corresponding attention score. Then, motif enrichment analysis identified some of the key motifs for circRNA formation. In conclusion, we designed deep learning network architecture suitable for learning gene sequences with sparse features and implemented it for the circRNA identification task, and the model has strong representation capability in the indication of some key loci.
Collapse
|
25
|
Niu M, Ju Y, Lin C, Zou Q. Characterizing viral circRNAs and their application in identifying circRNAs in viruses. Brief Bioinform 2021; 23:6377516. [PMID: 34585234 DOI: 10.1093/bib/bbab404] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 08/23/2021] [Accepted: 09/02/2021] [Indexed: 01/19/2023] Open
Abstract
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism, which play an important role in a variety of biological activities. Viruses can encode circRNA, and viral circRNAs have been found in multiple single-stranded and double-stranded viruses. However, the characteristics and functions of viral circRNAs remain unknown. Sequence alignment showed that viral circRNAs are less conserved than circRNAs in animal, indicating that the viral circRNAs may evolve rapidly. Through the analysis of the sequence characteristics of viral circRNAs and circRNAs in animal, it was found that viral circRNAs and animals circRNAs are similar in nucleic acid composition, but have obvious differences in secondary structure and autocorrelation characteristics. Based on these characteristics of viral circRNAs, machine learning algorithms were employed to construct a prediction model to identify viral circRNA. Additionally, analysis of the interaction between viral circRNA and miRNAs showed that viral circRNA is expected to interact with 518 human miRNAs, and preliminary analysis of the role of viral circRNA. And it has been also found that viral circRNAs may be involved in many KEGG pathways related to nervous system and cancer. We curated an online server, and the data and code are available: http://server.malab.cn/viral-CircRNA/.
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
26
|
Xue Y, Ye X, Wei L, Zhang X, Sakurai T, Wei L. Better Performance with Transformer: CPPFormer in precise prediction of cell-Penetrating Peptides. Curr Med Chem 2021; 29:881-893. [PMID: 34544332 DOI: 10.2174/0929867328666210920103140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 07/28/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.
Collapse
Affiliation(s)
- Yuyang Xue
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Xin Zhang
- School of Software, Shandong University, Jinan. China
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan. China
| |
Collapse
|
27
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
28
|
Pedraz-Valdunciel C, Rosell R. Defining the landscape of circRNAs in non-small cell lung cancer and their potential as liquid biopsy biomarkers: a complete review including current methods. EXTRACELLULAR VESICLES AND CIRCULATING NUCLEIC ACIDS 2021; 2:179-201. [PMID: 39697533 PMCID: PMC11648509 DOI: 10.20517/evcna.2020.07] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/22/2021] [Accepted: 06/02/2021] [Indexed: 12/20/2024]
Abstract
Despite the significant decrease in population-level mortality of lung cancer patients as reflected in the Surveillance Epidemiology and End Results program national database, lung cancer, with non-small cell lung cancer (NSCLC) in the lead, continues to be the most commonly diagnosed cancer and foremost cause of cancer-related death worldwide, primarily due to late-stage diagnosis and ineffective treatment regimens. Although innovative single therapies and their combinations are constantly being tested in clinical trials, the five-year survival rate of late-stage lung cancer remains only 5% (Cancer Research, UK). Henceforth, investigation in the early diagnosis of lung cancer and prediction of treatment response is critical for improving the overall survival of these patients. Circular RNAs (circRNAs) are a re-discovered type of RNAs featuring stable structure and high tissue-specific expression. Evidence has revealed that aberrant circRNA expression plays an important role in carcinogenesis and tumor progression. Further investigation is warranted to assess the value of EV- and platelet-derived circRNAs as liquid biopsy-based readouts for lung cancer detection. This review discusses the origin and biology of circRNAs, and analyzes their present landscape in NSCLC, focusing on liquid biopsies to illustrate the different methodological trends currently available in research. The possible limitations that could be holding back the clinical implementation of circRNAs are also analyzed.
Collapse
Affiliation(s)
- Carlos Pedraz-Valdunciel
- Cancer Biology and Precision Medicine Department, Germans Trias i Pujol Research Institute and Hospital, Badalona 08916, Spain
- Biochemistry, Molecular Biology and Biomedicine Department, Universitat Autónoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Rafael Rosell
- Cancer Biology and Precision Medicine Department, Germans Trias i Pujol Research Institute and Hospital, Badalona 08916, Spain
- Universitat Autónoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| |
Collapse
|
29
|
Liu D, Fang L. Current research on circular RNAs and their potential clinical implications in breast cancer. Cancer Biol Med 2021; 18:j.issn.2095-3941.2020.0275. [PMID: 34018386 PMCID: PMC8330541 DOI: 10.20892/j.issn.2095-3941.2020.0275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Breast cancer (BC) is one of the most common cancers and the leading causes of death among women worldwide, and its morbidity rate is growing. Discovery of novel biomarkers is necessary for early BC detection, treatment, and prognostication. Circular RNAs (circRNAs), a novel type of endogenous non-coding RNAs with covalently closed continuous loops, have been found to have a crucial role in tumorigenesis. Studies have demonstrated that circRNAs are aberrantly expressed in the tumor tissues and plasma of patients with BC, and they modulate gene expression affecting the proliferation, metastasis, and chemoresistance of BC by specifically binding and regulating the expression of microRNAs (miRNAs). Therefore, circRNAs can be used as novel potential diagnostic and prognostic markers, and therapeutic targets for BC. This article summarizes the properties, functions, and regulatory mechanisms of circRNAs, particularly current research on their association with BC proliferation, metastasis, and chemoresistance.
Collapse
Affiliation(s)
- Diya Liu
- Department of Thyroid and Breast Diseases, Shanghai Tenth People’s Hospital, Shanghai 200070, China
| | - Lin Fang
- Department of Thyroid and Breast Diseases, Shanghai Tenth People’s Hospital, Shanghai 200070, China
| |
Collapse
|
30
|
Niu M, Lin Y, Zou Q. sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks. PLANT MOLECULAR BIOLOGY 2021; 105:483-495. [PMID: 33385273 DOI: 10.1007/s11103-020-01102-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Accepted: 12/01/2020] [Indexed: 06/12/2023]
Abstract
KEY MESSAGE We proposed an ensemble convolutional neural network model to identify sgRNA high on-target activity in four crops and we used one-hot encoding and k-mers for sequence encoding. As an important component of the CRISPR/Cas9 system, single-guide RNA (sgRNA) plays an important role in gene redirection and editing. sgRNA has played an important role in the improvement of agronomic species, but there is a lack of effective bioinformatics tools to identify the activity of sgRNA in agronomic species. Therefore, it is necessary to develop a method based on machine learning to identify sgRNA high on-target activity. In this work, we proposed a simple convolutional neural network method to identify sgRNA high on-target activity. Our study used one-hot encoding and k-mers for sequence data conversion and a voting algorithm for constructing the convolutional neural network ensemble model sgRNACNN for the prediction of sgRNA activity. The ensemble model sgRNACNN was used for predictions in four crops: Glycine max, Zea mays, Sorghum bicolor and Triticum aestivum. The accuracy rates of the four crops in the sgRNACNN model were 82.43%, 80.33%, 78.25% and 87.49%, respectively. The experimental results showed that sgRNACNN realizes the identification of high on-target activity sgRNA of agronomic data and can meet the demands of sgRNA activity prediction in agronomy to a certain extent. These results have certain significance for guiding crop gene editing and academic research. The source code and relevant dataset can be found in the following link: https://github.com/nmt315320/sgRNACNN.git .
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
31
|
Lei X, Zhang C, Wang Y. Predicting Metabolite-Disease Associations Based on Spy Strategy and ABC Algorithm. Front Mol Biosci 2020; 7:603121. [PMID: 33344506 PMCID: PMC7747351 DOI: 10.3389/fmolb.2020.603121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 10/08/2020] [Indexed: 12/12/2022] Open
Abstract
In recent years, latent metabolite-disease associations have been a significant focus in the biomedical domain. And more and more experimental evidence has been adduced that metabolites correlate with the diagnosis of complex human diseases. Several computational methods have been developed to detect potential metabolite-disease associations. In this article, we propose a novel method based on the spy strategy and an artificial bee colony (ABC) algorithm for metabolite-disease association prediction (SSABCMDA). Due to the fact that there are large parts of missing associations in unconfirmed metabolite-disease pairs, spy strategy is adopted to extract reliable negative samples from unconfirmed pairs. Considering the effects of parameters, the ABC algorithm is utilized to optimize parameters. In relevant cross-validation experiments, our method achieves excellent predictive performance. Moreover, three types of case studies are conducted on three common diseases to demonstrate the validity and utility of SSABCMDA method. Relevant experimental results indicate that our method can predict potential associations between metabolites and diseases effectively.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Cheng Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yueyue Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
32
|
Meng C, Wu J, Guo F, Dong B, Xu L. CWLy-pred: A novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method. Genomics 2020; 112:4715-4721. [DOI: 10.1016/j.ygeno.2020.08.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 10/25/2022]
|
33
|
A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8926750. [PMID: 33133228 PMCID: PMC7591939 DOI: 10.1155/2020/8926750] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/14/2020] [Accepted: 09/16/2020] [Indexed: 12/14/2022]
Abstract
With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.
Collapse
|
34
|
Manavalan B, Hasan MM, Basith S, Gosu V, Shin TH, Lee G. Empirical Comparison and Analysis of Web-Based DNA N 4-Methylcytosine Site Prediction Tools. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:406-420. [PMID: 33230445 PMCID: PMC7533314 DOI: 10.1016/j.omtn.2020.09.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 09/11/2020] [Indexed: 12/12/2022]
Abstract
DNA N4-methylcytosine (4mC) is a crucial epigenetic modification involved in various biological processes. Accurate genome-wide identification of these sites is critical for improving our understanding of their biological functions and mechanisms. As experimental methods for 4mC identification are tedious, expensive, and labor-intensive, several machine learning-based approaches have been developed for genome-wide detection of such sites in multiple species. However, the predictions projected by these tools are difficult to quantify and compare. To date, no systematic performance comparison of 4mC tools has been reported. The aim of this study was to compare and critically evaluate 12 publicly available 4mC site prediction tools according to species specificity, based on a huge independent validation dataset. The tools 4mCCNN (Escherichia coli), DNA4mC-LIP (Arabidopsis thaliana), iDNA-MS (Fragaria vesca), DNA4mC-LIP and 4mCCNN (Drosophila melanogaster), and four tools for Caenorhabditis elegans achieved excellent overall performance compared with their counterparts. However, none of the existing methods was suitable for Geoalkalibacter subterraneus, Geobacter pickeringii, and Mus musculus, thereby limiting their practical applicability. Model transferability to five species and non-transferability to three species are also discussed. The presented evaluation will assist researchers in selecting appropriate prediction tools that best suit their purpose and provide useful guidelines for the development of improved 4mC predictors in the future.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Vijayakumar Gosu
- Department of Animal Biotechnology, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Tae-Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.,Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| |
Collapse
|
35
|
Hasan MM, Basith S, Khatun MS, Lee G, Manavalan B, Kurata H. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2020; 22:5903398. [PMID: 32910169 DOI: 10.1093/bib/bbaa202] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/06/2020] [Accepted: 08/06/2020] [Indexed: 12/13/2022] Open
Abstract
DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Japan
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Republic of Korea
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan
| |
Collapse
|