1
|
Cao C, Wang C, Yang S, Zou Q. CircSI-SSL: circRNA-binding site identification based on self-supervised learning. Bioinformatics 2024; 40:btae004. [PMID: 38180876 PMCID: PMC10789309 DOI: 10.1093/bioinformatics/btae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 11/13/2023] [Accepted: 01/03/2024] [Indexed: 01/07/2024] Open
Abstract
MOTIVATION In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, have attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training to produce superior performance. But the acquisition of sample labels requires a large number of biological experiments and is difficult to obtain. RESULTS To resolve this matter that a great deal of tags need to be trained in the circRNA-binding site prediction task, a self-supervised learning binding site identification algorithm named CircSI-SSL is proposed in this article. According to the survey, this is unprecedented in the research field. Specifically, CircSI-SSL initially combines multiple feature coding schemes and employs RNA_Transformer for cross-view sequence prediction (self-supervised task) to learn mutual information from the multi-view data, and then fine-tuning with only a few sample labels. Comprehensive experiments on six widely used circRNA datasets indicate that our CircSI-SSL algorithm achieves excellent performance in comparison to previous algorithms, even in the extreme case where the ratio of training data to test data is 1:9. In addition, the transplantation experiment of six linRNA datasets without network modification and hyperparameter adjustment shows that CircSI-SSL has good scalability. In summary, the prediction algorithm based on self-supervised learning proposed in this article is expected to replace previous supervised algorithms and has more extensive application value. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/cc646201081/CircSI-SSL.
Collapse
Affiliation(s)
- Chao Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shuhong Yang
- Faculty of Mathematics and Computer Science, Guangdong Ocean University, Zhanjiang, Guangdong 524088, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| |
Collapse
|
2
|
Murtaj V, Butti E, Martino G, Panina-Bordignon P. Endogenous neural stem cells characterization using omics approaches: Current knowledge in health and disease. Front Cell Neurosci 2023; 17:1125785. [PMID: 37091923 PMCID: PMC10113633 DOI: 10.3389/fncel.2023.1125785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 03/03/2023] [Indexed: 04/08/2023] Open
Abstract
Neural stem cells (NSCs), an invaluable source of neuronal and glial progeny, have been widely interrogated in the last twenty years, mainly to understand their therapeutic potential. Most of the studies were performed with cells derived from pluripotent stem cells of either rodents or humans, and have mainly focused on their potential in regenerative medicine. High-throughput omics technologies, such as transcriptomics, epigenetics, proteomics, and metabolomics, which exploded in the past decade, represent a powerful tool to investigate the molecular mechanisms characterizing the heterogeneity of endogenous NSCs. The transition from bulk studies to single cell approaches brought significant insights by revealing complex system phenotypes, from the molecular to the organism level. Here, we will discuss the current literature that has been greatly enriched in the “omics era”, successfully exploring the nature and function of endogenous NSCs and the process of neurogenesis. Overall, the information obtained from omics studies of endogenous NSCs provides a sharper picture of NSCs function during neurodevelopment in healthy and in perturbed environments.
Collapse
Affiliation(s)
- Valentina Murtaj
- Division of Neuroscience, San Raffaele Vita-Salute University, Milan, Italy
- Neuroimmunology, Division of Neuroscience, Institute of Experimental Neurology, IRCCS Ospedale San Raffaele, Milan, Italy
| | - Erica Butti
- Neuroimmunology, Division of Neuroscience, Institute of Experimental Neurology, IRCCS Ospedale San Raffaele, Milan, Italy
| | - Gianvito Martino
- Division of Neuroscience, San Raffaele Vita-Salute University, Milan, Italy
- Neuroimmunology, Division of Neuroscience, Institute of Experimental Neurology, IRCCS Ospedale San Raffaele, Milan, Italy
| | - Paola Panina-Bordignon
- Division of Neuroscience, San Raffaele Vita-Salute University, Milan, Italy
- Neuroimmunology, Division of Neuroscience, Institute of Experimental Neurology, IRCCS Ospedale San Raffaele, Milan, Italy
- *Correspondence: Paola Panina-Bordignon
| |
Collapse
|
3
|
Dori M, Caroli J, Forcato M. Circr, a Computational Tool to Identify miRNA:circRNA Associations. Front Bioinform 2022; 2:852834. [PMID: 36304313 PMCID: PMC9580875 DOI: 10.3389/fbinf.2022.852834] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 02/21/2022] [Indexed: 08/21/2023] Open
Abstract
Circular RNAs (circRNAs) are known to act as important regulators of the microRNA (miRNA) activity. Yet, computational resources to identify miRNA:circRNA interactions are mostly limited to already annotated circRNAs or affected by high rates of false positive predictions. To overcome these limitations, we developed Circr, a computational tool for the prediction of associations between circRNAs and miRNAs. Circr combines three publicly available algorithms for de novo prediction of miRNA binding sites on target sequences (miRanda, RNAhybrid, and TargetScan) and annotates each identified miRNA:target pairs with experimentally validated miRNA:RNA interactions and binding sites for Argonaute proteins derived from either ChIPseq or CLIPseq data. The combination of multiple tools for the identification of a single miRNA recognition site with experimental data allows to efficiently prioritize candidate miRNA:circRNA interactions for functional studies in different organisms. Circr can use its internal annotation database or custom annotation tables to enhance the identification of novel and not previously annotated miRNA:circRNA sites in virtually any species. Circr is written in Python 3.6 and is released under the GNU GPL3.0 License at https://github.com/bicciatolab/Circr.
Collapse
Affiliation(s)
- Martina Dori
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena,Italy
| | - Jimmy Caroli
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena,Italy
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Mattia Forcato
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena,Italy
| |
Collapse
|
4
|
Niu M, Zou Q, Lin C. CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach. PLoS Comput Biol 2022; 18:e1009798. [PMID: 35051187 PMCID: PMC8806072 DOI: 10.1371/journal.pcbi.1009798] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 02/01/2022] [Accepted: 01/02/2022] [Indexed: 02/06/2023] Open
Abstract
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git. More and more evidences show that circular RNA can directly bind to proteins and participate in countless different biological processes. The calculation method can quickly and accurately predict the binding site of circular RNA and RBP. In order to identify the interaction of circRNA with 37 different types of circRNA binding proteins, we developed an integrated deep learning network based on hierarchical network, called CRBPDL. It can effectively learn high-level feature representations. The performance of the model was verified through comparative experiments of different feature extraction algorithms, different deep learning models and classifier models. Moreover, the CRBPDL model was applied to 31 linear RNAs, and the effectiveness of our method was proved by comparison with the results of current excellent algorithms. It is expected that the CRBPDL model can effectively predict the binding site of circular RNA-RBP and provide reliable candidates for further biological experiments.
Collapse
Affiliation(s)
- Mengting Niu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- School of Informatics, Xiamen University, Xiamen, China
- * E-mail:
| |
Collapse
|
5
|
Suenkel C, Cavalli D, Massalini S, Calegari F, Rajewsky N. A Highly Conserved Circular RNA Is Required to Keep Neural Cells in a Progenitor State in the Mammalian Brain. Cell Rep 2020; 30:2170-2179.e5. [PMID: 32075758 DOI: 10.1016/j.celrep.2020.01.083] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 11/19/2019] [Accepted: 01/23/2020] [Indexed: 11/22/2022] Open
Abstract
circSLC45A4 is the main RNA splice isoform produced from its genetic locus and one of the highest expressed circRNAs in the developing human frontal cortex. Knockdown of this highly conserved circRNA in a human neuroblastoma cell line is sufficient to induce spontaneous neuronal differentiation, measurable by increased expression of neuronal marker genes. Depletion of circSlc45a4 in the developing mouse cortex causes a significant reduction of the basal progenitor pool and increases the expression of neurogenic regulators. Furthermore, knockdown of circSlc45a4a induces a significant depletion of cells in the cortical plate. In addition, deconvolution of the bulk RNA-seq data with the help of single-cell RNA-seq data validates the depletion of basal progenitors and reveals an increase in Cajal-Retzius cells. In summary, we present a detailed study of a highly conserved circular RNA that is necessary to maintain the pool of neural progenitors in vitro and in vivo.
Collapse
|
6
|
Dori M, Cavalli D, Lesche M, Massalini S, Alieh LHA, de Toledo BC, Khudayberdiev S, Schratt G, Dahl A, Calegari F. MicroRNA profiling of mouse cortical progenitors and neurons reveals miR-486-5p as a regulator of neurogenesis. Development 2020; 147:dev.190520. [PMID: 32273274 DOI: 10.1242/dev.190520] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 03/26/2020] [Indexed: 12/12/2022]
Abstract
MicroRNAs (miRNAs) are short (∼22 nt) single-stranded non-coding RNAs that regulate gene expression at the post-transcriptional level. Over recent years, many studies have extensively characterized the involvement of miRNA-mediated regulation in neurogenesis and brain development. However, a comprehensive catalog of cortical miRNAs expressed in a cell-specific manner in progenitor types of the developing mammalian cortex is still missing. Overcoming this limitation, here we exploited a double reporter mouse line previously validated by our group to allow the identification of the transcriptional signature of neurogenic commitment and provide the field with the complete atlas of miRNA expression in proliferating neural stem cells, neurogenic progenitors and newborn neurons during corticogenesis. By extending the currently known list of miRNAs expressed in the mouse brain by over twofold, our study highlights the power of cell type-specific analyses for the detection of transcripts that would otherwise be diluted out when studying bulk tissues. We further exploited our data by predicting putative miRNAs and validated the power of our approach by providing evidence for the involvement of miR-486 in brain development.
Collapse
Affiliation(s)
- Martina Dori
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Daniel Cavalli
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Mathias Lesche
- DRESDEN-concept Genome Center c/o Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Simone Massalini
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Leila Haj Abdullah Alieh
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Beatriz Cardoso de Toledo
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Sharof Khudayberdiev
- Institute for Physiological Chemistry, Biochemical-Pharmacological Center Marburg, Philipps-University of Marburg, Karl-von-Frisch-Strasse 2, 35043 Marburg, Germany
| | - Gerhard Schratt
- Institute for Physiological Chemistry, Biochemical-Pharmacological Center Marburg, Philipps-University of Marburg, Karl-von-Frisch-Strasse 2, 35043 Marburg, Germany
| | - Andreas Dahl
- DRESDEN-concept Genome Center c/o Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| | - Federico Calegari
- CRTD - Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Fetcherstrasse 105, 01307 Dresden, Germany
| |
Collapse
|
7
|
Niu M, Zhang J, Li Y, Wang C, Liu Z, Ding H, Zou Q, Ma Q. CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput Struct Biotechnol J 2020; 18:834-42. [PMID: 32308930 DOI: 10.1016/j.csbj.2020.03.028] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 12/27/2022] Open
Abstract
Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.
Collapse
Key Words
- ACC, Accuracy
- CNN, Convolutional Neural Networks
- Circular RNA
- DAC, Dinucleotide-based auto-covariance
- DACC, Dinucleotide-based auto-cross-covariance
- DCC, Dinucleotide-based cross-covariance
- ELM, extreme learning machine
- Expression level
- Extreme learning machine
- GAC, Geary autocorrelation
- Identification
- MAC, Moran autocorrelation
- MCC, Matthews Correlation Coefficient
- MRMD, Maximum-Relevance-Maximum-Distance
- NMBAC, Normalized Moreau–Broto autocorrelation
- PC-PseDNC-General, General parallel correlation pseudo-dinucleotide composition
- PCGs, protein coding genes
- PSO, particle swarm optimization algorithm
- Particle swarm optimization algorithm
- PseDPC, Pseudo-distance structure status pair composition
- PseSSC, Pseudo-structure status composition
- RBF, radial basis function
- RF, random forest
- SC-PseDNC-General, General series correlation pseudo-dinucleotide composition
- SE, Sensitivity
- SP, Specifity
- SVM, support vector machine
- Triplet, Local structure-sequence triplet element
- circRNA, circular RNA
- lncRNAs, long non-coding RNAs
Collapse
|
8
|
Dori M, Bicciato S. Integration of Bioinformatic Predictions and Experimental Data to Identify circRNA-miRNA Associations. Genes (Basel) 2019; 10:genes10090642. [PMID: 31450634 PMCID: PMC6769881 DOI: 10.3390/genes10090642] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 08/20/2019] [Accepted: 08/21/2019] [Indexed: 12/20/2022] Open
Abstract
Circular RNAs (circRNAs) have recently emerged as a novel class of transcripts, characterized by covalently linked 3'-5' ends that result in the so-called backsplice junction. During the last few years, thousands of circRNAs have been identified in different organisms. Yet, despite their role as disease biomarker started to emerge, depicting their function remains challenging. Different studies have shown that certain circRNAs act as miRNA sponges, but any attempt to generalize from the single case to the "circ-ome" has failed so far. In this review, we explore the potential to define miRNA "sponging" as a more general function of circRNAs and describe the different approaches to predict miRNA response elements (MREs) in known or novel circRNA sequences. Moreover, we discuss how experiments based on Ago2-IP and experimentally validated miRNA:target duplexes can be used to either prioritize or validate putative miRNA-circRNA associations.
Collapse
Affiliation(s)
- Martina Dori
- Center for Genome Research, Department of Life Sciences, University of Modena and Reggio Emilia, Via G. Campi, 287, 41100 Modena, Italy.
| | - Silvio Bicciato
- Center for Genome Research, Department of Life Sciences, University of Modena and Reggio Emilia, Via G. Campi, 287, 41100 Modena, Italy.
| |
Collapse
|