1
|
Cao C, Li M, Wang C, Xu L, Zou Q, Wang Y, Han W. DGCLCMI: a deep graph collaboration learning method to predict circRNA-miRNA interactions. BMC Biol 2025; 23:104. [PMID: 40264118 PMCID: PMC12016396 DOI: 10.1186/s12915-025-02197-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Accepted: 03/25/2025] [Indexed: 04/24/2025] Open
Abstract
BACKGROUND Numerous studies have shown that circRNA can act as a miRNA sponge, competitively binding to miRNAs, thereby regulating gene expression and disease progression. Due to the high cost and time-consuming nature of traditional wet lab experiments, analyzing circRNA-miRNA associations is often inefficient and labor-intensive. Although some computational models have been developed to identify these associations, they fail to capture the deep collaborative features between circRNA and miRNA interactions and do not guide the training of feature extraction networks based on these high-order relationships, leading to poor prediction performance. RESULTS To address these issues, we innovatively propose a novel deep graph collaboration learning method for circRNA-miRNA interaction, called DGCLCMI. First, it uses word2vec to encode sequences into word embeddings. Next, we present a joint model that combines an improved neural graph collaborative filtering method with a feature extraction network for optimization. Deep interaction information is embedded as informative features within the sequence representations for prediction. Comprehensive experiments on three well-established datasets across seven metrics demonstrate that our algorithm significantly outperforms previous models, achieving an average AUC of 0.960. In addition, a case study reveals that 18 out of 20 predicted unknown CMI data points are accurate. CONCLUSIONS The DGCLCMI improves circRNA and miRNA feature representation by capturing deep collaborative information, achieving superior performance compared to prior methods. It facilitates the discovery of unknown associations and sheds light on their roles in physiological processes.
Collapse
Affiliation(s)
- Chao Cao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, 324003, China
| | - Mengli Li
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, 324003, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, Guangdong, 518055, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, 324003, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
| | - Wu Han
- Department of Statistics, Stanford University, Stanford, CA, 94043, USA.
| |
Collapse
|
2
|
Guo Y, Lei X, Li S. An Integrated TCN-CrossMHA Model for Predicting circRNA-RBP Binding Sites. Interdiscip Sci 2025; 17:86-100. [PMID: 39503827 DOI: 10.1007/s12539-024-00660-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 09/14/2024] [Accepted: 09/17/2024] [Indexed: 02/19/2025]
Abstract
Circular RNA (circRNA) has the capacity to bind with RNA binding protein (RBP), thereby exerting a substantial impact on diseases. Predicting binding sites aids in comprehending the interaction mechanism, thereby offering insights for disease treatment strategies. Here, we propose a novel approach based on temporal convolutional network (TCN) and cross multi-head attention mechanism to predict circRNA-RBP binding sites (circTCA). First, we employ two distinct encoding methodologies to obtain two raw matrices of circRNA sequences. Then, two parallel TCN blocks extract shallow and abstract features of the two matrices separately. The fusion of the two is achieved through cross multi-head attention mechanism and after this, global expectation pooling assigns weights to the concatenated feature. Finally, the task of classifying the input sequence is entrusted to a fully connected (FC) layer. We compare circTCA with other five methods and conduct ablation experiments to demonstrate its effectiveness. We also conduct feature visualization and assess the motifs extracted by circTCA with existing motifs. All in all, circTCA is effective for binding sites prediction of circRNA and RBP.
Collapse
Affiliation(s)
- Yajing Guo
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Shuyu Li
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| |
Collapse
|
3
|
Liang SZ, Wang L, You ZH, Yu CQ, Wei MM, Wei Y, Shi TL, Jiang C. Predicting circRNA-Disease Associations through Multisource Domain-Aware Embeddings and Feature Projection Networks. J Chem Inf Model 2025; 65:1666-1676. [PMID: 39829001 DOI: 10.1021/acs.jcim.4c02250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Recent studies have highlighted the significant role of circular RNAs (circRNAs) in various diseases. Accurately predicting circRNA-disease associations is crucial for understanding their biological functions and disease mechanisms. This work introduces the MNDCDA method, designed to address the challenges posed by the limited number of known circRNA-disease associations and the high cost of biological experiments. MNDCDA integrates multiple biological data sources with neighborhood-aware embedding models and deep feature projection networks to predict potential pathways linking circRNAs to diseases. Initially, comprehensive biometric data are used to construct four similarity networks, forming a diverse circRNA-disease interaction framework. Next, a neighborhood-aware embedding model captures structural information about circRNAs and diseases, while deep feature projection networks learn high-order feature interactions and nonlinear connections. Finally, a bilinear decoder identifies novel associations between circRNAs and diseases. The MNDCDA model achieved an AUC of 0.9070 on a constructed benchmark dataset. In case studies, 25 out of 30 predicted circRNA-disease pairs were validated through wet lab experiments and published literature. These extensive experimental results demonstrate that MNDCDA is a robust computational tool for predicting circRNA-disease associations, providing valuable insights while helping to reduce research costs.
Collapse
Affiliation(s)
- Si-Zhe Liang
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Lei Wang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning 530007, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Meng-Meng Wei
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Yu Wei
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Tai-Long Shi
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| | - Chen Jiang
- School of Information Engineering, Xijing Univerity, Xi'an 710123, China
| |
Collapse
|
4
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
5
|
Cao C, Wang C, Dai Q, Zou Q, Wang T. CRBPSA: CircRNA-RBP interaction sites identification using sequence structural attention model. BMC Biol 2024; 22:260. [PMID: 39543602 PMCID: PMC11566611 DOI: 10.1186/s12915-024-02055-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Accepted: 10/30/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND Due to the ability of circRNA to bind with corresponding RBPs and play a critical role in gene regulation and disease prevention, numerous identification algorithms have been developed. Nevertheless, most of the current mainstream methods primarily capture one-dimensional sequence features through various descriptors, while neglecting the effective extraction of secondary structure features. Moreover, as the number of introduced descriptors increases, the issues of sparsity and ineffective representation also rise, causing a significant burden on computational models and leaving room for improvement in predictive performance. RESULTS Based on this, we focused on capturing the features of secondary structure in sequences and developed a new architecture called CRBPSA, which is based on a sequence-structure attention mechanism. Firstly, a base-pairing matrix is generated by calculating the matching probability between each base, with a Gaussian function introduced as a weight to construct the secondary structure. Then, a Structure_Transformer is employed to extract base-pairing information and spatial positional dependencies, enabling the identification of binding sites through deeper feature extraction. Experimental results using the same set of hyperparameters on 37 circRNA datasets, totaling 671,952 samples, show that the CRBPSA algorithm achieves an average AUC of 99.93%, surpassing all existing prediction methods. CONCLUSIONS CRBPSA is a lightweight and efficient prediction tool for circRNA-RBP, which can capture structural features of sequences with minimal computational resources and accurately predict protein-binding sites. This tool facilitates a deeper understanding of the biological processes and mechanisms underlying circRNA and protein interactions.
Collapse
Affiliation(s)
- Chao Cao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Qi Dai
- College of Life Science and Medicine, Zhejiang Sci-Tech University, Hangzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
6
|
Tian Z, Han C, Xu L, Teng Z, Song W. MGCNSS: miRNA-disease association prediction with multi-layer graph convolution and distance-based negative sample selection strategy. Brief Bioinform 2024; 25:bbae168. [PMID: 38622356 PMCID: PMC11018511 DOI: 10.1093/bib/bbae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/14/2024] [Accepted: 03/31/2024] [Indexed: 04/17/2024] Open
Abstract
Identifying disease-associated microRNAs (miRNAs) could help understand the deep mechanism of diseases, which promotes the development of new medicine. Recently, network-based approaches have been widely proposed for inferring the potential associations between miRNAs and diseases. However, these approaches ignore the importance of different relations in meta-paths when learning the embeddings of miRNAs and diseases. Besides, they pay little attention to screening out reliable negative samples which is crucial for improving the prediction accuracy. In this study, we propose a novel approach named MGCNSS with the multi-layer graph convolution and high-quality negative sample selection strategy. Specifically, MGCNSS first constructs a comprehensive heterogeneous network by integrating miRNA and disease similarity networks coupled with their known association relationships. Then, we employ the multi-layer graph convolution to automatically capture the meta-path relations with different lengths in the heterogeneous network and learn the discriminative representations of miRNAs and diseases. After that, MGCNSS establishes a highly reliable negative sample set from the unlabeled sample set with the negative distance-based sample selection strategy. Finally, we train MGCNSS under an unsupervised learning manner and predict the potential associations between miRNAs and diseases. The experimental results fully demonstrate that MGCNSS outperforms all baseline methods on both balanced and imbalanced datasets. More importantly, we conduct case studies on colon neoplasms and esophageal neoplasms, further confirming the ability of MGCNSS to detect potential candidate miRNAs. The source code is publicly available on GitHub https://github.com/15136943622/MGCNSS/tree/master.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Chenguang Han
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Lewen Xu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhixia Teng
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Wei Song
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|