1
|
Hou Z, Xu Z, Yan C, Luo H, Luo J. CPI-GGS: A deep learning model for predicting compound-protein interaction based on graphs and sequences. Comput Biol Chem 2025; 115:108326. [PMID: 39752853 DOI: 10.1016/j.compbiolchem.2024.108326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/24/2024] [Indexed: 02/26/2025]
Abstract
BACKGROUND Compound-protein interaction (CPI) is essential to drug discovery and design, where traditional methods are often costly and have low success rates. Recently, the integration of machine learning and deep learning in CPI research has shown potential to reduce costs and enhance discovery efficiency by improving protein target identification accuracy. Additionally, with an urgent need for novel therapies against complex diseases, CPI investigation could lead to the identification of effective new drugs. Since drug-target interactions involve complex biological processes, refined models are necessary for precise feature extraction and analysis. Nevertheless, current CPI prediction methods still face significant limitations: predictions lack sufficient accuracy, models require improved generalization ability, and further validation across diverse datasets remains essential. RESULTS To address some issues at the current stage, this paper proposes a combined deep learning method, CPI-GGS, for predicting and analyzing compound-protein interactions. The source code is available on GitHub at https://github.com/xingjie321/CPI-GGS. CONCLUSIONS The experimental results demonstrate improved accuracy in predicting compound-protein interactions and enhance the understanding of how compounds and proteins interact, providing a valuable new tool for drug discovery and development.
Collapse
Affiliation(s)
- Zhanwei Hou
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Zhenhan Xu
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China.
| |
Collapse
|
2
|
Yao B, Song Y. lncRNA-disease association prediction based on optimizing measures of multi-graph regularized matrix factorization. Comput Methods Biomech Biomed Engin 2025:1-16. [PMID: 40114384 DOI: 10.1080/10255842.2025.2479854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 02/05/2025] [Accepted: 02/17/2025] [Indexed: 03/22/2025]
Abstract
In this paper, we propose a novel lncRNA-disease association prediction algorithm based on optimizing measures of multi-graph regularized matrix factorization (OM-MGRMF). The method first calculates the semantic similarity of diseases, the functional similarity of lncRNAs, and the Gaussian similarity of both. It then constructs a new lncRNA-disease association matrix by using the K-nearest-neighbor (KNN) algorithm. Finally, the objective function is constructed through the utilization of ranking measures and multi-graph regularization constraints. This objective function is iteratively optimized by an adaptive gradient descent algorithm. The experimental results of OM-MGRMF outperform those of classical methods in both K-fold cross-validation.
Collapse
Affiliation(s)
- Bin Yao
- School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
- Henan International Joint Laboratory of Direct Drive and General of Intelligent Equipment, Jiaozuo, China
| | - Yunzhong Song
- School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China
- Henan International Joint Laboratory of Direct Drive and General of Intelligent Equipment, Jiaozuo, China
| |
Collapse
|
3
|
Zhu R, Wang Y, Dai LY. CLHGNNMDA: Hypergraph Neural Network Model Enhanced by Contrastive Learning for miRNA-Disease Association Prediction. J Comput Biol 2025; 32:47-63. [PMID: 39602201 DOI: 10.1089/cmb.2024.0720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024] Open
Abstract
Numerous biological experiments have demonstrated that microRNA (miRNA) is involved in gene regulation within cells, and mutations and abnormal expression of miRNA can cause a myriad of intricate diseases. Forecasting the association between miRNA and diseases can enhance disease prevention and treatment and accelerate drug research, which holds considerable importance for the development of clinical medicine and drug research. This investigation introduces a contrastive learning-augmented hypergraph neural network model, termed CLHGNNMDA, aimed at predicting associations between miRNAs and diseases. Initially, CLHGNNMDA constructs multiple hypergraphs by leveraging diverse similarity metrics related to miRNAs and diseases. Subsequently, hypergraph convolution is applied to each hypergraph to extract feature representations for nodes and hyperedges. Following this, autoencoders are employed to reconstruct information regarding the feature representations of nodes and hyperedges and to integrate various features of miRNAs and diseases extracted from each hypergraph. Finally, a joint contrastive loss function is utilized to refine the model and optimize its parameters. The CLHGNNMDA framework employs multi-hypergraph contrastive learning for the construction of a contrastive loss function. This approach takes into account inter-view interactions and upholds the principle of consistency, thereby augmenting the model's representational efficacy. The results obtained from fivefold cross-validation substantiate that the CLHGNNMDA algorithm achieves a mean area under the receiver operating characteristic curve of 0.9635 and a mean area under the precision-recall curve of 0.9656. These metrics are notably superior to those attained by contemporary state-of-the-art methodologies.
Collapse
Affiliation(s)
- Rong Zhu
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Yong Wang
- Laboratory Experimental Teaching and Equipment Management Center, Qufu Normal University, Rizhao, China
| | - Ling-Yun Dai
- School of Computer Science, Qufu Normal University, Rizhao, China
| |
Collapse
|
4
|
Peng Y, Chu S, Huang X, Cheng Y. PPDAMEGCN: Predicting piRNA-Disease Associations Based on Multi-Edge Type Graph Convolutional Network. IET Syst Biol 2025; 19:e70011. [PMID: 40120103 PMCID: PMC11929523 DOI: 10.1049/syb2.70011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/16/2025] [Accepted: 03/06/2025] [Indexed: 03/25/2025] Open
Abstract
Recently, many studies have proven that Piwi-interacting RNAs (piRNAs) play key roles in various biological processes and also associate with human complicated diseases. Therefore, in order to accelerate the traditional biomedical experimental methods for determining piRNA-disease associations, many computational approaches have been proposed. However, piRNA-disease associations can be classified into known and unknown associations, each of which may provide distinct types of information. Traditional graph convolutional networks (GCNs) typically treat all edges in a graph as identical, overlooking the fact that different edge types may carry different signals and influence the learning process in unique ways. In this study, we also provide a new piRNA-disease association prediction method, called PPDAMEGCN, based on a multi-edge type graph convolutional network. First, we calculate the piRNA sequence similarity based on the piRNA sequence information and Smith-Waterman method. The disease semantic similarity is also computed by disease ontology (DO). In addition, we calculate the Gaussian interaction profile (GIP) kernel similarities of piRNA and diseases through the known piRNA-disease associations. Then, we construct the piRNA similarity network by integrating the piRNA's sequence similarity and GIP similarity. We also construct the disease similarity network by integrating disease's semantic similarity and GIP similarity. Finally, we obtain the piRNA and disease embeddings by the multi-edge type graph convolutional network model on the heterogenous piRNA-disease association network. The piRNA-disease pair association probability score is calculated by a multilayer perceptron (MLP) with its concatenated embedding. We also compare PPDAMEGCN to other piRNA-disease prediction methods. The experimental results show that our method outperforms compared methods.
Collapse
Affiliation(s)
- Yinglong Peng
- School of Information and IntelligenceXiangXi Vocational and Technical College for NationalitiesJishouChina
| | - Shuang Chu
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| | - Xindi Huang
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| | - Yan Cheng
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| |
Collapse
|
5
|
Yang Y, Sun Y, Li F, Guan B, Liu JX, Shang J. MGCNRF: Prediction of Disease-Related miRNAs Based on Multiple Graph Convolutional Networks and Random Forest. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:15701-15709. [PMID: 37459265 DOI: 10.1109/tnnls.2023.3289182] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Increasing microRNAs (miRNAs) have been confirmed to be inextricably linked to various diseases, and the discovery of their associations has become a routine way of treating diseases. To overcome the time-consuming and laborious shortcoming of traditional experiments in verifying the associations of miRNAs and diseases (MDAs), a variety of computational methods have emerged. However, these methods still have many shortcomings in terms of predictive performance and accuracy. In this study, a model based on multiple graph convolutional networks and random forest (MGCNRF) was proposed for the prediction MDAs. Specifically, MGCNRF first mapped miRNA functional similarity and sequence similarity, disease semantic similarity and target similarity, and the known MDAs into four different two-layer heterogeneous networks. Second, MGCNRF applied four heterogeneous networks into four different layered attention graph convolutional networks (GCNs), respectively, to extract MDA embeddings. Finally, MGCNRF integrated the embeddings of every MDA into the features of the miRNA-disease pair and predicted potential MDAs through the random forest (RF). Fivefold cross-validation was applied to verify the prediction performance of MGCNRF, which outperforms the other seven state-of-the-art methods by area under curve. Furthermore, the accuracy and the case studies of different diseases further demonstrate the scientific rationale of MGCNRF. In conclusion, MGCNRF can serve as a scientific tool for predicting potential MDAs.
Collapse
|
6
|
Zhang B, Wang H, Ma C, Huang H, Fang Z, Qu J. LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks. BMC Bioinformatics 2024; 25:332. [PMID: 39407120 PMCID: PMC11481433 DOI: 10.1186/s12859-024-05950-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) can prevent, diagnose, and treat a variety of complex human diseases, and it is crucial to establish a method to efficiently predict lncRNA-disease associations. RESULTS In this paper, we propose a prediction method for the lncRNA-disease association relationship, named LDAGM, which is based on the Graph Convolutional Autoencoder and Multilayer Perceptron model. The method first extracts the functional similarity and Gaussian interaction profile kernel similarity of lncRNAs and miRNAs, as well as the semantic similarity and Gaussian interaction profile kernel similarity of diseases. It then constructs six homogeneous networks and deeply fuses them using a deep topology feature extraction method. The fused networks facilitate feature complementation and deep mining of the original association relationships, capturing the deep connections between nodes. Next, by combining the obtained deep topological features with the similarity network of lncRNA, disease, and miRNA interactions, we construct a multi-view heterogeneous network model. The Graph Convolutional Autoencoder is employed for nonlinear feature extraction. Finally, the extracted nonlinear features are combined with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of the lncRNA-disease pair. Prediction of the lncRNA-disease association relationship is performed using the Multilayer Perceptron model. To enhance the performance and stability of the Multilayer Perceptron model, we introduce a hidden layer called the aggregation layer in the Multilayer Perceptron model. Through a gate mechanism, it controls the flow of information between each hidden layer in the Multilayer Perceptron model, aiming to achieve optimal feature extraction from each hidden layer. CONCLUSIONS Parameter analysis, ablation studies, and comparison experiments verified the effectiveness of this method, and case studies verified the accuracy of this method in predicting lncRNA-disease association relationships.
Collapse
Grants
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- No. 62172123 National Natural Science Foundation, China
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- Grant No. 2022ZX01A36 the Key Research and Development Program of Heilongjiang
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. ZY20B11 the Special projects for the central government to guide the development of local science and technology, China
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
- No. CXRC20221104236 the Harbin Manufacturing Technology Innovation Talent Project
Collapse
Affiliation(s)
- Bing Zhang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Haoyu Wang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China.
| | - Chao Ma
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Hai Huang
- Harbin University of Science and Technology, Harbin, 150006, Heilongjiang province, China
| | - Zhou Fang
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| | - Jiaxing Qu
- Cyberspace Research Center, Harbin, 150001, Heilongjiang province, China
| |
Collapse
|
7
|
Lan W, Li C, Chen Q, Yu N, Pan Y, Zheng Y, Chen YPP. LGCDA: Predicting CircRNA-Disease Association Based on Fusion of Local and Global Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1413-1422. [PMID: 38607720 DOI: 10.1109/tcbb.2024.3387913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
CircRNA has been shown to be involved in the occurrence of many diseases. Several computational frameworks have been proposed to identify circRNA-disease associations. Despite the existing computational methods have obtained considerable successes, these methods still require to be improved as their performance may degrade due to the sparsity of the data and the problem of memory overflow. We develop a novel computational framework called LGCDA to predict circRNA-disease associations by fusing local and global features to solve the above mentioned problems. First, we construct closed local subgraphs by using k-hop closed subgraph and label the subgraphs to obtain rich graph pattern information. Then, the local features are extracted by using graph neural network (GNN). In addition, we fuse Gaussian interaction profile (GIP) kernel and cosine similarity to obtain global features. Finally, the score of circRNA-disease associations is predicted by using the multilayer perceptron (MLP) based on local and global features. We perform five-fold cross validation on five datasets for model evaluation and our model surpasses other advanced methods.
Collapse
|
8
|
Su Y, Liu J, Wu Q, Gao Z, Wang J, Li H, Zheng C. AMPFLDAP: Adaptive Message Passing and Feature Fusion on Heterogeneous Network for LncRNA-Disease Associations Prediction. Interdiscip Sci 2024; 16:608-622. [PMID: 38581626 DOI: 10.1007/s12539-024-00610-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 04/08/2024]
Abstract
Exploration of the intricate connections between long noncoding RNA (lncRNA) and diseases, referred to as lncRNA-disease associations (LDAs), plays a pivotal and indispensable role in unraveling the underlying molecular mechanisms of diseases and devising practical treatment approaches. It is imperative to employ computational methods for predicting lncRNA-disease associations to circumvent the need for superfluous experimental endeavors. Graph-based learning models have gained substantial popularity in predicting these associations, primarily because of their capacity to leverage node attributes and relationships within the network. Nevertheless, there remains much room for enhancing the performance of these techniques by incorporating and harmonizing the node attributes more effectively. In this context, we introduce a novel model, i.e., Adaptive Message Passing and Feature Fusion (AMPFLDAP), for forecasting lncRNA-disease associations within a heterogeneous network. Firstly, we constructed a heterogeneous network involving lncRNA, microRNA (miRNA), and diseases based on established associations and employing Gaussian interaction profile kernel similarity as a measure. Then, an adaptive topological message passing mechanism is suggested to address the information aggregation for heterogeneous networks. The topological features of nodes in the heterogeneous network were extracted based on the adaptive topological message passing mechanism. Moreover, an attention mechanism is applied to integrate both topological and semantic information to achieve the multimodal features of biomolecules, which are further used to predict potential LDAs. The experimental results demonstrated that the performance of the proposed AMPFLDAP is superior to seven state-of-the-art methods. Furthermore, to validate its efficacy in practical scenarios, we conducted detailed case studies involving three distinct diseases, which conclusively demonstrated AMPFLDAP's effectiveness in the prediction of LDAs.
Collapse
Affiliation(s)
- Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Jingjing Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Qingwen Wu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Zhen Gao
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Jing Wang
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Haitao Li
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|
9
|
Luo J, Wang J, Zhai H, Wang J. GCphase: an SNP phasing method using a graph partition and error correction algorithm. BMC Bioinformatics 2024; 25:267. [PMID: 39160480 PMCID: PMC11331634 DOI: 10.1186/s12859-024-05901-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 08/14/2024] [Indexed: 08/21/2024] Open
Abstract
BACKGROUND The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results. RESULTS In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase . CONCLUSIONS Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Jiayi Wang
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junfeng Wang
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| |
Collapse
|
10
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
11
|
He J, Li M, Qiu J, Pu X, Guo Y. HOPEXGB: A Consensual Model for Predicting miRNA/lncRNA-Disease Associations Using a Heterogeneous Disease-miRNA-lncRNA Information Network. J Chem Inf Model 2024; 64:2863-2877. [PMID: 37604142 DOI: 10.1021/acs.jcim.3c00856] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Predicting disease-related microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) is crucial to find new biomarkers for the prevention, diagnosis, and treatment of complex human diseases. Computational predictions for miRNA/lncRNA-disease associations are of great practical significance, since traditional experimental detection is expensive and time-consuming. In this paper, we proposed a consensual machine-learning technique-based prediction approach to identify disease-related miRNAs and lncRNAs by high-order proximity preserved embedding (HOPE) and eXtreme Gradient Boosting (XGB), named HOPEXGB. By connecting lncRNA, miRNA, and disease nodes based on their correlations and relationships, we first created a heterogeneous disease-miRNA-lncRNA (DML) information network to achieve an effective fusion of information on similarities, correlations, and interactions among miRNAs, lncRNAs, and diseases. In addition, a more rational negative data set was generated based on the similarities of unknown associations with the known ones, so as to effectively reduce the false negative rate in the data set for model construction. By 10-fold cross-validation, HOPE shows better performance than other graph embedding methods. The final consensual HOPEXGB model yields robust performance with a mean prediction accuracy of 0.9569 and also demonstrates high sensitivity and specificity advantages compared to lncRNA/miRNA-specific predictions. Moreover, it is superior to other existing methods and gives promising performance on the external testing data, indicating that integrating the information on lncRNA-miRNA interactions and the similarities of lncRNAs/miRNAs is beneficial for improving the prediction performance of the model. Finally, case studies on lung, stomach, and breast cancers indicate that HOPEXGB could be a powerful tool for preclinical biomarker detection and bioexperiment preliminary screening for the diagnosis and prognosis of cancers. HOPEXGB is publicly available at https://github.com/airpamper/HOPEXGB.
Collapse
Affiliation(s)
- Jian He
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiangguo Qiu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
12
|
Cao J, Chen Q, Qiu J, Wang Y, Lan W, Du X, Tan K. NGCN: Drug-target interaction prediction by integrating information and feature learning from heterogeneous network. J Cell Mol Med 2024; 28:e18224. [PMID: 38509739 PMCID: PMC10955156 DOI: 10.1111/jcmm.18224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/14/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
Drug-target interaction (DTI) prediction is essential for new drug design and development. Constructing heterogeneous network based on diverse information about drugs, proteins and diseases provides new opportunities for DTI prediction. However, the inherent complexity, high dimensionality and noise of such a network prevent us from taking full advantage of these network characteristics. This article proposes a novel method, NGCN, to predict drug-target interactions from an integrated heterogeneous network, from which to extract relevant biological properties and association information while maintaining the topology information. It focuses on learning the topology representation of drugs and targets to improve the performance of DTI prediction. Unlike traditional methods, it focuses on learning the low-dimensional topology representation of drugs and targets via graph-based convolutional neural network. NGCN achieves substantial performance improvements over other state-of-the-art methods, such as a nearly 1.0% increase in AUPR value. Moreover, we verify the robustness of NGCN through benchmark tests, and the experimental results demonstrate it is an extensible framework capable of combining heterogeneous information for DTI prediction.
Collapse
Affiliation(s)
- Junyue Cao
- College of Life Science and TechnologyGuangxi UniversityNanningChina
| | - Qingfeng Chen
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| | - Junlai Qiu
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| | - Yiming Wang
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| | - Wei Lan
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| | - Xiaojing Du
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| | - Kai Tan
- School of Computer, Electronics and InformationGuangxi UniversityNanningChina
| |
Collapse
|
13
|
Lu P, Li L. MGDHGS: Gene-bridged metabolite-disease relationships prediction via GraphSAGE and self-attention mechanism. Comput Biol Chem 2024; 109:108036. [PMID: 38422603 DOI: 10.1016/j.compbiolchem.2024.108036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 03/02/2024]
Abstract
Metabolites represent the underlying information of biological systems. Revealing the links between metabolites and diseases can facilitate the development of targeted drugs. Traditional biological experiments can be used to validate the relationships of metabolite-disease, but these methods are time-consuming and labor-intensive. In contrast, the prevailing computational methods have improved efficiency but primarily rely on the metabolite-disease interactions, overlooking the impact of other biological components. To remedy the problem, we present a novel computational framework (MGDHGS) based on metabolite-gene-disease heterogeneous network to forecast potential associations. Specifically, we initially integrate data from multiple sources to construct metabolite-gene-disease heterogeneous network that includes known associations and computationally-derived similarities. Then, the GraphSAGE is harnessed to learn the low dimensional neighborhood representation in the heterogeneous network and self-attention mechanism is applied to effectively capture the connectivity patterns, which contributions to combine with nodes intrinsic and extrinsic features. Finally, the ultimate relationships probability scores are predicted by linear regression based on the these characteristics. The five-fold cross-validation showcases impressive AUC (0.9734) and PR (0.9718) for MGDHGS compared with five state-of-the-art methods, and the case studies validate that the metabolite-disease associations predicted by MGDHGS can be substantiated through pertinent biological experiments. The findings of this study show great potential contribution in the development of targeted drugs as well as offering solid support for our understanding of the complex interactions between metabolites, genes and diseases.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Ling Li
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
14
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
15
|
Wang S, Qiao J, Feng S. Prediction of lncRNA and disease associations based on residual graph convolutional networks with attention mechanism. Sci Rep 2024; 14:5185. [PMID: 38431702 PMCID: PMC11319593 DOI: 10.1038/s41598-024-55957-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 02/29/2024] [Indexed: 03/05/2024] Open
Abstract
LncRNAs are non-coding RNAs with a length of more than 200 nucleotides. More and more evidence shows that lncRNAs are inextricably linked with diseases. To make up for the shortcomings of traditional methods, researchers began to collect relevant biological data in the database and used bioinformatics prediction tools to predict the associations between lncRNAs and diseases, which greatly improved the efficiency of the study. To improve the prediction accuracy of current methods, we propose a new lncRNA-disease associations prediction method with attention mechanism, called ResGCN-A. Firstly, we integrated lncRNA functional similarity, lncRNA Gaussian interaction profile kernel similarity, disease semantic similarity, and disease Gaussian interaction profile kernel similarity to obtain lncRNA comprehensive similarity and disease comprehensive similarity. Secondly, the residual graph convolutional network was used to extract the local features of lncRNAs and diseases. Thirdly, the new attention mechanism was used to assign the weight of the above features to further obtain the potential features of lncRNAs and diseases. Finally, the training set required by the Extra-Trees classifier was obtained by concatenating potential features, and the potential associations between lncRNAs and diseases were obtained by the trained Extra-Trees classifier. ResGCN-A combines the residual graph convolutional network with the attention mechanism to realize the local and global features fusion of lncRNA and diseases, which is beneficial to obtain more accurate features and improve the prediction accuracy. In the experiment, ResGCN-A was compared with five other methods through 5-fold cross-validation. The results show that the AUC value and AUPR value obtained by ResGCN-A are 0.9916 and 0.9951, which are superior to the other five methods. In addition, case studies and robustness evaluation have shown that ResGCN-A is an effective method for predicting lncRNA-disease associations. The source code for ResGCN-A will be available at https://github.com/Wangxiuxiun/ResGCN-A .
Collapse
Affiliation(s)
- Shengchang Wang
- School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Jiaqing Qiao
- School of Electronic and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Shou Feng
- College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.
| |
Collapse
|
16
|
Zhou L, Peng X, Zeng L, Peng L. Finding potential lncRNA-disease associations using a boosting-based ensemble learning model. Front Genet 2024; 15:1356205. [PMID: 38495672 PMCID: PMC10940470 DOI: 10.3389/fgene.2024.1356205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/01/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Lijun Zeng
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
17
|
Lan W, Liu M, Chen J, Ye J, Zheng R, Zhu X, Peng W. JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering. Methods 2024; 222:1-9. [PMID: 38128706 DOI: 10.1016/j.ymeth.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/07/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China; Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China.
| | - Mingyang Liu
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jianwei Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jin Ye
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Information Security, Guilin University of Science and Technology, Guilin, China
| | - Wei Peng
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
18
|
Momanyi BM, Zhou YW, Grace-Mercure BK, Temesgen SA, Basharat A, Ning L, Tang L, Gao H, Lin H, Tang H. SAGESDA: Multi-GraphSAGE networks for predicting SnoRNA-disease associations. Curr Res Struct Biol 2023; 7:100122. [PMID: 38188542 PMCID: PMC10771890 DOI: 10.1016/j.crstbi.2023.100122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 11/30/2023] [Accepted: 12/24/2023] [Indexed: 01/09/2024] Open
Abstract
Over the years, extensive research has highlighted the functional roles of small nucleolar RNAs in various biological processes associated with the development of complex human diseases. Therefore, understanding the existing relationships between different snoRNAs and diseases is crucial for advancing disease diagnosis and treatment. However, classical biological experiments for identifying snoRNA-disease associations are expensive and time-consuming. Therefore, there is an urgent need for cost-effective computational techniques that can enhance the efficiency and accuracy of prediction. While several computational models have already been proposed, many suffer from limitations and suboptimal performance. In this study, we introduced a novel Graph Neural Network-based (GNN) classification model, called SAGESDA, which is implemented through the GraphSAGE architecture with attention for the prediction of snoRNA-disease associations. The classifier leverages local neighbouring nodes in a heterogeneous network to generate new node embeddings through message passing. The mini-batch gradient descent technique was applied to divide the graph into smaller sub-graphs, which enhances the model's accuracy, speed and scalability. With these advancements, SAGESDA attained an area under the receiver operating characteristic (ROC) curve (AUC) of 0.92 using the standard dot product classifier, surpassing previous related studies. This notable performance demonstrates that SAGESDA is a promising model for predicting unknown snoRNA-disease associations with high accuracy. The SAGESDA implementation details can be obtained from https://github.com/momanyibiffon/SAGESDA.git.
Collapse
Affiliation(s)
- Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Wei Zhou
- School of Health Care Technology, Chengdu Neusoft University, Chengdu, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Sebu Aboma Temesgen
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Ahmad Basharat
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Lin Ning
- School of Health Care Technology, Chengdu Neusoft University, Chengdu, China
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Lixia Tang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, 646000, China
- Basic Medicine Research Innovation Center for Cardiometabolic Diseases, Ministry of Education, Luzhou, 646000, China
- Central Nervous System Drug Key Laboratory of Sichuan Province, Luzhou, 646000, China
| |
Collapse
|
19
|
Lu Z, Zhong H, Tang L, Luo J, Zhou W, Liu L. Predicting lncRNA-disease associations based on heterogeneous graph convolutional generative adversarial network. PLoS Comput Biol 2023; 19:e1011634. [PMID: 38019786 PMCID: PMC10686445 DOI: 10.1371/journal.pcbi.1011634] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023] Open
Abstract
There is a growing body of evidence indicating the crucial roles that long non-coding RNAs (lncRNAs) play in the development and progression of various diseases, including cancers, cardiovascular diseases, and neurological disorders. However, accurately predicting potential lncRNA-disease associations remains a challenge, as existing methods have limitations in extracting heterogeneous association information and handling sparse and unbalanced data. To address these issues, we propose a novel computational method, called HGC-GAN, which combines heterogeneous graph convolutional neural networks (GCN) and generative adversarial networks (GAN) to predict potential lncRNA-disease associations. Specifically, we construct a lncRNA-miRNA-disease heterogeneous network by integrating multiple association data and sequence information. The GCN-based generator is then employed to aggregate neighbor information of nodes and obtain node embeddings, which are used to predict lncRNA-disease associations. Meanwhile, the GAN-based discriminator is trained to distinguish between real and fake lncRNA-disease associations generated by the generator, enabling the generator to improve its ability to generate accurate lncRNA-disease associations gradually. Our experimental results demonstrate that HGC-GAN performs better in predicting potential lncRNA-disease associations, with AUC and AUPR values of 0.9591 and 0.9606, respectively, under 10-fold cross-validation. Moreover, our case study further confirms the effectiveness of HGC-GAN in predicting potential lncRNA-disease associations, even for novel lncRNAs without any known lncRNA-disease associations. Overall, our proposed method HGC-GAN provides a promising approach to predict potential lncRNA-disease associations and may have important implications for disease diagnosis, treatment, and drug development.
Collapse
Affiliation(s)
- Zhonghao Lu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Hua Zhong
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Lin Tang
- Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource in Yunnan, School of Life Sciences and School of Ecology and Environment, Yunnan University, Kunming, People’s Republic of China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, People’s Republic of China
| | - Lin Liu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| |
Collapse
|
20
|
Wang T, Zhu X, Wang K, Ding R. Circ_0006324 regulates cell proliferation, cell-cycle progression, apoptosis, and glycolysis of non-small cell lung cancer cells through miR-496/TRIM59 axis. J Biochem Mol Toxicol 2023; 37:e23473. [PMID: 37545326 DOI: 10.1002/jbt.23473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/25/2023] [Accepted: 07/08/2023] [Indexed: 08/08/2023]
Abstract
Increasing evidence suggests that circular RNA (circRNA) plays an important role in non-small cell lung cancer (NSCLC) progression. This study aimed to investigate the role and potential molecular mechanism of circ_0006324 in NSCLC. The expression levels of circ_0006324, miR-496, miR-488-5p, and tripartite motif-containing 59 (TRIM59) mRNA were determined by quantitative real-time polymerase chain reaction (PCR). 3-(4,5-dimethyl-2-thiazolyl)-2,5-diphenyl-2-H-tetrazolium bromide assay, EdU assay, and flow cytometry were carried out to evaluate cell proliferation and apoptosis. The extracellular acidification rate and lactic acid production were examined to assess cell glycolysis. Western blot assay was used to detect protein levels. The target relationship of circ_0006324/miR-496/TRIM59 axis was validated by RNA pull-down assay, dual luciferase reporter assay, and radio immunoprecipitation assay. Xenograft tumor assay was performed to reveal the function of circ_0006324 in vivo. Circ_0006324 was upregulated in NSCLC and related to tumor node metastasis stage and distant metastasis. Knockdown of circ_00006324 impeded NSCLC cell proliferation, glycolysis, and promoted cell apoptosis. MiR-496 was verified as a target of circ_0006324 and circ_00006324 mediated the altering of cell proliferation, apoptosis, and glycolysis of NSCLC cells through targeting miR-496. TRIM59 was verified as a target of miR-496, and circ_0006324 positively regulated TRIM59 expression by targeting miR-496. Overexpression of TRIM59 could reverse the effects of circ_0006324 silencing on the proliferation, apoptosis, and glycolysis of NSCLC cells. Circ_0006324 knockdown impeded NSCLC tumor growth in vivo. Circ_0006324 functioned as a tumor promoter in NSCLC to promote cell proliferation, cell cycle progression, and glycolysis and inhibit cell apoptosis via miR-496/TRIM59 axis.
Collapse
Affiliation(s)
- Tao Wang
- Department of Thoracic surgery, Affiliated hospital of Guizhou medical university, Guiyang, Guizhou, China
| | - Xu Zhu
- Department of Thoracic surgery, Affiliated hospital of Guizhou medical university, Guiyang, Guizhou, China
| | - Kai Wang
- Department of Thoracic surgery, Affiliated hospital of Guizhou medical university, Guiyang, Guizhou, China
| | - Ronghai Ding
- Department of Basic Medicine, Guizhou Medical university, Guiyang, Guizhou, China
| |
Collapse
|
21
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
22
|
Wu P, Nie Z, Huang Z, Zhang X. CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model. PLANTS (BASEL, SWITZERLAND) 2023; 12:1652. [PMID: 37111874 PMCID: PMC10143888 DOI: 10.3390/plants12081652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/10/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Circular RNAs (circRNAs), which are produced post-splicing of pre-mRNAs, are strongly linked to the emergence of several tumor types. The initial stage in conducting follow-up studies involves identifying circRNAs. Currently, animals are the primary target of most established circRNA recognition technologies. However, the sequence features of plant circRNAs differ from those of animal circRNAs, making it impossible to detect plant circRNAs. For example, there are non-GT/AG splicing signals at circRNA junction sites and few reverse complementary sequences and repetitive elements in the flanking intron sequences of plant circRNAs. In addition, there have been few studies on circRNAs in plants, and thus it is urgent to create a plant-specific method for identifying circRNAs. In this study, we propose CircPCBL, a deep-learning approach that only uses raw sequences to distinguish between circRNAs found in plants and other lncRNAs. CircPCBL comprises two separate detectors: a CNN-BiGRU detector and a GLT detector. The CNN-BiGRU detector takes in the one-hot encoding of the RNA sequence as the input, while the GLT detector uses k-mer (k = 1 - 4) features. The output matrices of the two submodels are then concatenated and ultimately pass through a fully connected layer to produce the final output. To verify the generalization performance of the model, we evaluated CircPCBL using several datasets, and the results revealed that it had an F1 of 85.40% on the validation dataset composed of six different plants species and 85.88%, 75.87%, and 86.83% on the three cross-species independent test sets composed of Cucumis sativus, Populus trichocarpa, and Gossypium raimondii, respectively. With an accuracy of 90.9% and 90%, respectively, CircPCBL successfully predicted ten of the eleven circRNAs of experimentally reported Poncirus trifoliata and nine of the ten lncRNAs of rice on the real set. CircPCBL could potentially contribute to the identification of circRNAs in plants. In addition, it is remarkable that CircPCBL also achieved an average accuracy of 94.08% on the human datasets, which is also an excellent result, implying its potential application in animal datasets. Ultimately, CircPCBL is available as a web server, from which the data and source code can also be downloaded free of charge.
Collapse
Affiliation(s)
- Pengpeng Wu
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Life Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhenjun Nie
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Zhiqiang Huang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| | - Xiaodan Zhang
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
- School of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
23
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
24
|
Lan W, Dong Y, Zhang H, Li C, Chen Q, Liu J, Wang J, Chen YPP. Benchmarking of computational methods for predicting circRNA-disease associations. Brief Bioinform 2023; 24:6972300. [PMID: 36611256 DOI: 10.1093/bib/bbac613] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/29/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Accumulating evidences demonstrate that circular RNA (circRNA) plays an important role in human diseases. Identification of circRNA-disease associations can help for the diagnosis of human diseases, while the traditional method based on biological experiments is time-consuming. In order to address the limitation, a series of computational methods have been proposed in recent years. However, few works have summarized these methods or compared the performance of them. In this paper, we divided the existing methods into three categories: information propagation, traditional machine learning and deep learning. Then, the baseline methods in each category are introduced in detail. Further, 5 different datasets are collected, and 14 representative methods of each category are selected and compared in the 5-fold, 10-fold cross-validation and the de novo experiment. In order to further evaluate the effectiveness of these methods, six common cancers are selected to compare the number of correctly identified circRNA-disease associations in the top-10, top-20, top-50, top-100 and top-200. In addition, according to the results, the observation about the robustness and the character of these methods are concluded. Finally, the future directions and challenges are discussed.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Yi Dong
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Hongyu Zhang
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Chunling Li
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information and State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi 530004, China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
25
|
Lin L, Chen R, Zhu Y, Xie W, Jing H, Chen L, Zou M. SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA-disease associations. Front Microbiol 2023; 13:1093615. [PMID: 36713213 PMCID: PMC9874942 DOI: 10.3389/fmicb.2022.1093615] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 11/30/2022] [Indexed: 01/13/2023] Open
Abstract
Accumulating evidence has demonstrated various associations of long non-coding RNAs (lncRNAs) with human diseases, such as abnormal expression due to microbial influences that cause disease. Gaining a deeper understanding of lncRNA-disease associations is essential for disease diagnosis, treatment, and prevention. In recent years, many matrix decomposition methods have also been used to predict potential lncRNA-disease associations. However, these methods do not consider the use of microbe-disease association information to enrich disease similarity, and also do not make more use of similarity information in the decomposition process. To address these issues, we here propose a correction-based similarity-constrained probability matrix decomposition method (SCCPMD) to predict lncRNA-disease associations. The microbe-disease associations are first used to enrich the disease semantic similarity matrix, and then the logistic function is used to correct the lncRNA and disease similarity matrix, and then these two corrected similarity matrices are added to the probability matrix decomposition as constraints to finally predict the potential lncRNA-disease associations. The experimental results show that SCCPMD outperforms the five advanced comparison algorithms. In addition, SCCPMD demonstrated excellent prediction performance in a case study for breast cancer, lung cancer, and renal cell carcinoma, with prediction accuracy reaching 80, 100, and 100%, respectively. Therefore, SCCPMD shows excellent predictive performance in identifying unknown lncRNA-disease associations.
Collapse
Affiliation(s)
- Lieqing Lin
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
| | - Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Yinting Zhu
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Weijie Xie
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Huaiguo Jing
- Sports Department, Guangdong University of Technology, Guangzhou, China
| | - Langcheng Chen
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
| | - Minqing Zou
- Department of Experiment Teaching, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
26
|
DRGCNCDA: Predicting circRNA-disease interactions based on knowledge graph and disentangled relational graph convolutional network. Methods 2022; 208:35-41. [DOI: 10.1016/j.ymeth.2022.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/15/2022] [Accepted: 10/10/2022] [Indexed: 11/06/2022] Open
|
27
|
Tan J, Li X, Zhang L, Du Z. Recent advances in machine learning methods for predicting LncRNA and disease associations. Front Cell Infect Microbiol 2022; 12:1071972. [PMID: 36530425 PMCID: PMC9748103 DOI: 10.3389/fcimb.2022.1071972] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/11/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in almost the entire cell life cycle through different mechanisms and play an important role in many key biological processes. Mutations and dysregulation of lncRNAs have been implicated in many complex human diseases. Therefore, identifying the relationship between lncRNAs and diseases not only contributes to biologists' understanding of disease mechanisms, but also provides new ideas and solutions for disease diagnosis, treatment, prognosis and prevention. Since the existing experimental methods for predicting lncRNA-disease associations (LDAs) are expensive and time consuming, machine learning methods for predicting lncRNA-disease associations have become increasingly popular among researchers. In this review, we summarize some of the human diseases studied by LDAs prediction models, association and similarity features of LDAs prediction, performance evaluation methods of models and some advanced machine learning prediction models of LDAs. Finally, we discuss the potential limitations of machine learning-based methods for LDAs prediction and provide some ideas for designing new prediction models.
Collapse
|
28
|
Lan W, Dong Y, Chen Q, Liu J, Wang J, Chen YPP, Pan S. IGNSCDA: Predicting CircRNA-Disease Associations Based on Improved Graph Convolutional Network and Negative Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3530-3538. [PMID: 34506289 DOI: 10.1109/tcbb.2021.3111607] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Accumulating evidences have shown that circRNA plays an important role in human diseases. It can be used as potential biomarker for diagnose and treatment of disease. Although some computational methods have been proposed to predict circRNA-disease associations, the performance still need to be improved. In this paper, we propose a new computational model based on Improved Graph convolutional network and Negative Sampling to predict CircRNA-Disease Associations. In our method, it constructs the heterogeneous network based on known circRNA-disease associations. Then, an improved graph convolutional network is designed to obtain the feature vectors of circRNA and disease. Further, the multi-layer perceptron is employed to predict circRNA-disease associations based on the feature vectors of circRNA and disease. In addition, the negative sampling method is employed to reduce the effect of the noise samples, which selects negative samples based on circRNA's expression profile similarity and Gaussian Interaction Profile kernel similarity. The 5-fold cross validation is utilized to evaluate the performance of the method. The results show that IGNSCDA outperforms than other state-of-the-art methods in the prediction performance. Moreover, the case study shows that IGNSCDA is an effective tool for predicting potential circRNA-disease associations.
Collapse
|
29
|
Yao D, Zhang T, Zhan X, Zhang S, Zhan X, Zhang C. Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations. Front Genet 2022; 13:995532. [PMID: 36092871 PMCID: PMC9448985 DOI: 10.3389/fgene.2022.995532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 08/01/2022] [Indexed: 11/20/2022] Open
Abstract
More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- *Correspondence: Dengju Yao,
| | - Tao Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Shuli Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South University of Science and Technology, Shenzhen, China
| | - Chao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
30
|
Wang B, Liu R, Zheng X, Du X, Wang Z. lncRNA-disease association prediction based on matrix decomposition of elastic network and collaborative filtering. Sci Rep 2022; 12:12700. [PMID: 35882886 PMCID: PMC9325687 DOI: 10.1038/s41598-022-16594-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, with the continuous development and innovation of high-throughput biotechnology, more and more evidence show that lncRNA plays an essential role in biological life activities and is related to the occurrence of various diseases. However, due to the high cost and time-consuming of traditional biological experiments, the number of associations between lncRNAs and diseases that rely on experiments to verify is minimal. Computer-aided study of lncRNA-disease association is an important method to study the development of the lncRNA-disease association. Using the existing data to establish a prediction model and predict the unknown lncRNA-disease association can make the biological experiment targeted and improve its accuracy of the biological experiment. Therefore, we need to find an accurate and efficient method to predict the relationship between lncRNA and diseases and help biologists complete the diagnosis and treatment of diseases. Most of the current lncRNA-disease association predictions do not consider the model instability caused by the actual data. Also, predictive models may produce data that overfit is not considered. This paper proposes a lncRNA-disease association prediction model (ENCFLDA) that combines an elastic network with matrix decomposition and collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association between unknown lncRNA and disease, updates the matrix by matrix decomposition combined with the elastic network, and then obtains the final prediction matrix by collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association of unknown lncRNAs with diseases. First, since the known lncRNA-disease association matrix is very sparse, the cosine similarity and KNN are used to update the lncRNA-disease association matrix. The matrix is then updated by matrix decomposition combined with an elastic net algorithm, to increase the stability of the overall prediction model and eliminate data overfitting. The final prediction matrix is then obtained through collaborative filtering based on lncRNA.Through simulation experiments, the results show that the AUC value of ENCFLDA can reach 0.9148 under the framework of LOOCV, which is higher than the prediction result of the latest model.
Collapse
Affiliation(s)
- Bo Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China.
| | - RunJie Liu
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoDong Zheng
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoXin Du
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - ZhengFei Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| |
Collapse
|
31
|
Chen S, Zhang Y, Ding X, Li W. Identification of lncRNA/circRNA-miRNA-mRNA ceRNA Network as Biomarkers for Hepatocellular Carcinoma. Front Genet 2022; 13:838869. [PMID: 35386284 PMCID: PMC8977626 DOI: 10.3389/fgene.2022.838869] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 02/24/2022] [Indexed: 12/24/2022] Open
Abstract
Background: Hepatocellular carcinoma (HCC) accounts for the majority of liver cancer, with the incidence and mortality rates increasing every year. Despite the improvement of clinical management, substantial challenges remain due to its high recurrence rates and short survival period. This study aimed to identify potential diagnostic and prognostic biomarkers in HCC through bioinformatic analysis. Methods: Datasets from GEO and TCGA databases were used for the bioinformatic analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were carried out by WebGestalt website and clusterProfiler package of R. The STRING database and Cytoscape software were used to establish the protein-protein interaction (PPI) network. The GEPIA website was used to perform expression analyses of the genes. The miRDB, miRWalk, and TargetScan were employed to predict miRNAs and the expression levels of the predicted miRNAs were explored via OncomiR database. LncRNAs were predicted in the StarBase and LncBase while circRNA prediction was performed by the circBank. ROC curve analysis and Kaplan-Meier (KM) survival analysis were performed to evaluate the diagnostic and prognostic value of the gene expression, respectively. Results: A total of 327 upregulated and 422 downregulated overlapping DEGs were identified between HCC tissues and noncancerous liver tissues. The PPI network was constructed with 89 nodes and 178 edges and eight hub genes were selected to predict upstream miRNAs and ceRNAs. A lncRNA/circRNA-miRNA-mRNA network was successfully constructed based on the ceRNA hypothesis, including five lncRNAs (DLGAP1-AS1, GAS5, LINC00665, TYMSOS, and ZFAS1), six circRNAs (hsa_circ_0003209, hsa_circ_0008128, hsa_circ_0020396, hsa_circ_0030051, hsa_circ_0034049, and hsa_circ_0082333), eight miRNAs (hsa-miR-150-5p, hsa-miR-19b-3p, hsa-miR-23b-3p, hsa-miR-26a-5p, hsa-miR-651-5p, hsa-miR-10a-5p, hsa-miR-214-5p and hsa-miR-486-5p), and five mRNAs (CDC6, GINS1, MCM4, MCM6, and MCM7). The ceRNA network can promote HCC progression via cell cycle, DNA replication, and other pathways. Clinical diagnostic and survival analyses demonstrated that the ZFAS1/hsa-miR-150-5p/GINS1 ceRNA regulatory axis had a high diagnostic and prognostic value. Conclusion: These results revealed that cell cycle and DNA replication pathway could be potential pathways to participate in HCC development. The ceRNA network is expected to provide potential biomarkers and therapeutic targets for HCC management, especially the ZFAS1/hsa-miR-150-5p/GINS1 regulatory axis.
Collapse
Affiliation(s)
- Shanshan Chen
- Cancer Center, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yongchao Zhang
- Cancer Center, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Xiaoyan Ding
- Cancer Center, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Wei Li
- Cancer Center, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
32
|
Hao X, Chen Q, Pan H, Qiu J, Zhang Y, Yu Q, Han Z, Du X. Enhancing drug-drug interaction prediction by three-way decision and knowledge graph embedding. GRANULAR COMPUTING 2022; 8:67-76. [PMID: 38624759 PMCID: PMC8913867 DOI: 10.1007/s41066-022-00315-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/15/2022] [Indexed: 11/30/2022]
Abstract
Drug-Drug interaction (DDI) prediction is essential in pharmaceutical research and clinical application. Existing computational methods mainly extract data from multiple resources and treat it as binary classification. However, this cannot unambiguously tell the boundary between positive and negative samples owing to the incompleteness and uncertainty of derived data. A granular computing method called three-way decision is proved to be effective in making uncertain decision, but it relies on supplementary information to make delay decision. Recently, biomedical knowledge graph has been regarded as an important source to obtain abundant supplementary information about drugs. This paper proposes a three-way decision-based method called 3WDDI, in combination with knowledge graph embedding as supplementary features to enhance DDI prediction. The drug pairs are divided into positive, negative and boundary regions by Convolutional Neural Network (CNN) according to drug chemical structure feature. Further, delay decision is made for objects in the boundary region by integrating knowledge graph embedding feature to promote the accuracy of decision-making. The empirical results show that 3WDDI yields up to 0.8922, 0.9614, 0.9582, 0.8930 for Accuracy, AUPR, AUC and F1-score, respectively, and outperforms several baseline models.
Collapse
Affiliation(s)
- Xinkun Hao
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Qingfeng Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIA 3086 Australia
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Haiming Pan
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Jie Qiu
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Yuxiao Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Qian Yu
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Zongzhao Han
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Xiaojing Du
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| |
Collapse
|
33
|
Ma Y. DeepMNE: Deep Multi-network Embedding for lncRNA-Disease Association prediction. IEEE J Biomed Health Inform 2022; 26:3539-3549. [PMID: 35180094 DOI: 10.1109/jbhi.2022.3152619] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Long non-coding RNA (lncRNA) participates in various biological processes, hence its mutations and disorders play an important role in the pathogenesis of multiple human diseases. Identifying disease-related lncRNAs is crucial for the diagnosis, prevention, and treatment of diseases. Although a large number of computational approaches have been developed, effectively integrating multi-omics data and accurately predicting potential lncRNA-disease associations remains a challenge, especially regarding new lncRNAs and new diseases. In this work, we propose a new method with deep multi-network embedding, called DeepMNE, to discover potential lncRNA disease associations, especially for novel diseases and lncRNAs. DeepMNE extracts multi-omics data to describe diseases and lncRNAs, and proposes a network fusion method based on deep learning to integrate multi-source information. Moreover, DeepMNE complements the sparse association network and uses kernel neighborhood similarity to construct disease similarity and lncRNA similarity networks. Furthermore, A graph embedding method is adopted to predict potential associations. Experimental results demonstrate that compared to other state-of-the-art methods, DeepMNE has a higher predictive performance on new associations, new lncRNAs and new diseases. Besides, DeepMNE also elicits a considerable predictive performance on perturbed datasets. Additionally, the results of two different types of case studies indicate that DeepMNE can be used as an effective tool for disease-related lncRNA prediction. The code of DeepMNE is freely available at https://github.com/Mayingjun20179/ DeepMNE.
Collapse
|
34
|
Sheng N, Huang L, Wang Y, Zhao J, Xuan P, Gao L, Cao Y. Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Brief Bioinform 2022; 23:6519791. [PMID: 35108355 DOI: 10.1093/bib/bbab604] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction. RESULTS In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE. CONTACT huanglan@jlu.edu.cn, wy6868@jlu.edu.cn.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.,School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus OH 43210, USA
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ling Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
35
|
Li X, Ai H, Li B, Zhang C, Meng F, Ai Y. MIMRDA: A Method Incorporating the miRNA and mRNA Expression Profiles for Predicting miRNA-Disease Associations to Identify Key miRNAs (microRNAs). Front Genet 2022; 13:825318. [PMID: 35154284 PMCID: PMC8829120 DOI: 10.3389/fgene.2022.825318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/10/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying cancer-related miRNAs (or microRNAs) that precisely target mRNAs is important for diagnosis and treatment of cancer. Creating novel methods to identify candidate miRNAs becomes an imminent Frontier of researches in the field. One major obstacle lies in the integration of the state-of-the-art databases. Here, we introduce a novel method, MIMRDA, which incorporates the miRNA and mRNA expression profiles for predicting miRNA-disease associations to identify key miRNAs. As a proof-of-principle study, we use the MIMRDA method to analyze TCGA datasets of 20 types (BLCA, BRCA, CESE, CHOL, COAD, ESCA, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PRAD, READ, SKCM, STAD, THCA and UCEC) of cancer, which identified hundreds of top-ranked miRNAs. Some (as Category 1) of them are endorsed by public databases including TCGA, miRTarBase, miR2Disease, HMDD, MISIM, ncDR and mTD; others (as Category 2) are supported by literature evidences. miR-21 (representing Category 1) and miR-1258 (representing Category 2) display the excellent characteristics of biomarkers in multi-dimensional assessments focusing on the function similarity analysis, overall survival analysis, and anti-cancer drugs’ sensitivity or resistance analysis. We compare the performance of the MIMRDA method over the Limma and SPIA packages, and estimate the accuracy of the MIMRDA method in classifying top-ranked miRNAs via the Random Forest simulation test. Our results indicate the superiority and effectiveness of the MIMRDA method, and recommend some top-ranked key miRNAs be potential biomarkers that warrant experimental validations.
Collapse
Affiliation(s)
- Xianbin Li
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Hannan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Electrical and Computer Engineering, The Grainger College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- National Center for Quality Supervision and Inspection of Automatic Equipment, National Center for Testing and Evaluation of Robots (Guangzhou), CRAT, SINOMACH-IT, Guangzhou, China
- *Correspondence: Yuncan Ai, ; Hannan Ai,
| | - Bizhou Li
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Chaohui Zhang
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Fanmei Meng
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yuncan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- *Correspondence: Yuncan Ai, ; Hannan Ai,
| |
Collapse
|
36
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
37
|
Luo J, Ding H, Shen J, Zhai H, Wu Z, Yan C, Luo H. BreakNet: detecting deletions using long reads and a deep learning approach. BMC Bioinformatics 2021; 22:577. [PMID: 34856923 PMCID: PMC8641175 DOI: 10.1186/s12859-021-04499-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 11/29/2021] [Indexed: 12/29/2022] Open
Abstract
Background Structural variations (SVs) occupy a prominent position in human genetic diversity, and deletions form an important type of SV that has been suggested to be associated with genetic diseases. Although various deletion calling methods based on long reads have been proposed, a new approach is still needed to mine features in long-read alignment information. Recently, deep learning has attracted much attention in genome analysis, and it is a promising technique for calling SVs. Results In this paper, we propose BreakNet, a deep learning method that detects deletions by using long reads. BreakNet first extracts feature matrices from long-read alignments. Second, it uses a time-distributed convolutional neural network (CNN) to integrate and map the feature matrices to feature vectors. Third, BreakNet employs a bidirectional long short-term memory (BLSTM) model to analyse the produced set of continuous feature vectors in both the forward and backward directions. Finally, a classification module determines whether a region refers to a deletion. On real long-read sequencing datasets, we demonstrate that BreakNet outperforms Sniffles, SVIM and cuteSV in terms of their F1 scores. The source code for the proposed method is available from GitHub at https://github.com/luojunwei/BreakNet. Conclusions Our work shows that deep learning can be combined with long reads to call deletions more effectively than existing methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04499-5.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Hongyu Ding
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Jiquan Shen
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Haixia Zhai
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Zhengjiang Wu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Chaokun Yan
- School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Huimin Luo
- School of Computer Science and Information Engineering, Henan University, Kaifeng, 475001, China
| |
Collapse
|
38
|
Lan W, Dong Y, Chen Q, Zheng R, Liu J, Pan Y, Chen YPP. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform 2021; 23:6447436. [PMID: 34864877 DOI: 10.1093/bib/bbab494] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Revised: 10/12/2021] [Accepted: 10/26/2021] [Indexed: 12/31/2022] Open
Abstract
Increasing evidences have proved that circRNA plays a significant role in the development of many diseases. In addition, many researches have shown that circRNA can be considered as the potential biomarker for clinical diagnosis and treatment of disease. Some computational methods have been proposed to predict circRNA-disease associations. However, the performance of these methods is limited as the sparsity of low-order interaction information. In this paper, we propose a new computational method (KGANCDA) to predict circRNA-disease associations based on knowledge graph attention network. The circRNA-disease knowledge graphs are constructed by collecting multiple relationship data among circRNA, disease, miRNA and lncRNA. Then, the knowledge graph attention network is designed to obtain embeddings of each entity by distinguishing the importance of information from neighbors. Besides the low-order neighbor information, it can also capture high-order neighbor information from multisource associations, which alleviates the problem of data sparsity. Finally, the multilayer perceptron is applied to predict the affinity score of circRNA-disease associations based on the embeddings of circRNA and disease. The experiment results show that KGANCDA outperforms than other state-of-the-art methods in 5-fold cross validation. Furthermore, the case study demonstrates that KGANCDA is an effective tool to predict potential circRNA-disease associations.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Yi Dong
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Ruiqing Zheng
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jin Liu
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Yi Pan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Yi-Ping Phoebe Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| |
Collapse
|
39
|
Fan Y, Chen M, Pan X. GCRFLDA: scoring lncRNA-disease associations using graph convolution matrix completion with conditional random field. Brief Bioinform 2021; 23:6363052. [PMID: 34486019 DOI: 10.1093/bib/bbab361] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 07/19/2021] [Accepted: 08/16/2021] [Indexed: 12/12/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) play important roles in various biological regulatory processes, and are closely related to the occurrence and development of diseases. Identifying lncRNA-disease associations is valuable for revealing the molecular mechanism of diseases and exploring treatment strategies. Thus, it is necessary to computationally predict lncRNA-disease associations as a complementary method for biological experiments. In this study, we proposed a novel prediction method GCRFLDA based on the graph convolutional matrix completion. GCRFLDA first constructed a graph using the available lncRNA-disease association information. Then, it constructed an encoder consisting of conditional random field and attention mechanism to learn efficient embeddings of nodes, and a decoder layer to score lncRNA-disease associations. In GCRFLDA, the Gaussian interaction profile kernels similarity and cosine similarity were fused as side information of lncRNA and disease nodes. Experimental results on four benchmark datasets show that GCRFLDA is superior to other existing methods. Moreover, we conducted case studies on four diseases and observed that 70 of 80 predicted associated lncRNAs were confirmed by the literature.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology
| | - Meijun Chen
- Guilin University of Electronic Technology, Guilin 541004, China
| | - Xiaoyong Pan
- Department of Automation of Shanghai Jiao Tong University
| |
Collapse
|
40
|
Zheng Y, Wu Z. Cascade Deep Forest With Heterogeneous Similarity Measures for Drug-Target Interaction Prediction. Front Genet 2021; 12:702259. [PMID: 34504515 PMCID: PMC8421679 DOI: 10.3389/fgene.2021.702259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Drug repositioning is a method of systematically identifying potential molecular targets that known drugs may act on. Compared with traditional methods, drug repositioning has been extensively studied due to the development of multi-omics technology and system biology methods. Because of its biological network properties, it is possible to apply machine learning related algorithms for prediction. Based on various heterogeneous network model, this paper proposes a method named THNCDF for predicting drug-target interactions. Various heterogeneous networks are integrated to build a tripartite network, and similarity calculation methods are used to obtain similarity matrix. Then, the cascade deep forest method is used to make prediction. Results indicate that THNCDF outperforms the previously reported methods based on the 10-fold cross-validation on the benchmark data sets proposed by Y. Yamanishi. The area under Precision Recall curve (AUPR) value on the Enzyme, GPCR, Ion Channel, and Nuclear Receptor data sets is 0.988, 0.980, 0.938, and 0.906 separately. The experimental results well illustrate the feasibility of this method.
Collapse
Affiliation(s)
- Ying Zheng
- School of Computer & Communication Engineering, Changsha University of Science & Technology, Changsha, China
| | | |
Collapse
|
41
|
Zhao Y, Fang ZY, Lin CX, Deng C, Xu YP, Li HD. RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest. Front Genet 2021; 12:665843. [PMID: 34386033 PMCID: PMC8354212 DOI: 10.3389/fgene.2021.665843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.
Collapse
Affiliation(s)
- Yuan Zhao
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Cui-Xiang Lin
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chao Deng
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yun-Pei Xu
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
42
|
Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating Different DNA Methylation Patterns at the Resolution of Methylation Haplotypes. Front Genet 2021; 12:697279. [PMID: 34262601 PMCID: PMC8273290 DOI: 10.3389/fgene.2021.697279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 06/01/2021] [Indexed: 11/15/2022] Open
Abstract
Different DNA methylation patterns presented on different tissues or cell types are considered as one of the main reasons accounting for the tissue-specific gene expressions. In recent years, many methods have been proposed to identify differentially methylated regions (DMRs) based on the mixture of methylation signals from homologous chromosomes. To investigate the possible influence of homologous chromosomes on methylation analysis, this paper proposed a method (MHap) to construct methylation haplotypes for homologous chromosomes in CpG dense regions. Through comparing the methylation consistency between homologous chromosomes in different cell types, it can be found that majority of paired methylation haplotypes derived from homologous chromosomes are consistent, while a lower methylation consistency was observed in the breast cancer sample. It also can be observed that the hypomethylation consistency of differentiated cells is higher than that of the corresponding undifferentiated stem cells. Furthermore, based on the methylation haplotypes constructed on homologous chromosomes, a method (MHap_DMR) is developed to identify DMRs between differentiated cells and the corresponding undifferentiated stem cells, or between the breast cancer sample and the normal breast sample. Through comparing the methylation haplotype modes of DMRs in two cell types, the DNA methylation changing directions of homologous chromosomes in cell differentiation and cancerization can be revealed. The code is available at: https://github.com/xqpeng/MHap_DMR.
Collapse
Affiliation(s)
- Xiaoqing Peng
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Yiming Li
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Xiangyan Kong
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Xiaojun Ding
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| |
Collapse
|
43
|
Wang J, Wang W, Yan C, Luo J, Zhang G. Predicting Drug-Disease Association Based on Ensemble Strategy. Front Genet 2021; 12:666575. [PMID: 34012464 PMCID: PMC8128144 DOI: 10.3389/fgene.2021.666575] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/23/2021] [Indexed: 12/29/2022] Open
Abstract
Drug repositioning is used to find new uses for existing drugs, effectively shortening the drug research and development cycle and reducing costs and risks. A new model of drug repositioning based on ensemble learning is proposed. This work develops a novel computational drug repositioning approach called CMAF to discover potential drug-disease associations. First, for new drugs and diseases or unknown drug-disease pairs, based on their known neighbor information, an association probability can be obtained by implementing the weighted K nearest known neighbors (WKNKN) method and improving the drug-disease association information. Then, a new drug similarity network and new disease similarity network can be constructed. Three prediction models are applied and ensembled to enable the final association of drug-disease pairs based on improved drug-disease association information and the constructed similarity network. The experimental results demonstrate that the developed approach outperforms recent state-of-the-art prediction models. Case studies further confirm the predictive ability of the proposed method. Our proposed method can effectively improve the prediction results.
Collapse
Affiliation(s)
- Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Wenxiu Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
44
|
Liu Z, Chen Q, Lan W, Pan H, Hao X, Pan S. GADTI: Graph Autoencoder Approach for DTI Prediction From Heterogeneous Network. Front Genet 2021; 12:650821. [PMID: 33912218 PMCID: PMC8072283 DOI: 10.3389/fgene.2021.650821] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 03/12/2021] [Indexed: 12/26/2022] Open
Abstract
Identifying drug–target interaction (DTI) is the basis for drug development. However, the method of using biochemical experiments to discover drug-target interactions has low coverage and high costs. Many computational methods have been developed to predict potential drug-target interactions based on known drug-target interactions, but the accuracy of these methods still needs to be improved. In this article, a graph autoencoder approach for DTI prediction (GADTI) was proposed to discover potential interactions between drugs and targets using a heterogeneous network, which integrates diverse drug-related and target-related datasets. Its encoder consists of two components: a graph convolutional network (GCN) and a random walk with restart (RWR). And the decoder is DistMult, a matrix factorization model, using embedding vectors from encoder to discover potential DTIs. The combination of GCN and RWR can provide nodes with more information through a larger neighborhood, and it can also avoid over-smoothing and computational complexity caused by multi-layer message passing. Based on the 10-fold cross-validation, we conduct three experiments in different scenarios. The results show that GADTI is superior to the baseline methods in both the area under the receiver operator characteristic curve and the area under the precision–recall curve. In addition, based on the latest Drugbank dataset (V5.1.8), the case study shows that 54.8% of new approved DTIs are predicted by GADTI.
Collapse
Affiliation(s)
- Zhixian Liu
- School of Medical, Guangxi University, Nanning, China.,School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Haiming Pan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Xinkun Hao
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Shirui Pan
- Department of Data Science and AI, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
45
|
Xie G, Huang B, Sun Y, Wu C, Han Y. RWSF-BLP: a novel lncRNA-disease association prediction model using random walk-based multi-similarity fusion and bidirectional label propagation. Mol Genet Genomics 2021; 296:473-483. [PMID: 33590345 DOI: 10.1007/s00438-021-01764-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 01/28/2021] [Indexed: 12/13/2022]
Abstract
An increasing number of studies and experiments have demonstrated that long noncoding RNAs (lncRNAs) have a massive impact on various biological processes. Predicting potential associations between lncRNAs and diseases not only can improve our understanding of the molecular mechanisms of human diseases but also can facilitate the identification of biomarkers for disease diagnosis, treatment, and prevention. However, identifying such associations through experiments is costly and demanding, thereby prompting researchers to develop computational methods to complement these experiments. In this paper, we constructed a novel model called RWSF-BLP (a novel lncRNA-disease association prediction model using Random Walk-based multi-Similarity Fusion and Bidirectional Label Propagation), which applies an efficient random walk-based multi-similarity fusion (RWSF) method to fuse different similarity matrices and utilizes bidirectional label propagation to predict potential lncRNA-disease associations. Leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold-CV) were implemented in the evaluation RWSF-BLP performance. Results showed that, RWSF-BLP has reliable AUCs of 0.9086 and 0.9115 ± 0.0044 under the framework of LOOCV and 5-fold-CV and outperformed other four canonical methods. Case studies on lung cancer and leukemia demonstrated that potential lncRNA-disease associations can be predicted through our method. Therefore, our method can accurately infer potential lncRNA-disease associations and may be a good choice in future biomedical research.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Bin Huang
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Yuping Sun
- School of Computer Science, Guangdong University of Technology, Guangzhou, China.
| | - Changhai Wu
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Yuqiong Han
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
46
|
He Z, Zhang J, Yuan X, Zhang Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front Genet 2021; 11:632901. [PMID: 33537063 PMCID: PMC7848170 DOI: 10.3389/fgene.2020.632901] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/30/2020] [Indexed: 12/13/2022] Open
Abstract
Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.
Collapse
Affiliation(s)
- Zongzhen He
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| |
Collapse
|
47
|
Gu C, Shi X, Dang X, Chen J, Chen C, Chen Y, Pan X, Huang T. Identification of Common Genes and Pathways in Eight Fibrosis Diseases. Front Genet 2021; 11:627396. [PMID: 33519923 PMCID: PMC7844395 DOI: 10.3389/fgene.2020.627396] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/15/2020] [Indexed: 01/05/2023] Open
Abstract
Acute and chronic inflammation often leads to fibrosis, which is also the common and final pathological outcome of chronic inflammatory diseases. To explore the common genes and pathogenic pathways among different fibrotic diseases, we collected all the reported genes of the eight fibrotic diseases: eye fibrosis, heart fibrosis, hepatic fibrosis, intestinal fibrosis, lung fibrosis, pancreas fibrosis, renal fibrosis, and skin fibrosis. We calculated the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment scores of all fibrotic disease genes. Each gene was encoded using KEGG and GO enrichment scores, which reflected how much a gene can affect this function. For each fibrotic disease, by comparing the KEGG and GO enrichment scores between reported disease genes and other genes using the Monte Carlo feature selection (MCFS) method, the key KEGG and GO features were identified. We compared the gene overlaps among eight fibrotic diseases and connective tissue growth factor (CTGF) was finally identified as the common key molecule. The key KEGG and GO features of the eight fibrotic diseases were all screened by MCFS method. Moreover, we interestingly found overlaps of pathways between renal fibrosis and skin fibrosis, such as GO:1901890-positive regulation of cell junction assembly, as well as common regulatory genes, such as CTGF, which is the key molecule regulating fibrogenesis. We hope to offer a new insight into the cellular and molecular mechanisms underlying fibrosis and therefore help leading to the development of new drugs, which specifically delay or even improve the symptoms of fibrosis.
Collapse
Affiliation(s)
- Chang Gu
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Xin Shi
- Department of Cardiology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Xuening Dang
- Department of Colorectal and Anal Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Colorectal Cancer Research Center, Shanghai, China
| | - Jiafei Chen
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Chunji Chen
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yumei Chen
- Department of Nuclear Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xufeng Pan
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|